martins/ctdb.git
16 years agouse NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
Ronnie Sahlberg [Wed, 17 Oct 2007 05:03:58 +0000 (15:03 +1000)]
use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
about this packet any more and just forget it ever saw it

16 years agoreverse the order in which public ips are listed so it matches the order
Ronnie Sahlberg [Wed, 17 Oct 2007 03:42:42 +0000 (13:42 +1000)]
reverse the order in which public ips are listed so it matches the order
of the public_addresses file

16 years agomerge from tridge
Ronnie Sahlberg [Wed, 17 Oct 2007 00:10:52 +0000 (10:10 +1000)]
merge from tridge

16 years agoincrease release number
Andrew Tridgell [Tue, 16 Oct 2007 10:14:04 +0000 (20:14 +1000)]
increase release number

16 years agomore detail on multipath config
Andrew Tridgell [Tue, 16 Oct 2007 10:13:28 +0000 (20:13 +1000)]
more detail on multipath config

16 years agoadd back the test inside the daemon that if someone asks us to drop
Ronnie Sahlberg [Tue, 16 Oct 2007 05:27:07 +0000 (15:27 +1000)]
add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file
since at this stage it MUST be locked by the recovery daemon.

in order to avoid a non-blocking fnctl() lock from blocking and cause
"issues"  we move the 'test that we can not lock reclock file' into a
child process.

16 years agoadd a new tunable : DeterministicIPs that makes the allocation of
Ronnie Sahlberg [Tue, 16 Oct 2007 02:15:02 +0000 (12:15 +1000)]
add a new tunable : DeterministicIPs  that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set,    the first entry in /etc/ctdb/public_addresses will
always be hosted by node 0, when that node is available, the second
entry by node1 and so on.

This tunable allows the allocation of addresses to become very
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are
identical on all the nodes in the cluster.

16 years agoinclude system/network.h so we get the prototype for inet_aton()
Ronnie Sahlberg [Tue, 16 Oct 2007 01:29:33 +0000 (11:29 +1000)]
include system/network.h so we get the prototype for inet_aton()

16 years agomerge from tridge
Ronnie Sahlberg [Tue, 16 Oct 2007 01:26:22 +0000 (11:26 +1000)]
merge from tridge

16 years agodont try to lock the file from inside the ctdb daemon.
Ronnie Sahlberg [Mon, 15 Oct 2007 23:50:31 +0000 (09:50 +1000)]
dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding
itself after a node arriving/leaving the cluster

16 years agoonly link to -lipq if needed
Andrew Tridgell [Mon, 15 Oct 2007 04:44:06 +0000 (14:44 +1000)]
only link to -lipq if needed

16 years agoimproved handling of systems without libipq.h
Andrew Tridgell [Mon, 15 Oct 2007 04:37:54 +0000 (14:37 +1000)]
improved handling of systems without libipq.h

16 years agodisable ipmux code until we have a configure test
Andrew Tridgell [Mon, 15 Oct 2007 04:29:47 +0000 (14:29 +1000)]
disable ipmux code until we have a configure test

16 years agosync flags between nodes in monitor loop in recmaster
Andrew Tridgell [Mon, 15 Oct 2007 04:28:51 +0000 (14:28 +1000)]
sync flags between nodes in monitor loop in recmaster

16 years agomerge from ronnie
Andrew Tridgell [Mon, 15 Oct 2007 04:17:49 +0000 (14:17 +1000)]
merge from ronnie

16 years agodisable optimisation for now, until we find a occasional segv
Andrew Tridgell [Mon, 15 Oct 2007 03:31:09 +0000 (13:31 +1000)]
disable optimisation for now, until we find a occasional segv

16 years agoadd config option for disabling bans
Andrew Tridgell [Mon, 15 Oct 2007 03:22:58 +0000 (13:22 +1000)]
add config option for disabling bans

16 years agouse $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb
Ronnie Sahlberg [Wed, 10 Oct 2007 21:51:57 +0000 (07:51 +1000)]
use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb

16 years agouse kill_tcp_connections() to kill off all tcp connections to the
Ronnie Sahlberg [Wed, 10 Oct 2007 21:30:10 +0000 (07:30 +1000)]
use kill_tcp_connections() to kill off all tcp connections to the
"single public ip" address when we do a recovery

16 years agomove the kill_tcp_connections() function from 10.interfaces to functions
Ronnie Sahlberg [Wed, 10 Oct 2007 21:27:38 +0000 (07:27 +1000)]
move the kill_tcp_connections() function from 10.interfaces to functions

16 years agofirst check that recovery master is connected (we know this from our own
Ronnie Sahlberg [Wed, 10 Oct 2007 21:10:17 +0000 (07:10 +1000)]
first check that recovery master is connected (we know this from our own
flags)

then pull the flags off recovery master before checking if it is banned

16 years agosimplify election handling
Ronnie Sahlberg [Wed, 10 Oct 2007 20:16:36 +0000 (06:16 +1000)]
simplify election handling

make sure we read and update the flags from all remote nodes before we
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.

16 years agomerge from tridge
Ronnie Sahlberg [Wed, 10 Oct 2007 00:49:55 +0000 (10:49 +1000)]
merge from tridge

16 years agomake sure reconnected nodes start off as unhealthy so they don't get a public IP
Andrew Tridgell [Wed, 10 Oct 2007 00:45:22 +0000 (10:45 +1000)]
make sure reconnected nodes start off as unhealthy so they don't get a public IP

16 years agoadd a --single-public-ip argument to ctdbd to specify the ip address
Ronnie Sahlberg [Tue, 9 Oct 2007 23:42:32 +0000 (09:42 +1000)]
add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip
address

update the killtcp control in the daemon that if a socketpair that is to
be killed does not match a normal public address it checks if the
destination address maches the single public ip address and if so uses
that vnn structure from the ctdb context

this allows killtcp to kill also connections to the single public ip
instead of only normal public addresses

16 years agoremove some debug outputs
Ronnie Sahlberg [Tue, 9 Oct 2007 03:45:42 +0000 (13:45 +1000)]
remove some debug outputs

16 years agosend out gratious arps when we are starting up serving the "single
Ronnie Sahlberg [Tue, 9 Oct 2007 02:00:12 +0000 (12:00 +1000)]
send out gratious arps when we are starting up serving the "single
public ip" but before we start the ipmux tool

16 years agoadd a control to send gratious arps from the ctdb daemon
Ronnie Sahlberg [Tue, 9 Oct 2007 01:56:09 +0000 (11:56 +1000)]
add a control to send gratious arps from the ctdb daemon

16 years agoadd an initial test version of an ip multiplex tool that allows us
Ronnie Sahlberg [Mon, 8 Oct 2007 04:05:22 +0000 (14:05 +1000)]
add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.

this ip address is attached to lo on all nodes but only the recmaster
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming
packets to this ip address onto the other node sin the cluster based on
the ip address of the client host

to use this feature one must
1, have one fixed ip address in the customers network attached
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
   to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
   to specify which ipaddress should be the "single public ip address"

to test with only one single client,   attach several ip addresses to
the client and ping the public address from the client with different -I
options.   look in network trace to see to which node the packet is
passed onto.

16 years agoadd a function in the ctdb tool to determine whether the local node is
Ronnie Sahlberg [Sun, 7 Oct 2007 23:47:20 +0000 (09:47 +1000)]
add a function in the ctdb tool to determine whether the local node is
the recmaster or not.

return 0 if the node is the recmaster and 1 (true) if it is not or if
we could not communicate with the ctdb daemon.

call it 'isnotrecmaster' to cope with that if the tool could not bind to
the socket to tyalk to the daemon, the tool will automatically return an
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the
local daemon and if the local daemon confirms this node is the recmaster

16 years agomerge from tridge
Ronnie Sahlberg [Fri, 5 Oct 2007 22:11:24 +0000 (08:11 +1000)]
merge from tridge

16 years agofixed several places where we set the recovery culprit incorrectly
Andrew Tridgell [Fri, 5 Oct 2007 03:51:31 +0000 (13:51 +1000)]
fixed several places where we set the recovery culprit incorrectly

16 years ago - catch ESTALE in the recovery lock by trying a read()
Andrew Tridgell [Fri, 5 Oct 2007 03:28:21 +0000 (13:28 +1000)]
 - catch ESTALE in the recovery lock by trying a read()
- priortise nodes that are unbanned and healthy in the election

16 years agowe are the culprit if we can't get the reclock
Andrew Tridgell [Fri, 5 Oct 2007 02:01:40 +0000 (12:01 +1000)]
we are the culprit if we can't get the reclock

16 years agochange async.private to async.private_data since private is a reserved
Ronnie Sahlberg [Wed, 26 Sep 2007 04:25:32 +0000 (14:25 +1000)]
change async.private to async.private_data since private is a reserved
work in c++

16 years agomerge from tridge
Ronnie Sahlberg [Tue, 25 Sep 2007 01:43:42 +0000 (11:43 +1000)]
merge from tridge

16 years agoupped version number
Andrew Tridgell [Mon, 24 Sep 2007 05:27:01 +0000 (15:27 +1000)]
upped version number

16 years agomerge from ronnie
Andrew Tridgell [Mon, 24 Sep 2007 03:52:35 +0000 (13:52 +1000)]
merge from ronnie

16 years agowhen we have a public ip address mismatch (i.e. we hold addresses we
Ronnie Sahlberg [Mon, 24 Sep 2007 00:52:26 +0000 (10:52 +1000)]
when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt   or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode

16 years agomerge from tridge
Ronnie Sahlberg [Mon, 24 Sep 2007 00:27:48 +0000 (10:27 +1000)]
merge from tridge

16 years agofixed a fd leak on the recovery lock
Andrew Tridgell [Mon, 24 Sep 2007 00:19:07 +0000 (10:19 +1000)]
fixed a fd leak on the recovery lock

16 years agorun monitoring more quickly when unhealthy and at startup
Andrew Tridgell [Mon, 24 Sep 2007 00:12:18 +0000 (10:12 +1000)]
run monitoring more quickly when unhealthy and at startup

16 years agono longer wait at startup for services to become available, instead
Andrew Tridgell [Mon, 24 Sep 2007 00:00:14 +0000 (10:00 +1000)]
no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen

16 years agofixed a valgrind error, and some warnings
Andrew Tridgell [Sun, 23 Sep 2007 23:57:14 +0000 (09:57 +1000)]
fixed a valgrind error, and some warnings

16 years agomake the persistent dbdir configurable
Andrew Tridgell [Fri, 21 Sep 2007 06:12:04 +0000 (16:12 +1000)]
make the persistent dbdir configurable

16 years agomerge from tridge
Ronnie Sahlberg [Fri, 21 Sep 2007 05:45:48 +0000 (15:45 +1000)]
merge from tridge

16 years agoavoid using connected nodes that aren't in the vnn map yet
Andrew Tridgell [Fri, 21 Sep 2007 05:44:13 +0000 (15:44 +1000)]
avoid using connected nodes that aren't in the vnn map yet

16 years agomerge bugfix from ronnie
Andrew Tridgell [Fri, 21 Sep 2007 05:32:11 +0000 (15:32 +1000)]
merge bugfix from ronnie

16 years agoin ctdb_control_persistent_store() we must talloc_steal() the pointer to
Ronnie Sahlberg [Fri, 21 Sep 2007 05:19:33 +0000 (15:19 +1000)]
in ctdb_control_persistent_store() we must talloc_steal() the pointer to
c   to prevent it from being immediately freed (and our persistent store
state with it) if we need to wait asynchronously for other nodes before
we can reply back to the client

16 years agomerge from ronnie
Andrew Tridgell [Fri, 21 Sep 2007 04:47:32 +0000 (14:47 +1000)]
merge from ronnie

16 years agowhen ctdb attaches to a database it broadcasts the attach to all other
Ronnie Sahlberg [Fri, 21 Sep 2007 03:47:40 +0000 (13:47 +1000)]
when ctdb attaches to a database  it broadcasts the attach to all other
nodes so that the db is created on them as well

when we send this broadcast   we must use the correct control and not
assume all databases created are of the temporary kind

16 years agomerge from tridge
Ronnie Sahlberg [Fri, 21 Sep 2007 03:20:29 +0000 (13:20 +1000)]
merge from tridge

16 years agoadded support for persistent databases in ctdbd
Andrew Tridgell [Fri, 21 Sep 2007 02:24:02 +0000 (12:24 +1000)]
added support for persistent databases in ctdbd

16 years agomerge from tridge
Ronnie Sahlberg [Wed, 19 Sep 2007 01:54:45 +0000 (11:54 +1000)]
merge from tridge

16 years agoone more command to run to enable winbind for vsftpd
Ronnie Sahlberg [Wed, 19 Sep 2007 01:53:48 +0000 (11:53 +1000)]
one more command to run to enable winbind for vsftpd

16 years agomake sure we set close on exec on any possibly inherited fds
Andrew Tridgell [Wed, 19 Sep 2007 01:46:37 +0000 (11:46 +1000)]
make sure we set close on exec on any possibly inherited fds

16 years agoseparate out the various fs display ops
Andrew Tridgell [Wed, 19 Sep 2007 01:46:11 +0000 (11:46 +1000)]
separate out the various fs display ops

16 years agoexpanded ctdb_diagnostics a bit
Andrew Tridgell [Mon, 17 Sep 2007 05:31:33 +0000 (15:31 +1000)]
expanded ctdb_diagnostics a bit

16 years agoadd documantation of additional requirements for FTP so that users can
Ronnie Sahlberg [Mon, 17 Sep 2007 03:01:16 +0000 (13:01 +1000)]
add documantation of additional requirements for FTP so that users can
log in and access files using the AD username/password

16 years agomerge from tridge
Ronnie Sahlberg [Sun, 16 Sep 2007 21:43:15 +0000 (07:43 +1000)]
merge from tridge

16 years agoincrease release number
Andrew Tridgell [Fri, 14 Sep 2007 09:27:11 +0000 (19:27 +1000)]
increase release number

16 years agomerge from ronnie
Andrew Tridgell [Fri, 14 Sep 2007 05:23:23 +0000 (15:23 +1000)]
merge from ronnie

16 years agolet ctdb ip only print the ip addresses known to the specified node
Ronnie Sahlberg [Fri, 14 Sep 2007 05:19:44 +0000 (15:19 +1000)]
let ctdb ip   only print the ip addresses known to the specified node
and not the entire cluster

16 years agoupdate vnn -> pnn in documentation
Ronnie Sahlberg [Fri, 14 Sep 2007 04:24:53 +0000 (14:24 +1000)]
update vnn -> pnn in documentation

16 years agodocumentation updates
Ronnie Sahlberg [Fri, 14 Sep 2007 04:19:12 +0000 (14:19 +1000)]
documentation updates

it is --event-script-dir      not --event-script

add explanation of the public_addresses file

16 years agocope with non-standard install dirs in event scripts
Andrew Tridgell [Fri, 14 Sep 2007 04:14:03 +0000 (14:14 +1000)]
cope with non-standard install dirs in event scripts

16 years agomerge from tridge
Ronnie Sahlberg [Fri, 14 Sep 2007 02:18:34 +0000 (12:18 +1000)]
merge from tridge

16 years agofix pkill args
Andrew Tridgell [Fri, 14 Sep 2007 01:59:04 +0000 (11:59 +1000)]
fix pkill args

16 years agomake sure all public IPs are removed at startup
Andrew Tridgell [Fri, 14 Sep 2007 01:56:40 +0000 (11:56 +1000)]
make sure all public IPs are removed at startup

16 years agoduring startup make sure to delete any public addresses from any
Ronnie Sahlberg [Fri, 14 Sep 2007 00:37:10 +0000 (10:37 +1000)]
during startup make sure to delete any public addresses from any
interface

16 years agolet each node verify that they have a correct assignment of public ip
Ronnie Sahlberg [Fri, 14 Sep 2007 00:16:36 +0000 (10:16 +1000)]
let each node verify that they have a correct assignment of public ip
addresses (i.e. htey hold those they should hold   and they dont hold
any of those they shouldnt hold)

if an inconsistency is found, mark the local node as recovery mode
active
and wait for the recovery master to trigger a full blown recovery

16 years ago- merge from ronnie
Andrew Tridgell [Thu, 13 Sep 2007 23:49:12 +0000 (09:49 +1000)]
- merge from ronnie
- add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring

16 years agowait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
Andrew Tridgell [Thu, 13 Sep 2007 23:25:11 +0000 (09:25 +1000)]
wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done

16 years agonicer use of testparm
Andrew Tridgell [Thu, 13 Sep 2007 23:24:34 +0000 (09:24 +1000)]
nicer use of testparm

16 years agoupdate the section about event scripts
Ronnie Sahlberg [Thu, 13 Sep 2007 22:56:27 +0000 (08:56 +1000)]
update the section about event scripts

16 years agodisable nfsv4 in etc/sysconfig/nfs
Ronnie Sahlberg [Thu, 13 Sep 2007 22:15:24 +0000 (08:15 +1000)]
disable nfsv4 in etc/sysconfig/nfs

16 years agowhen a ctdb_takeover_run has failed we must make sure that
Ronnie Sahlberg [Thu, 13 Sep 2007 04:51:37 +0000 (14:51 +1000)]
when a ctdb_takeover_run has failed  we must make sure that
need_takeover_run is set to true  or else we might forget to rerun it
again during the next recovery

othervise,  need_takeover_run is only set to true IFF the node flags for
a remote node and the local nodes differ.
It is possible that a takeover run fails  and thus the reassignment of
ip addresses is incomplete  but before we get back to the test in
monitor_cluster()  that all the node flags of all nodes have converged
and they now match each others again.   and thus causing
monitor_cluster() to fail to realize that a takeover run is needed.

16 years agoensure smbd and winbindd do die in 50.samba
Andrew Tridgell [Thu, 13 Sep 2007 04:36:23 +0000 (14:36 +1000)]
ensure smbd and winbindd do die in 50.samba

16 years agomerge from tridge
Ronnie Sahlberg [Thu, 13 Sep 2007 04:28:18 +0000 (14:28 +1000)]
merge from tridge

16 years agoprevent recursion in the calling of ctdb_takeover_run
Andrew Tridgell [Thu, 13 Sep 2007 04:08:18 +0000 (14:08 +1000)]
prevent recursion in the calling of ctdb_takeover_run

16 years agomore shell scripting fixes in 10.interface
Andrew Tridgell [Thu, 13 Sep 2007 01:57:42 +0000 (11:57 +1000)]
more shell scripting fixes in 10.interface

16 years agoforce recovery if unable to tell a node to release an IP
Andrew Tridgell [Thu, 13 Sep 2007 01:19:49 +0000 (11:19 +1000)]
force recovery if unable to tell a node to release an IP

16 years agofixed script errors in 10.interface
Andrew Tridgell [Thu, 13 Sep 2007 01:19:30 +0000 (11:19 +1000)]
fixed script errors in 10.interface

16 years agowe don't need the is_loopback logic in ctdb any more
Andrew Tridgell [Thu, 13 Sep 2007 00:45:06 +0000 (10:45 +1000)]
we don't need the is_loopback logic in ctdb any more

16 years agoremove more cruft from the logs
Andrew Tridgell [Thu, 13 Sep 2007 00:39:05 +0000 (10:39 +1000)]
remove more cruft from the logs

16 years agonew approach for killing TCP connections on IP release
Andrew Tridgell [Thu, 13 Sep 2007 00:24:48 +0000 (10:24 +1000)]
new approach for killing TCP connections on IP release

16 years agoremove clutter from ctdb log file
Andrew Tridgell [Thu, 13 Sep 2007 00:03:18 +0000 (10:03 +1000)]
remove clutter from ctdb log file

16 years agofixed return code
Andrew Tridgell [Thu, 13 Sep 2007 00:02:56 +0000 (10:02 +1000)]
fixed return code

16 years agohandle hung or slow ctdb daemons on shutdown
Andrew Tridgell [Wed, 12 Sep 2007 03:26:24 +0000 (13:26 +1000)]
handle hung or slow ctdb daemons on shutdown

16 years ago- set arp_ignore to prevent replying to arp requests for addresses on loopback
Andrew Tridgell [Wed, 12 Sep 2007 03:23:36 +0000 (13:23 +1000)]
- set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;

16 years ago- don't allow the registration of clients with IPs we don't hold
Andrew Tridgell [Wed, 12 Sep 2007 03:22:31 +0000 (13:22 +1000)]
- don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier

16 years agochanged some debug levels
Andrew Tridgell [Wed, 12 Sep 2007 03:21:19 +0000 (13:21 +1000)]
changed some debug levels

16 years agouse the public addresses variable instead of hardcoding the path
Ronnie Sahlberg [Tue, 11 Sep 2007 21:28:24 +0000 (07:28 +1000)]
use the public addresses variable instead of hardcoding the path

16 years agomove all ip addresses onto loopback when we startup ctdb
Ronnie Sahlberg [Tue, 11 Sep 2007 21:26:30 +0000 (07:26 +1000)]
move all ip addresses onto loopback when we startup ctdb

16 years agofixed location of arp_filter
Andrew Tridgell [Tue, 11 Sep 2007 06:38:32 +0000 (16:38 +1000)]
fixed location of arp_filter

16 years agoget interface right
Andrew Tridgell [Mon, 10 Sep 2007 10:45:27 +0000 (20:45 +1000)]
get interface right

16 years agograb the interface name from tok and not from the uninitialized array
Ronnie Sahlberg [Mon, 10 Sep 2007 06:34:11 +0000 (16:34 +1000)]
grab the interface name from tok and not from the uninitialized array

16 years agomerged patch from tridge
Ronnie Sahlberg [Mon, 10 Sep 2007 06:23:06 +0000 (16:23 +1000)]
merged patch from tridge

16 years agofixed a pointer cast warning
Andrew Tridgell [Mon, 10 Sep 2007 05:16:17 +0000 (15:16 +1000)]
fixed a pointer cast warning

16 years agoadded back --public-interface to startup script
Andrew Tridgell [Mon, 10 Sep 2007 05:09:28 +0000 (15:09 +1000)]
added back --public-interface to startup script