Ronnie Sahlberg [Thu, 22 Nov 2007 22:51:41 +0000 (09:51 +1100)]
merge from tridge
(This used to be ctdb commit
f9e3531747d293711016bce99f08f42366c9d85b)
Andrew Tridgell [Sun, 18 Nov 2007 04:15:19 +0000 (15:15 +1100)]
increase release number
(This used to be ctdb commit
1178fcce1a701441d43d4ed959f0ba6b50a5b07d)
Andrew Tridgell [Sun, 18 Nov 2007 04:14:54 +0000 (15:14 +1100)]
- merge from ronnie
- auto-detect CTDB_MANAGES_WINBIND from smb.conf if not set
(This used to be ctdb commit
3d675c7bcedbd483c923df54d1af068758edc206)
Andrew Tridgell [Sun, 18 Nov 2007 04:01:26 +0000 (15:01 +1100)]
need public_addresses for test suite
(This used to be ctdb commit
6d79994eace4802ab72dda2793028264c47d2d56)
Ronnie Sahlberg [Fri, 16 Nov 2007 02:37:27 +0000 (13:37 +1100)]
from Christian A
when monitoring that all nfs shares are available, allow both ' ' and
'\t' characters to separate the exported directory from the options
in /etc/exports
(This used to be ctdb commit
ac6cfe9de0acdcf9461068684fa890504454aae4)
Ronnie Sahlberg [Wed, 14 Nov 2007 19:56:02 +0000 (06:56 +1100)]
only check port 21 when monitoring vsftpd
(This used to be ctdb commit
41b0d71aaee186138eddc97d49503841fa26f234)
Ronnie Sahlberg [Wed, 14 Nov 2007 05:17:52 +0000 (16:17 +1100)]
add CTDB_MANAGES_WINBIND to /etc/sysconfig/ctdb to allow ctdb to be used
in environments where samba is used without winbind
(This used to be ctdb commit
1ae5af14f90fd81a20b14c02c0c5ad355a609134)
Ronnie Sahlberg [Tue, 13 Nov 2007 02:27:00 +0000 (13:27 +1100)]
merge from tridge
(This used to be ctdb commit
8e5d488ae2f7cae2a7a6386ed85a3d26f7d39261)
Andrew Tridgell [Mon, 12 Nov 2007 23:28:06 +0000 (10:28 +1100)]
make it easier to test starting large numbers of virtual nodes
(This used to be ctdb commit
cf61bf8b8806d29772985c904d5ee15c24d4d767)
Andrew Tridgell [Mon, 12 Nov 2007 23:27:44 +0000 (10:27 +1100)]
make election handling much more scalable
(This used to be ctdb commit
05938d462b92bd9ecb8e35f53651bded47c48675)
Ronnie Sahlberg [Mon, 12 Nov 2007 20:38:58 +0000 (07:38 +1100)]
merge from tridge
(This used to be ctdb commit
f2c8ef106e41c38f73adfc196b6cc328174fbd58)
Andrew Tridgell [Mon, 12 Nov 2007 02:10:15 +0000 (13:10 +1100)]
don't do the first startup event until we are out of recovery
(This used to be ctdb commit
689940eb6e23f16ee063331caf3986613a8963ea)
Ronnie Sahlberg [Mon, 12 Nov 2007 01:28:20 +0000 (12:28 +1100)]
merge from tridge
(This used to be ctdb commit
6ccf7a2545c57545111a6236c9b4b493b8464060)
Andrew Tridgell [Sun, 11 Nov 2007 23:53:11 +0000 (10:53 +1100)]
prevent a deadly embrace between smbd and ctdbd by moving the calling
of the startup event scripts after the point where recovery has
started and the node is in normal operation
This makes the 'startup' script just a special type of the 'monitor'
script which is called first
(This used to be ctdb commit
7424c30a5fd04aea0137c466b4318c3f185280d8)
Ronnie Sahlberg [Sun, 11 Nov 2007 23:23:35 +0000 (10:23 +1100)]
revert 773
(This used to be ctdb commit
5a1c8f458ddc9b0ff532afda6007e32db10a71c8)
Ronnie Sahlberg [Mon, 5 Nov 2007 02:36:11 +0000 (13:36 +1100)]
add a new tunable "CheckNodesFile" that when set to 0 will disable the
check in the recovery daemon that all nodes are using the same
/etc/ctdb/nodes file.
Also add some more missing checks that the pnn used is a valid pnn
before using it to dereferencing the ctdb->nodes array
This is useful since it allows us to add more physical nodes to a an
existing cluster without having to bring down the entire cluster.
The to add an additional node to an existing cluster would then be
1, on all nodes set CheckNodesFile=0 using 'ctdb setvar'
2, on all nodes add CTDB_SET_CheckNodesFile=0 to /etc/sysconfig/ctdb
For each each node, one at a time :
3, use 'ctdb disable' to stop the hosted services
4, service ctdb stop
5, service ctdb start
Once all nodes have been restarted
6, on all nodes remove CTDB_SET_CheckNodesFile=0 from
/etc/sysconfig/ctdb
7, on all nodes set CheckNodesFile=0 using 'ctdb setvar'
8, configure and start up the new node
During this procedure, only one node at a time was brought
down/restarted and was so only for a short period.
(This used to be ctdb commit
462501a32143e943ce350bd904a47c0955414a51)
Andrew Tridgell [Fri, 2 Nov 2007 02:20:29 +0000 (13:20 +1100)]
patch from michael adam
(This used to be ctdb commit
a7a3bef90f033bab5cb110a6ef77a8bef48f2588)
Ronnie Sahlberg [Wed, 31 Oct 2007 22:00:14 +0000 (09:00 +1100)]
merge from tridge
(This used to be ctdb commit
10302eeecc36c4ce94a4e2e0e57864be790325da)
Andrew Tridgell [Mon, 29 Oct 2007 23:19:43 +0000 (10:19 +1100)]
increase release number
(This used to be ctdb commit
dc648b1bb6becc52dcf900add97418a5634367eb)
Andrew Tridgell [Mon, 29 Oct 2007 23:18:52 +0000 (10:18 +1100)]
added bonding info to ctdb_diagnostics
(This used to be ctdb commit
71b5fc434bc5d88eb0669ee29aa932ba12737e07)
Andrew Tridgell [Mon, 29 Oct 2007 02:43:12 +0000 (13:43 +1100)]
merge from ronnie
(This used to be ctdb commit
22b110549ff35f2560043abd5d85bed4b35295ee)
root [Mon, 29 Oct 2007 01:34:45 +0000 (12:34 +1100)]
the while loop in the startup event runs as a subshell so we need an extra || exit 1 at the end
to propagate the error code back to the caller of the script
(This used to be ctdb commit
c30d5c328784059949f5e82a07008e9632234f20)
Ronnie Sahlberg [Sun, 28 Oct 2007 23:51:16 +0000 (10:51 +1100)]
if bond* interfaces are used as public interfaces we can not rely on ethtool but
have to check /proc for the status instead
(This used to be ctdb commit
4ed7747267aea265b7a71c651abf6d5db4f4718b)
Ronnie Sahlberg [Sun, 28 Oct 2007 21:50:51 +0000 (08:50 +1100)]
merge from tridge
(This used to be ctdb commit
c7777b966f6a6e0f4126c03300338fdc822ac6c9)
Ronnie Sahlberg [Sun, 28 Oct 2007 21:40:46 +0000 (08:40 +1100)]
merge from tridge
(This used to be ctdb commit
919ba610c61cfaf5ecc1ab64ad8be34a80d928f4)
Andrew Tridgell [Fri, 26 Oct 2007 04:53:09 +0000 (14:53 +1000)]
added monitoring of ftp ports
(This used to be ctdb commit
4780e078fb55d69053f78a4bbc7c67e569bb5dae)
Ronnie Sahlberg [Tue, 23 Oct 2007 02:35:43 +0000 (12:35 +1000)]
since service nfs stop/start sometimes fail to bring up the mount daemon on rhel5
check if mountd is running during monitoring and if it is not, try to restart it
(This used to be ctdb commit
3d4b74669164b519398aeeacd59714f1e3884eff)
Andrew Tridgell [Tue, 23 Oct 2007 01:56:52 +0000 (11:56 +1000)]
update release number
(This used to be ctdb commit
fe6766940b2cf8a84ed51824158c956362a5806d)
Andrew Tridgell [Tue, 23 Oct 2007 01:45:36 +0000 (11:45 +1000)]
merge from ronnie
(This used to be ctdb commit
cc70a2cc5f5400d6480cb609e1fa203236917976)
Ronnie Sahlberg [Mon, 22 Oct 2007 20:42:45 +0000 (06:42 +1000)]
merge from tridge
(This used to be ctdb commit
938e375a80ce2f1827117c38554f576f73a5c71e)
Andrew Tridgell [Mon, 22 Oct 2007 11:11:02 +0000 (21:11 +1000)]
fixed a problem with backgrounding onnnode
(This used to be ctdb commit
4e23630224bb219cfbbf129c4562da5a4c2d601a)
Andrew Tridgell [Mon, 22 Oct 2007 06:41:11 +0000 (16:41 +1000)]
fixed a double close of a socket, leading to an EPOLL error
(This used to be ctdb commit
bbe8ad842bdfedd37ef14a6be07ad939113fe9b1)
Ronnie Sahlberg [Mon, 22 Oct 2007 05:14:49 +0000 (15:14 +1000)]
nfs may take a while to stop so do it in hte background
(This used to be ctdb commit
2ccaeaf6a65731c17173a4945e3e00e230e67d35)
Andrew Tridgell [Mon, 22 Oct 2007 05:13:32 +0000 (15:13 +1000)]
another place where we need to mark connect_fde as freed
(This used to be ctdb commit
d047fbeafebe4b150602f9a91802795659058b16)
Andrew Tridgell [Mon, 22 Oct 2007 05:13:08 +0000 (15:13 +1000)]
fixed a valgrind uninitialised memory error due to pad bytes
(This used to be ctdb commit
aea9b0c8d467fe19815c046969e9c1049a3a20ac)
Andrew Tridgell [Mon, 22 Oct 2007 04:07:35 +0000 (14:07 +1000)]
prevent a double free
(This used to be ctdb commit
5a1b923abb36c6deb99ae178fdd54f12235dc309)
Ronnie Sahlberg [Mon, 22 Oct 2007 02:34:51 +0000 (12:34 +1000)]
when shutting down, we should stop monitoring
(This used to be ctdb commit
325683ef8f326f0565a827ff2c493adcab6e0d64)
Ronnie Sahlberg [Mon, 22 Oct 2007 02:34:08 +0000 (12:34 +1000)]
when we are shutting down, we should first shut down the recovery daemon
(This used to be ctdb commit
39ade6b329adcd3234124d6a8daaa6181abf739b)
Andrew Tridgell [Mon, 22 Oct 2007 00:26:25 +0000 (10:26 +1000)]
merge from ronnie
(This used to be ctdb commit
b47fdc1fc86431c9159b595047faa76ba31f6829)
Ronnie Sahlberg [Mon, 22 Oct 2007 00:18:38 +0000 (10:18 +1000)]
dont set parameters in statd-callout if they should be set they
bshould be set from 10.interfaces
(This used to be ctdb commit
0c7c2dae0a976922de58793d576855bc37cd38e1)
Ronnie Sahlberg [Sat, 20 Oct 2007 20:42:33 +0000 (06:42 +1000)]
dont set some of the sysctl variables in statd-callout. these are
mainly useful for avoiding ack-storms when doing very rapid
failover/failback during testing but should not be required in
real-world.
this gets rid of a lof of annoying messages from the messages file
(This used to be ctdb commit
50d289dcce2caa7c7be9b6faa3b38b69c2237038)
Ronnie Sahlberg [Fri, 19 Oct 2007 05:19:25 +0000 (15:19 +1000)]
merge from tridge
(This used to be ctdb commit
a45cfb29d9a0babccddc6aa26e71c00524da1d97)
Andrew Tridgell [Fri, 19 Oct 2007 02:22:24 +0000 (12:22 +1000)]
increase release number
(This used to be ctdb commit
747ff96f1d93c52ba7548d0540266b0277d88ac1)
Ronnie Sahlberg [Fri, 19 Oct 2007 01:03:12 +0000 (11:03 +1000)]
dont close the file, just set the fd to -1
(This used to be ctdb commit
04b26aa09e69b3c9fa1db245b5123c3cc02db8af)
Andrew Tridgell [Thu, 18 Oct 2007 23:39:07 +0000 (09:39 +1000)]
merge from ronnie
(This used to be ctdb commit
d444fdc7782496abe4b27003b647ac49fb52e6be)
Andrew Tridgell [Thu, 18 Oct 2007 23:30:55 +0000 (09:30 +1000)]
remove a incorrectly added file
(This used to be ctdb commit
ff01a32db81b6c04d42634f5660181c270988264)
Ronnie Sahlberg [Thu, 18 Oct 2007 23:05:37 +0000 (09:05 +1000)]
add missing ) in the IB transport (which i dont compile for)
(This used to be ctdb commit
7f7a184bae87d46bd589d11068b6443b007366b4)
Ronnie Sahlberg [Thu, 18 Oct 2007 23:04:52 +0000 (09:04 +1000)]
add a stub restart method for IB
(This used to be ctdb commit
d318504ad5a49dbdfa307be39ae88df839e6308d)
Ronnie Sahlberg [Thu, 18 Oct 2007 22:58:30 +0000 (08:58 +1000)]
add a new transport method so that when a node is marked as dead, we
shut down and restart the transport
othervise, if we use the tcp transport the tcp connection might try to
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp
connection quickly reaches the maximum backoff rto which is often 60 or
120 seconds. this would mean that it could take up to 60/120 seconds
before the tcp layer detects that the connection is dead and it has to
be reestablished.
(This used to be ctdb commit
0256db470879ce556b0f00070f7ebeaf37e529ab)
Ronnie Sahlberg [Thu, 18 Oct 2007 06:54:00 +0000 (16:54 +1000)]
set the flags explicitely isnstead of masking them in
(This used to be ctdb commit
27a5f9dead44890683f9dbc4f07cda11264aa03b)
Andrew Tridgell [Thu, 18 Oct 2007 06:27:36 +0000 (16:27 +1000)]
added some debug lines to help track down a problem
(This used to be ctdb commit
2ca31e9de179f76e392a26cc8305e2473357c760)
Ronnie Sahlberg [Thu, 18 Oct 2007 05:53:50 +0000 (15:53 +1000)]
merge from tridge
(This used to be ctdb commit
ad03e63906270c9c076ffdb1f62f912bb414ea10)
Andrew Tridgell [Thu, 18 Oct 2007 05:51:15 +0000 (15:51 +1000)]
merge from ronnie
(This used to be ctdb commit
a6b094fdede0ae850e87877fad0b9dd1f3a26869)
Andrew Tridgell [Thu, 18 Oct 2007 05:44:02 +0000 (15:44 +1000)]
merge from ronnie
(This used to be ctdb commit
75d4b386293e186a6bb8532515585ab72670d663)
Ronnie Sahlberg [Thu, 18 Oct 2007 04:13:48 +0000 (14:13 +1000)]
flush the route cache when we have added the single public ip to the
node
cleanup and remove everything when we do a shutdown event
(This used to be ctdb commit
221432f45073bc7624803058c8bbf18838e7ceeb)
Ronnie Sahlberg [Wed, 17 Oct 2007 05:03:58 +0000 (15:03 +1000)]
use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
about this packet any more and just forget it ever saw it
(This used to be ctdb commit
42a2a777cbc15a8cbbea7ecf2fb1c6dafa242d0c)
Ronnie Sahlberg [Wed, 17 Oct 2007 03:42:42 +0000 (13:42 +1000)]
reverse the order in which public ips are listed so it matches the order
of the public_addresses file
(This used to be ctdb commit
ce987661edd9160982e65866fb773445d296e5c7)
Ronnie Sahlberg [Wed, 17 Oct 2007 00:10:52 +0000 (10:10 +1000)]
merge from tridge
(This used to be ctdb commit
87760a95ec0a9e3cb2c415c569235a1ff58318cb)
Andrew Tridgell [Tue, 16 Oct 2007 10:14:04 +0000 (20:14 +1000)]
increase release number
(This used to be ctdb commit
69fe7ce1d7874ce51d79de29adc53c207cb8869f)
Andrew Tridgell [Tue, 16 Oct 2007 10:13:28 +0000 (20:13 +1000)]
more detail on multipath config
(This used to be ctdb commit
78c44f2267cbef5fbc57d56dfd5ff40972733a1f)
Ronnie Sahlberg [Tue, 16 Oct 2007 05:27:07 +0000 (15:27 +1000)]
add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file
since at this stage it MUST be locked by the recovery daemon.
in order to avoid a non-blocking fnctl() lock from blocking and cause
"issues" we move the 'test that we can not lock reclock file' into a
child process.
(This used to be ctdb commit
3af994641ec2234e37da1fa1f693441586471a7e)
Ronnie Sahlberg [Tue, 16 Oct 2007 02:15:02 +0000 (12:15 +1000)]
add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.
Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb
When this is set, the first entry in /etc/ctdb/public_addresses will
always be hosted by node 0, when that node is available, the second
entry by node1 and so on.
This tunable allows the allocation of addresses to become very
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are
identical on all the nodes in the cluster.
(This used to be ctdb commit
f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
Ronnie Sahlberg [Tue, 16 Oct 2007 01:29:33 +0000 (11:29 +1000)]
include system/network.h so we get the prototype for inet_aton()
(This used to be ctdb commit
7145764b2d217f88a723dcb0ffd4e5a1567d64cf)
Ronnie Sahlberg [Tue, 16 Oct 2007 01:26:22 +0000 (11:26 +1000)]
merge from tridge
(This used to be ctdb commit
9e6bc12c9be2dabcfb9c6aeef257ef4737287fab)
Ronnie Sahlberg [Mon, 15 Oct 2007 23:50:31 +0000 (09:50 +1000)]
dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding
itself after a node arriving/leaving the cluster
(This used to be ctdb commit
6c0d206dea7116db71bccb4802a93dd7283249f6)
Andrew Tridgell [Mon, 15 Oct 2007 04:44:06 +0000 (14:44 +1000)]
only link to -lipq if needed
(This used to be ctdb commit
7c378d881e37db0f14e07ccba19fde1f9f4f0831)
Andrew Tridgell [Mon, 15 Oct 2007 04:37:54 +0000 (14:37 +1000)]
improved handling of systems without libipq.h
(This used to be ctdb commit
cfa8ddd3ca53c0160558137cccfc7e73e46ec36c)
Andrew Tridgell [Mon, 15 Oct 2007 04:29:47 +0000 (14:29 +1000)]
disable ipmux code until we have a configure test
(This used to be ctdb commit
fd83f0f3eb233f22ce9b5b4afbc4f26e3c865b3c)
Andrew Tridgell [Mon, 15 Oct 2007 04:28:51 +0000 (14:28 +1000)]
sync flags between nodes in monitor loop in recmaster
(This used to be ctdb commit
6eef86e06388fc53a1212f1e2783ae174c6cd210)
Andrew Tridgell [Mon, 15 Oct 2007 04:17:49 +0000 (14:17 +1000)]
merge from ronnie
(This used to be ctdb commit
d18712caba11855010be52f90bac656683076676)
Andrew Tridgell [Mon, 15 Oct 2007 03:31:09 +0000 (13:31 +1000)]
disable optimisation for now, until we find a occasional segv
(This used to be ctdb commit
d09570c70551aa40390ce9ceffe7bc234e1afafe)
Andrew Tridgell [Mon, 15 Oct 2007 03:22:58 +0000 (13:22 +1000)]
add config option for disabling bans
(This used to be ctdb commit
153b911f7f957d4c564b04f5aa878033a02da9e4)
Ronnie Sahlberg [Wed, 10 Oct 2007 21:51:57 +0000 (07:51 +1000)]
use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb
(This used to be ctdb commit
6abb46b010851f5719f12273b4a3d46ec986f0c7)
Ronnie Sahlberg [Wed, 10 Oct 2007 21:30:10 +0000 (07:30 +1000)]
use kill_tcp_connections() to kill off all tcp connections to the
"single public ip" address when we do a recovery
(This used to be ctdb commit
19b52a2d5db31efa9e7c77037097ff8539986ac3)
Ronnie Sahlberg [Wed, 10 Oct 2007 21:27:38 +0000 (07:27 +1000)]
move the kill_tcp_connections() function from 10.interfaces to functions
(This used to be ctdb commit
055948530fb16bf49c42fc4489f29a21665156c0)
Ronnie Sahlberg [Wed, 10 Oct 2007 21:10:17 +0000 (07:10 +1000)]
first check that recovery master is connected (we know this from our own
flags)
then pull the flags off recovery master before checking if it is banned
(This used to be ctdb commit
94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433)
Ronnie Sahlberg [Wed, 10 Oct 2007 20:16:36 +0000 (06:16 +1000)]
simplify election handling
make sure we read and update the flags from all remote nodes before we
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.
(This used to be ctdb commit
e85f3806483ea420559d449e0e4d81bec996740f)
Ronnie Sahlberg [Wed, 10 Oct 2007 00:49:55 +0000 (10:49 +1000)]
merge from tridge
(This used to be ctdb commit
4690a205fe4325b03ab044bdb5fbc9aa3e94db6e)
Andrew Tridgell [Wed, 10 Oct 2007 00:45:22 +0000 (10:45 +1000)]
make sure reconnected nodes start off as unhealthy so they don't get a public IP
(This used to be ctdb commit
c733ec6760cae01ce277f491caf1355e46de5cf7)
Ronnie Sahlberg [Tue, 9 Oct 2007 23:42:32 +0000 (09:42 +1000)]
add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.
add a vnn structure to the ctdb context to describe the single public ip
address
update the killtcp control in the daemon that if a socketpair that is to
be killed does not match a normal public address it checks if the
destination address maches the single public ip address and if so uses
that vnn structure from the ctdb context
this allows killtcp to kill also connections to the single public ip
instead of only normal public addresses
(This used to be ctdb commit
5661ba17b91f62821dec1c76056c78b99752a90b)
Ronnie Sahlberg [Tue, 9 Oct 2007 03:45:42 +0000 (13:45 +1000)]
remove some debug outputs
(This used to be ctdb commit
f29c0b52df1f455909ba133e3ad3bc462dc32929)
Ronnie Sahlberg [Tue, 9 Oct 2007 02:00:12 +0000 (12:00 +1000)]
send out gratious arps when we are starting up serving the "single
public ip" but before we start the ipmux tool
(This used to be ctdb commit
dad1a80f39763314825939095f7656c13dcdbdc3)
Ronnie Sahlberg [Tue, 9 Oct 2007 01:56:09 +0000 (11:56 +1000)]
add a control to send gratious arps from the ctdb daemon
(This used to be ctdb commit
563819dd1acb344f95aabb4bad990b36f7ea4520)
Ronnie Sahlberg [Mon, 8 Oct 2007 04:05:22 +0000 (14:05 +1000)]
add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.
this ip address is attached to lo on all nodes but only the recmaster
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming
packets to this ip address onto the other node sin the cluster based on
the ip address of the client host
to use this feature one must
1, have one fixed ip address in the customers network attached
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
to specify which ipaddress should be the "single public ip address"
to test with only one single client, attach several ip addresses to
the client and ping the public address from the client with different -I
options. look in network trace to see to which node the packet is
passed onto.
(This used to be ctdb commit
50d648c95e4e6d7c2867a034c2b550086d853320)
Ronnie Sahlberg [Sun, 7 Oct 2007 23:47:20 +0000 (09:47 +1000)]
add a function in the ctdb tool to determine whether the local node is
the recmaster or not.
return 0 if the node is the recmaster and 1 (true) if it is not or if
we could not communicate with the ctdb daemon.
call it 'isnotrecmaster' to cope with that if the tool could not bind to
the socket to tyalk to the daemon, the tool will automatically return an
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the
local daemon and if the local daemon confirms this node is the recmaster
(This used to be ctdb commit
ae5fcb790b6c3985f514fa8a96bc00c2619f2a28)
Ronnie Sahlberg [Fri, 5 Oct 2007 22:11:24 +0000 (08:11 +1000)]
merge from tridge
(This used to be ctdb commit
02cda01c032804cb1c53593ceb98685c827e2d58)
Andrew Tridgell [Fri, 5 Oct 2007 03:51:31 +0000 (13:51 +1000)]
fixed several places where we set the recovery culprit incorrectly
(This used to be ctdb commit
d9da73395fa443801fc68ec53a42b548e832d58a)
Andrew Tridgell [Fri, 5 Oct 2007 03:28:21 +0000 (13:28 +1000)]
- catch ESTALE in the recovery lock by trying a read()
- priortise nodes that are unbanned and healthy in the election
(This used to be ctdb commit
929feb475dfdf7283f0e99b50b179e1c91d3a39f)
Andrew Tridgell [Fri, 5 Oct 2007 02:01:40 +0000 (12:01 +1000)]
we are the culprit if we can't get the reclock
(This used to be ctdb commit
1d320e113c6134ff6822b985a47131d8204af35a)
Ronnie Sahlberg [Wed, 26 Sep 2007 04:25:32 +0000 (14:25 +1000)]
change async.private to async.private_data since private is a reserved
work in c++
(This used to be ctdb commit
79eb28f6cd5dcc30b04966d202a050eaf98a2552)
Ronnie Sahlberg [Tue, 25 Sep 2007 01:43:42 +0000 (11:43 +1000)]
merge from tridge
(This used to be ctdb commit
5655fab1284dce8f4a09ad426d53f5151c88968b)
Andrew Tridgell [Mon, 24 Sep 2007 05:27:01 +0000 (15:27 +1000)]
upped version number
(This used to be ctdb commit
4312e20e047ddb0f825c5e0c51d85dfa6a1b7df8)
Andrew Tridgell [Mon, 24 Sep 2007 03:52:35 +0000 (13:52 +1000)]
merge from ronnie
(This used to be ctdb commit
c67f516f01f8033e3fbd0f338eaa3a8afb862495)
Ronnie Sahlberg [Mon, 24 Sep 2007 00:52:26 +0000 (10:52 +1000)]
when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode
(This used to be ctdb commit
a77a77e8b5180f6a4a1f3d7d4ff03811f3b71b56)
Ronnie Sahlberg [Mon, 24 Sep 2007 00:27:48 +0000 (10:27 +1000)]
merge from tridge
(This used to be ctdb commit
7f9242747543ea1a2cc05f5c8afc51ab26e7d4bb)
Andrew Tridgell [Mon, 24 Sep 2007 00:19:07 +0000 (10:19 +1000)]
fixed a fd leak on the recovery lock
(This used to be ctdb commit
186f35c42ed4fcc9ed44390b0dd036ece475d45e)
Andrew Tridgell [Mon, 24 Sep 2007 00:12:18 +0000 (10:12 +1000)]
run monitoring more quickly when unhealthy and at startup
(This used to be ctdb commit
ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)
Andrew Tridgell [Mon, 24 Sep 2007 00:00:14 +0000 (10:00 +1000)]
no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit
3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
Andrew Tridgell [Sun, 23 Sep 2007 23:57:14 +0000 (09:57 +1000)]
fixed a valgrind error, and some warnings
(This used to be ctdb commit
c0f52dbb385fa0748680adb7c40755c92e577551)
Andrew Tridgell [Fri, 21 Sep 2007 06:12:04 +0000 (16:12 +1000)]
make the persistent dbdir configurable
(This used to be ctdb commit
2587b887dcfce26b12c66fcb5d34e92da42a1776)