ctdb.git
13 years agovacuum: close fd leak 1.0.112b
Rusty Russell [Wed, 4 Aug 2010 22:31:55 +0000 (08:01 +0930)]
vacuum: close fd leak

Commit 517f05e42f17766b1e8db8f1f4789cbad968e304 "freeze: abort vacuuming
when we're going to freeze." introduced a file descriptor leak, because
the abortfd used to talk to the child wasn't closed.

Do this in the destructor.

CQ:S1019190
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoNew version : 1.0.112-30
Ronnie Sahlberg [Mon, 26 Jul 2010 06:52:36 +0000 (16:52 +1000)]
New version : 1.0.112-30

13 years agoMerge commit 'rusty/1.0.112' into 1.0.112
Ronnie sahlberg [Mon, 26 Jul 2010 06:49:16 +0000 (16:49 +1000)]
Merge commit 'rusty/1.0.112' into 1.0.112

13 years agovacuum: fix crash on vacuum abort
Rusty Russell [Mon, 26 Jul 2010 06:38:07 +0000 (16:08 +0930)]
vacuum: fix crash on vacuum abort

Martin Schwenke discovered that 517f05e42f17766b1e8db8f1f4789cbad968e304
("freeze: abort vacuuming when we're going to freeze.") used ctdb_db for
a logging message which is in fact uninitialized, causing a crash (even
if it wasn't actually logged).

Initialize it properly.  Also fix incorrect format in another logging
message introduced in that same change.

CQ:S1019093
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoNew version 1.0.112-29
Ronnie Sahlberg [Mon, 26 Jul 2010 05:26:33 +0000 (15:26 +1000)]
New version 1.0.112-29

 - Fix for a SEGV that can happen when tcp tickles fail/timeout
   CQ:S1019041

13 years agotakeover: prevent crash by avoiding free in traverse on RST timeout
Rusty Russell [Mon, 26 Jul 2010 04:28:48 +0000 (13:58 +0930)]
takeover: prevent crash by avoiding free in traverse on RST timeout

After 5 attempts to send a RST to a client without any response, we free
"con"; this is done during a traverse.  This frees the node we are walking
through (the node is made a child of "con" down in rb_tree.c's
trbt_create_node() (Valgrind would catch this, as Martin confirmed).

So, we create a temporary parent and reparent onto that; then we free
that parent after the traverse, thus deleting the unwanted nodes.

CQ:S1019041
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoNew version 1.0.112-28
Rusty Russell [Wed, 21 Jul 2010 07:51:09 +0000 (17:21 +0930)]
New version 1.0.112-28

13 years agoRevert "Make deterministic ips off by the default."
Rusty Russell [Wed, 21 Jul 2010 07:49:34 +0000 (17:19 +0930)]
Revert "Make deterministic ips off by the default."

This reverts commit 09d5dc94930a1349bb74b5557a4e71144ad525a4.

We decided more review is needed, and we should not change this
for 1.0.112.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoNew version 1.0.112-27
Rusty Russell [Wed, 21 Jul 2010 03:03:17 +0000 (12:33 +0930)]
New version 1.0.112-27

13 years agoMake deterministic ips off by the default.
Rusty Russell [Wed, 21 Jul 2010 03:09:04 +0000 (12:39 +0930)]
Make deterministic ips off by the default.

The git log makes it clear that it's mainly useful for debugging; we
should turn it off in production to minimize IP address movement.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agofreeze: abort vacuuming when we're going to freeze.
Rusty Russell [Wed, 21 Jul 2010 02:59:55 +0000 (12:29 +0930)]
freeze: abort vacuuming when we're going to freeze.

There are some reports of freeze timeouts, and it looks like vacuuming might
be the culprit.  So we add code to tell them to abort when a freeze is
going on.

CQ:S1018154 & S1018349
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agovacuum: disabling vacuuming during a freeze
Rusty Russell [Wed, 21 Jul 2010 02:58:04 +0000 (12:28 +0930)]
vacuum: disabling vacuuming during a freeze

We shouldn't even think about vacuuming when we've frozen the database
(which is earlier than when we set CTDB_RECOVERY_ACTIVE)

CQ:S1018154 & S1018349
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agologging: give a unique logging name to each forked child.
Rusty Russell [Mon, 19 Jul 2010 09:59:09 +0000 (19:29 +0930)]
logging: give a unique logging name to each forked child.

This means we can distinguish which child is logging, esp. via syslog where we have no pid.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoNew version 1.0.112-26
Rusty Russell [Thu, 15 Jul 2010 08:13:46 +0000 (17:43 +0930)]
New version 1.0.112-26

13 years agoconfig: wrap iptables in flock to avoid concurrancy.
Rusty Russell [Mon, 12 Jul 2010 05:41:42 +0000 (15:11 +0930)]
config: wrap iptables in flock to avoid concurrancy.

When doing a releaseip event, we do them in parallel for all the separate
IPs.  This creates a problem for iptables, which isn't reentrant, giving
the strange message:
iptables encountered unknown error "18446744073709551615" while initializing table "filter"

The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.

The simple workaround is to flock-wrap iptables.  Better would be to rework
the code so we didn't need to use iptables in these paths.

CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoNew version 1.0.112-25
Rusty Russell [Tue, 6 Jul 2010 08:03:00 +0000 (17:33 +0930)]
New version 1.0.112-25
* Tue Jul 6 2010 : Version 1.0.112-25
 - natgw firewall fix
   BZ62613

13 years agoMove NAT gateway firewall rules to recovered|updatenatgw events.
Martin Schwenke [Tue, 6 Jul 2010 07:54:43 +0000 (17:54 +1000)]
Move NAT gateway firewall rules to recovered|updatenatgw events.

The existing code wasn't working as designed in the start event.  It
should work here.

BZ: 62613
Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoNew version 1.0.112-24
Rusty Russell [Mon, 5 Jul 2010 02:44:06 +0000 (12:14 +0930)]
New version 1.0.112-24

* Mon Jul 5 2010 : Version 1.0.112-24
 - Extra logging on tdb_chainunlock failures.
 - Extra sanity check in ctdb_become_dmaster
   BZ65158
 - More robustness against IDR wrap
   BZ65158
 - Recovery failure under stress fix
   BZ:65158

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.
Rusty Russell [Thu, 1 Jul 2010 11:46:55 +0000 (21:46 +1000)]
ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.

We discovered that recent smbd locks the serverid tdb while
holding a lock on another tdb (locking.tdb):
  7: POSIX  ADVISORY  WRITE smbd-2224318 locking.tdb.0 10600 10600
  22: -> POSIX  ADVISORY  READ  smbd-2224318 serverid.tdb.0 26580 26580

The result is a deadlock against the ctdb_freeze code called for
recovery.  We extend the "notify" workaround to this case, too.

BZ:65158
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoWrap the IDR early, but not too early.
Ronnie Sahlberg [Thu, 10 Jun 2010 04:30:38 +0000 (14:30 +1000)]
Wrap the IDR early, but not too early.

We dont want it to wrap almost immediately so that basically all "ctdb ..."
commands log the "Reqid wrap" warning.

13 years agoDelay reusing ids to make protocol more robust
Rusty Russell [Wed, 9 Jun 2010 23:28:55 +0000 (08:58 +0930)]
Delay reusing ids to make protocol more robust

Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoidtree: fix handling of large ids (eg INT_MAX)
Rusty Russell [Wed, 9 Jun 2010 23:25:56 +0000 (08:55 +0930)]
idtree: fix handling of large ids (eg INT_MAX)

Since idtree assigns sequentially, it rarely reaches high numbers.
But such numbers can be forced with idr_get_new_above(), and that
reveals two bugs:
1) Crash in sub_remove() caused by pa array being too short.
2) Shift by more than 32 in _idr_find(), which is undefined, causing
   the "outside the current tree" optimization to misfire and return NULL.

Signed-off-by: Rusty Russell <rusty@rustorp.com.au>
13 years agofix a debug message
Ronnie Sahlberg [Wed, 9 Jun 2010 06:22:01 +0000 (16:22 +1000)]
fix a debug message

13 years agoidr can timeout and wrap/be reused quite quickly.
Ronnie Sahlberg [Wed, 9 Jun 2010 06:12:36 +0000 (16:12 +1000)]
idr can timeout and wrap/be reused quite quickly.

If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.

If while the request for B is in flight,  the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.

This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key)   but once the migration would complete we would chainunlock   idr->state->call->key

Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.

13 years agoWe can not be holding a chainlock at this stage, so the tdb_chainunlock() call is...
Ronnie Sahlberg [Wed, 9 Jun 2010 05:12:26 +0000 (15:12 +1000)]
We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus

( a child process might be holding the lock, but not the main daemon)

13 years agoadd extra logging for failed ctdb_ltdb_unlock() for a few more places
Ronnie Sahlberg [Wed, 9 Jun 2010 04:31:05 +0000 (14:31 +1000)]
add extra logging for failed ctdb_ltdb_unlock() for a few more places
it is called from

13 years agoadd additional logging when tdb_chainunlock() fails
Ronnie Sahlberg [Wed, 9 Jun 2010 04:17:35 +0000 (14:17 +1000)]
add additional logging when tdb_chainunlock() fails
so we can see where it was called from when it fails

13 years agoprint the db name qwhen a chainunlock fails too
Ronnie Sahlberg [Wed, 9 Jun 2010 03:54:10 +0000 (13:54 +1000)]
print the db name qwhen a chainunlock fails too

13 years agowhen tdb_chainunlock() fails, print the tdb error that occured
Ronnie Sahlberg [Wed, 9 Jun 2010 03:52:22 +0000 (13:52 +1000)]
when tdb_chainunlock() fails, print the tdb error that occured

13 years agoNew version 1.0.112-23
Ronnie Sahlberg [Tue, 8 Jun 2010 02:17:01 +0000 (12:17 +1000)]
New version 1.0.112-23

* Tue Jun 8 2010 : Version 1.0.112-23
 - Fix a SEGV that can be triggered by "ctdb delip"
   BZ 62783
 - Add iptables filters to stop clients from connecting to the NATGW
   address.
   BZ62613
 - Add timestamps to the ctdb statistics output
 - Change "ctdb addip" to block until the address is active
   BZ63191
 - Add additional log messages when tdbs can no longer be locked/chain
   unlocked
   BZ64688

13 years agoAdditional log messages when tdb databases can no longer be chainlocked or chainunlocked
Ronnie Sahlberg [Tue, 8 Jun 2010 02:09:19 +0000 (12:09 +1000)]
Additional log messages when tdb databases can no longer be chainlocked or chainunlocked

BZ64688

13 years agochange the addip command to wait until the ip address is taken by the proper node
Ronnie Sahlberg [Mon, 7 Jun 2010 04:26:08 +0000 (14:26 +1000)]
change the addip command to wait until the ip address is taken by the proper node

BZ63191

13 years agoWhen we say "current time of statistics" in the "ctdb statistics" output,
Ronnie Sahlberg [Wed, 2 Jun 2010 07:06:14 +0000 (17:06 +1000)]
When we say "current time of statistics" in the "ctdb statistics" output,
print the current time and not the start time

13 years agoAdd a variable for start/current time to ctdb statistics
Ronnie Sahlberg [Wed, 2 Jun 2010 03:13:09 +0000 (13:13 +1000)]
Add a variable for start/current time to ctdb statistics
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.

13 years ago Prevent clients from connecting to the natgw address.
Ronnie Sahlberg [Wed, 2 Jun 2010 02:47:01 +0000 (12:47 +1000)]
Prevent clients from connecting to the natgw address.
    This address is dedicated for outgoing connections.

    BZ62613

13 years agoFrom rusty
Ronnie Sahlberg [Wed, 26 May 2010 03:38:12 +0000 (13:38 +1000)]
From rusty
Fix a SEGV that could happend when deleting a public ip.

BZ62783

13 years agonew version 1.0.112-22
Ronnie Sahlberg [Mon, 24 May 2010 05:20:39 +0000 (15:20 +1000)]
new version 1.0.112-22

+* Mon May 24 2010 : Version 1.0.112-22
+ - Fix bug in 62.cnfs to allow exports that are quoted.
+ - Add monitoring og Quorum for the 62.cnfs script
+ - Allow restoredb to restore a backup to to a different database

13 years agoEnhance the "ctdb restoredb" command so you can restore a backup into a different...
Ronnie Sahlberg [Thu, 20 May 2010 01:26:37 +0000 (11:26 +1000)]
Enhance the "ctdb restoredb" command so you can restore a backup into a different database.

13 years agoAdd monitoring of quorum and make the node UNHEALTHY when quarum is lost
Ronnie Sahlberg [Mon, 24 May 2010 02:33:47 +0000 (12:33 +1000)]
Add monitoring of quorum and make the node UNHEALTHY when quarum is lost

13 years agoin 62.cnfs, lines in /etc/exports can have hte exports quoted,
Ronnie Sahlberg [Sun, 23 May 2010 23:51:52 +0000 (09:51 +1000)]
in 62.cnfs, lines in /etc/exports can have hte exports quoted,
so strip off any initial " on the exports line

13 years agoNew version 1.0.112-21
Ronnie Sahlberg [Fri, 21 May 2010 04:32:18 +0000 (14:32 +1000)]
New version 1.0.112-21

* Fri May 21 2010 : Version 1.0.112-21
 - Fix a bug where we would fail to remove the natgw configuration
   and would leave a ip address for natgw still remaining on loopback.
   BZ 58317
 - In the ctdb command line tool, we do not wait long enough for the ipreallocate
   command to finish on the recovery daemon. which sometimes would timeout if the
   recovery daemon was busy, or we had rolling recoveries.
   BZ 61783
 - During failed recoveries we could get in a state where recovery attempts stopped
   while database priority level 2 or 3 remain frozen.
   BZ 63951

13 years agoIt was possible for ->recovery_mode to get out of sync with the new three db prioriti...
Ronnie Sahlberg [Fri, 21 May 2010 04:25:47 +0000 (14:25 +1000)]
It was possible for ->recovery_mode to get out of sync with the new three db priorities in such a way that
->recovery_mode was set to normal   but database priorities leven2 or 3 was still set to frozen.

causing the recovery daemon to fail to detect that a recovery was needed to recover access to the database.

BZ63951

13 years agoIn control_ipreallocate() we wait at most 5 tries before aborting the command
Ronnie Sahlberg [Thu, 20 May 2010 02:35:57 +0000 (12:35 +1000)]
In control_ipreallocate() we wait at most 5 tries before aborting the command
and returning an error.
This might not be sufficient if there are several recoveries in a row.

Instead loop as long as it takes for the recovery master to finish the recoveries and re
spond to the ipreallocate call.

Increase the log level of the error message when the recovery master was busy and could
not perform the ipreallocation promptly

BZ61783

13 years agoFix a bug where we failed to remove the natgw address from loopback properly
Ronnie Sahlberg [Wed, 19 May 2010 03:23:43 +0000 (13:23 +1000)]
Fix a bug where we failed to remove the natgw address from loopback properly
bz58317

13 years agoNew version 1.0.112-20
Ronnie Sahlberg [Mon, 10 May 2010 23:35:03 +0000 (09:35 +1000)]
New version 1.0.112-20

+* Tue May 11 2010 : Version 1.0.112-20
+ - Add number of recoveries done to the ctdb statistics output.
+ - Use the fuill range for IDR values.
+   BZ 60540
+ - When performing a recovery, make sure all nodes agree with recmaster
+   on the reclock file.
+   BZ 62748
+ - Lower the loglevel for some debug messages
+   BZ 63074

13 years agoAdd the number of performed recoveries to the "ctdb statistics" output.
Ronnie Sahlberg [Mon, 10 May 2010 23:28:59 +0000 (09:28 +1000)]
Add the number of performed recoveries to the "ctdb statistics" output.

13 years agoctdb: use full range of IDR
Rusty Russell [Sat, 8 May 2010 12:54:11 +0000 (22:24 +0930)]
ctdb: use full range of IDR

This resolves a problem with huge numbers of requests which could overflow
16 bits.  Fortunately, the IDR should scale reasonably well, so we can simply
hold all the requests.

Although noone checks for failure, I added a constant for that.

BZ: 60540
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoWhen we perform a recovery. Make sure that the recmaster will sync the reclock file...
Ronnie Sahlberg [Wed, 5 May 2010 23:25:53 +0000 (09:25 +1000)]
When we perform a recovery. Make sure that the recmaster will sync the reclock file across the cluster.

This prevents issues such as when the recovery master is configured to NOT use a reclock file but other nodes are configured to use one.

BZ 62748

13 years agoLower the debug level for the messages when a client registers/deregisters a server id
Ronnie Sahlberg [Tue, 4 May 2010 22:47:30 +0000 (08:47 +1000)]
Lower the debug level for the messages when a client registers/deregisters a server id

BZ63074

13 years agoNew version 1.0.112-19
Ronnie Sahlberg [Tue, 4 May 2010 03:50:26 +0000 (13:50 +1000)]
New version 1.0.112-19

This version adds a new eventscript 62.cnfs that
allows better integration with GPFS.

BZ 61913

13 years agoNew version 1.0.112
Ronnie Sahlberg [Mon, 3 May 2010 06:09:34 +0000 (16:09 +1000)]
New version 1.0.112

+* Mon May 3 2010 : Version 1.0.112-18
+ - Make the NATGW eventscript check for when the system is misconfigured
+   to use the same address for both natgw as well as a public address
+   and warn.  BZ60933
+ - A recent change to monitor the ip address assignment broke "ctdb moveip".
+   Fix ctdb moveip so it works again. BZ62782

13 years agoDont check ip assignment across the cluster while ip-verification
Ronnie Sahlberg [Wed, 28 Apr 2010 05:47:19 +0000 (15:47 +1000)]
Dont check ip assignment across the cluster while ip-verification
checks are disabled

13 years agoThe recent change to the recovery daemon to keep track of and
Ronnie Sahlberg [Wed, 28 Apr 2010 05:43:11 +0000 (15:43 +1000)]
The recent change to the recovery daemon to keep track of and
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.

Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.

BZ62782

13 years agoMake create_merged_ip_list() a static function since
Ronnie Sahlberg [Wed, 28 Apr 2010 04:47:37 +0000 (14:47 +1000)]
Make create_merged_ip_list() a static function since
it is not called from outside of ctdb_takeover.c

13 years agoIn the log message when we have found an inconsistent ip address allocation,
Ronnie Sahlberg [Wed, 28 Apr 2010 04:44:53 +0000 (14:44 +1000)]
In the log message when we have found an inconsistent ip address allocation,
add extra log information about what the inconsistency is.

13 years agoIf the admin makes a configuration mistake and configures NATGW to use the
Ronnie Sahlberg [Tue, 27 Apr 2010 22:46:41 +0000 (08:46 +1000)]
If the admin makes a configuration mistake and configures NATGW to use the
same ip address as a normal public-address,
check for this in the natgw script and warn the user.

Also prevent ctdb from starting up since this configuration will not work.

BZ60933

14 years agoNew version 1.0.112-17
Ronnie Sahlberg [Thu, 22 Apr 2010 23:07:03 +0000 (09:07 +1000)]
New version 1.0.112-17

* Fri Apr 23 2010 : Version 1.0.112-17
 - In ctdb-crash-cleanup.sh  also check for and remove the NATGW address
   if set.
 - Add monitoring and warning of low memory condition
   CTDB_MONITOR_FREE_MEMORY_WARN

14 years agoAdd a setting where CTDB will monitor and warn for low memory conditions.
Ronnie Sahlberg [Thu, 22 Apr 2010 22:52:09 +0000 (08:52 +1000)]
Add a setting where CTDB will monitor and warn for low memory conditions.

    CTDB_MONITOR_FREE_MEMORY_WARN

BZ 59747

14 years agoIn the example script to remove all ip addresses after a ctdb crash,
Ronnie Sahlberg [Thu, 22 Apr 2010 22:35:01 +0000 (08:35 +1000)]
In the example script to remove all ip addresses after a ctdb crash,
add the NATGW address as one to be removed in addition to the
public addresses.

14 years agoNew version 1.0.112-16
Ronnie Sahlberg [Thu, 22 Apr 2010 04:16:39 +0000 (14:16 +1000)]
New version 1.0.112-16

14 years agoadd an example script that can be called from crontab to cleanup
Ronnie Sahlberg [Thu, 22 Apr 2010 04:02:11 +0000 (14:02 +1000)]
add an example script that can be called from crontab to cleanup
and release public ip addresses if ctdbd is no longer running

14 years agoadd a missing ||
Ronnie Sahlberg [Thu, 22 Apr 2010 03:36:13 +0000 (13:36 +1000)]
add a missing ||
to make the 10.interface script not fail with a syntax error

14 years agoNew version 1.0.112-15
Ronnie Sahlberg [Wed, 21 Apr 2010 05:42:00 +0000 (15:42 +1000)]
New version 1.0.112-15

* Wed Apr 21 2010 : Version 1.0.112-15
 - Change how we add/remove iptable rules during recovery to make
   rules leaks less likey.
 - change a debug message loglevel from ERR to NOTICE
   BZ62086
 - In the recovery daemon, track pulbic ip assignment
   across the cluster and verify consistency.
 - From Martins: change the 10.interface script to handle
   virtio interfaces correctly for virtual clusters.
 - Make the recovery master verify the reclock setting across the cluster
   and ban nodes with inconsistencies.
   BZ56354

14 years agoLet the recovery master verify the reclock setting on all nodes in the cluster.
Ronnie Sahlberg [Wed, 21 Apr 2010 03:46:54 +0000 (13:46 +1000)]
Let the recovery master verify the reclock setting on all nodes in the cluster.

Any node that is found to use a different filename from the first node
will be banned.

Nodes that have no reclock file at all configured are ignored in this check.

BZ56354

14 years agoBackport of patch to handle virtio interfaces correctly.
Ronnie Sahlberg [Tue, 20 Apr 2010 02:28:16 +0000 (12:28 +1000)]
Backport of patch to handle virtio interfaces correctly.
(ethtool does not work on these)

14 years agoIn the recovery daemon, keep track of which node we have assigned public ip
Ronnie Sahlberg [Thu, 8 Apr 2010 04:07:57 +0000 (14:07 +1000)]
In the recovery daemon, keep track of which node we have assigned public ip
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.

If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.

14 years agoLower the loglevel for "Recovery lock successfully taken"
Ronnie Sahlberg [Wed, 7 Apr 2010 00:42:51 +0000 (10:42 +1000)]
Lower the loglevel for "Recovery lock successfully taken"
from ERR to NOTICE

BZ62086

14 years agoVolker experienced a situation where we leaked iptable rules
Ronnie Sahlberg [Wed, 31 Mar 2010 00:20:25 +0000 (11:20 +1100)]
Volker experienced a situation where we leaked iptable rules
and continued to block an ipaddress after a recovery had completed.

Rework how we handle the iptables blocking and use a new separate
table for all failover related blocks so that we can find these rules and
remove them more easily from outside of the takeip and releaseip events.

14 years agoNew version 1.0.112-14
Ronnie Sahlberg [Mon, 29 Mar 2010 05:58:53 +0000 (16:58 +1100)]
New version 1.0.112-14

-   events:50.samba: wipe the local part of the serverid db before starting winb
    This is necessary for the new serverid approach.

14 years agoevents:50.samba: wipe the local part of the serverid db before starting winbind/smnd...
Michael Adam [Fri, 26 Mar 2010 16:33:51 +0000 (17:33 +0100)]
events:50.samba: wipe the local part of the serverid db before starting winbind/smnd/nmbd

This is necessary for the new serverid approach.

Michael

14 years agoNew Version 1.0.112-13
Ronnie Sahlberg [Wed, 24 Mar 2010 04:54:15 +0000 (15:54 +1100)]
New Version    1.0.112-13

14 years agoTry to restart NFS if "service nfs start" failed.
Ronnie Sahlberg [Wed, 24 Mar 2010 04:51:02 +0000 (15:51 +1100)]
Try to restart NFS if "service nfs start" failed.

BZ61827

14 years ago new version 1.0.112-12
Ronnie Sahlberg [Thu, 11 Mar 2010 07:05:06 +0000 (18:05 +1100)]
 new version 1.0.112-12

* Wed Mar 11 2010 : Version 1.0.112-12
 - From christian ambach : drop the loglevel of a vacuuming message
 - From  Wolfgang Mueller-Friedt : fix bug in ctdb_setstatus
 - Use "service nfs status" instead of rpcinfo when probing nfs
 - make sure to always create the tickles directory

14 years agoadjust a vacuum log level
Christian Ambach [Wed, 10 Mar 2010 17:46:15 +0000 (18:46 +0100)]
adjust a vacuum log level

made the severity of the decreasing interval log level the same as for the increasing,
they are both just info logs because they don't report errors

14 years agoctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called...
Wolfgang Mueller-Friedt [Wed, 10 Mar 2010 09:39:31 +0000 (10:39 +0100)]
ctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called with a wrong parameter list

14 years agoMake sure to always create the tickle directories for nfs
Ronnie Sahlberg [Mon, 8 Mar 2010 18:27:41 +0000 (05:27 +1100)]
Make sure to always create the tickle directories for nfs
on the monitor event

BZ56707

14 years agoChange the test if NFS is working to use "service nfs status"
Ronnie Sahlberg [Mon, 8 Mar 2010 18:15:33 +0000 (05:15 +1100)]
Change the test if NFS is working to use "service nfs status"
instead of rpcinfo to the nfs program.

14 years agoNew version 1.0.112-11
Ronnie Sahlberg [Thu, 25 Feb 2010 01:34:27 +0000 (12:34 +1100)]
New version 1.0.112-11

* Thu Feb 25 2010 : Version 1.0.112-11
 - Increase default script timeout to 90 seconds.
   BZ 61113
 - Add a new tunable EventScriptLogTimeout which will log a message everytime
   a monitor script takes longer than this to finish.
   BZ 61118

14 years agoAdd a tunable EventScriptLogTimeout that will log an error everytime a
Ronnie Sahlberg [Thu, 25 Feb 2010 01:22:32 +0000 (12:22 +1100)]
Add a tunable EventScriptLogTimeout that will log an error everytime a
monitoring script took longer than this.

BZ 61118

14 years agoIncrease the default script timeout to 90 seconds.
Ronnie Sahlberg [Thu, 25 Feb 2010 00:58:07 +0000 (11:58 +1100)]
Increase the default script timeout to 90 seconds.

BZ 61113

14 years agonew version 1.0.112-10
Ronnie Sahlberg [Tue, 23 Feb 2010 05:19:51 +0000 (16:19 +1100)]
new version 1.0.112-10

* Tue Feb 23 2010 : Version 1.0.112-10
 - revert the change in 10.0.0.112-9 and make a new attempt to make the scripts\
 behave.
 - make writing the ticklelist in 61.nfstickle a background task to avoid
   having a long cluster fs pause cause a node to become unhealthy
 - critical bugfix. during an error path in the "end recovery" code
   we could release a memory block before we had finished referencing it
   which could lead to a segv.   bz 61068
 - make sure we tear down the natgw configuration when a node become stopped
   or else we might end up with a duplicate ip address when a different node
   takes over the natgw role.   bz 61036

14 years agostore the nfs tickles for 61.nfstickle in a background shell
Ronnie Sahlberg [Tue, 23 Feb 2010 05:09:09 +0000 (16:09 +1100)]
store the nfs tickles for 61.nfstickle in a background shell
instead of blocking while it finishes.

this avoids having the eventscript hang/timeout if the underlying cluster filesystem hangs and blocks for 30+ seconds.

14 years agoRevert "Ignore any scripts that timesout for most events, except startup."
Ronnie Sahlberg [Tue, 23 Feb 2010 05:07:17 +0000 (16:07 +1100)]
Revert "Ignore any scripts that timesout for most events, except startup."

This reverts commit 527597ed6d9142c0b47a9c419c828793826ac95e.

14 years agoIn ctdb_control_end_recovery,
Ronnie Sahlberg [Tue, 23 Feb 2010 01:43:49 +0000 (12:43 +1100)]
In ctdb_control_end_recovery,

We used to talloc_steal c (the command packet) and make it a child of the
"event script state context".
If we failed to create a eventscript child context for some reason,
this would have talloc freed state, but at the same time it would also
implicitely have freed c.
Once ctdb_control_end_recovery() returns the error back to the caller,
the caller would dereference both c, and also outdata which is a child of c
and we would either read garbage data or segv.

Change the ordering so we only talloc_steal c as a child of state IFF
we have successfully created a child context for the script.

BZ61068

14 years ago Make sure that the natgw eventscript also triggers on the "stopped" event
Ronnie Sahlberg [Mon, 22 Feb 2010 23:14:51 +0000 (10:14 +1100)]
Make sure that the natgw eventscript also triggers on the "stopped" event
    to remove the natgw configuration and ip assignments used.

BZ61036

14 years agonew version 1.0.112-9
Ronnie Sahlberg [Tue, 16 Feb 2010 00:20:19 +0000 (11:20 +1100)]
new version 1.0.112-9

14 years agoIgnore any scripts that timesout for most events, except startup.
Ronnie Sahlberg [Tue, 16 Feb 2010 00:18:43 +0000 (11:18 +1100)]
Ignore any scripts that timesout for most events, except startup.

Threat hung scripts always (except startup) as success.

14 years agonew version 1.0.112-8
Ronnie Sahlberg [Sun, 14 Feb 2010 23:49:33 +0000 (10:49 +1100)]
new version 1.0.112-8

14 years agotry to restart rpc-rquotad if it is not running
Ronnie Sahlberg [Fri, 12 Feb 2010 02:19:57 +0000 (13:19 +1100)]
try to restart rpc-rquotad if it is not running

bz60317

14 years agoLeave sequence number alone when merely migrating records.
Rusty Russell [Fri, 12 Feb 2010 06:32:56 +0000 (17:02 +1030)]
Leave sequence number alone when merely migrating records.

(Based on earlier version from Ronnie which modified tdb; this one
is standalone).

When storing records in a tdb that has "automatic seqnum updates"
also check if the actual data for the record has changed or not.

If it has not changed at all, except for possibly the header,
this is likely just a dmaster migration operation in which case
we want to write the record to the tdb but we do not want the tdb
sequence number to be increased.

This resolves the problem of notify.tdb being thrashed under load:
the heuristic in smbd to only reread this when the sequence number
increases (rarely) breaks down.

Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like:
 512      1496  118.33 MB/sec  execute 60 sec  latency 0.00 msec
And turning on latency tracking, this was typical in the logs:
 ctdbd: High latency 9380914.000000s for operation lockwait on database notify.tdb

After this commit:
  512      2451  143.85 MB/sec  execute 60 sec  latency 0.00 msec
And no more latency messages...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agonew version 1.0.112-7
Ronnie Sahlberg [Thu, 11 Feb 2010 05:49:48 +0000 (16:49 +1100)]
new version 1.0.112-7

14 years agoReduce loglevel for two eventscript related debug messages
Ronnie Sahlberg [Thu, 11 Feb 2010 01:00:43 +0000 (12:00 +1100)]
Reduce loglevel for two eventscript related debug messages

14 years agoReducing the log level for a debug message
Ronnie Sahlberg [Thu, 11 Feb 2010 00:54:46 +0000 (11:54 +1100)]
Reducing the log level for a debug message

              DEBUG(DEBUG_DEBUG,("pnn %u starting migration of %08x t\

14 years agoReduce the log level for two debug messages
Ronnie Sahlberg [Thu, 11 Feb 2010 00:49:48 +0000 (11:49 +1100)]
Reduce the log level for two debug messages

       DEBUG(DEBUG_DEBUG,("pnn %u dmaster response %08x\n", ctdb->pnn, ctdb_has
       DEBUG(DEBUG_DEBUG,("pnn %u dmaster request on %08x for %u from %u\n",

14 years agoAdd a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
Ronnie Sahlberg [Thu, 11 Feb 2010 00:32:22 +0000 (11:32 +1100)]
Add a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
to control whether or not to check if we are swapping, and produce
useful output into the logfile if we are.

For production systems with dedicated nas-heads we should never swap.
But for developer/test systems we often use smaller nondedicated systems where
we can no longer guarantee that we will not be using swap.

14 years agolower the loglevel for a debug message for redundant releases of public ips
Ronnie Sahlberg [Thu, 11 Feb 2010 00:19:08 +0000 (11:19 +1100)]
lower the loglevel for a debug message for redundant releases of public ips

14 years agoAdd a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
Ronnie Sahlberg [Thu, 11 Feb 2010 00:09:39 +0000 (11:09 +1100)]
Add a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
when set to "yes" this will skip checking if knfsd has hung or not.

bz59626

14 years agoevent scripts: add logging for low memory conditions
Rusty Russell [Tue, 9 Feb 2010 02:16:35 +0000 (12:46 +1030)]
event scripts: add logging for low memory conditions

We should never enter swap; if we do, show the memory state of the machine and the process list.  This will help us diagnose what caused the condition before it's too late and the box starts OOM-killing processes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agonew version 1.0.112-6
Ronnie Sahlberg [Mon, 8 Feb 2010 21:33:24 +0000 (08:33 +1100)]
new version 1.0.112-6

14 years agoctdb: migrate to new dlinklist.h from Samba
Andrew Tridgell [Sun, 7 Feb 2010 08:02:06 +0000 (19:02 +1100)]
ctdb: migrate to new dlinklist.h from Samba