metze/ctdb/wip.git
15 years ago3 new tests. 24_ctdb_getdbmap.sh is only 1/2 implemented but does
Martin Schwenke [Mon, 15 Dec 2008 06:52:12 +0000 (17:52 +1100)]
3 new tests.  24_ctdb_getdbmap.sh is only 1/2 implemented but does
something vaguely useful.  ctdb_test_exit unsets $ctdb_test_exit_hook.
Fix bug in 17_ctdb_config_delete_ip.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoAdd a recovery to ctdb_test_exit to improve test stability.
Martin Schwenke [Fri, 12 Dec 2008 07:44:21 +0000 (18:44 +1100)]
Add a recovery to ctdb_test_exit to improve test stability.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoRename $CTDB_NUM_NODES to $CTDB_TEST_NUM_DAEMONS and only set it if
Martin Schwenke [Fri, 12 Dec 2008 06:25:38 +0000 (17:25 +1100)]
Rename $CTDB_NUM_NODES to $CTDB_TEST_NUM_DAEMONS and only set it if
$CTDB_TEST_REAL_CLUSTER is not set.  After a ctdb restart, force a
recovery to attempt to help tests that follows.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Fri, 12 Dec 2008 04:39:53 +0000 (15:39 +1100)]
Merge commit 'origin/master' into martins

15 years agoNew version 1.0.68
Ronnie Sahlberg [Thu, 11 Dec 2008 22:39:55 +0000 (09:39 +1100)]
New version 1.0.68

15 years agoImprove the monitor event test for ethernet interfaces (link detection).
Michael Adam [Wed, 10 Dec 2008 21:27:36 +0000 (22:27 +0100)]
Improve the monitor event test for ethernet interfaces (link detection).

On some systems, the ethtool link detection is not successful when a
cable is plugged but the interface has not been brought up previously.
This improves the test by bringing the interface up (without checking
for success here) and trying the ethtool test again afterwards.

Michael

15 years agoUse "grep -q" instead of "grep ... > /dev/null" in events.d/10.interfaces
Michael Adam [Wed, 10 Dec 2008 21:19:31 +0000 (22:19 +0100)]
Use "grep -q" instead of "grep ... > /dev/null" in events.d/10.interfaces
This enhances readability.

Michael

15 years agoAdd message about restart to 18_ctdb_freeze.sh.
Martin Schwenke [Thu, 11 Dec 2008 07:14:17 +0000 (18:14 +1100)]
Add message about restart to 18_ctdb_freeze.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoWith local daemons the sockets are now numbered starting from 0. Fix
Martin Schwenke [Wed, 10 Dec 2008 05:13:42 +0000 (16:13 +1100)]
With local daemons the sockets are now numbered starting from 0.  Fix
setup of local daemons so that it correctly assigns no public IPs to a
single node each time.  Separate out daemon_setup so that the
selection of the node with no public IPs is only done once at the
beginning of testing.  Clean up all current tests, mostly with a view
to ensuring that a node selected for testing some kind of failover
actually has public addresses assigned.  Reenabled 01_ctdb_version.sh
- it now passes if rpm doesn't do anything useful on the node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoupdate the "ctdb recover" command.
root [Wed, 10 Dec 2008 01:06:51 +0000 (12:06 +1100)]
update the "ctdb recover" command.

block and wait until the clustered has completed the recovery before returning.
this  makes it easier to script since it avoids the common need for
   ctdb recover
   ... complex loop to wait for recovery to complete ...
   script continues

15 years agoadd a CTDB_TIMEOUT variable for the ctdb tool.
root [Wed, 10 Dec 2008 01:01:19 +0000 (12:01 +1100)]
add a CTDB_TIMEOUT variable for the ctdb tool.
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ...  option would.

15 years agomake sure we return an errorcode when the ctdb command has hung and is timeodout...
root [Wed, 10 Dec 2008 00:49:51 +0000 (11:49 +1100)]
make sure we return an errorcode when the ctdb command has hung  and is timeodout by the -T <timeout> setting

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Wed, 10 Dec 2008 00:42:02 +0000 (11:42 +1100)]
Merge commit 'origin/master' into martins

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Wed, 10 Dec 2008 00:32:24 +0000 (11:32 +1100)]
Merge commit 'origin/master' into martins

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Wed, 10 Dec 2008 00:22:59 +0000 (11:22 +1100)]
Merge commit 'origin/master' into martins

15 years agoAdded use of $ctdb_test_exit_hook to function ctdb_test_exit. Removed
Martin Schwenke [Tue, 9 Dec 2008 07:20:11 +0000 (18:20 +1100)]
Added use of $ctdb_test_exit_hook to function ctdb_test_exit.  Removed
sleeps from ban/unban tests.  Now expect "ctdb ping" to return false
if it fails, so made relevant change to 09_ctdb_ping.sh.  New
functions install_eventscript and uninstall_eventscript.  New
setup/cleanup tests 00_ctdb_install_eventscript.sh and
99_ctdb_uninstall_eventscript.sh.  New test 21_ctdb_disablemonitor.sh,
which is incredibly complex.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoadd a helper that waits until the clueter is no longe rin recovery mode and return...
root [Tue, 9 Dec 2008 01:03:42 +0000 (12:03 +1100)]
add a helper that waits until the clueter is no longe rin recovery mode and return the generation number.

change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.

also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns.  this makes it much easier to script things ...

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Tue, 9 Dec 2008 00:46:34 +0000 (11:46 +1100)]
Merge commit 'origin/master' into martins

15 years agoupdate to the flags handling
root [Mon, 8 Dec 2008 23:45:14 +0000 (10:45 +1100)]
update to the flags handling
make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node

15 years agoIf ctdbd was started with the --socket option then we also set the CTDB_SOCKET variab...
root [Mon, 8 Dec 2008 06:29:17 +0000 (17:29 +1100)]
If ctdbd was started with the --socket option then we also set the CTDB_SOCKET variable so that the eventscripts can pick up the name proper

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Mon, 8 Dec 2008 06:03:50 +0000 (17:03 +1100)]
Merge commit 'origin/master' into martins

15 years agoreturn -1 if ctdb ping failed
root [Mon, 8 Dec 2008 01:57:40 +0000 (12:57 +1100)]
return -1 if ctdb ping failed

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Sun, 7 Dec 2008 21:57:46 +0000 (08:57 +1100)]
Merge commit 'origin/master' into martins

15 years agoWhen running with local daemons, provided there is more than 2 of
Martin Schwenke [Sun, 7 Dec 2008 21:15:18 +0000 (08:15 +1100)]
When running with local daemons, provided there is more than 2 of
them, randomly pick a single node that will not have any public IPs
assigned.  This will make life a bit more interesting and will
simulate what happens on real clusters with a management node.  Some
tests were disabling a node to implicitly trigger a ctdb restart - now
use an explicit restart of ctdb when it is required.
17_ctdb_config_delete_ip.sh now randomly chooses a public IP on any
node to disable - this works around a problem where the hardcoded node
might not have any public addresses.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoredo and update how we synchronize flags across the cluster.
root [Fri, 5 Dec 2008 05:32:30 +0000 (16:32 +1100)]
redo and update how we synchronize flags across the cluster.
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.

15 years agosome platforms are very picky about the third argument passed to bind().
root [Thu, 4 Dec 2008 23:33:38 +0000 (10:33 +1100)]
some platforms are very picky about the third argument passed to bind().
and would complain if sa.family is AF_INET and the third argument is not exactly the size of a sockaddr_in.

We used to pass a union containing both a sockaddr_in and a sockaddr_in6  which would mean that on those platforms bind() would fail since the passed structure for AF_INET would be too big.

Thus we need to set and pass the appropriate size to bind. At the same time for thos eplatforms we can also set sin[6]_size to the expected size.
(bind() on those platforms were isurprisingly perfectly ok with sin_len was "too big")

15 years agoNew test for getmonmode. Overload node_has_status some more to
Martin Schwenke [Thu, 4 Dec 2008 06:19:51 +0000 (17:19 +1100)]
New test for getmonmode.  Overload node_has_status some more to
support checking the monitoring mode.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agonew version 1.0.67
Ronnie Sahlberg [Thu, 4 Dec 2008 04:25:03 +0000 (15:25 +1100)]
new version 1.0.67

15 years agofix an incorrect path
root [Thu, 4 Dec 2008 04:03:40 +0000 (15:03 +1100)]
fix an incorrect path

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Thu, 4 Dec 2008 03:42:04 +0000 (14:42 +1100)]
Merge commit 'origin/master' into martins

15 years agoadd a description of the recovery-process
Ronnie Sahlberg [Thu, 4 Dec 2008 03:35:00 +0000 (14:35 +1100)]
add a description of the recovery-process

15 years agoctdb_test_init now contains a trap to force ctdb_test_exit to be run
Martin Schwenke [Wed, 3 Dec 2008 07:08:21 +0000 (18:08 +1100)]
ctdb_test_init now contains a trap to force ctdb_test_exit to be run
if the shell exits and ctdb_test_exit cancels this trap.  This means
that a testcase executing under set -e will call ctdb_test_exit on
failure, allowing the cluster to be restarted if necessary so that
following tests can complete successfully.  ctdb_test_exit now
respects $?, so a test will fail if the last thing executed before
ctdb_test_exit failed - this probably means the above trap was
triggered.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years ago$PATH only inludes $CTDB_DIR/bin if we're using local sockets. Rename
Martin Schwenke [Wed, 3 Dec 2008 04:48:24 +0000 (15:48 +1100)]
$PATH only inludes $CTDB_DIR/bin if we're using local sockets.  Rename
$TEST_WRAP to $CTDB_TEST_WRAPPER - value now set using
$CTDB_TEST_REMOTE_SCRIPTS_DIR if that is set.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoprint the list of valid debug level literals when an invalid debug level
Ronnie Sahlberg [Tue, 2 Dec 2008 03:08:10 +0000 (14:08 +1100)]
print the list of valid debug level literals when an invalid debug level
is specified in 'ctdb setdebug'

15 years agoredesign how reloadnodes is implemented.
Ronnie Sahlberg [Tue, 2 Dec 2008 02:26:30 +0000 (13:26 +1100)]
redesign how reloadnodes is implemented.

modify the transport methods to allow to restart individual connections
and set up destructors properly.

only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.

make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded

15 years agodebuglevel is a signed int, not usnigned.
root [Fri, 28 Nov 2008 00:29:43 +0000 (11:29 +1100)]
debuglevel is a signed int, not usnigned.

15 years agomake it possible to delete an ip from all nodes at once using
Ronnie Sahlberg [Thu, 27 Nov 2008 22:52:26 +0000 (09:52 +1100)]
make it possible to delete an ip from all nodes at once using
"ctdb delip x.x.x.x -n all"

This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.

Thus we first delete the ip from all connected nodes which are not
currently hosting it.

After this we delete the ip from the node which is hosting it.

15 years ago4 new tests. Hacked function node_has_status to support
Martin Schwenke [Thu, 27 Nov 2008 07:11:22 +0000 (18:11 +1100)]
4 new tests.  Hacked function node_has_status to support
frozen/unfrozen via ctdb statistics command.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years ago4 new tests. Marked more ctdbd.sh tests as done - will remove this
Martin Schwenke [Tue, 25 Nov 2008 06:53:28 +0000 (17:53 +1100)]
4 new tests.  Marked more ctdbd.sh tests as done - will remove this
file soon.  Simplify 06_ctdb_getpid.sh by using -v option to
try_command_on_node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoinew version 1.0.66
Ronnie Sahlberg [Mon, 24 Nov 2008 08:06:02 +0000 (19:06 +1100)]
inew version 1.0.66
ddwq

15 years agoNew test 09_ctdb_ping.sh. Add documentation and command-line
Martin Schwenke [Mon, 24 Nov 2008 06:47:09 +0000 (17:47 +1100)]
New test 09_ctdb_ping.sh.  Add documentation and command-line
processing to all tests.  New script ctdb_test_env sets up environment
for tests, is now sourced by run_tests, and can also take a test on
the command-line, complete with options.  Various cleanups and
improvements.  Document tests that have been properly implemented in
ctdbd.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoIncorporate temporary patch from Ronnie that adds --nopublicipcheck
Martin Schwenke [Fri, 21 Nov 2008 08:12:22 +0000 (19:12 +1100)]
Incorporate temporary patch from Ronnie that adds --nopublicipcheck
option to ctdbd.  Commit here because it seems to work.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoMove tests/*.c to tests/src/*.c and adjust Makefile.in accordingly.
Martin Schwenke [Fri, 21 Nov 2008 08:01:48 +0000 (19:01 +1100)]
Move tests/*.c to tests/src/*.c and adjust Makefile.in accordingly.
Move setting of $CTDB_NODES_SOCKETS to tests/scripts/run_tests and
make it only happen if $CTDB_TEST_REAL_CLUSTER is not set.  Bugfix in
function ips_are_on_nodeglob.  New/proper implementations of functions
stop_daemons and start_daemons, now called by function restart_ctdb.
In start_daemons.sh, add public addresses file generation/usage, use
new option --nopublicipcheck to ctdbd to avoid crazy behaviour and
kill ctdbd more carefully to avoid killing real daemons on a real
cluster - this should be able to coexist on a node of a real cluster.
start_daemons.sh is temporarily incompatible with start_daemons
function, but expecting to replace that script with function calls
very soon anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agoallow to change the recmaster even the database is not frozen
Ronnie Sahlberg [Fri, 21 Nov 2008 05:24:12 +0000 (16:24 +1100)]
allow to change the recmaster even the database is not frozen

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Fri, 21 Nov 2008 02:00:37 +0000 (13:00 +1100)]
Merge commit 'origin/master' into martins

15 years agoremove two variables no longer used from the example sysconfig file
Ronnie Sahlberg [Fri, 21 Nov 2008 00:30:32 +0000 (11:30 +1100)]
remove two variables no longer used from the example sysconfig file

15 years agofixed problem with looping ctdb recoveries
Andrew Tridgell [Thu, 20 Nov 2008 21:05:59 +0000 (08:05 +1100)]
fixed problem with looping ctdb recoveries

After a node failure, GPFS can get into a state where non-blocking
fcntl() locks can take a long time. This means to the ctdb set_recmode
test timing out, which leads to a recovery failure, and a new
recovery. The recovery loop can last a long time.

The fix is to consider a fcntl timeout as a success of this test. The
test is to see that we can't lock the shared reclock file, so a
timeout is fine for a success.

15 years agoMerge commit 'ronnie/master'
Andrew Tridgell [Thu, 20 Nov 2008 10:23:26 +0000 (21:23 +1100)]
Merge commit 'ronnie/master'

15 years agoAdd some simple tests that can be run from within the tree.
Martin Schwenke [Thu, 20 Nov 2008 09:40:01 +0000 (20:40 +1100)]
Add some simple tests that can be run from within the tree.

Signed-off-by: Martin Schwenke <martin@meltin.net>
15 years agodont override/change CTDB_BASE if it is already set by the shell
Ronnie Sahlberg [Thu, 20 Nov 2008 05:39:56 +0000 (16:39 +1100)]
dont override/change CTDB_BASE if it is already set by the shell

15 years agoKeepalive packets were only sent every KeepaliveInterval if the socket
Ronnie Sahlberg [Thu, 20 Nov 2008 02:35:08 +0000 (13:35 +1100)]
Keepalive packets were only sent every KeepaliveInterval if the socket
had been completely idle during that interval.
If we had been sending other packets such as Messages, Calls or Controls
there wouldnt be any need for an explicit keepalive and thus we didnt
send one.

This does make it somewhat awkward when analyzing traces since it is
non-intuitive when keepalives are sent and when they are not sent.

Change the keepalive logic to always send a keepalive regardless of
whether the link is idle or not.

15 years agoreqrite the handling of flag updates across the cluster to eliminate a
Ronnie Sahlberg [Wed, 19 Nov 2008 03:43:46 +0000 (14:43 +1100)]
reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.

15 years agoMerge branch 'master' into martins
Martin Schwenke [Wed, 19 Nov 2008 02:21:07 +0000 (13:21 +1100)]
Merge branch 'master' into martins

15 years agonew version 1.0.65
Ronnie Sahlberg [Wed, 12 Nov 2008 23:55:20 +0000 (10:55 +1100)]
new version 1.0.65

update the example sysconfig file. the default log level is 2, not 0

15 years agoadd a CTDB_SOCKET variable that can be used to override the default
Ronnie Sahlberg [Tue, 11 Nov 2008 03:49:30 +0000 (14:49 +1100)]
add a CTDB_SOCKET variable that can be used to override the default
/tmp/ctdb.socket

15 years agowe actually need a ctdb_db variable
Ronnie Sahlberg [Mon, 3 Nov 2008 10:54:52 +0000 (21:54 +1100)]
we actually need a ctdb_db variable

15 years agolatency is measured in us, not ms
Ronnie Sahlberg [Thu, 30 Oct 2008 02:34:10 +0000 (13:34 +1100)]
latency is measured in us, not ms

use an explicit ctdb_db variable instead of dereferencing state

15 years agoadd control and logging of very high latencies.
Ronnie Sahlberg [Thu, 30 Oct 2008 01:49:53 +0000 (12:49 +1100)]
add control and logging of very high latencies.

log the type of operation and the database name for all latencies higher
than a treshold

15 years agonew version 1.0.64
Ronnie Sahlberg [Wed, 22 Oct 2008 00:06:18 +0000 (11:06 +1100)]
new version 1.0.64

15 years agoadd a context and a timed event so that once we have been in recovery
Ronnie Sahlberg [Wed, 22 Oct 2008 00:04:41 +0000 (11:04 +1100)]
add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses

15 years agonew version 1.0.63
Ronnie Sahlberg [Sun, 19 Oct 2008 22:47:54 +0000 (09:47 +1100)]
new version 1.0.63

15 years agodont log "running periodic cleanup" ...
Ronnie Sahlberg [Sun, 19 Oct 2008 22:45:15 +0000 (09:45 +1100)]
dont log "running periodic cleanup" ...

15 years agonull out the pointer before we reload the nodes file
Ronnie Sahlberg [Fri, 17 Oct 2008 10:38:42 +0000 (21:38 +1100)]
null out the pointer before we reload the nodes file

15 years agowhen we reload the nodes file, we may need to reload the nodes file
Ronnie Sahlberg [Fri, 17 Oct 2008 10:18:06 +0000 (21:18 +1100)]
when we reload the nodes file,   we may need to reload the nodes file
inside the recovery daemon as well.

15 years agomake it possible to set the script log level in CTDB sysconfig
Ronnie Sahlberg [Thu, 16 Oct 2008 22:02:03 +0000 (09:02 +1100)]
make it possible to set the script log level in CTDB sysconfig

15 years agospecify a "script log level" on the commandline to set under which log
Ronnie Sahlberg [Thu, 16 Oct 2008 20:56:12 +0000 (07:56 +1100)]
specify a "script log level" on the commandline to set under which log
level any/all output from eventscripts will be logged as

15 years agonew version 1.0.62
Ronnie Sahlberg [Thu, 16 Oct 2008 06:59:55 +0000 (17:59 +1100)]
new version 1.0.62

15 years agoallow multiple eventscripts using the same prefix.
Ronnie Sahlberg [Thu, 16 Oct 2008 06:57:50 +0000 (17:57 +1100)]
allow multiple eventscripts using the same prefix.
this eases the pain for users that use out of tree eventscripts

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Thu, 16 Oct 2008 03:15:15 +0000 (14:15 +1100)]
Merge commit 'origin/master' into martins

15 years agoMerge commit 'ronnie/master'
Andrew Tridgell [Thu, 16 Oct 2008 01:58:25 +0000 (12:58 +1100)]
Merge commit 'ronnie/master'

15 years agonew version 1.0.61
Ronnie Sahlberg [Wed, 15 Oct 2008 05:40:44 +0000 (16:40 +1100)]
new version 1.0.61

15 years agoinstall the new multipath monitoring event script
Ronnie Sahlberg [Wed, 15 Oct 2008 05:29:09 +0000 (16:29 +1100)]
install the new multipath monitoring event script

15 years agoadd an eventscript to monitor that the multipath devices are healthy
Ronnie Sahlberg [Wed, 15 Oct 2008 05:27:33 +0000 (16:27 +1100)]
add an eventscript to monitor that the multipath devices are healthy

15 years agowe must also check the status returned from the get tickles control to
Ronnie Sahlberg [Tue, 14 Oct 2008 21:33:37 +0000 (08:33 +1100)]
we must also check the status returned from the get tickles control to
determine whether it was successful or not

15 years agolower the loglevel for the informational message that a TCP_ADD opeation
Ronnie Sahlberg [Tue, 14 Oct 2008 16:02:09 +0000 (03:02 +1100)]
lower the loglevel for the informational message that a TCP_ADD opeation
described an ip address not known to be a public address.

This could happen if someone for genuine reasons accesses a share
through a static ip address.
It can also happen if non homogenous public address configurations are
used and when a tcp description is pushed out to a different node that
does not server/know the specific ip address.

15 years agochange ip route add to route add -net since this works more reliably
Ronnie Sahlberg [Tue, 14 Oct 2008 14:49:19 +0000 (01:49 +1100)]
change ip route add to route add -net  since this works more reliably

update the makefile and rpm to install 99.routing

15 years agonew version 1.0.60
Ronnie Sahlberg [Tue, 14 Oct 2008 14:32:46 +0000 (01:32 +1100)]
new version 1.0.60

15 years agoverify that the nodes we try to ban/unban are operational and print an
Ronnie Sahlberg [Tue, 14 Oct 2008 14:23:57 +0000 (01:23 +1100)]
verify that the nodes we try to ban/unban are operational and print an
error to the user othervise.

15 years agoRevert "from Mathieu Parent <math.parent@gmail.com>"
Ronnie Sahlberg [Tue, 14 Oct 2008 14:08:29 +0000 (01:08 +1100)]
Revert "from Mathieu Parent <math.parent@gmail.com>"

This reverts commit dc9cd4779db4a89697731e4cf415be51067a07c1.

Conflicts:

15 years agoupdate the client side of getnodemap and getpublicips controls to
Ronnie Sahlberg [Tue, 14 Oct 2008 13:24:44 +0000 (00:24 +1100)]
update the client side of getnodemap and getpublicips controls to
fallback to the old-style ipv4-only controls if the new-style ipv4/ipv6
control fails.

this allows a 1.0.59+ (ipv4/ipv6) ctdb daemon being recmaster  to be
compatible with
pre-1.0.59  versions of ctdb that are ipv4 only.

15 years agoupdate TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
Ronnie Sahlberg [Mon, 13 Oct 2008 23:40:29 +0000 (10:40 +1100)]
update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
older ipv4-only version of these controls.

We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.

15 years agofrom Mathieu Parent <math.parent@gmail.com>
Ronnie Sahlberg [Sun, 12 Oct 2008 21:27:33 +0000 (08:27 +1100)]
from Mathieu Parent <math.parent@gmail.com>
Hi,

I have attached a patch necessary as debian log dir (/var/log) is not
a subdir of VARDIR (/var/lib on rpm systems, /var/lib/ctdb on debian).
As I don't know much about autotools and friends, this patch may be
hacky.

This is part of the process to minimize diff between distributions.

15 years agoFrom Mathieu Parent
Ronnie Sahlberg [Sun, 12 Oct 2008 21:21:20 +0000 (08:21 +1100)]
From Mathieu Parent
patch to make debian systems log the package versions in
ctdb_diagnostics

15 years agoadded some more gpfs commands per-filesystem
Andrew Tridgell [Thu, 9 Oct 2008 07:45:12 +0000 (18:45 +1100)]
added some more gpfs commands per-filesystem

15 years agoskip empty lines in the public addresses file, not skip all non-empty
Ronnie Sahlberg [Tue, 7 Oct 2008 08:34:34 +0000 (19:34 +1100)]
skip empty lines in the public addresses file,   not skip all non-empty
lines

15 years agofrom Michael Adams : allow #-style comments in the nodes and public
Ronnie Sahlberg [Tue, 7 Oct 2008 08:25:10 +0000 (19:25 +1100)]
from Michael Adams : allow #-style comments in the nodes and public
addresses file

15 years agonew version 1.0.59
Ronnie Sahlberg [Tue, 7 Oct 2008 07:23:12 +0000 (18:23 +1100)]
new version   1.0.59

15 years agoremove an unused variable
Ronnie Sahlberg [Tue, 7 Oct 2008 07:14:44 +0000 (18:14 +1100)]
remove an unused variable

15 years agoWhen we reload the nodes file
Ronnie Sahlberg [Tue, 7 Oct 2008 07:12:54 +0000 (18:12 +1100)]
When we reload the nodes file
instead of shutting down/restarting the entire tcp layer
just bounce all outgoing connections and reconnect

15 years agoadd a new eventscript : 99.routing that is used to add static routes to
Ronnie Sahlberg [Tue, 7 Oct 2008 00:03:30 +0000 (11:03 +1100)]
add a new eventscript : 99.routing that is used to add static routes to
interfaces when they are activated (an ip address is added during
takeip)

15 years agoThe author of the upstream code asked for this code to be GPLv2+ not GPLv3
Andrew Tridgell [Tue, 30 Sep 2008 14:16:17 +0000 (07:16 -0700)]
The author of the upstream code asked for this code to be GPLv2+ not GPLv3

15 years agomerged a bugfix for the idtree code from the Linux kernel. This
Andrew Tridgell [Tue, 30 Sep 2008 14:09:06 +0000 (07:09 -0700)]
merged a bugfix for the idtree code from the Linux kernel. This
matches commit 7aae6dd80e265aa9402ed507caaff4a5dba55069 in the kernel.

Many thanks to Jim Houston for pointing out this fix to us

15 years agoCheck that a database exists first before we dump its content (and
Ronnie Sahlberg [Mon, 22 Sep 2008 15:38:28 +0000 (01:38 +1000)]
Check that a database exists first before we dump its content (and
implicitely also create it) using 'ctdb catdb'

15 years agoMerge commit 'origin/master' into martins
Martin Schwenke [Wed, 17 Sep 2008 20:33:48 +0000 (06:33 +1000)]
Merge commit 'origin/master' into martins

15 years agoexpanded ctdb_diagnostics based on recent experience
Andrew Tridgell [Wed, 17 Sep 2008 11:00:04 +0000 (21:00 +1000)]
expanded ctdb_diagnostics based on recent experience

15 years agouse the correct tunable failcount not timeout
Ronnie Sahlberg [Wed, 17 Sep 2008 04:24:12 +0000 (14:24 +1000)]
use the correct tunable   failcount not timeout

15 years agoThe ctdb daemon keeps track of whether the recovery process is running
Ronnie Sahlberg [Wed, 17 Sep 2008 04:17:41 +0000 (14:17 +1000)]
The ctdb daemon keeps track of whether the recovery process is running
correctly by measuring how long it was since the last successful
communication with the recovery daemon was recorded.

After a certain timeout the ctdb daemon would deem the recovery daemon
as inoperable and shut down.

If the system clock is suddenly changed forward by many (60 or more)
seconds this could cause the timeout to trigger prematurely/immediately
where ctdb would incorrectly think that more than 60 seconds had passed
since last successful communications and thus abort.

Instead of cehcking for one timeout occuring, only deem the recovery
daemon to be "down" and trigger a shutdown if communications have
timedout for three intervals in a row.

15 years agofix a slow memory leak in the recovery daemon in the error paths for the
Ronnie Sahlberg [Mon, 15 Sep 2008 23:00:48 +0000 (09:00 +1000)]
fix a slow memory leak in the recovery daemon in the error paths for the
memdump function

15 years agofix some slow memory leaks in the vacuuming handler in the recovery
Ronnie Sahlberg [Mon, 15 Sep 2008 21:55:57 +0000 (07:55 +1000)]
fix some slow memory leaks in the vacuuming handler in the recovery
daemon

15 years agoFrom Volker L
Ronnie Sahlberg [Mon, 15 Sep 2008 20:50:28 +0000 (06:50 +1000)]
From Volker L
Fix a slow memory leak in the recovery daemon if there is a recoery
triggered during the public ip reassignment process