git.samba.org - metze/ctdb/wip.git/log

git.samba.org / metze / ctdb / wip.git / log

Martin Schwenke [Mon, 15 Dec 2008 06:52:12 +0000 (17:52 +1100)]

3 new tests. 24_ctdb_getdbmap.sh is only 1/2 implemented but does
something vaguely useful. ctdb_test_exit unsets $ctdb_test_exit_hook.
Fix bug in 17_ctdb_config_delete_ip.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 12 Dec 2008 07:44:21 +0000 (18:44 +1100)]

Add a recovery to ctdb_test_exit to improve test stability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 12 Dec 2008 06:25:38 +0000 (17:25 +1100)]

Rename $CTDB_NUM_NODES to $CTDB_TEST_NUM_DAEMONS and only set it if
$CTDB_TEST_REAL_CLUSTER is not set. After a ctdb restart, force a
recovery to attempt to help tests that follows.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 12 Dec 2008 04:39:53 +0000 (15:39 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 11 Dec 2008 22:39:55 +0000 (09:39 +1100)]

New version 1.0.68

commit | commitdiff | tree

Michael Adam [Wed, 10 Dec 2008 21:27:36 +0000 (22:27 +0100)]

Improve the monitor event test for ethernet interfaces (link detection).

On some systems, the ethtool link detection is not successful when a
cable is plugged but the interface has not been brought up previously.
This improves the test by bringing the interface up (without checking
for success here) and trying the ethtool test again afterwards.

Michael

commit | commitdiff | tree

Michael Adam [Wed, 10 Dec 2008 21:19:31 +0000 (22:19 +0100)]

Use "grep -q" instead of "grep ... > /dev/null" in events.d/10.interfaces
This enhances readability.

Michael

commit | commitdiff | tree

Martin Schwenke [Thu, 11 Dec 2008 07:14:17 +0000 (18:14 +1100)]

Add message about restart to 18_ctdb_freeze.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Dec 2008 05:13:42 +0000 (16:13 +1100)]

With local daemons the sockets are now numbered starting from 0.  Fix
setup of local daemons so that it correctly assigns no public IPs to a
single node each time.  Separate out daemon_setup so that the
selection of the node with no public IPs is only done once at the
beginning of testing.  Clean up all current tests, mostly with a view
to ensuring that a node selected for testing some kind of failover
actually has public addresses assigned.  Reenabled 01_ctdb_version.sh
- it now passes if rpm doesn't do anything useful on the node.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

root [Wed, 10 Dec 2008 01:06:51 +0000 (12:06 +1100)]

update the "ctdb recover" command.

block and wait until the clustered has completed the recovery before returning.
this  makes it easier to script since it avoids the common need for
   ctdb recover
   ... complex loop to wait for recovery to complete ...
   script continues

commit | commitdiff | tree

root [Wed, 10 Dec 2008 01:01:19 +0000 (12:01 +1100)]

add a CTDB_TIMEOUT variable for the ctdb tool.
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ... option would.

commit | commitdiff | tree

root [Wed, 10 Dec 2008 00:49:51 +0000 (11:49 +1100)]

make sure we return an errorcode when the ctdb command has hung and is timeodout by the -T <timeout> setting

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Dec 2008 00:42:02 +0000 (11:42 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Dec 2008 00:32:24 +0000 (11:32 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Dec 2008 00:22:59 +0000 (11:22 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Martin Schwenke [Tue, 9 Dec 2008 07:20:11 +0000 (18:20 +1100)]

Added use of $ctdb_test_exit_hook to function ctdb_test_exit.  Removed
sleeps from ban/unban tests.  Now expect "ctdb ping" to return false
if it fails, so made relevant change to 09_ctdb_ping.sh.  New
functions install_eventscript and uninstall_eventscript.  New
setup/cleanup tests 00_ctdb_install_eventscript.sh and
99_ctdb_uninstall_eventscript.sh.  New test 21_ctdb_disablemonitor.sh,
which is incredibly complex.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

root [Tue, 9 Dec 2008 01:03:42 +0000 (12:03 +1100)]

add a helper that waits until the clueter is no longe rin recovery mode and return the generation number.

change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.

also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns. this makes it much easier to script things ...

commit | commitdiff | tree

Martin Schwenke [Tue, 9 Dec 2008 00:46:34 +0000 (11:46 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

root [Mon, 8 Dec 2008 23:45:14 +0000 (10:45 +1100)]

update to the flags handling
make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node

commit | commitdiff | tree

root [Mon, 8 Dec 2008 06:29:17 +0000 (17:29 +1100)]

If ctdbd was started with the --socket option then we also set the CTDB_SOCKET variable so that the eventscripts can pick up the name proper

commit | commitdiff | tree

Martin Schwenke [Mon, 8 Dec 2008 06:03:50 +0000 (17:03 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

root [Mon, 8 Dec 2008 01:57:40 +0000 (12:57 +1100)]

return -1 if ctdb ping failed

commit | commitdiff | tree

Martin Schwenke [Sun, 7 Dec 2008 21:57:46 +0000 (08:57 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Martin Schwenke [Sun, 7 Dec 2008 21:15:18 +0000 (08:15 +1100)]

When running with local daemons, provided there is more than 2 of
them, randomly pick a single node that will not have any public IPs
assigned. This will make life a bit more interesting and will
simulate what happens on real clusters with a management node. Some
tests were disabling a node to implicitly trigger a ctdb restart - now
use an explicit restart of ctdb when it is required.
17_ctdb_config_delete_ip.sh now randomly chooses a public IP on any
node to disable - this works around a problem where the hardcoded node
might not have any public addresses.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

root [Fri, 5 Dec 2008 05:32:30 +0000 (16:32 +1100)]

redo and update how we synchronize flags across the cluster.
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.

commit | commitdiff | tree

root [Thu, 4 Dec 2008 23:33:38 +0000 (10:33 +1100)]

some platforms are very picky about the third argument passed to bind().
and would complain if sa.family is AF_INET and the third argument is not exactly the size of a sockaddr_in.

We used to pass a union containing both a sockaddr_in and a sockaddr_in6 which would mean that on those platforms bind() would fail since the passed structure for AF_INET would be too big.

Thus we need to set and pass the appropriate size to bind. At the same time for thos eplatforms we can also set sin[6]_size to the expected size.
(bind() on those platforms were isurprisingly perfectly ok with sin_len was "too big")

commit | commitdiff | tree

Martin Schwenke [Thu, 4 Dec 2008 06:19:51 +0000 (17:19 +1100)]

New test for getmonmode. Overload node_has_status some more to
support checking the monitoring mode.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 4 Dec 2008 04:25:03 +0000 (15:25 +1100)]

new version 1.0.67

commit | commitdiff | tree

root [Thu, 4 Dec 2008 04:03:40 +0000 (15:03 +1100)]

fix an incorrect path

commit | commitdiff | tree

Martin Schwenke [Thu, 4 Dec 2008 03:42:04 +0000 (14:42 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 4 Dec 2008 03:35:00 +0000 (14:35 +1100)]

add a description of the recovery-process

commit | commitdiff | tree

Martin Schwenke [Wed, 3 Dec 2008 07:08:21 +0000 (18:08 +1100)]

ctdb_test_init now contains a trap to force ctdb_test_exit to be run
if the shell exits and ctdb_test_exit cancels this trap. This means
that a testcase executing under set -e will call ctdb_test_exit on
failure, allowing the cluster to be restarted if necessary so that
following tests can complete successfully. ctdb_test_exit now
respects $?, so a test will fail if the last thing executed before
ctdb_test_exit failed - this probably means the above trap was
triggered.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 3 Dec 2008 04:48:24 +0000 (15:48 +1100)]

$PATH only inludes $CTDB_DIR/bin if we're using local sockets. Rename
$TEST_WRAP to $CTDB_TEST_WRAPPER - value now set using
$CTDB_TEST_REMOTE_SCRIPTS_DIR if that is set.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Dec 2008 03:08:10 +0000 (14:08 +1100)]

print the list of valid debug level literals when an invalid debug level
is specified in 'ctdb setdebug'

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 2 Dec 2008 02:26:30 +0000 (13:26 +1100)]

redesign how reloadnodes is implemented.

modify the transport methods to allow to restart individual connections
and set up destructors properly.

only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.

make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded

commit | commitdiff | tree

root [Fri, 28 Nov 2008 00:29:43 +0000 (11:29 +1100)]

debuglevel is a signed int, not usnigned.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 27 Nov 2008 22:52:26 +0000 (09:52 +1100)]

make it possible to delete an ip from all nodes at once using
"ctdb delip x.x.x.x -n all"

This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.

Thus we first delete the ip from all connected nodes which are not
currently hosting it.

After this we delete the ip from the node which is hosting it.

commit | commitdiff | tree

Martin Schwenke [Thu, 27 Nov 2008 07:11:22 +0000 (18:11 +1100)]

4 new tests. Hacked function node_has_status to support
frozen/unfrozen via ctdb statistics command.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 25 Nov 2008 06:53:28 +0000 (17:53 +1100)]

4 new tests. Marked more ctdbd.sh tests as done - will remove this
file soon. Simplify 06_ctdb_getpid.sh by using -v option to
try_command_on_node.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 24 Nov 2008 08:06:02 +0000 (19:06 +1100)]

inew version 1.0.66
ddwq

commit | commitdiff | tree

Martin Schwenke [Mon, 24 Nov 2008 06:47:09 +0000 (17:47 +1100)]

New test 09_ctdb_ping.sh.  Add documentation and command-line
processing to all tests.  New script ctdb_test_env sets up environment
for tests, is now sourced by run_tests, and can also take a test on
the command-line, complete with options.  Various cleanups and
improvements.  Document tests that have been properly implemented in
ctdbd.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Nov 2008 08:12:22 +0000 (19:12 +1100)]

Incorporate temporary patch from Ronnie that adds --nopublicipcheck
option to ctdbd. Commit here because it seems to work.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Nov 2008 08:01:48 +0000 (19:01 +1100)]

Move tests/*.c to tests/src/*.c and adjust Makefile.in accordingly.
Move setting of $CTDB_NODES_SOCKETS to tests/scripts/run_tests and
make it only happen if $CTDB_TEST_REAL_CLUSTER is not set. Bugfix in
function ips_are_on_nodeglob. New/proper implementations of functions
stop_daemons and start_daemons, now called by function restart_ctdb.
In start_daemons.sh, add public addresses file generation/usage, use
new option --nopublicipcheck to ctdbd to avoid crazy behaviour and
kill ctdbd more carefully to avoid killing real daemons on a real
cluster - this should be able to coexist on a node of a real cluster.
start_daemons.sh is temporarily incompatible with start_daemons
function, but expecting to replace that script with function calls
very soon anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 21 Nov 2008 05:24:12 +0000 (16:24 +1100)]

allow to change the recmaster even the database is not frozen

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Nov 2008 02:00:37 +0000 (13:00 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 21 Nov 2008 00:30:32 +0000 (11:30 +1100)]

remove two variables no longer used from the example sysconfig file

commit | commitdiff | tree

Andrew Tridgell [Thu, 20 Nov 2008 21:05:59 +0000 (08:05 +1100)]

fixed problem with looping ctdb recoveries

After a node failure, GPFS can get into a state where non-blocking
fcntl() locks can take a long time. This means to the ctdb set_recmode
test timing out, which leads to a recovery failure, and a new
recovery. The recovery loop can last a long time.

The fix is to consider a fcntl timeout as a success of this test. The
test is to see that we can't lock the shared reclock file, so a
timeout is fine for a success.

commit | commitdiff | tree

Andrew Tridgell [Thu, 20 Nov 2008 10:23:26 +0000 (21:23 +1100)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Martin Schwenke [Thu, 20 Nov 2008 09:40:01 +0000 (20:40 +1100)]

Add some simple tests that can be run from within the tree.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 20 Nov 2008 05:39:56 +0000 (16:39 +1100)]

dont override/change CTDB_BASE if it is already set by the shell

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 20 Nov 2008 02:35:08 +0000 (13:35 +1100)]

Keepalive packets were only sent every KeepaliveInterval if the socket
had been completely idle during that interval.
If we had been sending other packets such as Messages, Calls or Controls
there wouldnt be any need for an explicit keepalive and thus we didnt
send one.

This does make it somewhat awkward when analyzing traces since it is
non-intuitive when keepalives are sent and when they are not sent.

Change the keepalive logic to always send a keepalive regardless of
whether the link is idle or not.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 19 Nov 2008 03:43:46 +0000 (14:43 +1100)]

reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.

commit | commitdiff | tree

Martin Schwenke [Wed, 19 Nov 2008 02:21:07 +0000 (13:21 +1100)]

Merge branch 'master' into martins

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 12 Nov 2008 23:55:20 +0000 (10:55 +1100)]

new version 1.0.65

update the example sysconfig file. the default log level is 2, not 0

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 11 Nov 2008 03:49:30 +0000 (14:49 +1100)]

add a CTDB_SOCKET variable that can be used to override the default
/tmp/ctdb.socket

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 3 Nov 2008 10:54:52 +0000 (21:54 +1100)]

we actually need a ctdb_db variable

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 30 Oct 2008 02:34:10 +0000 (13:34 +1100)]

latency is measured in us, not ms

use an explicit ctdb_db variable instead of dereferencing state

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 30 Oct 2008 01:49:53 +0000 (12:49 +1100)]

add control and logging of very high latencies.

log the type of operation and the database name for all latencies higher
than a treshold

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 22 Oct 2008 00:06:18 +0000 (11:06 +1100)]

new version 1.0.64

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 22 Oct 2008 00:04:41 +0000 (11:04 +1100)]

add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 19 Oct 2008 22:47:54 +0000 (09:47 +1100)]

new version 1.0.63

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 19 Oct 2008 22:45:15 +0000 (09:45 +1100)]

dont log "running periodic cleanup" ...

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 17 Oct 2008 10:38:42 +0000 (21:38 +1100)]

null out the pointer before we reload the nodes file

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 17 Oct 2008 10:18:06 +0000 (21:18 +1100)]

when we reload the nodes file, we may need to reload the nodes file
inside the recovery daemon as well.

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 16 Oct 2008 22:02:03 +0000 (09:02 +1100)]

make it possible to set the script log level in CTDB sysconfig

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 16 Oct 2008 20:56:12 +0000 (07:56 +1100)]

specify a "script log level" on the commandline to set under which log
level any/all output from eventscripts will be logged as

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 16 Oct 2008 06:59:55 +0000 (17:59 +1100)]

new version 1.0.62

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 16 Oct 2008 06:57:50 +0000 (17:57 +1100)]

allow multiple eventscripts using the same prefix.
this eases the pain for users that use out of tree eventscripts

commit | commitdiff | tree

Martin Schwenke [Thu, 16 Oct 2008 03:15:15 +0000 (14:15 +1100)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Andrew Tridgell [Thu, 16 Oct 2008 01:58:25 +0000 (12:58 +1100)]

Merge commit 'ronnie/master'

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 15 Oct 2008 05:40:44 +0000 (16:40 +1100)]

new version 1.0.61

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 15 Oct 2008 05:29:09 +0000 (16:29 +1100)]

install the new multipath monitoring event script

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 15 Oct 2008 05:27:33 +0000 (16:27 +1100)]

add an eventscript to monitor that the multipath devices are healthy

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 21:33:37 +0000 (08:33 +1100)]

we must also check the status returned from the get tickles control to
determine whether it was successful or not

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 16:02:09 +0000 (03:02 +1100)]

lower the loglevel for the informational message that a TCP_ADD opeation
described an ip address not known to be a public address.

This could happen if someone for genuine reasons accesses a share
through a static ip address.
It can also happen if non homogenous public address configurations are
used and when a tcp description is pushed out to a different node that
does not server/know the specific ip address.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 14:49:19 +0000 (01:49 +1100)]

change ip route add to route add -net since this works more reliably

update the makefile and rpm to install 99.routing

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 14:32:46 +0000 (01:32 +1100)]

new version 1.0.60

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 14:23:57 +0000 (01:23 +1100)]

verify that the nodes we try to ban/unban are operational and print an
error to the user othervise.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 14:08:29 +0000 (01:08 +1100)]

Revert "from Mathieu Parent <math.parent@gmail.com>"

This reverts commit dc9cd4779db4a89697731e4cf415be51067a07c1.

Conflicts:

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 14 Oct 2008 13:24:44 +0000 (00:24 +1100)]

update the client side of getnodemap and getpublicips controls to
fallback to the old-style ipv4-only controls if the new-style ipv4/ipv6
control fails.

this allows a 1.0.59+ (ipv4/ipv6) ctdb daemon being recmaster to be
compatible with
pre-1.0.59 versions of ctdb that are ipv4 only.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 13 Oct 2008 23:40:29 +0000 (10:40 +1100)]

update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
older ipv4-only version of these controls.

We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 12 Oct 2008 21:27:33 +0000 (08:27 +1100)]

from Mathieu Parent <math.parent@gmail.com>
Hi,

I have attached a patch necessary as debian log dir (/var/log) is not
a subdir of VARDIR (/var/lib on rpm systems, /var/lib/ctdb on debian).
As I don't know much about autotools and friends, this patch may be
hacky.

This is part of the process to minimize diff between distributions.

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 12 Oct 2008 21:21:20 +0000 (08:21 +1100)]

From Mathieu Parent
patch to make debian systems log the package versions in
ctdb_diagnostics

commit | commitdiff | tree

Andrew Tridgell [Thu, 9 Oct 2008 07:45:12 +0000 (18:45 +1100)]

added some more gpfs commands per-filesystem

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Oct 2008 08:34:34 +0000 (19:34 +1100)]

skip empty lines in the public addresses file, not skip all non-empty
lines

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Oct 2008 08:25:10 +0000 (19:25 +1100)]

from Michael Adams : allow #-style comments in the nodes and public
addresses file

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Oct 2008 07:23:12 +0000 (18:23 +1100)]

new version 1.0.59

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Oct 2008 07:14:44 +0000 (18:14 +1100)]

remove an unused variable

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Oct 2008 07:12:54 +0000 (18:12 +1100)]

When we reload the nodes file
instead of shutting down/restarting the entire tcp layer
just bounce all outgoing connections and reconnect

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 7 Oct 2008 00:03:30 +0000 (11:03 +1100)]

add a new eventscript : 99.routing that is used to add static routes to
interfaces when they are activated (an ip address is added during
takeip)

commit | commitdiff | tree

Andrew Tridgell [Tue, 30 Sep 2008 14:16:17 +0000 (07:16 -0700)]

The author of the upstream code asked for this code to be GPLv2+ not GPLv3

commit | commitdiff | tree

Andrew Tridgell [Tue, 30 Sep 2008 14:09:06 +0000 (07:09 -0700)]

merged a bugfix for the idtree code from the Linux kernel. This
matches commit 7aae6dd80e265aa9402ed507caaff4a5dba55069 in the kernel.

Many thanks to Jim Houston for pointing out this fix to us

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 22 Sep 2008 15:38:28 +0000 (01:38 +1000)]

Check that a database exists first before we dump its content (and
implicitely also create it) using 'ctdb catdb'

commit | commitdiff | tree

Martin Schwenke [Wed, 17 Sep 2008 20:33:48 +0000 (06:33 +1000)]

Merge commit 'origin/master' into martins

commit | commitdiff | tree

Andrew Tridgell [Wed, 17 Sep 2008 11:00:04 +0000 (21:00 +1000)]

expanded ctdb_diagnostics based on recent experience

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 17 Sep 2008 04:24:12 +0000 (14:24 +1000)]

use the correct tunable failcount not timeout

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 17 Sep 2008 04:17:41 +0000 (14:17 +1000)]

The ctdb daemon keeps track of whether the recovery process is running
correctly by measuring how long it was since the last successful
communication with the recovery daemon was recorded.

After a certain timeout the ctdb daemon would deem the recovery daemon
as inoperable and shut down.

If the system clock is suddenly changed forward by many (60 or more)
seconds this could cause the timeout to trigger prematurely/immediately
where ctdb would incorrectly think that more than 60 seconds had passed
since last successful communications and thus abort.

Instead of cehcking for one timeout occuring, only deem the recovery
daemon to be "down" and trigger a shutdown if communications have
timedout for three intervals in a row.

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 15 Sep 2008 23:00:48 +0000 (09:00 +1000)]

fix a slow memory leak in the recovery daemon in the error paths for the
memdump function

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 15 Sep 2008 21:55:57 +0000 (07:55 +1000)]

fix some slow memory leaks in the vacuuming handler in the recovery
daemon

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 15 Sep 2008 20:50:28 +0000 (06:50 +1000)]

From Volker L
Fix a slow memory leak in the recovery daemon if there is a recoery
triggered during the public ip reassignment process

Work in progress branches

RSS Atom