Martin Schwenke [Mon, 15 Dec 2008 06:52:12 +0000 (17:52 +1100)]
3 new tests. 24_ctdb_getdbmap.sh is only 1/2 implemented but does
something vaguely useful. ctdb_test_exit unsets $ctdb_test_exit_hook.
Fix bug in 17_ctdb_config_delete_ip.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 12 Dec 2008 07:44:21 +0000 (18:44 +1100)]
Add a recovery to ctdb_test_exit to improve test stability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 12 Dec 2008 06:25:38 +0000 (17:25 +1100)]
Rename $CTDB_NUM_NODES to $CTDB_TEST_NUM_DAEMONS and only set it if
$CTDB_TEST_REAL_CLUSTER is not set. After a ctdb restart, force a
recovery to attempt to help tests that follows.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 12 Dec 2008 04:39:53 +0000 (15:39 +1100)]
Merge commit 'origin/master' into martins
Ronnie Sahlberg [Thu, 11 Dec 2008 22:39:55 +0000 (09:39 +1100)]
New version 1.0.68
Michael Adam [Wed, 10 Dec 2008 21:27:36 +0000 (22:27 +0100)]
Improve the monitor event test for ethernet interfaces (link detection).
On some systems, the ethtool link detection is not successful when a
cable is plugged but the interface has not been brought up previously.
This improves the test by bringing the interface up (without checking
for success here) and trying the ethtool test again afterwards.
Michael
Michael Adam [Wed, 10 Dec 2008 21:19:31 +0000 (22:19 +0100)]
Use "grep -q" instead of "grep ... > /dev/null" in events.d/10.interfaces
This enhances readability.
Michael
Martin Schwenke [Thu, 11 Dec 2008 07:14:17 +0000 (18:14 +1100)]
Add message about restart to 18_ctdb_freeze.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 10 Dec 2008 05:13:42 +0000 (16:13 +1100)]
With local daemons the sockets are now numbered starting from 0. Fix
setup of local daemons so that it correctly assigns no public IPs to a
single node each time. Separate out daemon_setup so that the
selection of the node with no public IPs is only done once at the
beginning of testing. Clean up all current tests, mostly with a view
to ensuring that a node selected for testing some kind of failover
actually has public addresses assigned. Reenabled 01_ctdb_version.sh
- it now passes if rpm doesn't do anything useful on the node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
root [Wed, 10 Dec 2008 01:06:51 +0000 (12:06 +1100)]
update the "ctdb recover" command.
block and wait until the clustered has completed the recovery before returning.
this makes it easier to script since it avoids the common need for
ctdb recover
... complex loop to wait for recovery to complete ...
script continues
root [Wed, 10 Dec 2008 01:01:19 +0000 (12:01 +1100)]
add a CTDB_TIMEOUT variable for the ctdb tool.
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ... option would.
root [Wed, 10 Dec 2008 00:49:51 +0000 (11:49 +1100)]
make sure we return an errorcode when the ctdb command has hung and is timeodout by the -T <timeout> setting
Martin Schwenke [Wed, 10 Dec 2008 00:42:02 +0000 (11:42 +1100)]
Merge commit 'origin/master' into martins
Martin Schwenke [Wed, 10 Dec 2008 00:32:24 +0000 (11:32 +1100)]
Merge commit 'origin/master' into martins
Martin Schwenke [Wed, 10 Dec 2008 00:22:59 +0000 (11:22 +1100)]
Merge commit 'origin/master' into martins
Martin Schwenke [Tue, 9 Dec 2008 07:20:11 +0000 (18:20 +1100)]
Added use of $ctdb_test_exit_hook to function ctdb_test_exit. Removed
sleeps from ban/unban tests. Now expect "ctdb ping" to return false
if it fails, so made relevant change to 09_ctdb_ping.sh. New
functions install_eventscript and uninstall_eventscript. New
setup/cleanup tests 00_ctdb_install_eventscript.sh and
99_ctdb_uninstall_eventscript.sh. New test 21_ctdb_disablemonitor.sh,
which is incredibly complex.
Signed-off-by: Martin Schwenke <martin@meltin.net>
root [Tue, 9 Dec 2008 01:03:42 +0000 (12:03 +1100)]
add a helper that waits until the clueter is no longe rin recovery mode and return the generation number.
change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.
also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns. this makes it much easier to script things ...
Martin Schwenke [Tue, 9 Dec 2008 00:46:34 +0000 (11:46 +1100)]
Merge commit 'origin/master' into martins
root [Mon, 8 Dec 2008 23:45:14 +0000 (10:45 +1100)]
update to the flags handling
make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node
root [Mon, 8 Dec 2008 06:29:17 +0000 (17:29 +1100)]
If ctdbd was started with the --socket option then we also set the CTDB_SOCKET variable so that the eventscripts can pick up the name proper
Martin Schwenke [Mon, 8 Dec 2008 06:03:50 +0000 (17:03 +1100)]
Merge commit 'origin/master' into martins
root [Mon, 8 Dec 2008 01:57:40 +0000 (12:57 +1100)]
return -1 if ctdb ping failed
Martin Schwenke [Sun, 7 Dec 2008 21:57:46 +0000 (08:57 +1100)]
Merge commit 'origin/master' into martins
Martin Schwenke [Sun, 7 Dec 2008 21:15:18 +0000 (08:15 +1100)]
When running with local daemons, provided there is more than 2 of
them, randomly pick a single node that will not have any public IPs
assigned. This will make life a bit more interesting and will
simulate what happens on real clusters with a management node. Some
tests were disabling a node to implicitly trigger a ctdb restart - now
use an explicit restart of ctdb when it is required.
17_ctdb_config_delete_ip.sh now randomly chooses a public IP on any
node to disable - this works around a problem where the hardcoded node
might not have any public addresses.
Signed-off-by: Martin Schwenke <martin@meltin.net>
root [Fri, 5 Dec 2008 05:32:30 +0000 (16:32 +1100)]
redo and update how we synchronize flags across the cluster.
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.
root [Thu, 4 Dec 2008 23:33:38 +0000 (10:33 +1100)]
some platforms are very picky about the third argument passed to bind().
and would complain if sa.family is AF_INET and the third argument is not exactly the size of a sockaddr_in.
We used to pass a union containing both a sockaddr_in and a sockaddr_in6 which would mean that on those platforms bind() would fail since the passed structure for AF_INET would be too big.
Thus we need to set and pass the appropriate size to bind. At the same time for thos eplatforms we can also set sin[6]_size to the expected size.
(bind() on those platforms were isurprisingly perfectly ok with sin_len was "too big")
Martin Schwenke [Thu, 4 Dec 2008 06:19:51 +0000 (17:19 +1100)]
New test for getmonmode. Overload node_has_status some more to
support checking the monitoring mode.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Thu, 4 Dec 2008 04:25:03 +0000 (15:25 +1100)]
new version 1.0.67
root [Thu, 4 Dec 2008 04:03:40 +0000 (15:03 +1100)]
fix an incorrect path
Martin Schwenke [Thu, 4 Dec 2008 03:42:04 +0000 (14:42 +1100)]
Merge commit 'origin/master' into martins
Ronnie Sahlberg [Thu, 4 Dec 2008 03:35:00 +0000 (14:35 +1100)]
add a description of the recovery-process
Martin Schwenke [Wed, 3 Dec 2008 07:08:21 +0000 (18:08 +1100)]
ctdb_test_init now contains a trap to force ctdb_test_exit to be run
if the shell exits and ctdb_test_exit cancels this trap. This means
that a testcase executing under set -e will call ctdb_test_exit on
failure, allowing the cluster to be restarted if necessary so that
following tests can complete successfully. ctdb_test_exit now
respects $?, so a test will fail if the last thing executed before
ctdb_test_exit failed - this probably means the above trap was
triggered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 3 Dec 2008 04:48:24 +0000 (15:48 +1100)]
$PATH only inludes $CTDB_DIR/bin if we're using local sockets. Rename
$TEST_WRAP to $CTDB_TEST_WRAPPER - value now set using
$CTDB_TEST_REMOTE_SCRIPTS_DIR if that is set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Tue, 2 Dec 2008 03:08:10 +0000 (14:08 +1100)]
print the list of valid debug level literals when an invalid debug level
is specified in 'ctdb setdebug'
Ronnie Sahlberg [Tue, 2 Dec 2008 02:26:30 +0000 (13:26 +1100)]
redesign how reloadnodes is implemented.
modify the transport methods to allow to restart individual connections
and set up destructors properly.
only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.
make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded
root [Fri, 28 Nov 2008 00:29:43 +0000 (11:29 +1100)]
debuglevel is a signed int, not usnigned.
Ronnie Sahlberg [Thu, 27 Nov 2008 22:52:26 +0000 (09:52 +1100)]
make it possible to delete an ip from all nodes at once using
"ctdb delip x.x.x.x -n all"
This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.
Thus we first delete the ip from all connected nodes which are not
currently hosting it.
After this we delete the ip from the node which is hosting it.
Martin Schwenke [Thu, 27 Nov 2008 07:11:22 +0000 (18:11 +1100)]
4 new tests. Hacked function node_has_status to support
frozen/unfrozen via ctdb statistics command.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 25 Nov 2008 06:53:28 +0000 (17:53 +1100)]
4 new tests. Marked more ctdbd.sh tests as done - will remove this
file soon. Simplify 06_ctdb_getpid.sh by using -v option to
try_command_on_node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 24 Nov 2008 08:06:02 +0000 (19:06 +1100)]
inew version 1.0.66
ddwq
Martin Schwenke [Mon, 24 Nov 2008 06:47:09 +0000 (17:47 +1100)]
New test 09_ctdb_ping.sh. Add documentation and command-line
processing to all tests. New script ctdb_test_env sets up environment
for tests, is now sourced by run_tests, and can also take a test on
the command-line, complete with options. Various cleanups and
improvements. Document tests that have been properly implemented in
ctdbd.sh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 21 Nov 2008 08:12:22 +0000 (19:12 +1100)]
Incorporate temporary patch from Ronnie that adds --nopublicipcheck
option to ctdbd. Commit here because it seems to work.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 21 Nov 2008 08:01:48 +0000 (19:01 +1100)]
Move tests/*.c to tests/src/*.c and adjust Makefile.in accordingly.
Move setting of $CTDB_NODES_SOCKETS to tests/scripts/run_tests and
make it only happen if $CTDB_TEST_REAL_CLUSTER is not set. Bugfix in
function ips_are_on_nodeglob. New/proper implementations of functions
stop_daemons and start_daemons, now called by function restart_ctdb.
In start_daemons.sh, add public addresses file generation/usage, use
new option --nopublicipcheck to ctdbd to avoid crazy behaviour and
kill ctdbd more carefully to avoid killing real daemons on a real
cluster - this should be able to coexist on a node of a real cluster.
start_daemons.sh is temporarily incompatible with start_daemons
function, but expecting to replace that script with function calls
very soon anyway...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Fri, 21 Nov 2008 05:24:12 +0000 (16:24 +1100)]
allow to change the recmaster even the database is not frozen
Martin Schwenke [Fri, 21 Nov 2008 02:00:37 +0000 (13:00 +1100)]
Merge commit 'origin/master' into martins
Ronnie Sahlberg [Fri, 21 Nov 2008 00:30:32 +0000 (11:30 +1100)]
remove two variables no longer used from the example sysconfig file
Andrew Tridgell [Thu, 20 Nov 2008 21:05:59 +0000 (08:05 +1100)]
fixed problem with looping ctdb recoveries
After a node failure, GPFS can get into a state where non-blocking
fcntl() locks can take a long time. This means to the ctdb set_recmode
test timing out, which leads to a recovery failure, and a new
recovery. The recovery loop can last a long time.
The fix is to consider a fcntl timeout as a success of this test. The
test is to see that we can't lock the shared reclock file, so a
timeout is fine for a success.
Andrew Tridgell [Thu, 20 Nov 2008 10:23:26 +0000 (21:23 +1100)]
Merge commit 'ronnie/master'
Martin Schwenke [Thu, 20 Nov 2008 09:40:01 +0000 (20:40 +1100)]
Add some simple tests that can be run from within the tree.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Thu, 20 Nov 2008 05:39:56 +0000 (16:39 +1100)]
dont override/change CTDB_BASE if it is already set by the shell
Ronnie Sahlberg [Thu, 20 Nov 2008 02:35:08 +0000 (13:35 +1100)]
Keepalive packets were only sent every KeepaliveInterval if the socket
had been completely idle during that interval.
If we had been sending other packets such as Messages, Calls or Controls
there wouldnt be any need for an explicit keepalive and thus we didnt
send one.
This does make it somewhat awkward when analyzing traces since it is
non-intuitive when keepalives are sent and when they are not sent.
Change the keepalive logic to always send a keepalive regardless of
whether the link is idle or not.
Ronnie Sahlberg [Wed, 19 Nov 2008 03:43:46 +0000 (14:43 +1100)]
reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.
Martin Schwenke [Wed, 19 Nov 2008 02:21:07 +0000 (13:21 +1100)]
Merge branch 'master' into martins
Ronnie Sahlberg [Wed, 12 Nov 2008 23:55:20 +0000 (10:55 +1100)]
new version 1.0.65
update the example sysconfig file. the default log level is 2, not 0
Ronnie Sahlberg [Tue, 11 Nov 2008 03:49:30 +0000 (14:49 +1100)]
add a CTDB_SOCKET variable that can be used to override the default
/tmp/ctdb.socket
Ronnie Sahlberg [Mon, 3 Nov 2008 10:54:52 +0000 (21:54 +1100)]
we actually need a ctdb_db variable
Ronnie Sahlberg [Thu, 30 Oct 2008 02:34:10 +0000 (13:34 +1100)]
latency is measured in us, not ms
use an explicit ctdb_db variable instead of dereferencing state
Ronnie Sahlberg [Thu, 30 Oct 2008 01:49:53 +0000 (12:49 +1100)]
add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold
Ronnie Sahlberg [Wed, 22 Oct 2008 00:06:18 +0000 (11:06 +1100)]
new version 1.0.64
Ronnie Sahlberg [Wed, 22 Oct 2008 00:04:41 +0000 (11:04 +1100)]
add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses
Ronnie Sahlberg [Sun, 19 Oct 2008 22:47:54 +0000 (09:47 +1100)]
new version 1.0.63
Ronnie Sahlberg [Sun, 19 Oct 2008 22:45:15 +0000 (09:45 +1100)]
dont log "running periodic cleanup" ...
Ronnie Sahlberg [Fri, 17 Oct 2008 10:38:42 +0000 (21:38 +1100)]
null out the pointer before we reload the nodes file
Ronnie Sahlberg [Fri, 17 Oct 2008 10:18:06 +0000 (21:18 +1100)]
when we reload the nodes file, we may need to reload the nodes file
inside the recovery daemon as well.
Ronnie Sahlberg [Thu, 16 Oct 2008 22:02:03 +0000 (09:02 +1100)]
make it possible to set the script log level in CTDB sysconfig
Ronnie Sahlberg [Thu, 16 Oct 2008 20:56:12 +0000 (07:56 +1100)]
specify a "script log level" on the commandline to set under which log
level any/all output from eventscripts will be logged as
Ronnie Sahlberg [Thu, 16 Oct 2008 06:59:55 +0000 (17:59 +1100)]
new version 1.0.62
Ronnie Sahlberg [Thu, 16 Oct 2008 06:57:50 +0000 (17:57 +1100)]
allow multiple eventscripts using the same prefix.
this eases the pain for users that use out of tree eventscripts
Martin Schwenke [Thu, 16 Oct 2008 03:15:15 +0000 (14:15 +1100)]
Merge commit 'origin/master' into martins
Andrew Tridgell [Thu, 16 Oct 2008 01:58:25 +0000 (12:58 +1100)]
Merge commit 'ronnie/master'
Ronnie Sahlberg [Wed, 15 Oct 2008 05:40:44 +0000 (16:40 +1100)]
new version 1.0.61
Ronnie Sahlberg [Wed, 15 Oct 2008 05:29:09 +0000 (16:29 +1100)]
install the new multipath monitoring event script
Ronnie Sahlberg [Wed, 15 Oct 2008 05:27:33 +0000 (16:27 +1100)]
add an eventscript to monitor that the multipath devices are healthy
Ronnie Sahlberg [Tue, 14 Oct 2008 21:33:37 +0000 (08:33 +1100)]
we must also check the status returned from the get tickles control to
determine whether it was successful or not
Ronnie Sahlberg [Tue, 14 Oct 2008 16:02:09 +0000 (03:02 +1100)]
lower the loglevel for the informational message that a TCP_ADD opeation
described an ip address not known to be a public address.
This could happen if someone for genuine reasons accesses a share
through a static ip address.
It can also happen if non homogenous public address configurations are
used and when a tcp description is pushed out to a different node that
does not server/know the specific ip address.
Ronnie Sahlberg [Tue, 14 Oct 2008 14:49:19 +0000 (01:49 +1100)]
change ip route add to route add -net since this works more reliably
update the makefile and rpm to install 99.routing
Ronnie Sahlberg [Tue, 14 Oct 2008 14:32:46 +0000 (01:32 +1100)]
new version 1.0.60
Ronnie Sahlberg [Tue, 14 Oct 2008 14:23:57 +0000 (01:23 +1100)]
verify that the nodes we try to ban/unban are operational and print an
error to the user othervise.
Ronnie Sahlberg [Tue, 14 Oct 2008 14:08:29 +0000 (01:08 +1100)]
Revert "from Mathieu Parent <math.parent@gmail.com>"
This reverts commit
dc9cd4779db4a89697731e4cf415be51067a07c1.
Conflicts:
Ronnie Sahlberg [Tue, 14 Oct 2008 13:24:44 +0000 (00:24 +1100)]
update the client side of getnodemap and getpublicips controls to
fallback to the old-style ipv4-only controls if the new-style ipv4/ipv6
control fails.
this allows a 1.0.59+ (ipv4/ipv6) ctdb daemon being recmaster to be
compatible with
pre-1.0.59 versions of ctdb that are ipv4 only.
Ronnie Sahlberg [Mon, 13 Oct 2008 23:40:29 +0000 (10:40 +1100)]
update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
older ipv4-only version of these controls.
We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.
Ronnie Sahlberg [Sun, 12 Oct 2008 21:27:33 +0000 (08:27 +1100)]
from Mathieu Parent <math.parent@gmail.com>
Hi,
I have attached a patch necessary as debian log dir (/var/log) is not
a subdir of VARDIR (/var/lib on rpm systems, /var/lib/ctdb on debian).
As I don't know much about autotools and friends, this patch may be
hacky.
This is part of the process to minimize diff between distributions.
Ronnie Sahlberg [Sun, 12 Oct 2008 21:21:20 +0000 (08:21 +1100)]
From Mathieu Parent
patch to make debian systems log the package versions in
ctdb_diagnostics
Andrew Tridgell [Thu, 9 Oct 2008 07:45:12 +0000 (18:45 +1100)]
added some more gpfs commands per-filesystem
Ronnie Sahlberg [Tue, 7 Oct 2008 08:34:34 +0000 (19:34 +1100)]
skip empty lines in the public addresses file, not skip all non-empty
lines
Ronnie Sahlberg [Tue, 7 Oct 2008 08:25:10 +0000 (19:25 +1100)]
from Michael Adams : allow #-style comments in the nodes and public
addresses file
Ronnie Sahlberg [Tue, 7 Oct 2008 07:23:12 +0000 (18:23 +1100)]
new version 1.0.59
Ronnie Sahlberg [Tue, 7 Oct 2008 07:14:44 +0000 (18:14 +1100)]
remove an unused variable
Ronnie Sahlberg [Tue, 7 Oct 2008 07:12:54 +0000 (18:12 +1100)]
When we reload the nodes file
instead of shutting down/restarting the entire tcp layer
just bounce all outgoing connections and reconnect
Ronnie Sahlberg [Tue, 7 Oct 2008 00:03:30 +0000 (11:03 +1100)]
add a new eventscript : 99.routing that is used to add static routes to
interfaces when they are activated (an ip address is added during
takeip)
Andrew Tridgell [Tue, 30 Sep 2008 14:16:17 +0000 (07:16 -0700)]
The author of the upstream code asked for this code to be GPLv2+ not GPLv3
Andrew Tridgell [Tue, 30 Sep 2008 14:09:06 +0000 (07:09 -0700)]
merged a bugfix for the idtree code from the Linux kernel. This
matches commit
7aae6dd80e265aa9402ed507caaff4a5dba55069 in the kernel.
Many thanks to Jim Houston for pointing out this fix to us
Ronnie Sahlberg [Mon, 22 Sep 2008 15:38:28 +0000 (01:38 +1000)]
Check that a database exists first before we dump its content (and
implicitely also create it) using 'ctdb catdb'
Martin Schwenke [Wed, 17 Sep 2008 20:33:48 +0000 (06:33 +1000)]
Merge commit 'origin/master' into martins
Andrew Tridgell [Wed, 17 Sep 2008 11:00:04 +0000 (21:00 +1000)]
expanded ctdb_diagnostics based on recent experience
Ronnie Sahlberg [Wed, 17 Sep 2008 04:24:12 +0000 (14:24 +1000)]
use the correct tunable failcount not timeout
Ronnie Sahlberg [Wed, 17 Sep 2008 04:17:41 +0000 (14:17 +1000)]
The ctdb daemon keeps track of whether the recovery process is running
correctly by measuring how long it was since the last successful
communication with the recovery daemon was recorded.
After a certain timeout the ctdb daemon would deem the recovery daemon
as inoperable and shut down.
If the system clock is suddenly changed forward by many (60 or more)
seconds this could cause the timeout to trigger prematurely/immediately
where ctdb would incorrectly think that more than 60 seconds had passed
since last successful communications and thus abort.
Instead of cehcking for one timeout occuring, only deem the recovery
daemon to be "down" and trigger a shutdown if communications have
timedout for three intervals in a row.
Ronnie Sahlberg [Mon, 15 Sep 2008 23:00:48 +0000 (09:00 +1000)]
fix a slow memory leak in the recovery daemon in the error paths for the
memdump function
Ronnie Sahlberg [Mon, 15 Sep 2008 21:55:57 +0000 (07:55 +1000)]
fix some slow memory leaks in the vacuuming handler in the recovery
daemon
Ronnie Sahlberg [Mon, 15 Sep 2008 20:50:28 +0000 (06:50 +1000)]
From Volker L
Fix a slow memory leak in the recovery daemon if there is a recoery
triggered during the public ip reassignment process