Martin Schwenke [Fri, 19 Apr 2013 03:05:02 +0000 (13:05 +1000)]
ctdbd: New control CTDB_CONTROL_IPREALLOCATED
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
27a44685f0d7a88804b61a1542bb42adc8f88cb1)
Martin Schwenke [Tue, 30 Apr 2013 07:22:23 +0000 (17:22 +1000)]
ctdbd: Avoid freeing non-monitor event callback when monitoring is disabled
When running a non-monitor event, check is made for any active monitor
events. If there is an active monitor event, then the active monitor
event is cancelled. This is done by freeing state->callback which is
allocated from monitor_context.
When CTDB is stopped or shutdown, monitoring is disabled by freeing
monitor_context, which frees callback and then stopped or shutdown event
is run. This creates a new callback structure which is allocated at
the exact same memory location as the monitor callback which was freed.
So in the check for active monitor events, it frees the new callback
for non-monitor event. Since the callback function flags successful
completion of that event, it is never marked complete and CTDB is stuck
in a loop waiting for completion.
Move the monitor cancellation to the top of the function so that this
can't happen.
Follow log snippest highlights the problem.
2013/04/30 16:54:10.673807 [21505]: Received SHUTDOWN command. Stopping CTDB daemon.
2013/04/30 16:54:10.673814 [21505]: Shutting down recovery daemon
2013/04/30 16:54:10.673852 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0
2013/04/30 16:54:10.673858 [21505]: Monitoring has been stopped
2013/04/30 16:54:10.673899 [21505]: server/eventscript.c:594 Sending SIGTERM to child pid:23847
2013/04/30 16:54:10.673913 [21505]: server/eventscript.c:629 searching for callback 0x1c6d5c0
2013/04/30 16:54:10.673932 [21505]: server/eventscript.c:641 running callback
2013/04/30 16:54:10.673939 [21505]: server/eventscript.c:866 in event_script_callback
2013/04/30 16:54:10.673946 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
05f785b51cfd8b22b3ae35bf034127fbc07005be)
Martin Schwenke [Wed, 20 Feb 2013 23:43:35 +0000 (10:43 +1100)]
recoverd: Interface reference count changes should not cause takeover runs
At the moment a naive compare of the all the interface data is done.
So, if any IPs move then the reference counts for the the relevant
interfaces change, interfaces appear to have changed and another
takeover run is initiated by each node that took/released IPs.
This change stops the spurious takeover runs by changing the interface
comparison to ignore the reference counts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
0b7257642f62ebd83c05b6e2922f0dc2737f175c)
Michael Adam [Fri, 19 Apr 2013 14:24:32 +0000 (16:24 +0200)]
recover: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
b5a8791268e938d7e017056e0e2bd2cbec1fa690)
Michael Adam [Fri, 19 Apr 2013 14:23:16 +0000 (16:23 +0200)]
ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
c7eab97c7a939710b73aae2d75b404b235a998f5)
Michael Adam [Fri, 19 Apr 2013 14:22:49 +0000 (16:22 +0200)]
ctdb_call: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f99eb2f56d8ca27110a45ae0e1c4bff40ac7a60e)
Michael Adam [Fri, 19 Apr 2013 14:09:34 +0000 (16:09 +0200)]
vacuum: use CTDB_REC_RO_FLAGS in the vacuuming code
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a62775334aa20d1d850d2df705eb70303b04ac5c)
Michael Adam [Fri, 19 Apr 2013 13:55:38 +0000 (15:55 +0200)]
ltdb_server: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
61f17e53576197def46bc61fdf0cdb5282333a3e)
Michael Adam [Fri, 19 Apr 2013 14:01:45 +0000 (16:01 +0200)]
include: define CTDB_REC_RO_FLAGS - all read-only related record flags
This is used for some checks
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
c7924ce6404bb18641b00d5fbd2fe9da9aaf7959)
Michael Adam [Fri, 22 Feb 2013 15:12:17 +0000 (16:12 +0100)]
vacuum: Update (C)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
61264debba58355b9716ac1637fdedef5ed249c8)
Michael Adam [Sat, 29 Dec 2012 16:23:27 +0000 (17:23 +0100)]
vacuum: extend the header comment for ctdb_process_delete_list()
Describe the (new) process more precisely.
And mention that is the last step of the vacuuming process
that is performed on the lmaster.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
06de786c786f1cab4c6721adf47c2cb1e8a72adb)
Michael Adam [Sat, 5 Jan 2013 00:20:18 +0000 (01:20 +0100)]
vacuum: turn the vacuuming on lmaster into a three-phase process.
More precisely, before locally deleting an empty record, that has been
migrated with data and that we are dmaster and laster for, we now perform
the deletion on the other nodes in two steps instead of a single step.
- First send out the list of records to be deleted to all
other nodes with the new RECEIVE_RECORDS control to store
the lmaster's current empty copy.
- Then send those records that could be deleted on all nodes
to all nodes again with the TRY_DELETE_RECORDS control
as before for deletion.
- Finally delete those records locally that were successfully
deleted remotely in the previous step.
This fixes an old race where a recovery that hits the vacuum process
square between the eyes can create gaps in the record's history and
hence let the records resurrect. In the case of the locking.tdb,
that could mean that a file that was already closed, was recorded as
being open and locked again, so samba clients were locked out of that
file until samba was restarted.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
eee23d44b6427be8ab49bbfcee3abb62f37dfcc7)
Michael Adam [Thu, 20 Dec 2012 23:24:47 +0000 (00:24 +0100)]
vacuum: introduce the RECEIVE_RECORDS control
This in preparation of turning the vacuming on the lmaster into
into a two phase process:
- First the node sends the list of records to be vacuumed
to all other nodes with this new RECEIVE_RECORDS control.
The remote nodes should store the lmaster's empty current copy.
- Only those records that could be stored on all other nodes
are processed further. They are send to all other nodes with
the TRY_DELETE_RECORDS control as before for deletion.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e397702e271af38204fd99733bbeba7c1db3a999)
Michael Adam [Sat, 29 Dec 2012 17:32:39 +0000 (18:32 +0100)]
vacuum: reorder some of ctdb_process_delete_list() more intuitively
Now that the nodemap and its talloc children don't hang off of the
delete_records_list talloc context, we can build the nodemap
and earlier, and move the construction of the delete_records_list
to where it is more obvious what it is used for.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e3740899c1af6962f93c85ad7d1cb71bddce45c6)
Michael Adam [Sat, 29 Dec 2012 16:16:33 +0000 (17:16 +0100)]
vacuum: add explicit temporary memory context to ctdb_process_delete_list()
This removes the implicit artificial talloc hierarchy and makes the
code easier to understand.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
b7c3b8cdf92c597e621e3dae28b110d321de5ea8)
Michael Adam [Sat, 5 Jan 2013 00:19:06 +0000 (01:19 +0100)]
vacuum: fix indentation in ctdb_process_delete_list()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
59a887e12469266e514ad7d4e34810e7ea888ba3)
Michael Adam [Mon, 17 Dec 2012 16:31:55 +0000 (17:31 +0100)]
vacuum: free temporary allocated memory correctly in ctdb_process_delete_list().
Add a common exit point for cleanup.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
11d728465a9c635e1829abaae17e2f7720433b69)
Michael Adam [Mon, 17 Dec 2012 16:26:22 +0000 (17:26 +0100)]
vacuum: move variable into scope of use in ctdb_process_delete_list()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
3710dd0f313f551f1b302b4961e0203243e3d661)
Michael Adam [Mon, 17 Dec 2012 12:07:21 +0000 (13:07 +0100)]
vacuum: move variable into scope of use in ctdb_process_delete_list()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
4640979b526b6dac69a6a0555bfce75fe0206dac)
Michael Adam [Mon, 17 Dec 2012 12:03:42 +0000 (13:03 +0100)]
vacuum: simplify ctdb_process_delete_list(): reduce indentation
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f3e6e7f8ef22bd70dd2f101d818e2e5ab5ed3cd8)
Michael Adam [Wed, 3 Apr 2013 12:12:27 +0000 (14:12 +0200)]
vacuum: add DEBUG to skip conditions in delete_record_traverse()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
817c77a3d0a3546bf46389cec5f6b54778dd1693)
Michael Adam [Fri, 5 Apr 2013 15:14:43 +0000 (17:14 +0200)]
vacuum: break line for RO-flags check in delete_record_traverse() for readability
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
3f7e35ff0db740cdcb6d27c43a59bb6ca6066efb)
Michael Adam [Mon, 22 Apr 2013 14:21:02 +0000 (10:21 -0400)]
client: fix ctdb_control() to be able to cope with CTDB_CTRL_FLAG_NOREPLY
This was apparently not used before in this context, and the bug hence
not detected. It becomes necessary when ctdb_local_schedule_for_deletion()
is called from a client ctdbd (the vacuuming child), hence needs to send
the SCHEDULE_FOR_DELETION control to its parent.
Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e72a5e11845fe445baaee4730bb0bea8588ee9e3)
Amitay Isaacs [Fri, 19 Apr 2013 03:29:04 +0000 (13:29 +1000)]
ctdbd: Set num_clients statistic from ctdb->num_clients
This fixes the problem of "ctdb statisticsreset" clearing the number of
clients even when there are active clients.
Values returned in statistics for frozen, recovering, memory_used are based on
the current state of CTDB and are not maintained as statistics. This should
include num_clients as well.
Currently ctdb->num_clients is unused. So use that to track the number of
clients and fill in statistics field only when requested.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
dc4ca816630ed44b419108da53421331243fb8c7)
Martin Schwenke [Mon, 22 Apr 2013 03:52:04 +0000 (13:52 +1000)]
ctdbd: Log PID file creation and removal at NOTICE level
Unexpected removal of this file can have serious consequences, so it
is best if this is logged at the default level.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
bfed6a8d1771db3401d12b819204736c33acb312)
Martin Schwenke [Mon, 22 Apr 2013 03:48:06 +0000 (13:48 +1000)]
scripts: Ensure even external scripts get tagged in logs as "ctdbd"
Our practice is to search logs for "ctdbd:". We want to make sure we
find everything.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93)
Martin Schwenke [Sun, 21 Apr 2013 20:52:49 +0000 (06:52 +1000)]
eventscripts: Ensure directories are created
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.
Create those directories instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
0076cfc4666e5a96eb2c8affb59585b090840e00)
Martin Schwenke [Wed, 17 Apr 2013 03:26:04 +0000 (13:26 +1000)]
scripts: Clean up update_tickles() and handling of associated directory
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
700cf95a1f29b4b88460a00a55d57a9e397011e0)
Martin Schwenke [Wed, 17 Apr 2013 03:12:32 +0000 (13:12 +1000)]
scripts: Use $CTDB_SCRIPT_DEBUGLEVEL instead of something more complex
The current logic is horrible and creates an unnecessary file. Let's
make the script debug level independent of ctddb's debug level.
* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly
* Remove ctdb_set_current_debuglevel()
* Remove the "getdebug" command from ctdb stub in eventscript unit
tests
* Update relevant eventscript unit tests to use
$CTDB_SCRIPT_DEBUGLEVEL
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
85efa446c7f5c5af1c3a960001aa777775ae562f)
Martin Schwenke [Fri, 19 Apr 2013 03:10:27 +0000 (13:10 +1000)]
scripts: Ensure service command is in $PATH in ctdb-crash-cleanup.sh
Move the use of the service command below inclusion of functions file,
which sets $PATH.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
d254d03f69cbdc3e473202b759af6e1392cbb59c)
Martin Schwenke [Mon, 15 Apr 2013 09:15:22 +0000 (19:15 +1000)]
initscript: Remove duplicate setting of $ctdbd
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
e7a4b7e35a1e4b826846e2494a3803abb57065ee)
Martin Schwenke [Tue, 16 Apr 2013 01:40:55 +0000 (11:40 +1000)]
util: Removed unused declaration of ctdbd_start()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
1e989894764e4cd1d551c44784d91cb295cd790d)
Martin Schwenke [Mon, 15 Apr 2013 03:31:42 +0000 (13:31 +1000)]
include: Move ctdb_start_daemon() from ctdb_client.h to ctdb_private.h
It really is internal.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
abb64f62efaa70df4b87c030b96300eafd98e6a3)
Martin Schwenke [Mon, 15 Apr 2013 05:42:55 +0000 (15:42 +1000)]
scripts: ctdb-crash-cleanup.sh uses initscript to see if ctdbd is running
"ctdb ping" can time out. How many times should we try?
Instead, depend on the initscript to implement something sane.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
90cb337e5ccf397b69a64298559a428ff508f196)
Martin Schwenke [Mon, 15 Apr 2013 05:18:12 +0000 (15:18 +1000)]
initscript: Use a PID file to implement the "status" option
Using "ctdb ping" and "ctdb status" is fraught with danger. These
commands can timeout when ctdbd is running, leading callers to believe
that ctdbd is not running. Timeouts could be increased but we would
still have to handle potential timeouts.
Everything else in the world implements the "status" option by
checking if the relevant process is running. This change makes CTDB
do the same thing and uses standard distro functions.
This change is backward compatible in sense that a missing
/var/run/ctdb/ directory means that we don't do a PID file check but
just depend on the distro's checking method. Therefore, if CTDB was
started with an older version of this script then "service ctdb
status" will still work.
This script does not support changing the value of CTDB_VALGRIND
between calls. If you start with CTDB_VALGRIND=yes then you need to
check status with the same setting. CTDB_VALGRIND is a debug
variable, so this is acceptable.
This also adds sourcing of /lib/lsb/init-functions to make the Debian
function status_of_proc() available.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
687e2eace4f48400cf5029914f62b6ddabb85378)
Martin Schwenke [Mon, 15 Apr 2013 03:32:57 +0000 (13:32 +1000)]
ctdbd: Add --pidfile option
Default is not to create a pid file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
996e74d3db0c50f91b320af8ab7c43ea6b1136af)
Martin Schwenke [Mon, 15 Apr 2013 06:14:40 +0000 (16:14 +1000)]
util: ctdb_fork() should call ctdb_set_child_info()
For now we pass NULL as the child name. Later we'll give ctdb_fork()
and friends an extra argument and pass that through.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
ba8866d40125bab06391a17d48ff06a4a9f9da89)
Martin Schwenke [Tue, 16 Apr 2013 01:11:11 +0000 (11:11 +1000)]
util: New functions ctdb_set_child_info() and ctdb_is_child_process()
Must be called by all child processes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
59b019a97aad9a731f9080ea5be14d0dbdfe03d6)
Michael Adam [Wed, 17 Apr 2013 11:08:49 +0000 (13:08 +0200)]
tests: add a comment to recovery db corruption test
The comment explains that we use "ctdb stop" and "ctdb continue"
but we should use "ctdb setcrecmasterrole off".
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
06ac62f890299021220214327f1b611c3cf00145)
Amitay Isaacs [Thu, 11 Apr 2013 06:59:36 +0000 (16:59 +1000)]
tests: Add a test for subsequent recoveries corrupting databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
b1577a11d548479ff1a05702d106af9465921ad4)
Amitay Isaacs [Thu, 11 Apr 2013 06:58:34 +0000 (16:58 +1000)]
tests: Support waiting for "recovered" state in tests
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
2438f3a4944f7adbcae4cc1b9d5452714244afe7)
Michael Adam [Wed, 3 Apr 2013 10:02:59 +0000 (12:02 +0200)]
ctdb_call: don't bump the rsn in ctdb_become_dmaster() any more
This is now done in ctdb_ltdb_store_server(), so this
extra bump can be spared.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
cad3107b12e8392f786f9a758ee38cf3a3d58538)
Michael Adam [Wed, 3 Apr 2013 09:40:25 +0000 (11:40 +0200)]
Fix a severe recovery bug that can lead to data corruption for SMB clients.
Problem:
Recovery can under certain circumstances lead to old record copies
resurrecting: Recovery selects the newest record copy purely by RSN. At
the end of the recovery, the recovery master is the dmaster for all
records in all (non-persistent) databases. And the other nodes locally
hold the complete copy of the databases. The bug is that the recovery
process does not increment the RSN on the recovery master at the end of
the recovery. Now clients acting directly on the Recovery master will
directly change a record's content on the recmaster without migration
and hence without RSN bump. So a subsequent recovery can not tell that
the recmaster's copy is newer than the copies on the other nodes, since
their RSN is the same. Hence, if the recmaster is not node 0 (or more
precisely not the active node with the lowest node number), the recovery
will choose copies from nodes with lower number and stick to these.
Here is how to reproduce:
- assume we have a cluster with at least 2 nodes
- ensure that the recmaster is not node 0
(maybe ensure with "onnode 0 ctdb setrecmasterrole off")
say recmaster is node 1
- choose a new database name, say "test1.tdb"
(make sure it is not yet attached as persistent)
- choose a key name, say "key1"
- all clustere nodes should ok and no recovery running
- now do the following on node 1:
1. dbwrap_tool test1.tdb store key1 uint32 1
2. dbwrap_tool test1.tdb fetch key1 uint32
==> 1
3. ctdb recover
4. dbwrap_tool test1.tdb store key1 uint32 2
5. dbwrap_tool test1.tdb fetch key1 uint32
==> 2
4. ctdb recover
7. dbwrap_tool test1.tdb fetch key1 uint32
==> 1
==> BUG
This is a very severe bug, since when applied to Samba's locking.tdb
database, it means that for SMB clients on clustered Samba there is
the potential for locking out oneself from previously opened files
or even worse, data corruption:
Case 1: locking out
- client on recmaster opens file
- recovery propagates open file handle (entry in locking.tdb) to
other nodes
- client closes file
- client opens the same file
- recovery resurrects old copy of open file record in locking.tdb
from lower node
- client closes file but fails to delete entry in locking.tdb
- client tries to open same file again but fails, since
the old record locks it out (since the client is still connected)
Case 2: data corruption
- clien1 on recmaster opens file
- recovery propagates open file info to other nodes
- client1 closes the file and disconnects
- client2 opens the same file
- recovery resurrects old copy of locking.tdb record,
where client2 has no entry, but client1 has.
- but client2 believes it still has a handle
- client3 opens the file and succees without
conflicting with client2
(the detached entry for client1 is discarded because
the server does not exist any more).
=> both client2 and client3 believe they have exclusive
access to the file and writing creates data corruption
Fix:
When storing a record on the dmaster, bump its RSN.
The ctdb_ltdb_store_server() is the central function for storing
a record to a local tdb from the ctdbd server context.
So this is also the place where the RSN of the record to be stored
should be incremented, when storing on the dmaster.
For the case of the record migration, this is currently done in
ctdb_become_dmaster() in ctdb_call.c, but there are other places
such as in recovery, where we should bump the RSN, but currently
don't do it.
So moving the RSN incrementation into ctdb_ltdb_store_server fixes
the recovery-record-resurrection bug.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
feb1d40b21a160737aead22e398f3c34ff3be8de)
Michael Adam [Mon, 15 Apr 2013 10:50:42 +0000 (12:50 +0200)]
logging: fix comment typo
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
4c0cbfbe8b19f2e6fe17093b52c734bec63dd8b7)
Michael Adam [Wed, 3 Apr 2013 12:03:32 +0000 (14:03 +0200)]
ctdbd: unimplement the unused SET_DMASTER control
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
2e92deef5221ee651028ef87138b3113f1fece91)
Michael Adam [Fri, 22 Mar 2013 16:48:00 +0000 (17:48 +0100)]
recoverd: remove bogus comment "qqq" from "add prototype new banning code"
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
9f01b8db72780acf2f88f1392bc0a796dd4c6176)
Michael Adam [Fri, 5 Apr 2013 14:55:18 +0000 (16:55 +0200)]
build: silence building of porting_test
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e96acf19b4d1e0f951ab92b88869a01ff06398be)
Amitay Isaacs [Thu, 11 Apr 2013 03:20:09 +0000 (13:20 +1000)]
traverse: Ensure backward compatibility for CTDB_CONTROL_TRAVERSE_ALL
This makes sure that CTDB_CONTROL TRAVERSE_ALL is compatible with older versions
of CTDB (i.e. 1.2.39 and 1.2.40 branches).
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
5808f0778b39b79ab7a5c7f53ad27947131386ec)
Amitay Isaacs [Thu, 11 Apr 2013 03:18:36 +0000 (13:18 +1000)]
traverse: Add CTDB_CONTROL_TRAVERSE_ALL_EXT to support withemptyrecords
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
e691df43d20871468142c8fb83f7c7303c4ec307)
Amitay Isaacs [Thu, 11 Apr 2013 06:58:59 +0000 (16:58 +1000)]
tests: Fix typo in variable name
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
043e18a8324ccb2c8ddd7b323ebedb5b0de1298d)
Amitay Isaacs [Wed, 27 Mar 2013 01:32:43 +0000 (12:32 +1100)]
tools/ltdbtool: Fix handling of -e option
Also, include description of -e option in usage.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
35264e42ade4676468cf7713fa339c784e932953)
Amitay Isaacs [Fri, 5 Apr 2013 02:34:06 +0000 (13:34 +1100)]
recoverd/takeover: Use IP->node mapping info from nodes hosting that IP
When collating IP information for IP layout, only trust the nodes that are
hosting an IP, to have correct information about that IP. Ignore what all the
other nodes think.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1c7adbccc69ac276d2b957ad16c3802fdb8868ca)
Amitay Isaacs [Wed, 3 Apr 2013 03:44:08 +0000 (14:44 +1100)]
statd-callout: Make sure statd callout script always runs as root
In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
fe8c4880b371492a38554868d4ca10918c54e412)
Amitay Isaacs [Mon, 18 Mar 2013 02:45:08 +0000 (13:45 +1100)]
client: Set the socket non-blocking only after connect succeeds
If the socket is set non-blocking before connect, then we should catch
EAGAIN errors and retry. Instead of adding a random number of retries,
better to wait for connect to succeed and then set the socket to
non-blocking.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
524ec206e6a5e8b11723f4d8d1251ed5d84063b0)
Amitay Isaacs [Fri, 5 Apr 2013 02:19:34 +0000 (13:19 +1100)]
Revert "client: handle transient connection errors"
This reverts commit
dc0c58547cd4b20a8e2cd21f3c8363f34fd03e75.
There is a simpler solution that retrying random number of times. Do not set
socket non-blocking till connect succeeds.
(This used to be ctdb commit
74acc2c568300ef42740cf11299a1b2507047f60)
Volker Lendecke [Wed, 3 Apr 2013 12:59:21 +0000 (14:59 +0200)]
common/messaging: Use the jenkins hash in ctdb_message
This give a better hash distribution
(This used to be ctdb commit
f7f8bde2376f8180a0dca6d7b8d7d2a4a12f4bd8)
Volker Lendecke [Fri, 5 Apr 2013 02:11:31 +0000 (13:11 +1100)]
common/messaging: use tdb_parse_record in message_list_db_fetch
This avoids malloc/free in a hot code path.
(This used to be ctdb commit
c137531fae8f7f6392746ce1b9ac6f219775fc29)
Amitay Isaacs [Wed, 3 Apr 2013 04:08:14 +0000 (15:08 +1100)]
common/messaging: Abstract db related operations inside db functions
This simplifies the use of message indexdb API and abstracts tdb related code
inside the API.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
bf7296ce9b98563bcb8426cd035dbeab6d884f59)
Amitay Isaacs [Tue, 2 Apr 2013 05:57:51 +0000 (16:57 +1100)]
common/messaging: Don't forget to free the result returned by tdb_fetch()
This fixes a memory leak in the messaging code.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
20be1f991dd75c2333c9ec9db226432a819f57ba)
Amitay Isaacs [Tue, 2 Apr 2013 01:08:39 +0000 (12:08 +1100)]
common/messaging: Free message list header if all message handlers are freed
This makes sure that even if the srvids are not deregistered, the header
structure is freed when the last message handler has been freed as a result of
client going away.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
4e1ec7412866f2d31c41de1bec0fbf788c03051b)
Sumit Bose [Mon, 25 Mar 2013 11:28:31 +0000 (12:28 +0100)]
build: Fix for tevent autoconf check
The list of include files is the 4th argument of AC_CHECK_DECLS.
(This used to be ctdb commit
85b777196289646ca37e06ebbf1f7a684d0aabc5)
Amitay Isaacs [Wed, 13 Mar 2013 11:57:44 +0000 (22:57 +1100)]
util: Add hex_decode_talloc() to decode hex string into a binary blob
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
307416afda707b687f5e89e8438e45c154a4c806)
Amitay Isaacs [Wed, 13 Mar 2013 00:46:18 +0000 (11:46 +1100)]
logging: Do not ignore stdout/stderr from the exec'd children
To log debugging information from child processes that are started
with vfork and exec, do not set close_on_exec on STDOUT and STDERR for
that process.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
08c53ee609b80f87450a7a1d7dd24fbcdf5ab7bc)
Michael Adam [Fri, 22 Feb 2013 11:42:10 +0000 (12:42 +0100)]
server:persistent: fix a debug message (copy'n'paste error)
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
87c89b7c2a14e2ee79a3efc7e8125842bc04bf23)
Volker Lendecke [Tue, 12 Mar 2013 12:53:58 +0000 (13:53 +0100)]
fix a typo
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
98abd344342a011a8599411deae79f94abc09541)
Amitay Isaacs [Fri, 22 Feb 2013 01:59:39 +0000 (12:59 +1100)]
common/io: For scheduling immediate events use tevent_schedule_immediate
tevent_schedule_immediate() is much more efficient at handling events that need
to be processed immediately rather than creating timed events with
timeval_zero().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
11734be353a1e246163eda631d35dfe55d1d6fb1)
Amitay Isaacs [Thu, 21 Feb 2013 02:16:15 +0000 (13:16 +1100)]
ctdbd: Add an index db for message list for faster searches
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers. Using a hash based index significantly improves the
performance of search in a linked list.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
3e09f25d419635f6dd679b48fa65370f7860be7d)
Martin Schwenke [Wed, 27 Feb 2013 05:01:55 +0000 (16:01 +1100)]
tools/ctdb: delip no longer fails if IP can not be moved
Moving the IP is an optimisation so should not cause failure.
Refactor and simplify the retry-move-IP into new function
try_moveip().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
5402f85dde045576cbaf64e01c68e28ed52204e8)
Michael Adam [Fri, 22 Feb 2013 10:36:00 +0000 (11:36 +0100)]
server:persistent: fix a comment typo.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
6455ce5e4980a63d56ed30f7059869c8356c12ea)
Martin Schwenke [Mon, 18 Feb 2013 05:39:00 +0000 (16:39 +1100)]
recoverd: update_capabilities() should use connected nodes
... as the comment says... not just active nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
4f71dca8df19a63f198e2d6d59e605b49ec5e803)
Martin Schwenke [Tue, 19 Feb 2013 03:30:50 +0000 (14:30 +1100)]
client: Refactor node listing functions to use list_of_nodes()
This reduces repetition.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f505020a5720faa4ecc6414e0bfaa6b3c0e47291)
Martin Schwenke [Tue, 19 Feb 2013 03:29:06 +0000 (14:29 +1100)]
client: New generic node listing function list_of_nodes()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a73bb56991b8c07ed0e9517ffcf0dc264be30487)
Amitay Isaacs [Thu, 17 Jan 2013 23:42:14 +0000 (10:42 +1100)]
common/io: Rewrite socket handling code to read all available data
This improves the processing of packets considerably. It has been
observed that there can be as many as 10 packets in the socket buffer and
the current code of reading a single packet from a socket at a time is
not very optimal. This change reads all the bytes from socket buffer and
then parses to extract multiple packets. If there are multiple packets,
set up a timed event to process next packet.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
d788bc8f7212b7dc1587ae592242dc8c876f4053)
Martin Schwenke [Fri, 15 Feb 2013 00:18:45 +0000 (11:18 +1100)]
doc: Fix typo in ctdbd manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
855ab348901edb3ec1327499a43f509d279b8182)
Amitay Isaacs [Mon, 11 Feb 2013 02:23:47 +0000 (13:23 +1100)]
ctdbd: Fix the PullDBPreallocation size to 10MB as intended
In
1f262deaad0818f159f9c68330f7fec121679023, Ronnie changed recovery code
to allocate chunks of 10MB in traverse_pulldb() and traverse_recdb(). The
tunable PullDBPreallocation size was set to 100MB.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e204fac03412520e877ab04363b3ece02667c55b)
Amitay Isaacs [Mon, 11 Feb 2013 00:25:49 +0000 (11:25 +1100)]
eventscripts: Remove calls to "smbstatus -np" for samba cleanup
This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
053b89c6dbce47001505524606889334559d2ec4)
Martin Schwenke [Wed, 6 Feb 2013 03:15:11 +0000 (14:15 +1100)]
Logging: Fix breakage when freeing the log ringbuffer
Commit
a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer. The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!
Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function). The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.
This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed. There are 3 special cases:
* Forking the recovery daemon. We want to be able to fetch from the
ringbuffer there.
* The ringbuffer fetching code. Change the 2 calls in this code (main
daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
instead.
While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.
Note to self: always test... even the most obvious patches... ;-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
00db5fa00474f8a83f1aa3b603fd756cc9b49ff4)
Volker Lendecke [Wed, 6 Feb 2013 09:28:37 +0000 (10:28 +0100)]
Fix a comment typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
b940e3a24daa73ca9b2896b7a449240136442b53)
Martin Schwenke [Tue, 5 Feb 2013 02:16:46 +0000 (13:16 +1100)]
initscript: export CTDB_EXTERNAL_TRACE
This means it can be set like any other configuration option in the
configuration file, without needing to export it there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a0ef73e197dc9147f7718e0813fe803ff0b3d54d)
Martin Schwenke [Tue, 5 Feb 2013 03:36:29 +0000 (14:36 +1100)]
ctdbd: Don't use a fixed length buffer for the hung script command
The amount of data to write into the buffer wasn't constrained
anywhere...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
9b0d56b16775aa16f33bdfdf831256e085fa3339)
Martin Schwenke [Tue, 5 Feb 2013 03:25:01 +0000 (14:25 +1100)]
ctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable
This is quite easy to misconfigure by failing to set the execute bit
on the script. Better to complain loudly.
This is a debugging facilty rather than core CTDB functionality, so it
doesn't need a subtle mechanism to disable it at run-time. To disable
the designated script at run-time either edit it to put an "exit 0" at
the top or move it aside and symlink to /bin/true.
This is implemented by actually removing the code that checks that the
file exists and is executable. The output from the shell when the
system() function fails is just as useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3400b2ed34b6eb9496eb55f1aab6f89d2952060d)
Martin Schwenke [Tue, 5 Feb 2013 04:49:52 +0000 (15:49 +1100)]
ctdbd: Remove command-line option --debug-hung-script
Use an environment variable instead. This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.
The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each. So, the convention
will be to use an environment variable for each debug option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
0581f9a84e58764d194f4e04064c2c5b393c348b)
Martin Schwenke [Tue, 5 Feb 2013 02:08:55 +0000 (13:08 +1100)]
ctdbd: Remove debug_hung_script_ctx
The only allocation against this context is by
ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler()
anyway. There should be no memory leak.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
501461cc3e132d4adee9e91b5d4513a26bae2846)
Martin Schwenke [Thu, 10 Jan 2013 03:39:09 +0000 (14:39 +1100)]
ctdbd: Message logged at exit should be different for different processes
Some subprocesses print "CTDB daemon shutting down" when they exit and
this can be confusing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f1ffe1112b7e342d7f1228ca816a8e5918f893cf)
Amitay Isaacs [Tue, 22 Jan 2013 02:27:20 +0000 (13:27 +1100)]
daemon: Make sure all the traverse children are terminated if traverse times out
When traverse times out, callback function is called with key and data set to
tdb_null. This is also the way to signal end of traverse. So if the traverse
times out, callback function treats it as traverse ended and frees state without
calling the destructor.
Keep track if the traverse timed out, so callback function can take appropriate
action for traverse timeout and traverse end.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
35da9a7c2a0f5e54e61588c3c3455f06ebc66822)
Martin Schwenke [Tue, 5 Feb 2013 01:09:36 +0000 (12:09 +1100)]
Logging: Free the ringbuffer in child processes created with ctdb_fork()
At the moment the log ringbuffer is duplicated in every child process.
Althought it is copy-on-write we want to see if it is contributing to
out-of-memory situations when there are a lot of children.
The ringbuffer isn't accessible from any of the children anyway...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a82d3ec12f0fda16d6bfa8442a07595de897c10e)
Martin Schwenke [Tue, 5 Feb 2013 01:08:11 +0000 (12:08 +1100)]
Logging: New function ctdb_log_ringbuffer_free()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a4f622e85168f59417c11705f1734e0352e1d44a)
Martin Schwenke [Tue, 5 Feb 2013 01:13:57 +0000 (12:13 +1100)]
build: Fix a Makefile.in typo
Objects are named *.o ;-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
25a20409fb39a94b64c13990c0eba4f75d482ecd)
Martin Schwenke [Fri, 11 Jan 2013 01:39:37 +0000 (12:39 +1100)]
tools/ctdb: Fix a compiler warning
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
d1ec06d30148e6fd344625a2fbf1c22391bd908a)
Amitay Isaacs [Wed, 23 Jan 2013 03:35:47 +0000 (14:35 +1100)]
recoverd: Fix printing of node flags from local information
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
124e2a471aeda9c900fd898178a30522d7d74221)
Mathieu Parent [Mon, 14 Jan 2013 16:48:01 +0000 (17:48 +0100)]
common: Don't lie on unimplemented gratuitous arp
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
b054193d1d19a8eef998fa690899501f79badb8a)
Mathieu Parent [Mon, 14 Jan 2013 16:21:01 +0000 (17:21 +0100)]
tests: Test portability
Curiously test_ctdb_sys_check_iface_exists fails on Linux
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
109f428aa34f8f4cc0329880d2f4a5593a6cc6f3)
Mathieu Parent [Mon, 14 Jan 2013 11:13:24 +0000 (12:13 +0100)]
common: FreeBSD+kFreeBSD: Implement get_process_name (same as in Linux)
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
258092aaf6b7a9bdc14f0fb35e8bd7f7dc742b3f)
Mathieu Parent [Mon, 14 Jan 2013 10:23:46 +0000 (11:23 +0100)]
common: Detailed platform-specific FIXME
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
d202b2fdd4fd70172e5e44583627b57a1b7ad2ed)
Mathieu Parent [Sun, 13 Jan 2013 13:15:20 +0000 (14:15 +0100)]
build: Update config.guess 2012-12-30 and config.sub to 2013-01-11
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
3c6a9b73364c9543366fa033c778145dc7a152a9)
Mathieu Parent [Sat, 12 Jan 2013 15:43:03 +0000 (16:43 +0100)]
doc: allows to -> allows one to
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
95fc493a7d4145f976cb3fe928d9e92faec4dd71)
Mathieu Parent [Sat, 12 Jan 2013 14:14:48 +0000 (15:14 +0100)]
build: Add missing LDFLAGS
Original Author: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Mathieu Parent <math.parent@gmail.com>
(This used to be ctdb commit
506ecd186759675a1cf50a0a05a285fee03fc51e)
Srikrishan Malik [Wed, 9 Jan 2013 10:41:39 +0000 (16:11 +0530)]
Changes for unobtrusive recovery and new method for health check.
Unobtrusive recovery: Ganesha will not be restarted on failovers.
Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>
(This used to be ctdb commit
0e651e9da0f1f3c836b4474612ab13d0ccd272d9)
Amitay Isaacs [Wed, 9 Jan 2013 05:22:39 +0000 (16:22 +1100)]
recoverd: Create recoverd monitoring timed events off recoverd context
This ensures that when shutting down CTDB, all the timed events
associated with monitoring recoverd are destroyed and recoverd
is not restarted.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7393e2b290f9879ff72d5c5a9ce933034129f0e8)
Amitay Isaacs [Mon, 29 Oct 2012 03:56:10 +0000 (14:56 +1100)]
daemon: Protect against double free of callback state while shutting down
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it. This includes
callback state for current_monitor, if the current monitor event has
not yet finished. As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.
So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)