obnox/ctdb.git
11 years agotests: add a comment to recovery db corruption test master-rrr-recovery-fix
Michael Adam [Wed, 17 Apr 2013 11:08:49 +0000 (13:08 +0200)]
tests: add a comment to recovery db corruption test

The comment explains that we use "ctdb stop" and "ctdb continue"
but we should use "ctdb setcrecmasterrole off".

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agotests: Add a test for subsequent recoveries corrupting databases
Amitay Isaacs [Thu, 11 Apr 2013 06:59:36 +0000 (16:59 +1000)]
tests: Add a test for subsequent recoveries corrupting databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
11 years agotests: Support waiting for "recovered" state in tests
Amitay Isaacs [Thu, 11 Apr 2013 06:58:34 +0000 (16:58 +1000)]
tests: Support waiting for "recovered" state in tests

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
11 years agoctdb_call: don't bump the rsn in ctdb_become_dmaster() any more
Michael Adam [Wed, 3 Apr 2013 10:02:59 +0000 (12:02 +0200)]
ctdb_call: don't bump the rsn in ctdb_become_dmaster() any more

This is now done in ctdb_ltdb_store_server(), so this
extra bump can be spared.

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agoFix a severe recovery bug that can lead to data corruption for SMB clients.
Michael Adam [Wed, 3 Apr 2013 09:40:25 +0000 (11:40 +0200)]
Fix a severe recovery bug that can lead to data corruption for SMB clients.

Problem:
Recovery can under certain circumstances lead to old record copies
resurrecting: Recovery selects the newest record copy purely by RSN. At
the end of the recovery, the recovery master is the dmaster for all
records in all (non-persistent) databases. And the other nodes locally
hold the complete copy of the databases. The bug is that the recovery
process does not increment the RSN on the recovery master at the end of
the recovery. Now clients acting directly on the Recovery master will
directly change a record's content on the recmaster without migration
and hence without RSN bump.  So a subsequent recovery can not tell that
the recmaster's copy is newer than the copies on the other nodes, since
their RSN is the same. Hence, if the recmaster is not node 0 (or more
precisely not the active node with the lowest node number), the recovery
will choose copies from nodes with lower number and stick to these.

Here is how to reproduce:

- assume we have a cluster with at least 2 nodes
- ensure that the recmaster is not node 0
  (maybe ensure with "onnode 0 ctdb setrecmasterrole off")
  say recmaster is node 1
- choose a new database name, say "test1.tdb"
  (make sure it is not yet attached as persistent)
- choose a key name, say "key1"
- all clustere nodes should ok and no recovery running
- now do the following on node 1:

1. dbwrap_tool test1.tdb store key1 uint32 1
2. dbwrap_tool test1.tdb fetch key1 uint32
   ==> 1
3. ctdb recover
4. dbwrap_tool test1.tdb store key1 uint32 2
5. dbwrap_tool test1.tdb fetch key1 uint32
   ==> 2
4. ctdb recover
7. dbwrap_tool test1.tdb fetch key1 uint32
   ==> 1
   ==> BUG

This is a very severe bug, since when applied to Samba's locking.tdb
database, it means that for SMB clients on clustered Samba there is
the potential for locking out oneself from previously opened files
or even worse, data corruption:

Case 1: locking out

- client on recmaster opens file
- recovery propagates open file handle (entry in locking.tdb) to
  other nodes
- client closes file
- client opens the same file
- recovery resurrects old copy of open file record in locking.tdb
  from lower node
- client closes file but fails to delete entry in locking.tdb
- client tries to open same file again but fails, since
  the old record locks it out (since the client is still connected)

Case 2: data corruption

- clien1 on recmaster opens file
- recovery propagates open file info to other nodes
- client1 closes the file and disconnects
- client2 opens the same file
- recovery resurrects old copy of locking.tdb record,
  where client2 has no entry, but client1 has.
- but client2 believes it still has a handle
- client3 opens the file and succees without
  conflicting with client2
  (the detached entry for client1 is discarded because
   the server does not exist any more).
=> both client2 and client3 believe they have exclusive
  access to the file and writing creates data corruption

Fix:

When storing a record on the dmaster, bump its RSN.

The ctdb_ltdb_store_server() is the central function for storing
a record to a local tdb from the ctdbd server context.
So this is also the place where the RSN of the record to be stored
should be incremented, when storing on the dmaster.

For the case of the record migration, this is currently done in
ctdb_become_dmaster() in ctdb_call.c, but there are other places
such as in recovery, where we should bump the RSN, but currently
don't do it.

So moving the RSN incrementation into ctdb_ltdb_store_server fixes
the recovery-record-resurrection bug.

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agologging: fix comment typo
Michael Adam [Mon, 15 Apr 2013 10:50:42 +0000 (12:50 +0200)]
logging: fix comment typo

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: unimplement the unused SET_DMASTER control
Michael Adam [Wed, 3 Apr 2013 12:03:32 +0000 (14:03 +0200)]
ctdbd: unimplement the unused SET_DMASTER control

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
11 years agorecoverd: remove bogus comment "qqq" from "add prototype new banning code"
Michael Adam [Fri, 22 Mar 2013 16:48:00 +0000 (17:48 +0100)]
recoverd: remove bogus comment "qqq" from "add prototype new banning code"

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
11 years agobuild: silence building of porting_test
Michael Adam [Fri, 5 Apr 2013 14:55:18 +0000 (16:55 +0200)]
build: silence building of porting_test

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
11 years agotraverse: Ensure backward compatibility for CTDB_CONTROL_TRAVERSE_ALL
Amitay Isaacs [Thu, 11 Apr 2013 03:20:09 +0000 (13:20 +1000)]
traverse: Ensure backward compatibility for CTDB_CONTROL_TRAVERSE_ALL

This makes sure that CTDB_CONTROL TRAVERSE_ALL is compatible with older versions
of CTDB (i.e. 1.2.39 and 1.2.40 branches).

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
11 years agotraverse: Add CTDB_CONTROL_TRAVERSE_ALL_EXT to support withemptyrecords
Amitay Isaacs [Thu, 11 Apr 2013 03:18:36 +0000 (13:18 +1000)]
traverse: Add CTDB_CONTROL_TRAVERSE_ALL_EXT to support withemptyrecords

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
11 years agotests: Fix typo in variable name
Amitay Isaacs [Thu, 11 Apr 2013 06:58:59 +0000 (16:58 +1000)]
tests: Fix typo in variable name

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotools/ltdbtool: Fix handling of -e option
Amitay Isaacs [Wed, 27 Mar 2013 01:32:43 +0000 (12:32 +1100)]
tools/ltdbtool: Fix handling of -e option

Also, include description of -e option in usage.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agorecoverd/takeover: Use IP->node mapping info from nodes hosting that IP
Amitay Isaacs [Fri, 5 Apr 2013 02:34:06 +0000 (13:34 +1100)]
recoverd/takeover: Use IP->node mapping info from nodes hosting that IP

When collating IP information for IP layout, only trust the nodes that are
hosting an IP, to have correct information about that IP.  Ignore what all the
other nodes think.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agostatd-callout: Make sure statd callout script always runs as root
Amitay Isaacs [Wed, 3 Apr 2013 03:44:08 +0000 (14:44 +1100)]
statd-callout: Make sure statd callout script always runs as root

In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

11 years agoclient: Set the socket non-blocking only after connect succeeds
Amitay Isaacs [Mon, 18 Mar 2013 02:45:08 +0000 (13:45 +1100)]
client: Set the socket non-blocking only after connect succeeds

If the socket is set non-blocking before connect, then we should catch
EAGAIN errors and retry. Instead of adding a random number of retries,
better to wait for connect to succeed and then set the socket to
non-blocking.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoRevert "client: handle transient connection errors"
Amitay Isaacs [Fri, 5 Apr 2013 02:19:34 +0000 (13:19 +1100)]
Revert "client: handle transient connection errors"

This reverts commit dc0c58547cd4b20a8e2cd21f3c8363f34fd03e75.

There is a simpler solution that retrying random number of times. Do not set
socket non-blocking till connect succeeds.

11 years agocommon/messaging: Use the jenkins hash in ctdb_message
Volker Lendecke [Wed, 3 Apr 2013 12:59:21 +0000 (14:59 +0200)]
common/messaging: Use the jenkins hash in ctdb_message

This give a better hash distribution

11 years agocommon/messaging: use tdb_parse_record in message_list_db_fetch
Volker Lendecke [Fri, 5 Apr 2013 02:11:31 +0000 (13:11 +1100)]
common/messaging: use tdb_parse_record in message_list_db_fetch

This avoids malloc/free in a hot code path.

11 years agocommon/messaging: Abstract db related operations inside db functions
Amitay Isaacs [Wed, 3 Apr 2013 04:08:14 +0000 (15:08 +1100)]
common/messaging: Abstract db related operations inside db functions

This simplifies the use of message indexdb API and abstracts tdb related code
inside the API.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agocommon/messaging: Don't forget to free the result returned by tdb_fetch()
Amitay Isaacs [Tue, 2 Apr 2013 05:57:51 +0000 (16:57 +1100)]
common/messaging: Don't forget to free the result returned by tdb_fetch()

This fixes a memory leak in the messaging code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agocommon/messaging: Free message list header if all message handlers are freed
Amitay Isaacs [Tue, 2 Apr 2013 01:08:39 +0000 (12:08 +1100)]
common/messaging: Free message list header if all message handlers are freed

This makes sure that even if the srvids are not deregistered, the header
structure is freed when the last message handler has been freed as a result of
client going away.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agobuild: Fix for tevent autoconf check
Sumit Bose [Mon, 25 Mar 2013 11:28:31 +0000 (12:28 +0100)]
build: Fix for tevent autoconf check

The list of include files is the 4th argument of AC_CHECK_DECLS.

11 years agoutil: Add hex_decode_talloc() to decode hex string into a binary blob
Amitay Isaacs [Wed, 13 Mar 2013 11:57:44 +0000 (22:57 +1100)]
util: Add hex_decode_talloc() to decode hex string into a binary blob

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agologging: Do not ignore stdout/stderr from the exec'd children
Amitay Isaacs [Wed, 13 Mar 2013 00:46:18 +0000 (11:46 +1100)]
logging: Do not ignore stdout/stderr from the exec'd children

To log debugging information from child processes that are started
with vfork and exec, do not set close_on_exec on STDOUT and STDERR for
that process.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoserver:persistent: fix a debug message (copy'n'paste error)
Michael Adam [Fri, 22 Feb 2013 11:42:10 +0000 (12:42 +0100)]
server:persistent: fix a debug message (copy'n'paste error)

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agofix a typo
Volker Lendecke [Tue, 12 Mar 2013 12:53:58 +0000 (13:53 +0100)]
fix a typo

Reviewed-by: Michael Adam <obnox@samba.org>
11 years agocommon/io: For scheduling immediate events use tevent_schedule_immediate
Amitay Isaacs [Fri, 22 Feb 2013 01:59:39 +0000 (12:59 +1100)]
common/io: For scheduling immediate events use tevent_schedule_immediate

tevent_schedule_immediate() is much more efficient at handling events that need
to be processed immediately rather than creating timed events with
timeval_zero().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Add an index db for message list for faster searches
Amitay Isaacs [Thu, 21 Feb 2013 02:16:15 +0000 (13:16 +1100)]
ctdbd: Add an index db for message list for faster searches

When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers.  Using a hash based index significantly improves the
performance of search in a linked list.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotools/ctdb: delip no longer fails if IP can not be moved
Martin Schwenke [Wed, 27 Feb 2013 05:01:55 +0000 (16:01 +1100)]
tools/ctdb: delip no longer fails if IP can not be moved

Moving the IP is an optimisation so should not cause failure.

Refactor and simplify the retry-move-IP into new function
try_moveip().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoserver:persistent: fix a comment typo.
Michael Adam [Fri, 22 Feb 2013 10:36:00 +0000 (11:36 +0100)]
server:persistent: fix a comment typo.

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agorecoverd: update_capabilities() should use connected nodes
Martin Schwenke [Mon, 18 Feb 2013 05:39:00 +0000 (16:39 +1100)]
recoverd: update_capabilities() should use connected nodes

... as the comment says... not just active nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoclient: Refactor node listing functions to use list_of_nodes()
Martin Schwenke [Tue, 19 Feb 2013 03:30:50 +0000 (14:30 +1100)]
client: Refactor node listing functions to use list_of_nodes()

This reduces repetition.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agoclient: New generic node listing function list_of_nodes()
Martin Schwenke [Tue, 19 Feb 2013 03:29:06 +0000 (14:29 +1100)]
client: New generic node listing function list_of_nodes()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agocommon/io: Rewrite socket handling code to read all available data
Amitay Isaacs [Thu, 17 Jan 2013 23:42:14 +0000 (10:42 +1100)]
common/io: Rewrite socket handling code to read all available data

This improves the processing of packets considerably.  It has been
observed that there can be as many as 10 packets in the socket buffer and
the current code of reading a single packet from a socket at a time is
not very optimal.  This change reads all the bytes from socket buffer and
then parses to extract multiple packets.  If there are multiple packets,
set up a timed event to process next packet.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agodoc: Fix typo in ctdbd manpage
Martin Schwenke [Fri, 15 Feb 2013 00:18:45 +0000 (11:18 +1100)]
doc: Fix typo in ctdbd manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Fix the PullDBPreallocation size to 10MB as intended
Amitay Isaacs [Mon, 11 Feb 2013 02:23:47 +0000 (13:23 +1100)]
ctdbd: Fix the PullDBPreallocation size to 10MB as intended

In 1f262deaad0818f159f9c68330f7fec121679023, Ronnie changed recovery code
to allocate chunks of 10MB in traverse_pulldb() and traverse_recdb().  The
tunable PullDBPreallocation size was set to 100MB.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoeventscripts: Remove calls to "smbstatus -np" for samba cleanup
Amitay Isaacs [Mon, 11 Feb 2013 00:25:49 +0000 (11:25 +1100)]
eventscripts: Remove calls to "smbstatus -np" for samba cleanup

This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoLogging: Fix breakage when freeing the log ringbuffer
Martin Schwenke [Wed, 6 Feb 2013 03:15:11 +0000 (14:15 +1100)]
Logging: Fix breakage when freeing the log ringbuffer

Commit a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer.  The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!

Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function).  The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.

This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed.  There are 3 special cases:

* Forking the recovery daemon.  We want to be able to fetch from the
  ringbuffer there.

* The ringbuffer fetching code.  Change the 2 calls in this code (main
  daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
  instead.

While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.

Note to self: always test... even the most obvious patches...  ;-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoFix a comment typo
Volker Lendecke [Wed, 6 Feb 2013 09:28:37 +0000 (10:28 +0100)]
Fix a comment typo

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
11 years agoinitscript: export CTDB_EXTERNAL_TRACE
Martin Schwenke [Tue, 5 Feb 2013 02:16:46 +0000 (13:16 +1100)]
initscript: export CTDB_EXTERNAL_TRACE

This means it can be set like any other configuration option in the
configuration file, without needing to export it there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Don't use a fixed length buffer for the hung script command
Martin Schwenke [Tue, 5 Feb 2013 03:36:29 +0000 (14:36 +1100)]
ctdbd: Don't use a fixed length buffer for the hung script command

The amount of data to write into the buffer wasn't constrained
anywhere...

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable
Martin Schwenke [Tue, 5 Feb 2013 03:25:01 +0000 (14:25 +1100)]
ctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable

This is quite easy to misconfigure by failing to set the execute bit
on the script.  Better to complain loudly.

This is a debugging facilty rather than core CTDB functionality, so it
doesn't need a subtle mechanism to disable it at run-time.  To disable
the designated script at run-time either edit it to put an "exit 0" at
the top or move it aside and symlink to /bin/true.

This is implemented by actually removing the code that checks that the
file exists and is executable.  The output from the shell when the
system() function fails is just as useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Remove command-line option --debug-hung-script
Martin Schwenke [Tue, 5 Feb 2013 04:49:52 +0000 (15:49 +1100)]
ctdbd: Remove command-line option --debug-hung-script

Use an environment variable instead.  This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.

The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each.  So, the convention
will be to use an environment variable for each debug option.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Remove debug_hung_script_ctx
Martin Schwenke [Tue, 5 Feb 2013 02:08:55 +0000 (13:08 +1100)]
ctdbd: Remove debug_hung_script_ctx

The only allocation against this context is by
ctdb_fork_with_logging().  This memory is freed by ctdb_log_handler()
anyway.  There should be no memory leak.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Message logged at exit should be different for different processes
Martin Schwenke [Thu, 10 Jan 2013 03:39:09 +0000 (14:39 +1100)]
ctdbd: Message logged at exit should be different for different processes

Some subprocesses print "CTDB daemon shutting down" when they exit and
this can be confusing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agodaemon: Make sure all the traverse children are terminated if traverse times out
Amitay Isaacs [Tue, 22 Jan 2013 02:27:20 +0000 (13:27 +1100)]
daemon: Make sure all the traverse children are terminated if traverse times out

When traverse times out, callback function is called with key and data set to
tdb_null.  This is also the way to signal end of traverse.  So if the traverse
times out, callback function treats it as traverse ended and frees state without
calling the destructor.

Keep track if the traverse timed out, so callback function can take appropriate
action for traverse timeout and traverse end.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoLogging: Free the ringbuffer in child processes created with ctdb_fork()
Martin Schwenke [Tue, 5 Feb 2013 01:09:36 +0000 (12:09 +1100)]
Logging: Free the ringbuffer in child processes created with ctdb_fork()

At the moment the log ringbuffer is duplicated in every child process.
Althought it is copy-on-write we want to see if it is contributing to
out-of-memory situations when there are a lot of children.

The ringbuffer isn't accessible from any of the children anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoLogging: New function ctdb_log_ringbuffer_free()
Martin Schwenke [Tue, 5 Feb 2013 01:08:11 +0000 (12:08 +1100)]
Logging: New function ctdb_log_ringbuffer_free()

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agobuild: Fix a Makefile.in typo
Martin Schwenke [Tue, 5 Feb 2013 01:13:57 +0000 (12:13 +1100)]
build: Fix a Makefile.in typo

Objects are named *.o  ;-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotools/ctdb: Fix a compiler warning
Martin Schwenke [Fri, 11 Jan 2013 01:39:37 +0000 (12:39 +1100)]
tools/ctdb: Fix a compiler warning

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agorecoverd: Fix printing of node flags from local information
Amitay Isaacs [Wed, 23 Jan 2013 03:35:47 +0000 (14:35 +1100)]
recoverd: Fix printing of node flags from local information

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agocommon: Don't lie on unimplemented gratuitous arp
Mathieu Parent [Mon, 14 Jan 2013 16:48:01 +0000 (17:48 +0100)]
common: Don't lie on unimplemented gratuitous arp

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agotests: Test portability
Mathieu Parent [Mon, 14 Jan 2013 16:21:01 +0000 (17:21 +0100)]
tests: Test portability

Curiously test_ctdb_sys_check_iface_exists fails on Linux

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agocommon: FreeBSD+kFreeBSD: Implement get_process_name (same as in Linux)
Mathieu Parent [Mon, 14 Jan 2013 11:13:24 +0000 (12:13 +0100)]
common: FreeBSD+kFreeBSD: Implement get_process_name (same as in Linux)

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agocommon: Detailed platform-specific FIXME
Mathieu Parent [Mon, 14 Jan 2013 10:23:46 +0000 (11:23 +0100)]
common: Detailed platform-specific FIXME

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agobuild: Update config.guess 2012-12-30 and config.sub to 2013-01-11
Mathieu Parent [Sun, 13 Jan 2013 13:15:20 +0000 (14:15 +0100)]
build: Update config.guess 2012-12-30 and config.sub to 2013-01-11

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agodoc: allows to -> allows one to
Mathieu Parent [Sat, 12 Jan 2013 15:43:03 +0000 (16:43 +0100)]
doc: allows to -> allows one to

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agobuild: Add missing LDFLAGS
Mathieu Parent [Sat, 12 Jan 2013 14:14:48 +0000 (15:14 +0100)]
build: Add missing LDFLAGS

Original Author: Simon Ruderich <simon@ruderich.org>

Signed-off-by: Mathieu Parent <math.parent@gmail.com>
11 years agoChanges for unobtrusive recovery and new method for health check.
Srikrishan Malik [Wed, 9 Jan 2013 10:41:39 +0000 (16:11 +0530)]
Changes for unobtrusive recovery and new method for health check.

Unobtrusive recovery: Ganesha will not be restarted on failovers.

Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>
11 years agorecoverd: Create recoverd monitoring timed events off recoverd context
Amitay Isaacs [Wed, 9 Jan 2013 05:22:39 +0000 (16:22 +1100)]
recoverd: Create recoverd monitoring timed events off recoverd context

This ensures that when shutting down CTDB, all the timed events
associated with monitoring recoverd are destroyed and recoverd
is not restarted.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agodaemon: Protect against double free of callback state while shutting down
Amitay Isaacs [Mon, 29 Oct 2012 03:56:10 +0000 (14:56 +1100)]
daemon: Protect against double free of callback state while shutting down

When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it.  This includes
callback state for current_monitor, if the current monitor event has
not yet finished.  As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.

So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agodaemon: On shutdown, destroy timed events that check if recoverd is active
Amitay Isaacs [Tue, 4 Dec 2012 04:05:44 +0000 (15:05 +1100)]
daemon: On shutdown, destroy timed events that check if recoverd is active

When CTDB is shutting down, recovery daemon is stopped, but the
event that checks if recovery daemon is still alive is not destroyed.
So recovery master is restarted during shutdown if CTDB daemon takes
longer to shutdown.

There are two processes that check if recovery daemon is working.

1. ctdb_check_recd() - which checks every 30 seconds if the recovery
   daemon process exists.

2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon
   fails to ping CTDB daemon.

Both the events are periodic and need to be destroyed when shutting down.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotests: Add a test for recovery of persistent databases
Amitay Isaacs [Tue, 18 Dec 2012 01:52:39 +0000 (12:52 +1100)]
tests: Add a test for recovery of persistent databases

Ensure that RSN based recovery and __db_sequence_number__ based recovery
methods for persistent databases work correctly.  They should not cause
corruption of the database.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotools/ctdb: Add setdbseqnum command to set __db_sequence_number__
Amitay Isaacs [Wed, 19 Dec 2012 04:14:42 +0000 (15:14 +1100)]
tools/ctdb: Add setdbseqnum command to set __db_sequence_number__

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotools/ctdb: Re-factor code to check if db exists given name or id
Amitay Isaacs [Wed, 19 Dec 2012 03:43:26 +0000 (14:43 +1100)]
tools/ctdb: Re-factor code to check if db exists given name or id

Most of the commands related to database operations can now use the
common code (db_exists()) to refer to database with either name or id.

In addition to return db_id for db_name, the function returns all the
flags set for the database.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agotools/ctdb: Add pdelete command to delete a record from persistent database
Amitay Isaacs [Mon, 17 Dec 2012 03:46:14 +0000 (14:46 +1100)]
tools/ctdb: Add pdelete command to delete a record from persistent database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agodaemon: Update the comment and remove redundant check in ctdb_start_transport()
Amitay Isaacs [Tue, 4 Dec 2012 03:58:30 +0000 (14:58 +1100)]
daemon: Update the comment and remove redundant check in ctdb_start_transport()

ctdb_start_transport() is called just before "setup" event, when CTDB
is ready to process the requests. "startup" event happens much later
after a successful recovery.

Transport method ctdb->methods is successfully initialized before
ctdb_start_transport() is called.  No need to check again.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoeventscripts: Fail the setup event if CTDB does not become ready
Martin Schwenke [Tue, 8 Jan 2013 05:49:56 +0000 (16:49 +1100)]
eventscripts: Fail the setup event if CTDB does not become ready

Currently it silently continues without attempting to set tunables.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: Make script_log() use supplied message, stop logger from hanging
Martin Schwenke [Fri, 4 Jan 2013 02:52:01 +0000 (13:52 +1100)]
scripts: Make script_log() use supplied message, stop logger from hanging

When using syslog any provided message arguments are ignored and not
passed to logger.  This means that logger blocks waiting on stdin.
That's bad.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: Rework ctdb-crash-cleanup.sh so that it uses existing functions
Martin Schwenke [Fri, 4 Jan 2013 00:41:03 +0000 (11:41 +1100)]
scripts: Rework ctdb-crash-cleanup.sh so that it uses existing functions

This improves maintainability.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: Make drop_all_public_ips() more robust
Martin Schwenke [Fri, 4 Jan 2013 00:23:29 +0000 (11:23 +1100)]
scripts: Make drop_all_public_ips() more robust

Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.

Factoring out some of the code will allow it to be used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Default value for debug_hung_script should use ETCDIR
Martin Schwenke [Thu, 3 Jan 2013 05:02:52 +0000 (16:02 +1100)]
ctdbd: Default value for debug_hung_script should use ETCDIR

That is, it should use whatever was specified in ./configure and
should not hardcode /etc.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: debug-hung-script.sh doesn't need functions/loadconfig
Martin Schwenke [Thu, 3 Jan 2013 04:33:57 +0000 (15:33 +1100)]
scripts: debug-hung-script.sh doesn't need functions/loadconfig

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: statd-callout should calculate CTDB_BASE if it is not set
Martin Schwenke [Thu, 3 Jan 2013 04:33:10 +0000 (15:33 +1100)]
scripts: statd-callout should calculate CTDB_BASE if it is not set

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoeventscripts: Each script should set CTDB_BASE if it is not set
Martin Schwenke [Thu, 3 Jan 2013 04:26:12 +0000 (15:26 +1100)]
eventscripts: Each script should set CTDB_BASE if it is not set

This makes it easier to run the scripts externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoscripts: Move drop_all_public_ips() to the functions file
Martin Schwenke [Thu, 3 Jan 2013 04:07:07 +0000 (15:07 +1100)]
scripts: Move drop_all_public_ips() to the functions file

... so it can be improved and used elsewhere.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/simple: Add test to check recovery daemon IP verification
Martin Schwenke [Fri, 12 Oct 2012 05:12:38 +0000 (16:12 +1100)]
tests/simple: Add test to check recovery daemon IP verification

Also update ips_are_on_nodeglob() to handle negation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/eventscripts: Ratchet down debug level for ctdb_takeover_tests
Martin Schwenke [Mon, 7 Jan 2013 23:21:49 +0000 (10:21 +1100)]
tests/eventscripts: Ratchet down debug level for ctdb_takeover_tests

The default IP allocation algorithm used by ctdb_takeover_tests
changed from "non-deterministic IPs" to "LCP2".  The latter generates
a lot more debug output.  ctdb_takeover_tests is used by the ctdb tool
stub to calculate IP address changes for failovers.  This resulted in
unexpected debug output that caused tests to fail.  Since eventscript
tests don't care how IP allocations are arrived at, the best solution
is to turn down the debug level.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Separate each IP allocation algorithm into its own function
Martin Schwenke [Fri, 14 Dec 2012 06:12:01 +0000 (17:12 +1100)]
recoverd: Separate each IP allocation algorithm into its own function

This makes the code much more readable and maintainable.

As a side effect, fix a memory leak in LCP2.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: New function unassign_unsuitable_ips()
Martin Schwenke [Thu, 13 Dec 2012 02:23:32 +0000 (13:23 +1100)]
recoverd: New function unassign_unsuitable_ips()

Move the code into a new function so it can be called from a number of
places.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Move failback retry loop into basic_failback() and lcp2_failback()
Martin Schwenke [Thu, 13 Dec 2012 01:15:32 +0000 (12:15 +1100)]
recoverd: Move failback retry loop into basic_failback() and lcp2_failback()

The retry loop is currently in ctdb_takeover_run_core().  Pushing it
into each function will make it possible to put each algorithm into a
separate top-level function.  This will make the code much clearer and
more maintainable.

Also keep associated test code compatible.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Trying to failback more IPs no longer allocates unassigned IPs
Martin Schwenke [Tue, 11 Dec 2012 04:49:17 +0000 (15:49 +1100)]
recoverd: Trying to failback more IPs no longer allocates unassigned IPs

Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so
there's no point looping back that far.

Also fix a unit test that now fails because looping back to handle
unassigned IPs is no longer logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: basic_failback() can call find_takeover_node() directly
Martin Schwenke [Tue, 11 Dec 2012 04:43:36 +0000 (15:43 +1100)]
recoverd: basic_failback() can call find_takeover_node() directly

Instead of unassigning, looping back and depending on
basic_allocate_unassigned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Don't do failback at all when deterministic IPs are in use
Martin Schwenke [Tue, 11 Dec 2012 04:01:12 +0000 (15:01 +1100)]
recoverd: Don't do failback at all when deterministic IPs are in use

This seems to be the right thing to do instead of calling into the
failback code and continually skipping the release of an IP.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' set
Martin Schwenke [Fri, 14 Dec 2012 06:10:41 +0000 (17:10 +1100)]
recoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' set

If this is done earlier then some other logic can be improved.  Also,
this should be a warning since no error condition is set.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Fix a memory leak in IP allocation
Martin Schwenke [Fri, 14 Dec 2012 06:10:05 +0000 (17:10 +1100)]
recoverd: Fix a memory leak in IP allocation

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Add some LCP2 tests for case when no node are healthy
Martin Schwenke [Thu, 20 Dec 2012 05:27:27 +0000 (16:27 +1100)]
tests/takeover: Add some LCP2 tests for case when no node are healthy

3 tests should assign IPs to all nodes.

3 tests set NoIPTakeoverOnDisabled=1 and should drop all IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Initial tests for deterministic IPs
Martin Schwenke [Thu, 20 Dec 2012 05:26:42 +0000 (16:26 +1100)]
tests/takeover: Initial tests for deterministic IPs

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Do output filtering for deterministic IPs algorithm too
Martin Schwenke [Thu, 20 Dec 2012 05:25:53 +0000 (16:25 +1100)]
tests/takeover: Do output filtering for deterministic IPs algorithm too

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Support testing of NoIPTakeoverOnDisabled
Martin Schwenke [Thu, 20 Dec 2012 05:24:58 +0000 (16:24 +1100)]
tests/takeover: Support testing of NoIPTakeoverOnDisabled

Via $CTDB_SET_NoIPTakeoverOnDisabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: IP allocation now selected via $CTDB_IP_ALGORITHM
Martin Schwenke [Thu, 20 Dec 2012 03:52:05 +0000 (14:52 +1100)]
tests/takeover: IP allocation now selected via $CTDB_IP_ALGORITHM

Default to LCP2, like ctdbd.  Also support "det" for deterministic
IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/takeover: Support valgrinding the takeover code
Martin Schwenke [Thu, 13 Dec 2012 09:29:22 +0000 (20:29 +1100)]
tests/takeover: Support valgrinding the takeover code

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests: new simple integration test for delip interface garbage collection
Martin Schwenke [Fri, 30 Nov 2012 05:38:08 +0000 (16:38 +1100)]
tests: new simple integration test for delip interface garbage collection

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests: new function ip2ipmask() for integration testing
Martin Schwenke [Fri, 30 Nov 2012 05:37:28 +0000 (16:37 +1100)]
tests: new function ip2ipmask() for integration testing

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Clean up orphaned interfaces when an IP is deleted
Martin Schwenke [Fri, 23 Nov 2012 09:09:07 +0000 (20:09 +1100)]
ctdbd: Clean up orphaned interfaces when an IP is deleted

Add a new function ctdb_remove_orphaned_ifaces() and call it in
ctdb_control_del_public_address().

ctdb_remove_orphaned_ifaces() uses a naive implementation that does
things in a very obvious way.  There are many ways to improve the
performance - some are mentioned in a comment in the code.  However, I
doubt that this will be a bottleneck even with a large number of
public IPs.  Running the eventscript is likely to outweigh the cost of
this cleanup.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

11 years agotests/complex: Add NFS test when CTDB is killed on one of the nodes
Amitay Isaacs [Mon, 7 Jan 2013 01:00:34 +0000 (12:00 +1100)]
tests/complex: Add NFS test when CTDB is killed on one of the nodes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoEventscripts: Change the default reconfigure action to do nothing
Martin Schwenke [Tue, 4 Dec 2012 04:00:44 +0000 (15:00 +1100)]
Eventscripts: Change the default reconfigure action to do nothing

A default action of restarting the service doesn't obey the principle
of least surprise.  It cause the NFS service to be implicitly
reintroduced.

This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Do not restart NFS on reconfigure
Martin Schwenke [Tue, 4 Dec 2012 03:52:25 +0000 (14:52 +1100)]
Eventscripts: Do not restart NFS on reconfigure

It looks like this restart was accidentally reintroduced in commit
fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure
became unset so the default action of restarting the service would
occur.  From there cleanups have explicitly reintroduced it and
carried it through the code.

Also update the unit tests affected by this change.

The restart was originally removed in commit
bc481c3f1a44c50648488c4f8a7f15ec395d446f.

The default reconfigure action of restarting a service is clearly
suboptimal and will be addressed in a separate patch.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Initialise the node flags in just one place
Martin Schwenke [Tue, 4 Dec 2012 03:28:06 +0000 (14:28 +1100)]
ctdbd: Initialise the node flags in just one place

Currently flags are initialised in 2 places.  One of them is in
ctdb_tcp_listen_automatic(), which just seems wrong.  This makes the
code easier to follow by just doing it in ctdb_start_daemon().

This means that the flags are now initialised later than previously.
However, it is still done before the transport is started and before
clients can connect.

In future it might make sense to do a similar thing with setting the
PNN.  However, the current optimisation is reasonably obvious...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>