metze/ctdb/wip.git
13 years ago On RHEL, "service nfs stop;service nfs start" and "service nfs restart"
Ronnie Sahlberg [Wed, 18 Aug 2010 21:18:22 +0000 (07:18 +1000)]
On RHEL,    "service nfs stop;service nfs start"  and "service nfs restart"
    sometimes (very rarely) fails to restart the service.

    Add a function to restart NFSd on SLES and RHEL-like systems.

    If we detect the system is unhealthy due to kNFSd not running,
    try to restart the service again "service nfs restart" and
    hope for the best.

CQ1019372

13 years agoAdd machinereadable output for the "ctgdb gettickles <ip>" command
Ronnie Sahlberg [Wed, 18 Aug 2010 04:37:16 +0000 (14:37 +1000)]
Add machinereadable output for the "ctgdb gettickles <ip>" command

13 years agoRemove the structure ctdb_control_tcp_vnn since this is identical to the structure...
Ronnie Sahlberg [Wed, 18 Aug 2010 02:36:03 +0000 (12:36 +1000)]
Remove the structure ctdb_control_tcp_vnn since this is identical to the structure ctdb_tcp_connection.

Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".

Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'

13 years agoAdd a new "ctdb addtickle" command to manually add tickles to ctdbd
Ronnie Sahlberg [Wed, 18 Aug 2010 01:09:32 +0000 (11:09 +1000)]
Add a new "ctdb addtickle" command to manually add tickles to ctdbd

This can be used to set ctdbd up to generate a tickle for non-samba
services.
(samba contains code to set tickles up automatically)

13 years agoupdate the example for the new signature of
Ronnie Sahlberg [Wed, 18 Aug 2010 00:18:35 +0000 (10:18 +1000)]
update the example for the new signature of
ctdb_set_message_handler_send()

13 years agoWe use eventloop nesting in a couple of places, notably the sync
Ronnie Sahlberg [Wed, 18 Aug 2010 00:11:59 +0000 (10:11 +1000)]
We use eventloop nesting in a couple of places, notably the sync
parts of the recovery daemon.

Initialize all event contexts to allow nesting

13 years agoMerge commit 'rusty/libctdb-new' into foo
Ronnie Sahlberg [Tue, 17 Aug 2010 23:53:52 +0000 (09:53 +1000)]
Merge commit 'rusty/libctdb-new' into foo

13 years agoevent: Update events to latest Samba version 0.9.8
Rusty Russell [Tue, 17 Aug 2010 23:46:31 +0000 (09:16 +0930)]
event: Update events to latest Samba version 0.9.8

In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa939ef1bbb740584f09e76e2ecd5b06.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agotalloc: update to 2.0.3 version from SAMBA
Rusty Russell [Tue, 17 Aug 2010 23:41:58 +0000 (09:11 +0930)]
talloc: update to 2.0.3 version from SAMBA

This is based on SAMBA as at revision 2de63aa2801a907905b3e05557074af5b896d486.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoCorrectly set docdir
Volker Lendecke [Fri, 6 Aug 2010 08:12:13 +0000 (10:12 +0200)]
Correctly set docdir

13 years agotdb: workaround starvation problem in locking entire database.
Rusty Russell [Mon, 16 Aug 2010 00:52:21 +0000 (10:22 +0930)]
tdb: workaround starvation problem in locking entire database.

(Imported from SAMBA 11ab43084b10cf53b530cdc3a6036c898b79ca38)

We saw tdb_lockall() take 71 seconds under heavy load; this is because Linux
(at least) doesn't prevent new small locks being obtained while we're waiting
for a big log.

The workaround is to do divide and conquer using non-blocking chainlocks: if
we get down to a single chain we block.  Using a simple test program where
children did "hold lock for 100ms, sleep for 1 second" the time to do
tdb_lockall() dropped signifiantly.  There are ln(hashsize) locks taken in
the contended case, but that's slow anyway.

More analysis is given in my blog at http://rusty.ozlabs.org/?p=120

This may also help transactions, though in that case it's the initial
read lock which uses this gradual locking routine; the update-to-write-lock
code is separate and still tries to update in one go.

Even though ABI doesn't change, minor version bumped so behavior change
can be easily detected.

CQ:S1018154
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agotdb: Fix tdb_check() to work with read-only tdb databases.
Rusty Russell [Mon, 16 Aug 2010 00:43:32 +0000 (10:13 +0930)]
tdb: Fix tdb_check() to work with read-only tdb databases.

(Import from SAMBA bc1c82ea137e1bf6cb55139a666c56ebb2226b23)
The function tdb_lockall() uses F_WRLCK internally, which doesn't work on
a fd opened with O_RDONLY. Use tdb_lockall_read() instead.

13 years agotdb: remove unused variable in tdb_new_database().
Rusty Russell [Mon, 16 Aug 2010 00:42:02 +0000 (10:12 +0930)]
tdb: remove unused variable in tdb_new_database().

(Imported from SAMBA 2eab1d7fdcb54f9ec27431ca4858eb64cb1bd835)

13 years agotdb: fix short write logic in tdb_new_database
Rusty Russell [Mon, 16 Aug 2010 00:50:19 +0000 (10:20 +0930)]
tdb: fix short write logic in tdb_new_database

Commit 207a213c/24fed55d purported to fix the problem of signals during
tdb_new_database (which could cause a spurious short write, hence a failure).
However, the code is wrong: newdb+written is not correct.

Fix this by introducing a general tdb_write_all() and using it here and in
the tracing code.

Cc: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoCreate a new command "ctdb sync" that isd just an alias for "ctdb ipreallocate"
Ronnie Sahlberg [Mon, 9 Aug 2010 23:43:17 +0000 (09:43 +1000)]
Create a new command "ctdb sync"   that isd just an alias for "ctdb ipreallocate"

13 years agoUpdate a log message to reflect that this does no longer only happen
Ronnie Sahlberg [Mon, 9 Aug 2010 23:41:41 +0000 (09:41 +1000)]
Update a log message to reflect that this does no longer only happen
when trying/failing to ban a node.

13 years agolibctdb: add synchronous message handling and unregister, with tests.
Rusty Russell [Mon, 9 Aug 2010 06:11:32 +0000 (15:41 +0930)]
libctdb: add synchronous message handling and unregister, with tests.

It turns out that we *do* want a separate private arg for the message
handler and the completion callback, so we change that.

We also fix the prototypes of the remove_message functions as we
implement them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoMerge remote branch 'martins/master'
Ronnie Sahlberg [Mon, 9 Aug 2010 01:35:38 +0000 (11:35 +1000)]
Merge remote branch 'martins/master'

13 years agoAdd some command-line options to ctdb_diagnostics.
Martin Schwenke [Fri, 6 Aug 2010 01:10:56 +0000 (11:10 +1000)]
Add some command-line options to ctdb_diagnostics.

In some contexts ctdb_diagnostics generates too many errors when it is
run on heterogeneous and machine-configured clusters.  In some
clusters some nodes are expected to be differently configured and also
machine-generated configured files can have comments containing
timestamps.

This adds some command-line options that can be used to reduce the
number of errors reported:

    -n <nodes>  Comma separated list of nodes to operate on
    -c          Ignore comment lines (starting with '#') in file comparisons
    -w          Ignore whitespace in file comparisons
    --no-ads    Do not use commands that assume an Active Directory Server

The -n option simply allows ctdb_diagnostics to operate on a subset of
nodes, avoiding file comparisons with and data collection on nodes
that are differently configured.  For file comparisons, instead of
showing each file on the current node and then comparing other nodes
to that file, the file from the first (available or requested) nodes
is shown and then other nodes are compared to that.  That has resulted
in changes in output - that is, ctdb diagnostics no longer prints
messages referencing the current node.

-c and -w are used to weaken comparisons between configuration files.

--no-ads can be used to avoid running ADS-specific commands if a
cluster uses LDAP (or other non-ADS) configuration.

This also fixes a number of bugs in related code:

* A call to onnode was losing the >> NODE ...  << lines because they
  now go to stderr.  This was changed in onnode long ago but
  ctdb_diagnostics was never updated to match.

* ctdb_diagnostics was counting lines in /etc/ctdb/nodes to determine
  what nodes to operate on.  For some time the nodes file has
  supported syntax that makes this invalid.  "ctdb listnodes -Y" is
  now used to list available nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoiupdate the docs that ctdb freeze is no more
Ronnie Sahlberg [Thu, 5 Aug 2010 06:35:37 +0000 (16:35 +1000)]
iupdate the docs that ctdb freeze is no more

13 years ago remove the "ctdb freeze" debugging command
Ronnie Sahlberg [Thu, 5 Aug 2010 06:30:47 +0000 (16:30 +1000)]
 remove the "ctdb freeze" debugging command

13 years agoTest suite: remove unnecessary verbosity from enable/continue tests.
Martin Schwenke [Thu, 5 Aug 2010 06:03:21 +0000 (16:03 +1000)]
Test suite: remove unnecessary verbosity from enable/continue tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: Fix typo in continue test.
Martin Schwenke [Thu, 5 Aug 2010 06:01:23 +0000 (16:01 +1000)]
Test suite: Fix typo in continue test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: weaken ctdb continue/enable tests for non-deterministic IPs.
Martin Schwenke [Thu, 5 Aug 2010 05:58:56 +0000 (15:58 +1000)]
Test suite: weaken ctdb continue/enable tests for non-deterministic IPs.

These tests currently wait for the old IPs to fail back to the test
node.  This isn't guaranteed with DeterministicIPs disabled.

This changes those tests to wait until the test node gets at least 1
IP assigned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoinitscript: wait until we can ping ctdbd before setting tunables.
Martin Schwenke [Thu, 5 Aug 2010 05:29:40 +0000 (15:29 +1000)]
initscript: wait until we can ping ctdbd before setting tunables.

Currently we do a "sleep 1" after starting and before running
set_ctdb_variables to set the tunables.  This is too arbitrary and
might fail if the system is heavily loaded.  This, for example, could
result in some nodes running with DeterministicIPs and some without,
in which case a different IP allocation algorithm would run depending
on who is the recmaster!

This makes the start function wait until "ctdb ping" succeeds (with 10
second timeout) before trying to run set_ctdb_variables.  If a timeout
occurs then the start function attempts to kill ctdbd before exiting
with a failure.

It also cleans up the status reporting code for Red Hat and SUSE so
that the final status code is reported.  Currently there are cases
where a correct status is prematurely reported before a failure
occurs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite - make the ctdb_fetch test cope with "Reqid wrap!" messages.
Martin Schwenke [Thu, 5 Aug 2010 03:43:50 +0000 (13:43 +1000)]
Test suite - make the ctdb_fetch test cope with "Reqid wrap!" messages.

Recent CTDB notice the wrap and print this message.  The test needs to
cope.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: remove thaw/freeze tests.
Martin Schwenke [Thu, 5 Aug 2010 01:40:05 +0000 (11:40 +1000)]
Test suite: remove thaw/freeze tests.

They test debugging commands that no longer operate as expected.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite - fix addip test.
Martin Schwenke [Wed, 4 Aug 2010 06:08:12 +0000 (16:08 +1000)]
Test suite - fix addip test.

The test currently checks that all existing IPs plus the newly added
IP are on the test node after "ctdb addip" is run.  With
DeterministicIPs enabled, if the new IP is "before" other IPs then the
other IPs may be shuffled by the deterministic IPs modulo algorithm.
This will happen on the 1st recovery after the move.  Sometimes this
recovery happens before we get the list of IPs to check and sometimes
after, so the test is racy.

The fix is to simply check for the presence of the new IP and not
worry about the others.  This reduces whatever value this test
had... but you can't have everything.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoMerge remote branch 'martins/master'
Martin Schwenke [Wed, 4 Aug 2010 06:05:39 +0000 (16:05 +1000)]
Merge remote branch 'martins/master'

13 years agoTest suite - try to make addip test more reliable and add some debugging.
Martin Schwenke [Wed, 4 Aug 2010 03:16:06 +0000 (13:16 +1000)]
Test suite - try to make addip test more reliable and add some debugging.

This test is failing in some situations.  The "ctdb addip" command
works but the IP never appears in the "ctdb ip" output.

Try restricting the last octet to be between 101-199.  At the moment
addresses like 10.0.2.1 are being chosen and these are often the
address of the host machine in autocluster configurations... so might
cause weirdness.

Also add some debugging if checking for the IP address times out.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - add option to change odds of a failure.
Martin Schwenke [Tue, 3 Aug 2010 01:51:14 +0000 (11:51 +1000)]
Testing: IP allocation simulation - add option to change odds of a failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - clean up usage message.
Martin Schwenke [Tue, 3 Aug 2010 01:41:50 +0000 (11:41 +1000)]
Testing: IP allocation simulation - clean up usage message.

Group options better and make the language consistent between options.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - print maximum number of unhealthy nodes.
Martin Schwenke [Tue, 3 Aug 2010 01:37:34 +0000 (11:37 +1000)]
Testing: IP allocation simulation - print maximum number of unhealthy nodes.

This can imply something about imbalance.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - improve help for options.
Martin Schwenke [Tue, 3 Aug 2010 01:36:33 +0000 (11:36 +1000)]
Testing: IP allocation simulation - improve help for options.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - make usage/failure more obvious.
Martin Schwenke [Mon, 2 Aug 2010 05:46:23 +0000 (15:46 +1000)]
Testing: IP allocation simulation - make usage/failure more obvious.

Tweak the usage message for -g option.

Print an error if no node groups defined, instead of curious Python
error.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - rename an example to node_group_extra.py.
Martin Schwenke [Mon, 2 Aug 2010 05:09:13 +0000 (15:09 +1000)]
Testing: IP allocation simulation - rename an example to node_group_extra.py.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - rename an example to node_group_simple.py.
Martin Schwenke [Mon, 2 Aug 2010 05:07:56 +0000 (15:07 +1000)]
Testing: IP allocation simulation - rename an example to node_group_simple.py.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - add general node group example.
Martin Schwenke [Mon, 2 Aug 2010 05:06:39 +0000 (15:06 +1000)]
Testing: IP allocation simulation - add general node group example.

This allows node pool configuration to be specifed on the
command-line.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - update options processing in examples.
Martin Schwenke [Mon, 2 Aug 2010 05:01:47 +0000 (15:01 +1000)]
Testing: IP allocation simulation - update options processing in examples.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - Update README.
Martin Schwenke [Mon, 2 Aug 2010 04:58:15 +0000 (14:58 +1000)]
Testing: IP allocation simulation - Update README.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - fix nondeterminism in do_something_random().
Martin Schwenke [Mon, 2 Aug 2010 04:24:00 +0000 (14:24 +1000)]
Testing: IP allocation simulation - fix nondeterminism in do_something_random().

The current code makes random choices from unsorted lists.  This
ensures the lists are sorted.

Also, make the code easier to read by doing the random selction from
lists of PNNs rather than lists of Node objects.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - Tweak options handling and Cluster.diff().
Martin Schwenke [Mon, 2 Aug 2010 04:20:12 +0000 (14:20 +1000)]
Testing: IP allocation simulation - Tweak options handling and Cluster.diff().

process_args() must now be called by programs inporting this module.
Options are put into global variable "options", which can be
references using "ctdb_takeover.options".

Can now pass extra option specifications to process_args().

Remove global variable prev and make it a Cluster object variable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - update copyright message.
Martin Schwenke [Mon, 2 Aug 2010 04:16:02 +0000 (14:16 +1000)]
Testing: IP allocation simulation - update copyright message.

There's a lot of new code here, so let's make the copyright message
make sense.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - add command line option for random seed.
Martin Schwenke [Sun, 1 Aug 2010 01:53:28 +0000 (11:53 +1000)]
Testing: IP allocation simulation - add command line option for random seed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation - save some warnings for verbose mode.
Martin Schwenke [Sun, 1 Aug 2010 01:41:52 +0000 (11:41 +1000)]
Testing: IP allocation simulation - save some warnings for verbose mode.

We don't need to see warnings about unallocatable IPs unless we're in
verbose mode.  Can node be run with -n (and without -v or -d) to see
just the statistics.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: IP allocation simulation prints final imbalance in statistics.
Martin Schwenke [Sun, 1 Aug 2010 01:41:02 +0000 (11:41 +1000)]
Testing: IP allocation simulation prints final imbalance in statistics.

This is useful to know.  When things get unbalance they tend to stay
that way.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: In IP allocation simulation count total number of events.
Martin Schwenke [Sun, 1 Aug 2010 01:39:30 +0000 (11:39 +1000)]
Testing: In IP allocation simulation count total number of events.

This starts at -1 because we always have to do the initial allocation.

No longer print event number for each event by default, only when
verbose is enabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: Add imbalance information to IP allocation simulation.
Martin Schwenke [Sun, 1 Aug 2010 01:37:35 +0000 (11:37 +1000)]
Testing: Add imbalance information to IP allocation simulation.

Implement the imbalance calculations.

Also add command-line option to display imbalance for each step.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoMerge branch 'master' of git://git.samba.org/sahlberg/ctdb
Martin Schwenke [Sat, 31 Jul 2010 10:34:45 +0000 (20:34 +1000)]
Merge branch 'master' of git://git.samba.org/sahlberg/ctdb

13 years agoTesting: Add Python IP allocation simulation.
Martin Schwenke [Fri, 30 Jul 2010 06:45:36 +0000 (16:45 +1000)]
Testing: Add Python IP allocation simulation.

Includes simulation module and example scenarios.  This allows you to
test and perhaps tweak an algorithm that should be the same as the
current CTDB IP reallocation one.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoOptimise 61.nfstickle to write the tickles more efficiently.
Martin Schwenke [Mon, 26 Jul 2010 06:22:59 +0000 (16:22 +1000)]
Optimise 61.nfstickle to write the tickles more efficiently.

Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat.  The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: handle extra lines in statistics output.
Martin Schwenke [Mon, 7 Jun 2010 02:03:25 +0000 (12:03 +1000)]
Test suite: handle extra lines in statistics output.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: handle change to disconnected node error message.
Martin Schwenke [Mon, 7 Jun 2010 02:29:31 +0000 (12:29 +1000)]
Test suite: handle change to disconnected node error message.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTesting: Add Python IP allocation simulation.
Martin Schwenke [Fri, 30 Jul 2010 06:45:36 +0000 (16:45 +1000)]
Testing: Add Python IP allocation simulation.

Includes simulation module and example scenarios.  This allows you to
test and perhaps tweak an algorithm that should be the same as the
current CTDB IP reallocation one.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoAdd a code-style document.
Ronnie Sahlberg [Fri, 30 Jul 2010 06:37:22 +0000 (16:37 +1000)]
Add a code-style document.

Shamelessly sto^H^H^Hborrowed from samba3.

13 years agoevents/10.interface: we need to mark interfaces as "up" if we don't know how to monit...
Stefan Metzmacher [Fri, 30 Jul 2010 06:09:40 +0000 (08:09 +0200)]
events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them

metze

13 years agoMerge commit 'rusty/master'
Ronnie Sahlberg [Fri, 30 Jul 2010 06:25:40 +0000 (16:25 +1000)]
Merge commit 'rusty/master'

13 years agoctdb: Fixed use of reserved word "private" in typedefs
Evan Kinney [Thu, 29 Jul 2010 02:48:46 +0000 (22:48 -0400)]
ctdb: Fixed use of reserved word "private" in typedefs

In include/ctdb.h, ctdb_callback_t and ctdb_rrl_callback_t were
defined with a void *private variable. The variable name was
changed to void *private_data to avoid issues encountered in
the Samba autoconf script.

Evan Kinney <evan.kinney@sas.com>

13 years agoOptimise 61.nfstickle to write the tickles more efficiently.
Martin Schwenke [Mon, 26 Jul 2010 06:22:59 +0000 (16:22 +1000)]
Optimise 61.nfstickle to write the tickles more efficiently.

Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat.  The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: handle extra lines in statistics output.
Martin Schwenke [Mon, 7 Jun 2010 02:03:25 +0000 (12:03 +1000)]
Test suite: handle extra lines in statistics output.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoTest suite: handle change to disconnected node error message.
Martin Schwenke [Mon, 7 Jun 2010 02:29:31 +0000 (12:29 +1000)]
Test suite: handle change to disconnected node error message.

Signed-off-by: Martin Schwenke <martin@meltin.net>
13 years agoconfig/interface_modify.sh: do the echo before running the script
Stefan Metzmacher [Mon, 12 Jul 2010 12:11:41 +0000 (14:11 +0200)]
config/interface_modify.sh: do the echo before running the script

metze
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoconfig/interface_modify.sh: before calling a script check if it exists and is executable
Stefan Metzmacher [Mon, 12 Jul 2010 12:05:51 +0000 (14:05 +0200)]
config/interface_modify.sh: before calling a script check if it exists and is executable

For non bash shells $_s_script might end with '/*'.

We do the workarround this way, because it makes sense to check
that a script is executable, before trying to execute it.

metze

[ This actually applies to any shell -- Rusty Russell ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoconfig: wrap iptables in flock to avoid concurrancy.
Rusty Russell [Mon, 12 Jul 2010 05:41:42 +0000 (15:11 +0930)]
config: wrap iptables in flock to avoid concurrancy.

When doing a releaseip event, we do them in parallel for all the separate
IPs.  This creates a problem for iptables, which isn't reentrant, giving
the strange message:
iptables encountered unknown error "18446744073709551615" while initializing table "filter"

The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.

The simple workaround is to flock-wrap iptables.  Better would be to rework
the code so we didn't need to use iptables in these paths.

CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoctdb: fix crash on "ctdb scriptstatus --events=releaseip"
Rusty Russell [Mon, 12 Jul 2010 06:38:37 +0000 (16:08 +0930)]
ctdb: fix crash on "ctdb scriptstatus --events=releaseip"

Martin accidentally typed this instead of "ctdb scriptstatus releaseip"
and it crashes.

CQ:S1018859
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoversion: generate RPM version from git
Rusty Russell [Fri, 2 Jul 2010 03:21:08 +0000 (13:21 +1000)]
version: generate RPM version from git

This unifies our RPM version handling, based on tags.
1) Tags are of form ctdb-<version>.
2) The first <version> starts with .1.
3) Devel versions end with .0.<patchnum>.<checksum>.devel to reliably
   identify them.

This means that devel versions will correctly supersede releases and earlier
devels, but new releases will correctly supersede older devel RPMs.

Making a new release is as simple as creating a new git tag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoReport client for queue errors.
Rusty Russell [Thu, 1 Jul 2010 13:08:49 +0000 (23:08 +1000)]
Report client for queue errors.

We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them.  Add a name for each queue, and print nread.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agotdb: improve logging
Rusty Russell [Thu, 1 Jul 2010 08:33:18 +0000 (18:33 +1000)]
tdb: improve logging

When tdb throws an error, we didn't report the name of the tdb; we should.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.
Rusty Russell [Thu, 1 Jul 2010 11:46:55 +0000 (21:46 +1000)]
ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.

We discovered that recent smbd locks the serverid tdb while
holding a lock on another tdb (locking.tdb):
  7: POSIX  ADVISORY  WRITE smbd-2224318 locking.tdb.0 10600 10600
  22: -> POSIX  ADVISORY  READ  smbd-2224318 serverid.tdb.0 26580 26580

The result is a deadlock against the ctdb_freeze code called for
recovery.  We extend the "notify" workaround to this case, too.

BZ:65158
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agospeed startup: with --sloppy-start, cut initial election timeout to 1/2 second.
Rusty Russell [Tue, 22 Jun 2010 13:25:20 +0000 (22:55 +0930)]
speed startup: with --sloppy-start, cut initial election timeout to 1/2 second.

Seconds between ctdbd first log message and node healthy:
BEFORE: 4.03
AFTER: 2.02

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agospeed startup: add --sloppy-start.
Rusty Russell [Tue, 22 Jun 2010 13:22:34 +0000 (22:52 +0930)]
speed startup: add --sloppy-start.

The extra recovery interval wait was introduced in 821333afb458 but no
explanation was provided in that message.  Nonetheless, if starting
the entire cluster for the first time, it should be safe to skip this.

We use the commandline arg --sloppy-start which should discourage
people from using it outside testing.

Seconds between ctdbd first log message and node healthy:
BEFORE: 16.10
AFTER: 4.03

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agospeed startup: run startup immediately after recovery finished.
Rusty Russell [Tue, 22 Jun 2010 13:20:45 +0000 (22:50 +0930)]
speed startup: run startup immediately after recovery finished.

Seconds between ctdbd first log message and node healthy:
BEFORE: 17.08
AFTER: 16.10

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agospeed startup: don't wait a full recovery interval if we've already waited
Rusty Russell [Tue, 22 Jun 2010 13:20:35 +0000 (22:50 +0930)]
speed startup: don't wait a full recovery interval if we've already waited

We currently sleep for one second, whether or not we've already slept.
Change this to sleep for the remainder of the second, if at all.

Seconds between ctdbd first log message and node healthy:
BEFORE: 18.09
AFTER: 17.08

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agospeed startup: immediately run first monitor event after startup.
Rusty Russell [Tue, 22 Jun 2010 13:20:07 +0000 (22:50 +0930)]
speed startup: immediately run first monitor event after startup.

Once we've done a startup, we need to run a monitor event successfully
to be marked as healthy.  Rather than wait the usual 5 seconds, run it
immediately (which will then reset next_interval to 5 seconds).

Seconds between ctdbd first log message and node healthy:
BEFORE: 23.58
AFTER: 18.09

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agospeed startup: alter recovery loop
Rusty Russell [Tue, 22 Jun 2010 13:20:23 +0000 (22:50 +0930)]
speed startup: alter recovery loop

We do a recovery on startup.  But the code does:
   Sleep for ctdb->tunable.recover_interval.
   Check for recovery.

We want to do it in the other order.  This is best done by extracting
the loop into a separate "main_loop" function.

Seconds between ctdbd first log message and node healthy:
BEFORE: 24.09
AFTER: 23.58

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test: run.sh script
Rusty Russell [Mon, 21 Jun 2010 06:39:16 +0000 (16:09 +0930)]
libctdb: test: run.sh script

This is a script which starts up a fake ctdbd and runs the libctdb
test suite.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test: add readrecordlock support
Rusty Russell [Mon, 21 Jun 2010 06:36:00 +0000 (16:06 +0930)]
libctdb: test: add readrecordlock support

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test: add database save and restore
Rusty Russell [Mon, 21 Jun 2010 05:30:46 +0000 (15:00 +0930)]
libctdb: test: add database save and restore

Once we do operations which alter the TDBs, we need to restore them to
pristine state after a failed child dies.

The method used here is a terrible hack: it should at least do a
tdb_lockall() on the database before blatting it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test: --no-failtest
Rusty Russell [Mon, 21 Jun 2010 05:18:54 +0000 (14:48 +0930)]
libctdb: test: --no-failtest

Sometimes you just want to test that the basic test case is sane,
without all the failure paths being tested.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test: improve logging of failure paths
Rusty Russell [Mon, 21 Jun 2010 05:27:11 +0000 (14:57 +0930)]
libctdb: test: improve logging of failure paths

We include the file and line which called the functions, so the printed
failure path now looks like:

[malloc(ctdb.c:144)]:1:S[socket(ctdb.c:168)]:1:S...

The form is:
    [ <function> ( <caller> ) ] : <input line> : <result>

<function> is the function which is called (eg. malloc).
<caller> is the file and line number which called <function>.
<input line> is the 1-based line number in the input which we were up to.
<result> is 'S' (success) or 'F' (failure).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test: logging enhancement
Rusty Russell [Mon, 21 Jun 2010 05:32:05 +0000 (15:02 +0930)]
libctdb: test: logging enhancement

Make children log through a pipe to the parent, which then spits it out
only if the child has a problem.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: test infrastructure
Rusty Russell [Fri, 16 Jul 2010 04:42:40 +0000 (14:12 +0930)]
libctdb: test infrastructure

This introduces 'ctdb-test', a program for testing libctdb.  It takes
commands on standard input (with reduced functionality) or an input file.

It still needs some cleaning up, but you can uncover a bug in libctdb
today simply by running a simple attachdb test:

$ ctdb-test tests/attachdb1.txt

It will print out a crash, and the path of successful and failed
operations which lead to it:

...
Child signalled 11 on failure path: [malloc]:1:S[socket]:1:S[connect]:1:S[malloc]:1:S[malloc]:1:S[malloc]:1:S[malloc]:4:S[malloc]:4:F

Feed that failure path into ctdb-test using --failpath (under a debugger):

gdb --args ctdb-test tests/attachdb1.txt --failpath=[malloc]:1:S[socket]:1:S[connect]:1:S[malloc]:1:S[malloc]:1:S[malloc]:1:S[malloc]:4:S[malloc]:4:F

And you hit the exact error.

It is based on the fork-to-fail model of nfsim.  The relevant parts are
from page 154 of the proceedings of 2005 Ottawa Linux Symposium Volume II:
http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf

Or our presentation of same (from slide 21):
http://ozlabs.org/~jk/projects/nfsim/nfsim.sxi

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: implement synchronous readrecordlock interface.
Rusty Russell [Mon, 21 Jun 2010 05:17:34 +0000 (14:47 +0930)]
libctdb: implement synchronous readrecordlock interface.

Because this doesn't use a generic callback, it's not quite as trivial
as the other sync wrappers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: implement ctdb_disconnect and ctdb_detachdb
Rusty Russell [Fri, 18 Jun 2010 06:05:52 +0000 (15:35 +0930)]
libctdb: implement ctdb_disconnect and ctdb_detachdb

These are important for testing, since we can easily tell if we
leak memory if there are outstanding allocations after calling
these.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: fix io_elem resource leak on realloc failure.
Rusty Russell [Fri, 18 Jun 2010 06:18:48 +0000 (15:48 +0930)]
libctdb: fix io_elem resource leak on realloc failure.

Found by nfsim.

I knew about this, but as we stop when it happens anyway I didn't fix
it.  But it bugs nfsim, so fix it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: fix writerecord() to actually write the record.
Rusty Russell [Mon, 21 Jun 2010 05:15:37 +0000 (14:45 +0930)]
libctdb: fix writerecord() to actually write the record.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: ctdb_service() never returns < 0
Rusty Russell [Fri, 18 Jun 2010 05:43:54 +0000 (15:13 +0930)]
libctdb: ctdb_service() never returns < 0

Found by ctdb-test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: check ctdb_request_free & ctdb_cancel used appropriately.
Rusty Russell [Fri, 18 Jun 2010 05:45:11 +0000 (15:15 +0930)]
libctdb: check ctdb_request_free & ctdb_cancel used appropriately.

Since I made this mistake myself, we should check for it.

We could have one function that does both, but from a user's point of
view they are very different and it's quite possibly a bug if they
think the request is finished/unfinished when it's not.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: synchronous should be using ctdb_cancel to kill unfinished requests.
Rusty Russell [Fri, 18 Jun 2010 05:45:27 +0000 (15:15 +0930)]
libctdb: synchronous should be using ctdb_cancel to kill unfinished requests.

Found by ctdb-test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: fix uninitialized field usage on ctdb_attach failure path
Rusty Russell [Fri, 18 Jun 2010 06:17:23 +0000 (15:47 +0930)]
libctdb: fix uninitialized field usage on ctdb_attach failure path

Found by ctdb-test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolibctdb: removed unused lock field from struct ctdb_db
Rusty Russell [Fri, 18 Jun 2010 06:17:14 +0000 (15:47 +0930)]
libctdb: removed unused lock field from struct ctdb_db

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoWrap the IDR early, but not too early.
Ronnie Sahlberg [Thu, 10 Jun 2010 04:30:38 +0000 (14:30 +1000)]
Wrap the IDR early, but not too early.

We dont want it to wrap almost immediately so that basically all "ctdb ..."
commands log the "Reqid wrap" warning.

13 years agoMerge commit 'rusty/idtree'
Ronnie sahlberg [Thu, 10 Jun 2010 03:33:14 +0000 (13:33 +1000)]
Merge commit 'rusty/idtree'

13 years agoDelay reusing ids to make protocol more robust
Rusty Russell [Wed, 9 Jun 2010 23:28:55 +0000 (08:58 +0930)]
Delay reusing ids to make protocol more robust

Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716,  0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agoidtree: fix handling of large ids (eg INT_MAX)
Rusty Russell [Wed, 9 Jun 2010 23:25:56 +0000 (08:55 +0930)]
idtree: fix handling of large ids (eg INT_MAX)

Since idtree assigns sequentially, it rarely reaches high numbers.
But such numbers can be forced with idr_get_new_above(), and that
reveals two bugs:
1) Crash in sub_remove() caused by pa array being too short.
2) Shift by more than 32 in _idr_find(), which is undefined, causing
   the "outside the current tree" optimization to misfire and return NULL.

Signed-off-by: Rusty Russell <rusty@rustorp.com.au>
13 years agofix a debug message
Ronnie Sahlberg [Wed, 9 Jun 2010 06:22:01 +0000 (16:22 +1000)]
fix a debug message

13 years agoidr can timeout and wrap/be reused quite quickly.
Ronnie Sahlberg [Wed, 9 Jun 2010 06:12:36 +0000 (16:12 +1000)]
idr can timeout and wrap/be reused quite quickly.

If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.

If while the request for B is in flight,  the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.

This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key)   but once the migration would complete we would chainunlock   idr->state->call->key

Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.

13 years agoWe can not be holding a chainlock at this stage, so the tdb_chainunlock() call is...
Ronnie Sahlberg [Wed, 9 Jun 2010 05:12:26 +0000 (15:12 +1000)]
We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus

( a child process might be holding the lock, but not the main daemon)

13 years agoadd extra logging for failed ctdb_ltdb_unlock() for a few more places
Ronnie Sahlberg [Wed, 9 Jun 2010 04:31:05 +0000 (14:31 +1000)]
add extra logging for failed ctdb_ltdb_unlock() for a few more places
it is called from

13 years agoadd additional logging when tdb_chainunlock() fails
Ronnie Sahlberg [Wed, 9 Jun 2010 04:17:35 +0000 (14:17 +1000)]
add additional logging when tdb_chainunlock() fails
so we can see where it was called from when it fails