git.samba.org - metze/ctdb/wip.git/log

git.samba.org / metze / ctdb / wip.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Martin Schwenke [Wed, 4 Aug 2010 06:05:39 +0000 (16:05 +1000)]

Merge remote branch 'martins/master'

commit | commitdiff | tree

Martin Schwenke [Wed, 4 Aug 2010 03:16:06 +0000 (13:16 +1000)]

Test suite - try to make addip test more reliable and add some debugging.

This test is failing in some situations. The "ctdb addip" command
works but the IP never appears in the "ctdb ip" output.

Try restricting the last octet to be between 101-199. At the moment
addresses like 10.0.2.1 are being chosen and these are often the
address of the host machine in autocluster configurations... so might
cause weirdness.

Also add some debugging if checking for the IP address times out.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 3 Aug 2010 01:51:14 +0000 (11:51 +1000)]

Testing: IP allocation simulation - add option to change odds of a failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 3 Aug 2010 01:41:50 +0000 (11:41 +1000)]

Testing: IP allocation simulation - clean up usage message.

Group options better and make the language consistent between options.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 3 Aug 2010 01:37:34 +0000 (11:37 +1000)]

Testing: IP allocation simulation - print maximum number of unhealthy nodes.

This can imply something about imbalance.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 3 Aug 2010 01:36:33 +0000 (11:36 +1000)]

Testing: IP allocation simulation - improve help for options.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 05:46:23 +0000 (15:46 +1000)]

Testing: IP allocation simulation - make usage/failure more obvious.

Tweak the usage message for -g option.

Print an error if no node groups defined, instead of curious Python
error.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 05:09:13 +0000 (15:09 +1000)]

Testing: IP allocation simulation - rename an example to node_group_extra.py.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 05:07:56 +0000 (15:07 +1000)]

Testing: IP allocation simulation - rename an example to node_group_simple.py.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 05:06:39 +0000 (15:06 +1000)]

Testing: IP allocation simulation - add general node group example.

This allows node pool configuration to be specifed on the
command-line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 05:01:47 +0000 (15:01 +1000)]

Testing: IP allocation simulation - update options processing in examples.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 04:58:15 +0000 (14:58 +1000)]

Testing: IP allocation simulation - Update README.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 04:24:00 +0000 (14:24 +1000)]

Testing: IP allocation simulation - fix nondeterminism in do_something_random().

The current code makes random choices from unsorted lists. This
ensures the lists are sorted.

Also, make the code easier to read by doing the random selction from
lists of PNNs rather than lists of Node objects.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 04:20:12 +0000 (14:20 +1000)]

Testing: IP allocation simulation - Tweak options handling and Cluster.diff().

process_args() must now be called by programs inporting this module.
Options are put into global variable "options", which can be
references using "ctdb_takeover.options".

Can now pass extra option specifications to process_args().

Remove global variable prev and make it a Cluster object variable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 2 Aug 2010 04:16:02 +0000 (14:16 +1000)]

Testing: IP allocation simulation - update copyright message.

There's a lot of new code here, so let's make the copyright message
make sense.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 1 Aug 2010 01:53:28 +0000 (11:53 +1000)]

Testing: IP allocation simulation - add command line option for random seed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 1 Aug 2010 01:41:52 +0000 (11:41 +1000)]

Testing: IP allocation simulation - save some warnings for verbose mode.

We don't need to see warnings about unallocatable IPs unless we're in
verbose mode. Can node be run with -n (and without -v or -d) to see
just the statistics.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 1 Aug 2010 01:41:02 +0000 (11:41 +1000)]

Testing: IP allocation simulation prints final imbalance in statistics.

This is useful to know. When things get unbalance they tend to stay
that way.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 1 Aug 2010 01:39:30 +0000 (11:39 +1000)]

Testing: In IP allocation simulation count total number of events.

This starts at -1 because we always have to do the initial allocation.

No longer print event number for each event by default, only when
verbose is enabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 1 Aug 2010 01:37:35 +0000 (11:37 +1000)]

Testing: Add imbalance information to IP allocation simulation.

Implement the imbalance calculations.

Also add command-line option to display imbalance for each step.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sat, 31 Jul 2010 10:34:45 +0000 (20:34 +1000)]

Merge branch 'master' of git://git.samba.org/sahlberg/ctdb

commit | commitdiff | tree

Martin Schwenke [Fri, 30 Jul 2010 06:45:36 +0000 (16:45 +1000)]

Testing: Add Python IP allocation simulation.

Includes simulation module and example scenarios. This allows you to
test and perhaps tweak an algorithm that should be the same as the
current CTDB IP reallocation one.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 26 Jul 2010 06:22:59 +0000 (16:22 +1000)]

Optimise 61.nfstickle to write the tickles more efficiently.

Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat. The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 7 Jun 2010 02:03:25 +0000 (12:03 +1000)]

Test suite: handle extra lines in statistics output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 7 Jun 2010 02:29:31 +0000 (12:29 +1000)]

Test suite: handle change to disconnected node error message.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 30 Jul 2010 06:45:36 +0000 (16:45 +1000)]

Testing: Add Python IP allocation simulation.

Includes simulation module and example scenarios. This allows you to
test and perhaps tweak an algorithm that should be the same as the
current CTDB IP reallocation one.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 30 Jul 2010 06:37:22 +0000 (16:37 +1000)]

Add a code-style document.

Shamelessly sto^H^H^Hborrowed from samba3.

commit | commitdiff | tree

Stefan Metzmacher [Fri, 30 Jul 2010 06:09:40 +0000 (08:09 +0200)]

events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them

metze

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 30 Jul 2010 06:25:40 +0000 (16:25 +1000)]

Merge commit 'rusty/master'

commit | commitdiff | tree

Evan Kinney [Thu, 29 Jul 2010 02:48:46 +0000 (22:48 -0400)]

ctdb: Fixed use of reserved word "private" in typedefs

In include/ctdb.h, ctdb_callback_t and ctdb_rrl_callback_t were
defined with a void *private variable. The variable name was
changed to void *private_data to avoid issues encountered in
the Samba autoconf script.

Evan Kinney <evan.kinney@sas.com>

commit | commitdiff | tree

Martin Schwenke [Mon, 26 Jul 2010 06:22:59 +0000 (16:22 +1000)]

Optimise 61.nfstickle to write the tickles more efficiently.

Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat. The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 7 Jun 2010 02:03:25 +0000 (12:03 +1000)]

Test suite: handle extra lines in statistics output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 7 Jun 2010 02:29:31 +0000 (12:29 +1000)]

Test suite: handle change to disconnected node error message.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Stefan Metzmacher [Mon, 12 Jul 2010 12:11:41 +0000 (14:11 +0200)]

config/interface_modify.sh: do the echo before running the script

metze
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Stefan Metzmacher [Mon, 12 Jul 2010 12:05:51 +0000 (14:05 +0200)]

config/interface_modify.sh: before calling a script check if it exists and is executable

For non bash shells $_s_script might end with '/*'.

We do the workarround this way, because it makes sense to check
that a script is executable, before trying to execute it.

metze

[ This actually applies to any shell -- Rusty Russell ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 12 Jul 2010 05:41:42 +0000 (15:11 +0930)]

config: wrap iptables in flock to avoid concurrancy.

When doing a releaseip event, we do them in parallel for all the separate
IPs. This creates a problem for iptables, which isn't reentrant, giving
the strange message:
iptables encountered unknown error "18446744073709551615" while initializing table "filter"

The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.

The simple workaround is to flock-wrap iptables. Better would be to rework
the code so we didn't need to use iptables in these paths.

CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 12 Jul 2010 06:38:37 +0000 (16:08 +0930)]

ctdb: fix crash on "ctdb scriptstatus --events=releaseip"

Martin accidentally typed this instead of "ctdb scriptstatus releaseip"
and it crashes.

CQ:S1018859
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 2 Jul 2010 03:21:08 +0000 (13:21 +1000)]

version: generate RPM version from git

This unifies our RPM version handling, based on tags.
1) Tags are of form ctdb-<version>.
2) The first <version> starts with .1.
3) Devel versions end with .0.<patchnum>.<checksum>.devel to reliably
identify them.

This means that devel versions will correctly supersede releases and earlier
devels, but new releases will correctly supersede older devel RPMs.

Making a new release is as simple as creating a new git tag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Thu, 1 Jul 2010 13:08:49 +0000 (23:08 +1000)]

Report client for queue errors.

We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them. Add a name for each queue, and print nread.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Thu, 1 Jul 2010 08:33:18 +0000 (18:33 +1000)]

tdb: improve logging

When tdb throws an error, we didn't report the name of the tdb; we should.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Thu, 1 Jul 2010 11:46:55 +0000 (21:46 +1000)]

ctdb_freeze: extend db priority hack to cover serverid.tdb deadlock.

We discovered that recent smbd locks the serverid tdb while
holding a lock on another tdb (locking.tdb):
  7: POSIX  ADVISORY  WRITE smbd-2224318 locking.tdb.0 10600 10600
  22: -> POSIX  ADVISORY  READ  smbd-2224318 serverid.tdb.0 26580 26580

The result is a deadlock against the ctdb_freeze code called for
recovery.  We extend the "notify" workaround to this case, too.

BZ:65158
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 22 Jun 2010 13:25:20 +0000 (22:55 +0930)]

speed startup: with --sloppy-start, cut initial election timeout to 1/2 second.

Seconds between ctdbd first log message and node healthy:
BEFORE: 4.03
AFTER: 2.02

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 22 Jun 2010 13:22:34 +0000 (22:52 +0930)]

speed startup: add --sloppy-start.

The extra recovery interval wait was introduced in 821333afb458 but no
explanation was provided in that message. Nonetheless, if starting
the entire cluster for the first time, it should be safe to skip this.

We use the commandline arg --sloppy-start which should discourage
people from using it outside testing.

Seconds between ctdbd first log message and node healthy:
BEFORE: 16.10
AFTER: 4.03

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 22 Jun 2010 13:20:45 +0000 (22:50 +0930)]

speed startup: run startup immediately after recovery finished.

Seconds between ctdbd first log message and node healthy:
BEFORE: 17.08
AFTER: 16.10

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 22 Jun 2010 13:20:35 +0000 (22:50 +0930)]

speed startup: don't wait a full recovery interval if we've already waited

We currently sleep for one second, whether or not we've already slept.
Change this to sleep for the remainder of the second, if at all.

Seconds between ctdbd first log message and node healthy:
BEFORE: 18.09
AFTER: 17.08

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 22 Jun 2010 13:20:07 +0000 (22:50 +0930)]

speed startup: immediately run first monitor event after startup.

Once we've done a startup, we need to run a monitor event successfully
to be marked as healthy. Rather than wait the usual 5 seconds, run it
immediately (which will then reset next_interval to 5 seconds).

Seconds between ctdbd first log message and node healthy:
BEFORE: 23.58
AFTER: 18.09

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 22 Jun 2010 13:20:23 +0000 (22:50 +0930)]

speed startup: alter recovery loop

We do a recovery on startup.  But the code does:
   Sleep for ctdb->tunable.recover_interval.
   Check for recovery.

We want to do it in the other order.  This is best done by extracting
the loop into a separate "main_loop" function.

Seconds between ctdbd first log message and node healthy:
BEFORE: 24.09
AFTER: 23.58

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 06:39:16 +0000 (16:09 +0930)]

libctdb: test: run.sh script

This is a script which starts up a fake ctdbd and runs the libctdb
test suite.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 06:36:00 +0000 (16:06 +0930)]

libctdb: test: add readrecordlock support

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 05:30:46 +0000 (15:00 +0930)]

libctdb: test: add database save and restore

Once we do operations which alter the TDBs, we need to restore them to
pristine state after a failed child dies.

The method used here is a terrible hack: it should at least do a
tdb_lockall() on the database before blatting it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 05:18:54 +0000 (14:48 +0930)]

libctdb: test: --no-failtest

Sometimes you just want to test that the basic test case is sane,
without all the failure paths being tested.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 05:27:11 +0000 (14:57 +0930)]

libctdb: test: improve logging of failure paths

We include the file and line which called the functions, so the printed
failure path now looks like:

[malloc(ctdb.c:144)]:1:S[socket(ctdb.c:168)]:1:S...

The form is:
[ <function> ( <caller> ) ] : <input line> : <result>

<function> is the function which is called (eg. malloc).
<caller> is the file and line number which called <function>.
<input line> is the 1-based line number in the input which we were up to.
<result> is 'S' (success) or 'F' (failure).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 05:32:05 +0000 (15:02 +0930)]

libctdb: test: logging enhancement

Make children log through a pipe to the parent, which then spits it out
only if the child has a problem.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 16 Jul 2010 04:42:40 +0000 (14:12 +0930)]

libctdb: test infrastructure

This introduces 'ctdb-test', a program for testing libctdb. It takes
commands on standard input (with reduced functionality) or an input file.

It still needs some cleaning up, but you can uncover a bug in libctdb
today simply by running a simple attachdb test:

$ ctdb-test tests/attachdb1.txt

It will print out a crash, and the path of successful and failed
operations which lead to it:

...
Child signalled 11 on failure path: [malloc]:1:S[socket]:1:S[connect]:1:S[malloc]:1:S[malloc]:1:S[malloc]:1:S[malloc]:4:S[malloc]:4:F

Feed that failure path into ctdb-test using --failpath (under a debugger):

gdb --args ctdb-test tests/attachdb1.txt --failpath=[malloc]:1:S[socket]:1:S[connect]:1:S[malloc]:1:S[malloc]:1:S[malloc]:1:S[malloc]:4:S[malloc]:4:F

And you hit the exact error.

It is based on the fork-to-fail model of nfsim. The relevant parts are
from page 154 of the proceedings of 2005 Ottawa Linux Symposium Volume II:
http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf

Or our presentation of same (from slide 21):
http://ozlabs.org/~jk/projects/nfsim/nfsim.sxi

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 05:17:34 +0000 (14:47 +0930)]

libctdb: implement synchronous readrecordlock interface.

Because this doesn't use a generic callback, it's not quite as trivial
as the other sync wrappers.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 06:05:52 +0000 (15:35 +0930)]

libctdb: implement ctdb_disconnect and ctdb_detachdb

These are important for testing, since we can easily tell if we
leak memory if there are outstanding allocations after calling
these.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 06:18:48 +0000 (15:48 +0930)]

libctdb: fix io_elem resource leak on realloc failure.

Found by nfsim.

I knew about this, but as we stop when it happens anyway I didn't fix
it. But it bugs nfsim, so fix it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Mon, 21 Jun 2010 05:15:37 +0000 (14:45 +0930)]

libctdb: fix writerecord() to actually write the record.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 05:43:54 +0000 (15:13 +0930)]

libctdb: ctdb_service() never returns < 0

Found by ctdb-test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 05:45:11 +0000 (15:15 +0930)]

libctdb: check ctdb_request_free & ctdb_cancel used appropriately.

Since I made this mistake myself, we should check for it.

We could have one function that does both, but from a user's point of
view they are very different and it's quite possibly a bug if they
think the request is finished/unfinished when it's not.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 05:45:27 +0000 (15:15 +0930)]

libctdb: synchronous should be using ctdb_cancel to kill unfinished requests.

Found by ctdb-test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 06:17:23 +0000 (15:47 +0930)]

libctdb: fix uninitialized field usage on ctdb_attach failure path

Found by ctdb-test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 18 Jun 2010 06:17:14 +0000 (15:47 +0930)]

libctdb: removed unused lock field from struct ctdb_db

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 10 Jun 2010 04:30:38 +0000 (14:30 +1000)]

Wrap the IDR early, but not too early.

We dont want it to wrap almost immediately so that basically all "ctdb ..."
commands log the "Reqid wrap" warning.

commit | commitdiff | tree

Ronnie sahlberg [Thu, 10 Jun 2010 03:33:14 +0000 (13:33 +1000)]

Merge commit 'rusty/idtree'

commit | commitdiff | tree

Rusty Russell [Wed, 9 Jun 2010 23:28:55 +0000 (08:58 +0930)]

Delay reusing ids to make protocol more robust

Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.

The result was that we unlocked the wrong record, leading to the
following:

ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found

This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Wed, 9 Jun 2010 23:25:56 +0000 (08:55 +0930)]

idtree: fix handling of large ids (eg INT_MAX)

Since idtree assigns sequentially, it rarely reaches high numbers.
But such numbers can be forced with idr_get_new_above(), and that
reveals two bugs:
1) Crash in sub_remove() caused by pa array being too short.
2) Shift by more than 32 in _idr_find(), which is undefined, causing
the "outside the current tree" optimization to misfire and return NULL.

Signed-off-by: Rusty Russell <rusty@rustorp.com.au>

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 06:22:01 +0000 (16:22 +1000)]

fix a debug message

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 06:12:36 +0000 (16:12 +1000)]

idr can timeout and wrap/be reused quite quickly.

If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.

If while the request for B is in flight, the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.

This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key

Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 05:12:26 +0000 (15:12 +1000)]

We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus

( a child process might be holding the lock, but not the main daemon)

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 04:31:05 +0000 (14:31 +1000)]

add extra logging for failed ctdb_ltdb_unlock() for a few more places
it is called from

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 04:17:35 +0000 (14:17 +1000)]

add additional logging when tdb_chainunlock() fails
so we can see where it was called from when it fails

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 03:54:10 +0000 (13:54 +1000)]

print the db name qwhen a chainunlock fails too

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 9 Jun 2010 03:52:22 +0000 (13:52 +1000)]

when tdb_chainunlock() fails, print the tdb error that occured

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 8 Jun 2010 23:17:35 +0000 (09:17 +1000)]

Some "ctdb ..." commands can be run without having the main daemon running.

In that case, when the main daemon is not running
the ctdb context will be initialized to NULL, since we can not connect.

Move the calls to read the ctdb socketname and connecting via libctdb to
only happen when we are executing a "ctdb ..." command that requires that we talk to the actual daemon.
Otherwise we will get an ugly SEGV for the "ctdb ..." commandline tool
when trying to run a command that is supposed to work also when the daemon is down.

commit | commitdiff | tree

Rusty Russell [Tue, 8 Jun 2010 08:39:42 +0000 (18:09 +0930)]

libctdb: connect TDB logging to our logging

A simple connector function, made a bit more complex because TDB adds
a '\n' and we don't.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 8 Jun 2010 08:40:36 +0000 (18:10 +0930)]

libctdb: always check header hasn't changed on local tdb

The code on which this is based could alter the header: a normal client
can't. If we use this differently later we can change this. For the
moment it's a nice extra check.

We optimize out the record write altogether when the record hasn't
changed, rather than just suppressing the seqnum update.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 8 Jun 2010 07:41:40 +0000 (17:11 +0930)]

libctdb: more bool conversion, and accompany lock by ctdb_db in API

I missed some int->bool conversions previously, particularly the
return of ctdb_writerecord().

By always handing functions ctdb_connection or ctdb_db, we keep it
consistent with the rest of the API and can do extra lock consistency
checks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 8 Jun 2010 07:23:17 +0000 (16:53 +0930)]

libctdb: clarify logging levels

Now we have more messages, it seems to make sense to document their usage
and make them consistent.

In particular, LOG_CRIT for internal libctdb problems, LOG_ALERT for
API misuse.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Tue, 8 Jun 2010 07:22:23 +0000 (16:52 +0930)]

libctdb: use magic to detect free/invalid locks

Rather than using a binary, we use a magic value for locking. We also
split out the "dont have the lock yet" from the "do have the lock"
paths for clarity and extra checking.

This should detect a superset of the previous case, even if they free
(and reuse) the lock memory.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 8 Jun 2010 02:09:19 +0000 (12:09 +1000)]

Additional log messages when tdb databases can no longer be chainlocked or chainunlocked

BZ64688

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 5 Jun 2010 05:43:01 +0000 (15:43 +1000)]

In ctdb_writerecord()
Verify that the lock is still held and refuse the write otherwise.

We have to guarantee that we dont write to an unlocked record.

If we write to a record after it has been released, the record may have
already migrated off the node, in which case we get a DMASTER split brain for this record. (These application bugs are incredibly hard to track down)

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 5 Jun 2010 05:38:11 +0000 (15:38 +1000)]

Split ctdb_release_lock() into a function to release the locvk and another function to free the data structures.
This allows us to keep the datastructure valid after the lock has been released by the application and we can trap and warn when the application is accessing the lock after it has been released. I.e. application bugs.

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 5 Jun 2010 04:38:01 +0000 (14:38 +1000)]

update "ctdb pnn" to use the new return value for _recv() where
bool false means failure and true means success.

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 5 Jun 2010 04:27:46 +0000 (14:27 +1000)]

Must initialize ctdb->locks or else bad things happen

commit | commitdiff | tree

Ronnie Sahlberg [Sat, 5 Jun 2010 04:21:42 +0000 (14:21 +1000)]

Update the ctdb tool to use the new signature for ctdb_connect()

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 11:00:08 +0000 (20:30 +0930)]

libctdb: documentation

Full documentation for all the functions.

This looks longer than it is, because it sorts them into async and
sync parts, and also renames some formal parameters.

Added TODO to libctdb directory to track our plans.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 10:52:03 +0000 (20:22 +0930)]

libctdb: use values from ctdb_protocol.h, don't re-declare

We're best off including ctdb_protocol.h to get these, even if we
document the important ones in ctdb.h.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 10:49:25 +0000 (20:19 +0930)]

libctdb: use bool in API

Return bool instead of -1/0; that's what the young kids are doing
these days!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 10:11:42 +0000 (19:41 +0930)]

libctdb: track lock for each ctdb_db, complain if they hold too long.

In particular, this stops them grabbing two (with wrappers so we can
enhance this logic once we support threads), and warns them if they
re-enter ctdb_service() holding a lock (you are not supposed to block!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 10:57:06 +0000 (20:27 +0930)]

patch libctdb-use-logging.patch

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 10:57:03 +0000 (20:27 +0930)]

libctdb: add logging infrastructure

This is based on Ronnie's work, merged with mine. That means
errors are all my fault.

Differences from Ronnie's:
1) use syslog's LOG_ levels directly.
2) typesafe arg to log function, and use it (eg stderr) in helper function.
3) store fn in ctdb context, and expose ctdb_log_level directly thru API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 07:24:08 +0000 (16:54 +0930)]

libctdb: add ctdb arg to more functions.

This is going to help for logging, since we want it there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 4 Jun 2010 04:47:06 +0000 (14:47 +1000)]

Readrecordlock changes:

Make the use of ctdb_release_lock() mandatory from the callback.

Split ctdb_release_lock() in two, release the tdb lock in the
ctdb_release_lock() function and move the freeing of the lock structure to ctdb_free_lock() which is private to libctdb.

When the callback returns, verify that the callback has actually released the lock and warn (FIXME) if not.

Update ctdb_writerecord to warn and fail (FIXME) if writing while the lock is not held.

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 4 Jun 2010 04:20:17 +0000 (14:20 +1000)]

remove the global rrl_cb_called from the libctdb example
and psss it through the callback via private_data.

add a comment that the callback may sometimes have already been invoked
when the ctdb_readrecordlock_async() call returns
and that the application can use *private_data IF the application
needs to know if the callback has already triggered or not.

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 04:03:08 +0000 (13:33 +0930)]

libctdb: change callback for ctdb_readrecordlock.

After discussion with Ronnie, we decided to revisit this interface. We use
the name ctdb_readrecordlock_async, as it is *not* always a send, and we
use a specific callback to avoid the "fake request" creation on the fast
path.

The request itself is never exposed: this means it can't be cancelled,
but we can revisit that later if need be.

This makes both use and implementation simpler.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Rusty Russell [Fri, 4 Jun 2010 04:04:06 +0000 (13:34 +0930)]

libctdb: fix wrong argument being handed to callback on attachdb fail

When attachdb failed, we were handing the db, not the user-supplied
arg to the callback.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 2 Jun 2010 07:06:14 +0000 (17:06 +1000)]

When we say "current time of statistics" in the "ctdb statistics" output,
print the current time and not the start time

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 2 Jun 2010 06:49:05 +0000 (16:49 +1000)]

ctdb_req_control contains 4 padding bytes. Create an explicit pad variable here and set it to 0 when creating a control to keep valgrind happy.

PDUs are padded to 8 byte boundary. If padding is used, memset it to 0
to keep valgrind happy.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 2 Jun 2010 05:13:32 +0000 (15:13 +1000)]

Add the offsetof macro to libctdb

change all calls to new_ctdb_request() to use the offset macro to calculate the correct size (instead of allocating one byte too many and hoping the alignment padding saves us.)

Work in progress branches