ctdb.git
11 years agodoc: Fix path string of /etc/sysconfig/ctdb file
Amitay Isaacs [Fri, 17 Aug 2012 03:06:12 +0000 (13:06 +1000)]
doc: Fix path string of /etc/sysconfig/ctdb file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agorecoverd: All inactive nodes should yield recovery master role
Martin Schwenke [Fri, 6 Jul 2012 10:43:46 +0000 (20:43 +1000)]
recoverd: All inactive nodes should yield recovery master role

Not just stopped nodes.  In reality, this means that banned nodes will
also yield, since nodes in the other inactive states won't be running
a daemon.

This seems sensible since if another node notices that an inactive
node is the recovery master then it will force an election anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: An inactive node should not force recovery master elections
Martin Schwenke [Fri, 6 Jul 2012 10:36:48 +0000 (20:36 +1000)]
recoverd: An inactive node should not force recovery master elections

An inactive node can't become the recovery master.  So if an inactive
node notices that the recovery master is inactive, it shouldn't force
an election for recovery master and nominate itself as a candidate.
This can cause the recovery master to flip-flop between nodes when all
nodes are inactive.

If there is actually an active node then it will trigger the election.

This is fairly cosmetic but is a step along the way towards ironing
out weirdness when all nodes are stopped.

Also, fix a related comment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: main_loop() should not verify local IPs if node is stopped
Martin Schwenke [Tue, 3 Jul 2012 00:30:29 +0000 (10:30 +1000)]
recoverd: main_loop() should not verify local IPs if node is stopped

Doing these checks is pointless and potentially causes unnecessary log
messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: verify_local_ip_allocation() should dup ifaces before early return
Martin Schwenke [Tue, 3 Jul 2012 00:15:25 +0000 (10:15 +1000)]
recoverd: verify_local_ip_allocation() should dup ifaces before early return

If CTDB starts in STOPPED state then it thinks it is in the middle of
a recovery.  rec->ifaces is also NULL and an early exit further down
(that checks to see if a recovery is in process) means that it stays
that way.

However, each time this function is entered the need for a takeover
run is re-flagged.  The takeover run never happens due to the the
early exit, causing a couple of unneeded messages to be logged each
time.

This is avoided by moving the code that sets rec->ifaces so that it is
executed earlier and, in this case, in the middle of a recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Update a log message that has bit-rotted
Martin Schwenke [Mon, 2 Jul 2012 07:26:04 +0000 (17:26 +1000)]
recoverd: Update a log message that has bit-rotted

This message used to be correct because the ipreallocated event only
handled updating the NAT gateway.  However, that has changed so the
message needs to be updated.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agorecoverd: Fix bogus info in message about changed flags
Martin Schwenke [Fri, 22 Jun 2012 04:01:02 +0000 (14:01 +1000)]
recoverd: Fix bogus info in message about changed flags

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/eventscripts: Extra cases for policy routing missing config test
Martin Schwenke [Mon, 30 Jul 2012 02:51:43 +0000 (12:51 +1000)]
tests/eventscripts: Extra cases for policy routing missing config test

Test the startup and monitor events too.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: 13.per_ip_routing should always fail if config is missing
Martin Schwenke [Mon, 30 Jul 2012 02:51:12 +0000 (12:51 +1000)]
Eventscripts: 13.per_ip_routing should always fail if config is missing

Currently, if the configuration file is specified by
$CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the
absent) monitor event "succeeds", so the state of a node will
flip-flop.

Instead of this, if the configuration file is missing then fail early
on for all events.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoRevert "Eventscripts - make 13.per_ip_routing fail gracefully if config is missing"
Martin Schwenke [Mon, 30 Jul 2012 01:50:53 +0000 (11:50 +1000)]
Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is missing"

When the configuration file is missing this causes the node to
flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor
event here).

Will reimplement this properly.

This reverts commit 351ca413eec460330571ca8b01ad269728fe15df.

11 years agoctdb tool: recmaster command might as well be auto-all
Martin Schwenke [Fri, 6 Jul 2012 10:35:23 +0000 (20:35 +1000)]
ctdb tool: recmaster command might as well be auto-all

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agodoc: Document the new onnode -P option
Martin Schwenke [Tue, 17 Jul 2012 06:52:04 +0000 (16:52 +1000)]
doc: Document the new onnode -P option

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotools/onnode: Add -P option to push files to given nodes
Martin Schwenke [Tue, 17 Jul 2012 06:45:55 +0000 (16:45 +1000)]
tools/onnode: Add -P option to push files to given nodes

A list of files is given rather than a command.  These files are
pushed to the specified nodes.

Quoting is fragile/broken so filenames with spaces won't work - you
win some, you lose some.  :-)

All of the other onnode options should work together with this option.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Clean up 11.routing
Martin Schwenke [Tue, 17 Jul 2012 10:13:45 +0000 (20:13 +1000)]
Eventscripts: Clean up 11.routing

The loops can all be done without cat or grep.

The pair of loops in updateip is combined into a single loop.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Log a meaningful message if the nodes file/list is empty
Martin Schwenke [Tue, 3 Jul 2012 21:21:01 +0000 (07:21 +1000)]
ctdbd: Log a meaningful message if the nodes file/list is empty

Right now the message says it can't bind to any of the
addresses... even when there aren't any!

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Remove the worked "Forced" from message about running eventscripts
Martin Schwenke [Mon, 2 Jul 2012 07:15:42 +0000 (17:15 +1000)]
ctdbd: Remove the worked "Forced" from message about running eventscripts

The eventscripts are run after a takeover run and in this case they're
not forced.  The messages seems to imply that somone has run "ctdb
eventscript" when that is not necessarily the case.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoctdbd: Fix ctdb_control_release_ip() on local daemons
Martin Schwenke [Mon, 2 Jul 2012 04:09:32 +0000 (14:09 +1000)]
ctdbd: Fix ctdb_control_release_ip() on local daemons

When running on local daemons no IPs are actually assigned to
interfaces.  Commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e broke
ctdb_control_release_ip() for local daemons because it asks the system
which interface the given IP is on, instead of the old behaviour of
trusting CTDB's internal records.

For local deamons (i.e. !ctdb->do_checkpublicip) revert to the old
behaviour of looking up the interface internally.  This is good
enough, given that the tests don't tend to misconfigure the addresses.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoInitscript: clean up drop_all_public_ips()
Martin Schwenke [Tue, 17 Jul 2012 05:45:45 +0000 (15:45 +1000)]
Initscript: clean up drop_all_public_ips()

This makes the case implicit where $CTDB_PUBLIC_ADDRESSES is unset.
This is OK because that's not an interesting code path.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/tool: Run ctdb_tool_* under $VALGRIND
Martin Schwenke [Fri, 20 Jul 2012 07:00:12 +0000 (17:00 +1000)]
tests/tool: Run ctdb_tool_* under $VALGRIND

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/eventscripts: Rewrite the testparm stub
Martin Schwenke [Tue, 3 Jul 2012 21:29:18 +0000 (07:29 +1000)]
tests/eventscripts: Rewrite the testparm stub

It currently needs the real testparm command installed even though it
only uses limited features.  It is easy enough to fake up the
functionality that 50.samba uses.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/complex: Fix broken ctdb_test_check_real_cluster()
Martin Schwenke [Tue, 3 Jul 2012 03:05:58 +0000 (13:05 +1000)]
tests/complex: Fix broken ctdb_test_check_real_cluster()

It doesn't set $h at all...

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/simple: ctdb stop/continue tests weren't actually checking IPs
Martin Schwenke [Mon, 2 Jul 2012 04:18:51 +0000 (14:18 +1000)]
tests/simple: ctdb stop/continue tests weren't actually checking IPs

The correct variable is $test_node_ips, not $ips.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests: select_test_node_and_ips() should try to avoid failing
Martin Schwenke [Mon, 2 Jul 2012 04:06:35 +0000 (14:06 +1000)]
tests: select_test_node_and_ips() should try to avoid failing

Sometimes "ctdb sync" doesn't do its job, so we end up with unassigned
IPs.

If $test_node isn't set then this is bad.  However, try a few times to
ensure it is set.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests: simple tests against local daemons should check $TEST_LOCAL_DEAMONS
Martin Schwenke [Mon, 2 Jul 2012 04:05:21 +0000 (14:05 +1000)]
tests: simple tests against local daemons should check $TEST_LOCAL_DEAMONS

Note the old $CTDB_TEST_REAL_CLUSTER - it doesn't exist anymore...

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests: run_tests should exit with $status with -e option
Martin Schwenke [Wed, 20 Jun 2012 05:57:48 +0000 (15:57 +1000)]
tests: run_tests should exit with $status with -e option

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests/simple: ctdb reloadips test should use $test_ip
Martin Schwenke [Thu, 14 Jun 2012 09:37:39 +0000 (19:37 +1000)]
tests/simple: ctdb reloadips test should use $test_ip

There's no point recalculating this value.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agotests: select_test_node_and_ips() should never select non-node -1
Martin Schwenke [Thu, 14 Jun 2012 09:36:04 +0000 (19:36 +1000)]
tests:  select_test_node_and_ips() should never select non-node -1

Instead of selecting the 1st pnn found, select the 1st one that isn't -1.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoutil: Do not lock down memory when running with local daemons
Amitay Isaacs [Thu, 26 Jul 2012 12:01:50 +0000 (22:01 +1000)]
util: Do not lock down memory when running with local daemons

Thanks to Ronnie for highlighting the issue of memory lockdown on AIX.
Fix typo, use getuid and not getpid.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agostatd-callout: Fix a bug in the calculations of $STATE
Martin Schwenke [Thu, 5 Jul 2012 06:27:54 +0000 (16:27 +1000)]
statd-callout: Fix a bug in the calculations of $STATE

It is just meant to be even, so divided *and* multiplied by 2.  Use
$(( )) to make it more readable.

While touching this code, make the related calculation a bit more
readable too.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Default route on NAT gateway should have a metric of 10
Martin Schwenke [Tue, 24 Jul 2012 01:23:09 +0000 (11:23 +1000)]
Eventscripts: Default route on NAT gateway should have a metric of 10

At the moment routes from 11.routing can fail to be added because they
conflict with the default route added by 11.natgw.

NAT gateway is meant to be a last resort, so routes from 11.routing
should override it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Update/remove stale comments in 11.natgw
Martin Schwenke [Tue, 17 Jul 2012 10:10:11 +0000 (20:10 +1000)]
Eventscripts: Update/remove stale comments in 11.natgw

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Retrieve and build NAT gateway details better in 11.natgw
Martin Schwenke [Tue, 17 Jul 2012 05:39:50 +0000 (15:39 +1000)]
Eventscripts: Retrieve and build NAT gateway details better in 11.natgw

* "ctdb natgw" is run twice when it doesn't need to be.

* Tweak the parsing of "ctdb natgw" output so that it is done by the
  shell instead of a bunch of external processes.

* Make default NAT gateway be -1, even on error.  If the process
  failed entirely then it could previously be empty.

* Streamline the error handling using die() for when there is no NAT
  gateway.

* Downcase script-local variable names.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Optimise building the host address in 11.natgw
Martin Schwenke [Tue, 17 Jul 2012 05:37:14 +0000 (15:37 +1000)]
Eventscripts: Optimise building the host address in 11.natgw

It can be build without forking unnecessary processes.

Also downcase variable name because it is local to script.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: Clean up startup sanity check in 11.natgw
Martin Schwenke [Tue, 17 Jul 2012 05:32:38 +0000 (15:32 +1000)]
Eventscripts: Clean up startup sanity check in 11.natgw

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: remove redundant firewall rules from 11.natgw
Martin Schwenke [Tue, 17 Jul 2012 05:26:16 +0000 (15:26 +1000)]
Eventscripts: remove redundant firewall rules from 11.natgw

aeb70c7e7822854eb87873a5c7783e27e6e72318 said it moved these but it
redundantly duplicated them instead.  That commit also fixed the
problem because it moved the rules after delete_all() not out of the
startup event as claimed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoEventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation
Martin Schwenke [Tue, 17 Jul 2012 05:21:10 +0000 (15:21 +1000)]
Eventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation

$CTDB_NATGW_PUBLIC_IP can be split into $_ip and $_maskbits without
forking lots of processes.

Also "local" isn't supported by POSIX.

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoweb: Add my name to the developer list.
Amitay Isaacs [Tue, 24 Jul 2012 07:27:22 +0000 (17:27 +1000)]
web: Add my name to the developer list.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoRemove tevent_loop_allow_nesting()
Amitay Isaacs [Fri, 15 Jun 2012 01:05:00 +0000 (11:05 +1000)]
Remove tevent_loop_allow_nesting()

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoctdbd: Return explicit boolean values for function returning bool
Amitay Isaacs [Wed, 6 Jun 2012 06:19:10 +0000 (16:19 +1000)]
ctdbd: Return explicit boolean values for function returning bool

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoutil: Do not try to lockdown memory when running in local daemons mode
Amitay Isaacs [Wed, 6 Jun 2012 06:16:15 +0000 (16:16 +1000)]
util: Do not try to lockdown memory when running in local daemons mode

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agoFix compiler warnings.
Amitay Isaacs [Fri, 15 Jun 2012 05:07:04 +0000 (15:07 +1000)]
Fix compiler warnings.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
11 years agorun_tests: improve spacing
Michael Adam [Tue, 3 Jul 2012 09:50:05 +0000 (11:50 +0200)]
run_tests: improve spacing

11 years agorun_tests.sh: fix a comment
Michael Adam [Tue, 3 Jul 2012 09:46:26 +0000 (11:46 +0200)]
run_tests.sh: fix a comment

11 years agoctdb: use correct "persistent" state for ctdb_attach in "ctdb cattdb"
Michael Adam [Tue, 3 Jul 2012 12:28:36 +0000 (14:28 +0200)]
ctdb: use correct "persistent" state for ctdb_attach in "ctdb cattdb"

Originally, "ctdb cattdb" attached explicitly as non-persistent, which
is now forbidden for persistent databases by the server.

Pair-Programmed-With: Gregor Beck <gbeck@sernet.de>

11 years agoctdbd: refuse attaching with "persistent" to a non-persistent db and v.v.
Gregor Beck [Thu, 21 Jun 2012 08:26:03 +0000 (10:26 +0200)]
ctdbd: refuse attaching with "persistent" to a non-persistent db and v.v.

Signed-off-by: Michael Adam <obnox@samba.org>
11 years agoWhen we find an ip we shouldnt host, just release it
Ronnie Sahlberg [Wed, 20 Jun 2012 05:10:05 +0000 (15:10 +1000)]
When we find an ip we shouldnt host, just release it

Dont call a full blown clusterwide ipreallocation,  just release it locally

11 years agoWhen we release an ip, get the interface name from the kernel
Ronnie Sahlberg [Wed, 20 Jun 2012 00:08:11 +0000 (10:08 +1000)]
When we release an ip, get the interface name from the kernel

instead of using the interface where ctdb thinks the ip is hosted at.
The difference is that this now allows us to handle cases where we want to release an ip   but ctdbd does not know which interface the ip is assigned on.
(user has used 'ip addr add...'  and manually assigned an ip to the wrong interface)

11 years agoAdd new command to find which interface is located on
Ronnie Sahlberg [Wed, 20 Jun 2012 03:32:02 +0000 (13:32 +1000)]
Add new command to find which interface is located on

12 years agoSTATISTICS: Add tracking of the 10 hottest keys per database measured in hopcount
Ronnie Sahlberg [Wed, 13 Jun 2012 06:17:18 +0000 (16:17 +1000)]
STATISTICS: Add tracking of the 10 hottest keys per database measured in hopcount

and add mechanisms to dump it using the ctdb dbstatistics command

12 years agoReimplement logging of long running events
Martin Schwenke [Thu, 7 Jun 2012 05:08:15 +0000 (15:08 +1000)]
Reimplement logging of long running events

Reimplement 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00 using tevent
trace points.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotevent: change version to 0.9.16
Stefan Metzmacher [Fri, 8 Jun 2012 10:50:21 +0000 (12:50 +0200)]
tevent: change version to 0.9.16

This adds tevent_*_trace_*() and tevent_context_init_ops()

metze

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Fri Jun  8 20:47:41 CEST 2012 on sn-devel-104

12 years agotevent: expose tevent_context_init_ops
Stefan Metzmacher [Fri, 11 May 2012 13:19:55 +0000 (15:19 +0200)]
tevent: expose tevent_context_init_ops

This can be used to implement wrapper backends,
while passing a private pointer to the backens init function
via ev->additional_data.

metze

12 years agolib/tevent: Add trace point callback
Martin Schwenke [Tue, 5 Jun 2012 06:00:07 +0000 (16:00 +1000)]
lib/tevent: Add trace point callback

Set/get a single callback function to be invoked at various trace
points.  Define "before wait" and "after wait" trace points - more
trace points can be added later if required.

CTDB wants this to log long waits and events.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
12 years agoRevert "TEVENT: Add back tracking of long runnig events to the local copy of tevent...
Martin Schwenke [Thu, 7 Jun 2012 04:20:13 +0000 (14:20 +1000)]
Revert "TEVENT: Add back tracking of long runnig  events to the local copy of tevent library"

This reverts commit 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00.

Do this using new tevent trace point callback.

12 years agolib/tevent: In poll_event_context, add a pointer back to the tevent_context
Martin Schwenke [Thu, 7 Jun 2012 02:26:02 +0000 (12:26 +1000)]
lib/tevent: In poll_event_context, add a pointer back to the tevent_context

This makes it consistent with the other backends.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
12 years agolib/tevent/testsuite: no longer use 'compat' symbols
Stefan Metzmacher [Mon, 14 May 2012 09:48:00 +0000 (11:48 +0200)]
lib/tevent/testsuite: no longer use 'compat' symbols

metze

12 years agoRun the shutdown eventscript before we tear down the transport
Ronnie Sahlberg [Wed, 30 May 2012 01:50:13 +0000 (11:50 +1000)]
Run the shutdown eventscript before we tear down the transport

This allows eventscripts to still be able to call and use ctdb during the shutdown phase.

12 years agotests: Increment RSN always in ctdb_update_record_persistent test
Amitay Isaacs [Fri, 25 May 2012 05:57:14 +0000 (15:57 +1000)]
tests: Increment RSN always in ctdb_update_record_persistent test

If the record does not exist in persistent DB, RSN for that record is
considered 0. To write a record, RSN for that record should be set to 1,
otherwise the RSN check would fail.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
12 years agotests: Fix ctdb_fetch test (parse extra lines of output)
Amitay Isaacs [Fri, 25 May 2012 01:40:38 +0000 (11:40 +1000)]
tests: Fix ctdb_fetch test (parse extra lines of output)

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
12 years agotests: Fix flakey behavior of ctdb_fetch test
Amitay Isaacs [Thu, 24 May 2012 06:46:07 +0000 (16:46 +1000)]
tests: Fix flakey behavior of ctdb_fetch test

There were two issues with this test:

1. Since the messages are sent from one node to the next, if a node
   does not register for messages before CTDB on that nodes receives
   the message, it will never be seen by ctdb_fetch and it would
   block on receive and would not send any messages to next node.
   The crude solution is to sleep just before the messages are sent,
   so that ctdb_fetch on all nodes have registered for the messages.

2. If ctdb_fetch stops sending messages after timelimit expiry, the
   next node will keep waiting to receive messages in event_loop_once().
   The default timeout is 30 seconds for event_loop_once(). Adding a
   timed event will always set the timeout value to the time remaining
   for the timed event to expire.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
12 years agoserver: Replace BOOL datatype with bool, True/False with true/false
Amitay Isaacs [Thu, 17 May 2012 06:08:37 +0000 (16:08 +1000)]
server: Replace BOOL datatype with bool, True/False with true/false

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
12 years agotests/eventscripts: Tweak expected output for lockd:b restart
Martin Schwenke [Fri, 25 May 2012 01:44:56 +0000 (11:44 +1000)]
tests/eventscripts: Tweak expected output for lockd:b restart

Commit 13acd58c41fba1a33894fbd654fed69ea0eac322 mades this test fail,
since lockd:b and lockd:bs were incorrectly producing the same output.

12 years agotests: Complex tests must not be run from a cluster node
Martin Schwenke [Wed, 23 May 2012 05:36:01 +0000 (15:36 +1000)]
tests: Complex tests must not be run from a cluster node

Tickle tests fail if run from a node involved in the test.

The condition is actually weaker than this: the test can't be run from
a CTDB node that is hosting public addresses that may be used by the
test.

Rework ctdb_test_check_real_cluster() to support checking this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: Fix deprecated iptables ! usage
Martin Schwenke [Wed, 23 May 2012 04:24:40 +0000 (14:24 +1000)]
Eventscripts: Fix deprecated iptables ! usage

This currently causes warning in the logs.

This change is not SLES10-compatible but we already have some other
non-SLES10-compatible changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: test_wrap needs to set TEST_BIN_DIR when installed
Martin Schwenke [Tue, 22 May 2012 01:24:05 +0000 (11:24 +1000)]
tests: test_wrap needs to set TEST_BIN_DIR when installed

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agopackaging: make ctdb-tests package depend on nc
Amitay Isaacs [Fri, 18 May 2012 02:59:41 +0000 (12:59 +1000)]
packaging: make ctdb-tests package depend on nc

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
12 years agotests: Use per node log files when running tests with local daemons
Amitay Isaacs [Thu, 10 May 2012 06:59:39 +0000 (16:59 +1000)]
tests: Use per node log files when running tests with local daemons

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
12 years agoRECOVERY: Increase the time we allow before timing out recovery related tasks.
Ronnie Sahlberg [Fri, 25 May 2012 02:31:11 +0000 (12:31 +1000)]
RECOVERY: Increase the time we allow before timing out recovery related tasks.

If the system is temporarily taking unusually long to perform these tasks it is better to wait a lot longer and allow the tasks to complete than timing out repeatedly and then becomming banned.

12 years agoRECOVER: When we pull databases during recovery, we used to reallocate the databuffer...
Ronnie Sahlberg [Fri, 25 May 2012 02:27:59 +0000 (12:27 +1000)]
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region.

Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move  operations that may be required.

Create a tunable to override/change how much preallocation should be used.

12 years agoDOCS: Document the new tunables to produce warnings if databases grow unexpectedly...
Ronnie Sahlberg [Mon, 21 May 2012 04:01:04 +0000 (14:01 +1000)]
DOCS: Document the new tunables to produce warnings if databases grow unexpectedly big.

12 years agoDEBUG: Add checks for and print debug messages when 1) a database contains very many...
Ronnie Sahlberg [Mon, 21 May 2012 03:11:38 +0000 (13:11 +1000)]
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big.

Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0

12 years agoTEVENT: Add back tracking of long runnig events to the local copy of tevent library
Ronnie Sahlberg [Sun, 20 May 2012 23:17:05 +0000 (09:17 +1000)]
TEVENT: Add back tracking of long runnig  events to the local copy of tevent library

12 years agoGANESHA: make the ganesha script executable by default
Ronnie Sahlberg [Thu, 17 May 2012 01:16:57 +0000 (11:16 +1000)]
GANESHA: make the ganesha script executable by default

12 years agoMerge remote branch 'martins/ganesha'
Ronnie Sahlberg [Thu, 17 May 2012 01:48:07 +0000 (11:48 +1000)]
Merge remote branch 'martins/ganesha'

12 years agoDebug: When scripts hang, we may need to collect additional data in order to debug...
Ronnie Sahlberg [Thu, 17 May 2012 00:17:51 +0000 (10:17 +1000)]
Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung.

Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.

S1037271

12 years agoEventscripts: Modernise 60.ganesha to match 60.nfs
Martin Schwenke [Wed, 16 May 2012 07:24:21 +0000 (17:24 +1000)]
Eventscripts: Modernise 60.ganesha to match 60.nfs

Originally from Srikrishan Malik <srikrishan.malik@in.ibm.com> with
some style changes by me.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscripts: restart lockd in the background when going unhealthy
Martin Schwenke [Wed, 16 May 2012 03:29:58 +0000 (13:29 +1000)]
Eventscripts: restart lockd in the background when going unhealthy

Sometimes the restart can hang when there are I/O problems.  Then the
eventscript times out and gets killed so the node never marked as
unhealthy.

Restarting in the background avoids this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscript functions: add optional version to nfs_check_rpc_service()
Martin Schwenke [Tue, 8 May 2012 04:53:58 +0000 (14:53 +1000)]
Eventscript functions: add optional version to nfs_check_rpc_service()

This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Move the "ctdb reloadips" test from complex/ to simple/
Martin Schwenke [Mon, 14 May 2012 05:11:14 +0000 (15:11 +1000)]
tests: Move the "ctdb reloadips" test from complex/ to simple/

This is made possible by separation of public addresses files for
local daemons and the addition of get_ctdbd_command_line_option().

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Fix a typo in daemons_setup()
Martin Schwenke [Mon, 14 May 2012 05:01:44 +0000 (15:01 +1000)]
tests: Fix a typo in daemons_setup()

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: New function get_ctdbd_command_line_option() for integration testing
Martin Schwenke [Mon, 14 May 2012 05:00:32 +0000 (15:00 +1000)]
tests: New function get_ctdbd_command_line_option() for integration testing

This allows, for example, the public addresses file used by a
particular daemon to be known.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Use per-daemon public_addresses file for local daemons
Martin Schwenke [Mon, 14 May 2012 04:59:22 +0000 (14:59 +1000)]
tests: Use per-daemon public_addresses file for local daemons

This allows a node's public addresses file to be hacked for testing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Restore the old behaviour of "make test" so it uses tests/var
Martin Schwenke [Mon, 14 May 2012 02:47:02 +0000 (12:47 +1000)]
tests: Restore the old behaviour of "make test" so it uses tests/var

This is finally possible, given all the other changes...  :-)

This is a good default because daemons will be left running, test/var
will still exist and test failures can be investigated.

To "automatically" clean up, do:

  ./tests/run_tests.sh -C -V tests/var -- tests/simple/99_daemons_shutdown.sh

... although "killall ctdbd ; rm -rf tests/var" is less keystrokes.  ;-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Fix wrapper scripts to handle options and tests without breakage
Martin Schwenke [Mon, 14 May 2012 01:57:20 +0000 (11:57 +1000)]
tests: Fix wrapper scripts to handle options and tests without breakage

If the -V option is given and no tests are supplied, the "cd" command
in run_tests.sh cause scripts/run_tests to interpret the argument to
-V incorrectly.  Therefore, the wrapper scripts can't use "cd" because
they don't know what the options are doing!

Instead scripts/run_tests searches for each test relative to the
current directory and, if not previously found, then searches relative
to the top-level tests directory.  This is a much better way of doing
things.

Given that run_tests.sh and run_cluster_tests.sh were starting to
contain duplicate complex logic, remove run_cluster_tests.sh and
replace it with a symlink to run_tests.sh.  Run_tests.sh checks $0 to
see what options/defaults to use.  Update INSTALL to deal with this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Add a test for "ctdb reloadips"
Martin Schwenke [Fri, 11 May 2012 02:13:24 +0000 (12:13 +1000)]
tests: Add a test for "ctdb reloadips"

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: In integration tests, use --node-ip to avoid locking weirdness
Martin Schwenke [Thu, 10 May 2012 06:58:16 +0000 (16:58 +1000)]
tests: In integration tests, use --node-ip to avoid locking weirdness

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Allow run_cluster_tests.sh to take options
Martin Schwenke [Thu, 10 May 2012 06:17:44 +0000 (16:17 +1000)]
tests: Allow run_cluster_tests.sh to take options

However, options must be followed by "--".

This also fixes:

* a bug where specifying tests caused local daemons to be used; and
* an incorrect comment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests: Allow run_tests.sh to take options
Martin Schwenke [Thu, 10 May 2012 04:55:19 +0000 (14:55 +1000)]
tests: Allow run_tests.sh to take options

However, options must be followed by "--".

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests/eventscripts: Fix a policy routing test
Martin Schwenke [Thu, 10 May 2012 04:32:06 +0000 (14:32 +1000)]
tests/eventscripts: Fix a policy routing test

The previous commit 55006ea8999ab3721fcde81b92692661065f0688
highlighted an error in this test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agotests/eventscripts: $CTDB_BASE needs to be in $TEST_VAR_DIR
Martin Schwenke [Thu, 10 May 2012 04:16:45 +0000 (14:16 +1000)]
tests/eventscripts: $CTDB_BASE needs to be in $TEST_VAR_DIR

The policy routing tests write the configuration file into $CTDB_BASE,
as per rcommended practice.  Unless this is in $TEST_VAR_DIR this
won't work sensible when the tests are installed.

Things are done slightly different than for /etc.  Here we use
symlinks and we want them to be dereferenced.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoPackaging: Improve dependencies
Martin Schwenke [Wed, 9 May 2012 07:20:27 +0000 (17:20 +1000)]
Packaging: Improve dependencies

We don't strictly need gawk (i.e. could probably use nawk), but that
seems to provide /bin/awk on RHEL.

PreReq seems old-school.  We don't have an scriptlets, so nothing
needs to be installed before CTDB.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoPackaging: add options to ctdb.spec.in to force use of bundled libraries
Martin Schwenke [Wed, 9 May 2012 06:03:00 +0000 (16:03 +1000)]
Packaging: add options to ctdb.spec.in to force use of bundled libraries

Ideas borrowed from the Fedora samba4 spec file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoEventscript functions: add optional version to nfs_check_rpc_service()
Martin Schwenke [Tue, 8 May 2012 04:53:58 +0000 (14:53 +1000)]
Eventscript functions: add optional version to nfs_check_rpc_service()

This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoPackaging: devel package fixes
Martin Schwenke [Fri, 11 May 2012 00:32:26 +0000 (10:32 +1000)]
Packaging: devel package fixes

Group was non-existent, typo in summary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoPackaging: generate a ctdb-tests package
Martin Schwenke [Thu, 3 May 2012 02:12:53 +0000 (12:12 +1000)]
Packaging: generate a ctdb-tests package

Signed-off-by: Martin Schwenke <martin@meltin.net>
12 years agoWe dont need to serialize the "probe which address this node is" if we have given...
Ronnie Sahlberg [Thu, 10 May 2012 07:40:22 +0000 (17:40 +1000)]
We dont need to serialize the "probe which address this node is" if we have given an explicit --node-ip on the commandline

12 years agoTrack all child process so we never send a signal to an unrelated process (our child...
Ronnie Sahlberg [Thu, 3 May 2012 01:42:41 +0000 (11:42 +1000)]
Track all child process so we never send a signal to an unrelated process (our child died  and kernel wrapped the pid-space and reused the pid for a different process

Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.

Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a

12 years agoDOC: document the reloadips command
Ronnie Sahlberg [Thu, 3 May 2012 01:06:55 +0000 (11:06 +1000)]
DOC: document the reloadips command

12 years agoRELOADIPS: simplify the reloadips code a bit
Ronnie Sahlberg [Tue, 1 May 2012 05:27:12 +0000 (15:27 +1000)]
RELOADIPS: simplify the reloadips code a bit
and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it
from spamming the logs with "We already host ..."
messages

12 years agoRevert "server: locking: Provide a common API for non-blocking locking of TDBs"
Amitay Isaacs [Tue, 1 May 2012 02:09:48 +0000 (12:09 +1000)]
Revert "server: locking: Provide a common API for non-blocking locking of TDBs"

This reverts commit 6a92fc2b8da2bba98dca29b781ab459ba4e879a5.

Reverting incomplete changes to ctdb_lock.c