martins/ctdb.git
10 years agorecoverd: Ignore failed ipreallocated controls to inactive nodes master
Martin Schwenke [Tue, 26 Nov 2013 01:35:44 +0000 (12:35 +1100)]
recoverd: Ignore failed ipreallocated controls to inactive nodes

Currently timeouts for controls to inactive nodes can cause banning
credits to be applied.  This should not happen.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoUpdate NEWS ctdb-2.5.1
Martin Schwenke [Mon, 25 Nov 2013 08:28:10 +0000 (19:28 +1100)]
Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoscripts: Be careful when generating unique pids for stack traces
Amitay Isaacs [Tue, 26 Nov 2013 04:41:50 +0000 (15:41 +1100)]
scripts: Be careful when generating unique pids for stack traces

sort expects the data to be line based, so make it so.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoconfig: Simplify the default CTDB configuration file
Amitay Isaacs [Tue, 26 Nov 2013 03:38:58 +0000 (14:38 +1100)]
config: Simplify the default CTDB configuration file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

10 years agoscripts: Replace hard-coded /var/ctdb with CTDB_VARDIR
Amitay Isaacs [Tue, 26 Nov 2013 03:29:52 +0000 (14:29 +1100)]
scripts: Replace hard-coded /var/ctdb with CTDB_VARDIR

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoscripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT
Amitay Isaacs [Tue, 26 Nov 2013 02:27:46 +0000 (13:27 +1100)]
scripts: Set defaults for CTDB_DBDIR and CTDB_DBDIR_PERSISTENT

If these configuration variables are not defined, then there should
a default fallback.  This is a workaround till CTDB compile time
configuration can be accessed at runtime.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoeventscripts: Perform share check before NFS RPC checks in 60.ganesha
Amitay Isaacs [Tue, 26 Nov 2013 00:39:54 +0000 (11:39 +1100)]
eventscripts: Perform share check before NFS RPC checks in 60.ganesha

If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotools/ctdb: Improve error checking when parsing node string
Martin Schwenke [Fri, 22 Nov 2013 02:57:31 +0000 (13:57 +1100)]
tools/ctdb: Improve error checking when parsing node string

If a node isn't numeric then it is silently converted to 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agorecoverd: Only respond to currently queued ipreallocated requests
Martin Schwenke [Fri, 22 Nov 2013 02:57:03 +0000 (13:57 +1100)]
recoverd: Only respond to currently queued ipreallocated requests

Otherwise new requests can come in during the latter parts of the
takeover run when the IP allocation algorithm has already run, and the
new requests will be dequeued even though they haven't really be
processed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoscripts: Add an early exit to statd-callout's notify case
Martin Schwenke [Tue, 19 Nov 2013 04:40:08 +0000 (15:40 +1100)]
scripts: Add an early exit to statd-callout's notify case

If $statd_state is empty then the loop will run once and print
spurious errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Remove the nfs_statd_update() call from 60.ganesha
Martin Schwenke [Tue, 19 Nov 2013 04:37:58 +0000 (15:37 +1100)]
eventscripts: Remove the nfs_statd_update() call from 60.ganesha

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/integration: Neaten up some of the persistent database tests
Martin Schwenke [Mon, 18 Nov 2013 10:04:49 +0000 (21:04 +1100)]
tests/integration: Neaten up some of the persistent database tests

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotools/ctdb: Fix tstore command to generate ltdb header internally
Amitay Isaacs [Mon, 18 Nov 2013 04:09:27 +0000 (15:09 +1100)]
tools/ctdb: Fix tstore command to generate ltdb header internally

This fixes an alignment discrepancy on 32-bit vs 64-bit platforms.

  sizeof(struct ctdb_ltdb_header) = 20  (32-bit)
                                  = 24  (64-bit)

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests/takeover: Fix bogus test description
Martin Schwenke [Fri, 15 Nov 2013 04:31:03 +0000 (15:31 +1100)]
tests/takeover: Fix bogus test description

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/simple: User sleep_for() instead of sleep
Martin Schwenke [Fri, 15 Nov 2013 04:23:14 +0000 (15:23 +1100)]
tests/simple: User sleep_for() instead of sleep

Progress...

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/simple: Update persistent DB tests
Martin Schwenke [Fri, 15 Nov 2013 04:21:58 +0000 (15:21 +1100)]
tests/simple: Update persistent DB tests

* Low level DB checks should ignore the sequence number record.

* A restart is needed after messing with the RecoverPDBBySeqNum
  tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agorecoverd: For persistent databases a sequence number of 0 is valid
Martin Schwenke [Fri, 15 Nov 2013 04:20:40 +0000 (15:20 +1100)]
recoverd: For persistent databases a sequence number of 0 is valid

Otherwise recovery ends up done by RSN when it is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agolocking: Use vfork instead of fork to exec helpers
Amitay Isaacs [Tue, 19 Nov 2013 04:31:39 +0000 (15:31 +1100)]
locking: Use vfork instead of fork to exec helpers

There is a significant overhead using fork() over vfork(), specially
when the child process execs a helper.  The overhead is in memory space
and time.

    # strace -c ./test_fork 1024 200
    count=1024, size=204800, total=200M
    failed fork=0
    time for fork() = 4879.597000 us
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    4.543321        3304      1375       375 clone
      0.00    0.000071           0      1033           mmap
      0.00    0.000000           0         1           read
      0.00    0.000000           0         3           write
      0.00    0.000000           0         2           open
      0.00    0.000000           0         2           close
      0.00    0.000000           0         3           fstat
      0.00    0.000000           0         3           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         3           brk
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    4.543392                  2429       376 total

    # strace -c ./test_vfork 1024 200
    count=1024, size=204800, total=200M
    failed fork=0
    time for fork() = 82.041000 us
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     96.47    0.001204           1      1000           vfork
      3.53    0.000044           0      1033           mmap
      0.00    0.000000           0         1           read
      0.00    0.000000           0         3           write
      0.00    0.000000           0         2           open
      0.00    0.000000           0         2           close
      0.00    0.000000           0         3           fstat
      0.00    0.000000           0         3           mprotect
      0.00    0.000000           0         1           munmap
      0.00    0.000000           0         3           brk
      0.00    0.000000           0         1         1 access
      0.00    0.000000           0         1           execve
      0.00    0.000000           0         1           arch_prctl
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.001248                  2054         1 total

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agocommon: Refactor code to keep track of child processes
Amitay Isaacs [Tue, 19 Nov 2013 05:13:20 +0000 (16:13 +1100)]
common: Refactor code to keep track of child processes

This code can then be used to track child processes created with vfork().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoscripts: Run a single instance of debug_locks.sh at a give time
Amitay Isaacs [Fri, 15 Nov 2013 07:59:04 +0000 (18:59 +1100)]
scripts: Run a single instance of debug_locks.sh at a give time

This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.

Also, improve the logging format with separators.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agolocking: Update current lock statistics when lock is scheduled
Amitay Isaacs [Fri, 15 Nov 2013 07:36:09 +0000 (18:36 +1100)]
locking: Update current lock statistics when lock is scheduled

When a child process is created for a lock request, the current locks
statistics should be updated immediately.  This will provide accurate
information on number of active lock requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agolocking: Do not merge multiple lock requests to avoid unfair scheduling
Amitay Isaacs [Mon, 18 Nov 2013 04:48:22 +0000 (15:48 +1100)]
locking: Do not merge multiple lock requests to avoid unfair scheduling

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agolocking: Implement active lock requests limit per database
Amitay Isaacs [Fri, 15 Nov 2013 04:58:59 +0000 (15:58 +1100)]
locking: Implement active lock requests limit per database

This limit was currently a global limit and not per database.  This
prevents any database freeze lock requests from getting scheduled if
the global limit was reached.

Only individual record requests should be limited and database freeze
requests should always get scheduled.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoscripts: Rewrite statd-callout to avoid 10 minute lag
Martin Schwenke [Fri, 8 Nov 2013 05:41:11 +0000 (16:41 +1100)]
scripts: Rewrite statd-callout to avoid 10 minute lag

This is naive and assumes no performance problems when updating
persistent DBs.  It also does no error handling.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoclient: Treat empty __db_sequence_number__ record as 0
Amitay Isaacs [Wed, 13 Nov 2013 06:45:25 +0000 (17:45 +1100)]
client: Treat empty __db_sequence_number__ record as 0

This fixes the issue of transaction commit failing due to an empty
__db_sequence_number__ record in persistent database left by previous
cancelled transaction.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agodoc: Update ctdb.1 - primarily to add pdelete/pfetch/pstore/ptrans
Martin Schwenke [Wed, 13 Nov 2013 05:19:00 +0000 (16:19 +1100)]
doc: Update ctdb.1 - primarily to add pdelete/pfetch/pstore/ptrans

Also:

* More <refentryinfo> above <refmeta> to make the XML valid.

* Describe DB argument in introduction and use it for database
  commands.

* Remove unnecessary format="linespecific" from <screen> tags, since
  it will not be allowed in DocBook 5.0.

* Sort the items in "INTERNAL COMMANDS".

* Update/simplify some command descriptions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotools/ctdb: New ptrans command
Martin Schwenke [Wed, 6 Nov 2013 02:43:53 +0000 (13:43 +1100)]
tools/ctdb: New ptrans command

Also add test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoonnode: New -i option to stop stdin from being closed
Martin Schwenke [Wed, 13 Nov 2013 03:04:17 +0000 (14:04 +1100)]
onnode: New -i option to stop stdin from being closed

This can be useful for piping data to onnode in certain circumstances.

There are now also enough command-line options that they should
definitely be alphabetically ordered.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/integration: try_command_on_node() shouldn't lose onnode options
Martin Schwenke [Wed, 13 Nov 2013 03:13:52 +0000 (14:13 +1100)]
tests/integration: try_command_on_node() shouldn't lose onnode options

Currently it only passes the last (non -v) option seen.  It should
pass them all.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agorecoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUN
Martin Schwenke [Tue, 12 Nov 2013 04:16:49 +0000 (15:16 +1100)]
recoverd: Fix backward compatibility for CTDB_SRVID_TAKEOVER_RUN

When running a mixed version cluster, compatibility with older
versions was was broken during recent refactorisation.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoscripts: debug_locks.sh should use configuration to find TDB location
Martin Schwenke [Mon, 4 Nov 2013 01:56:39 +0000 (12:56 +1100)]
scripts: debug_locks.sh should use configuration to find TDB location

That is, don't use fixed paths.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agorecoverd: A node refuses to play against itself
Martin Schwenke [Fri, 1 Nov 2013 03:34:20 +0000 (14:34 +1100)]
recoverd: A node refuses to play against itself

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agorecoverd: Remove duplicate code to update flags during recovery
Martin Schwenke [Thu, 14 Nov 2013 03:25:47 +0000 (14:25 +1100)]
recoverd: Remove duplicate code to update flags during recovery

This also happens earlier in do_recovery() and the nodemap is not
updated after that, so this update is redundant.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agobuild: Update to latest upstream config.guess
Martin Schwenke [Thu, 14 Nov 2013 03:14:10 +0000 (14:14 +1100)]
build: Update to latest upstream config.guess

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotools/ctdb: Fix db commands when dbid is given instead of name
Amitay Isaacs [Wed, 13 Nov 2013 04:25:46 +0000 (15:25 +1100)]
tools/ctdb: Fix db commands when dbid is given instead of name

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests: CTDB tool should always be invoked as $CTDB instad of ctdb
Amitay Isaacs [Wed, 13 Nov 2013 03:33:31 +0000 (14:33 +1100)]
tests: CTDB tool should always be invoked as $CTDB instad of ctdb

$CTDB_TEST_WRAPPER is required only to run test functions or test binaries
on remote nodes.  For running ctdb command, $CTDB is sufficient.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests: No need to run onnode in parallel for single node
Amitay Isaacs [Wed, 13 Nov 2013 03:25:59 +0000 (14:25 +1100)]
tests: No need to run onnode in parallel for single node

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests: Remove -q option to try_command_on_node
Amitay Isaacs [Wed, 13 Nov 2013 03:19:43 +0000 (14:19 +1100)]
tests: Remove -q option to try_command_on_node

This option is always passed to onnode by default.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests: Coverity fixes
Amitay Isaacs [Mon, 11 Nov 2013 01:41:17 +0000 (12:41 +1100)]
tests: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotcp: Coverity fixes
Amitay Isaacs [Mon, 11 Nov 2013 01:41:00 +0000 (12:41 +1100)]
tcp: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotools/ctdb: Coverity fixes
Amitay Isaacs [Mon, 11 Nov 2013 01:40:44 +0000 (12:40 +1100)]
tools/ctdb: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agocommon: Coverity fixes
Amitay Isaacs [Mon, 11 Nov 2013 01:40:28 +0000 (12:40 +1100)]
common: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoclient: Coverity fixes
Amitay Isaacs [Mon, 11 Nov 2013 01:39:48 +0000 (12:39 +1100)]
client: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoserver: Coverity fixes
Amitay Isaacs [Mon, 11 Nov 2013 01:39:27 +0000 (12:39 +1100)]
server: Coverity fixes

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agotests: Fix calling of ctdb tool from test
Amitay Isaacs [Thu, 7 Nov 2013 05:01:49 +0000 (16:01 +1100)]
tests: Fix calling of ctdb tool from test

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoRevert "tests: If transaction_start fails, try again"
Amitay Isaacs [Thu, 7 Nov 2013 04:54:28 +0000 (15:54 +1100)]
Revert "tests: If transaction_start fails, try again"

This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.

This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoclient: Make g_lock_lock() wait till lock is obtained
Amitay Isaacs [Thu, 7 Nov 2013 04:54:20 +0000 (15:54 +1100)]
client: Make g_lock_lock() wait till lock is obtained

This makes the behaviour of g_lock_lock() similar to that implemented in
Samba.  Now ctdb_transaction_start() will return NULL only when there are
failures and not when another transaction is active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoeventscript: Fix link creation failure if the link already exist but the target path...
Srikrishan Malik [Thu, 31 Oct 2013 06:24:58 +0000 (11:54 +0530)]
eventscript: Fix link creation failure if the link already exist but the target path is missing

Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
10 years agodoc: Update NEWS ctdb-2.5
Martin Schwenke [Wed, 16 Oct 2013 00:46:54 +0000 (11:46 +1100)]
doc: Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoweb: Add links to new manpages
Amitay Isaacs [Wed, 30 Oct 2013 02:22:21 +0000 (13:22 +1100)]
web: Add links to new manpages

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agodoc: Major updates to manual pages
Martin Schwenke [Mon, 23 Sep 2013 06:26:16 +0000 (16:26 +1000)]
doc: Major updates to manual pages

This includes new manpages for ctdb.7, ctdb.conf.5 and ctdb-tunables.7.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agotunables: Remove obsolete tunables
Amitay Isaacs [Wed, 30 Oct 2013 01:37:15 +0000 (12:37 +1100)]
tunables: Remove obsolete tunables

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agorecoverd: Rebalancing should be done regardless tunable
Martin Schwenke [Wed, 30 Oct 2013 01:17:37 +0000 (12:17 +1100)]
recoverd: Rebalancing should be done regardless tunable

Rebalance target nodes should be set even if a deferred rebalance is
not configured.  The user can explicitly cause a takeover run.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agorecoverd: Improve an error message in the election code
Martin Schwenke [Wed, 30 Oct 2013 00:32:28 +0000 (11:32 +1100)]
recoverd: Improve an error message in the election code

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoRevert "if a new node enters the cluster, that node will already be frozen at start"
Martin Schwenke [Tue, 29 Oct 2013 05:38:42 +0000 (16:38 +1100)]
Revert "if a new node enters the cluster, that node will already be frozen at start"

This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94.
Furthermore, if a node doesn't force an election but wins it then it
can fail to record that it is the new recovery master.  This can lead
to a reverse split brain where there is no recovery master.

This reverts commit c5035657606283d2e35bea40992505e84ca8e7be.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

Conflicts:
server/ctdb_recoverd.c

10 years agoctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO
Martin Schwenke [Tue, 29 Oct 2013 03:05:41 +0000 (14:05 +1100)]
ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO

This is important enough that we should see it when the log level is
DEBUG_NOTICE.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test
Martin Schwenke [Mon, 28 Oct 2013 05:20:44 +0000 (16:20 +1100)]
tests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test

This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test
Martin Schwenke [Mon, 28 Oct 2013 05:14:40 +0000 (16:14 +1100)]
tests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test

This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.

This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration.  Fixing it is hard and involves adding a more
complex stub for testparm.  We already have that in the eventscript
unit tests above.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Rewrite the smb.conf cache file handling
Martin Schwenke [Mon, 28 Oct 2013 05:00:54 +0000 (16:00 +1100)]
eventscripts: Rewrite the smb.conf cache file handling

The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning.  Instead, put a
timeout on a foreground update.

If the foreground update fails:

* If there's no available cache file then die.

* If there is a previous cache file then use it and log a warning.

* Do a background update at the end of the monitor event.

Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value.  Update the
associated test to add a comma.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agotools/ctdb: Fix documentation string for ban command
Martin Schwenke [Fri, 25 Oct 2013 05:25:25 +0000 (16:25 +1100)]
tools/ctdb: Fix documentation string for ban command

Ban time of 0 is not supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoRevert "recoverd: Disable takeover runs on other nodes for 5 minutes"
Martin Schwenke [Thu, 24 Oct 2013 00:13:16 +0000 (11:13 +1100)]
Revert "recoverd: Disable takeover runs on other nodes for 5 minutes"

5 minutes is too long to leave the cluster in limbo if the recovery
daemon dies during a takeover run, even though this is quite unlikely.
We need a new recover master to be able to do takeover runs fairly
quickly.

This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f.

10 years agotools/onnode: Fix healthy/ok node handling
Martin Schwenke [Thu, 24 Oct 2013 03:15:53 +0000 (14:15 +1100)]
tools/onnode: Fix healthy/ok node handling

This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output.  The fake "ctdb -Y status" output in the
test was never updated to reflect this change.

Instead of making sure that all columns are "0", just check that
they're not "1".  This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.

Also update associated tests.  The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.

This fixes samba bz#8122.

Signed-off-by: Martin Schwenke <martin@meltin.net>
onnode test fixup

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agodaemon: Change the default recovery method for persistent databases
Amitay Isaacs [Mon, 28 Oct 2013 07:49:51 +0000 (18:49 +1100)]
daemon: Change the default recovery method for persistent databases

Use sequence numbers to do recovery for persistent databases instead of
RSNs.  This fixes the problem of registry corruption during recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agopackaging: Create runtime directories for CTDB
Amitay Isaacs [Wed, 23 Oct 2013 04:37:41 +0000 (15:37 +1100)]
packaging: Create runtime directories for CTDB

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoinitscript: Update systemd configuration to put PID file in /run/ctdb
Martin Schwenke [Wed, 23 Oct 2013 00:28:26 +0000 (11:28 +1100)]
initscript: Update systemd configuration to put PID file in /run/ctdb

Elsewhere we're moving the socket to /var/run/ctdb.  We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agobuild: Move the default CTDB socket from /tmp to /var/run/ctdb
Amitay Isaacs [Thu, 3 Oct 2013 05:19:05 +0000 (15:19 +1000)]
build: Move the default CTDB socket from /tmp to /var/run/ctdb

Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.

The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location.  This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

10 years agopackaging: Move ctdb/ directory from /var to /var/lib
Amitay Isaacs [Thu, 3 Oct 2013 05:47:30 +0000 (15:47 +1000)]
packaging: Move ctdb/ directory from /var to /var/lib

Introduce CTDB_VARDIR variable that points to /var/lib/ctdb by default.
This makes CTDB_VARDIR consistent across C code and scripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoctdbd: Simplify database directory setting logic
Martin Schwenke [Mon, 21 Oct 2013 08:36:36 +0000 (19:36 +1100)]
ctdbd: Simplify database directory setting logic

No need to check if the options are set.  The options are always set
via static defaults.

No need to talloc_strdup() the values via wrapper functions.  The
options aren't going away.  Remove now unused ctdb_set_tdb_dir() and
similar functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoctdbd: Remove duplicate database directory setting logic
Martin Schwenke [Mon, 21 Oct 2013 08:36:36 +0000 (19:36 +1100)]
ctdbd: Remove duplicate database directory setting logic

Defaults for ctdb->db_directory and similar variables are currently
set in 2 places.

Change this to set them in only 1 place and make the directories at
initialisation time instead of waiting until later.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agocommon: New function ctdb_mkdir_p_or_die()
Martin Schwenke [Mon, 21 Oct 2013 08:29:39 +0000 (19:29 +1100)]
common: New function ctdb_mkdir_p_or_die()

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agocommon: New function mkdir_p()
Martin Schwenke [Mon, 21 Oct 2013 08:08:52 +0000 (19:08 +1100)]
common: New function mkdir_p()

Behaves like mkdir -p.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agotcp: Create socket lock in /var/run/ctdb instead of /tmp
Amitay Isaacs [Thu, 3 Oct 2013 05:13:41 +0000 (15:13 +1000)]
tcp: Create socket lock in /var/run/ctdb instead of /tmp

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

10 years agodoc/examples: Add CTDB configuration examples
Amitay Isaacs [Thu, 24 Oct 2013 03:26:12 +0000 (14:26 +1100)]
doc/examples: Add CTDB configuration examples

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoAdd missing $remote_fs LSB dependency
Mathieu Parent [Thu, 29 Aug 2013 06:20:05 +0000 (08:20 +0200)]
Add missing $remote_fs LSB dependency

10 years agoImproved check_ctdb
Mathieu Parent [Thu, 29 Aug 2013 05:42:12 +0000 (07:42 +0200)]
Improved check_ctdb

- increase verbosity with "-v"
- concat error messages (if there are several)
- handle 255 return code as warning (as it is the return code when any of the node is missing)
- read /etc/ctdb/nodes remotely (ctdb_check can be run on a non-ctdb host)

10 years agoAdd missing events.d/99.timeout
Mathieu Parent [Thu, 15 Aug 2013 18:23:57 +0000 (20:23 +0200)]
Add missing events.d/99.timeout

10 years agoeventscripts: Instead of listing all tunables, query EventScriptTimeout
Amitay Isaacs [Thu, 24 Oct 2013 03:37:41 +0000 (14:37 +1100)]
eventscripts: Instead of listing all tunables, query EventScriptTimeout

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
10 years agoctdb_client.h: fix build on AIX by removing C++-style comments
Michael Adam [Tue, 22 Oct 2013 22:46:34 +0000 (00:46 +0200)]
ctdb_client.h: fix build on AIX by removing C++-style comments

Reported by John P Janosik <jpjanosi@us.ibm.com>

Signed-off-by: Michael Adam <obnox@samba.org>
10 years agoctdbd: Pass the public address file location in ctdb context
Martin Schwenke [Mon, 21 Oct 2013 08:52:01 +0000 (19:52 +1100)]
ctdbd: Pass the public address file location in ctdb context

No need to pass it as an extra argument to ctdb_start_daemon.

Also ensure options.public_address_list gets a nice static default.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoctdbd: Debug locks by default with override from enviroment variable
Martin Schwenke [Tue, 1 Oct 2013 05:13:29 +0000 (15:13 +1000)]
ctdbd: Debug locks by default with override from enviroment variable

Default is debug_locks.sh, relative to CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoctdbd: Default for event_script_dir should use CTDB_BASE
Martin Schwenke [Tue, 15 Oct 2013 03:10:58 +0000 (14:10 +1100)]
ctdbd: Default for event_script_dir should use CTDB_BASE

Also get rid of ctdb_set_event_script_dir().  It creates an
unnecessary copy of something that will be around for the lifetime of
the process.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoctdbd: Add nodes_file member to struct ctdb_context
Martin Schwenke [Mon, 21 Oct 2013 08:33:10 +0000 (19:33 +1100)]
ctdbd: Add nodes_file member to struct ctdb_context

This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.

Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded.  It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.

Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agotools/ctdb: CTDB_BASE is the default location of configuration files
Martin Schwenke [Mon, 21 Oct 2013 08:43:47 +0000 (19:43 +1100)]
tools/ctdb: CTDB_BASE is the default location of configuration files

Ensure that environment variable CTDB_BASE is set.

Update defaults for nodes and natgw_nodes to use CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoctdbd: Don't check CTDB_BASE before setting it, just don't override
Martin Schwenke [Tue, 15 Oct 2013 03:02:31 +0000 (14:02 +1100)]
ctdbd: Don't check CTDB_BASE before setting it, just don't override

That's what the 3rd argument to setenv(3) is for...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/integration: Pass --valgrinding option when running under valgrind
Martin Schwenke [Tue, 22 Oct 2013 04:36:30 +0000 (15:36 +1100)]
tests/integration: Pass --valgrinding option when running under valgrind

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoctdbd: Fix some errors in the popt configuration
Martin Schwenke [Mon, 21 Oct 2013 08:42:32 +0000 (19:42 +1100)]
ctdbd: Fix some errors in the popt configuration

That 4th argument isn't a default or similar, so consistently make it 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

10 years agoinitscript: New configuration variable CTDB_DBDIR_STATE
Martin Schwenke [Fri, 18 Oct 2013 05:43:26 +0000 (16:43 +1100)]
initscript: New configuration variable CTDB_DBDIR_STATE

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoscripts: Make detect_init_style() more readable
Martin Schwenke [Fri, 18 Oct 2013 02:24:03 +0000 (13:24 +1100)]
scripts: Make detect_init_style() more readable

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Rework the iSCSI eventscript
Martin Schwenke [Thu, 17 Oct 2013 05:44:24 +0000 (16:44 +1100)]
eventscripts: Rework the iSCSI eventscript

* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Don't update static routes on "recovered" event
Martin Schwenke [Thu, 17 Oct 2013 05:20:18 +0000 (16:20 +1100)]
eventscripts: Don't update static routes on "recovered" event

Routes only need to be updated when IPs have moved.  IP takeover runs
will generate "ipreallocated", which is enough.  "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: NAT gateway script doesn't need to handle "recovered" event
Martin Schwenke [Thu, 17 Oct 2013 05:17:26 +0000 (16:17 +1100)]
eventscripts: NAT gateway script doesn't need to handle "recovered" event

Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event.  The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Delete placeholder "recovered" and "shutdown" events
Martin Schwenke [Thu, 17 Oct 2013 05:14:14 +0000 (16:14 +1100)]
eventscripts: Delete placeholder "recovered" and "shutdown" events

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Clean up comment at the top of 00.ctdb
Martin Schwenke [Thu, 17 Oct 2013 05:13:21 +0000 (16:13 +1100)]
eventscripts: Clean up comment at the top of 00.ctdb

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Remove reconfigure check from samba and winbind eventscripts
Martin Schwenke [Thu, 17 Oct 2013 05:00:39 +0000 (16:00 +1100)]
eventscripts: Remove reconfigure check from samba and winbind eventscripts

There is no reconfigure code for these scripts so no need to check for
reconfiguration.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Remove reconfigure code from httpd eventscript
Martin Schwenke [Thu, 17 Oct 2013 04:58:25 +0000 (15:58 +1100)]
eventscripts: Remove reconfigure code from httpd eventscript

Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
Martin Schwenke [Thu, 17 Oct 2013 04:23:35 +0000 (15:23 +1100)]
eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()

A generic framework is no longer needed now that the "ctdb" checker is
the only one left.  Simplify the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoeventscripts: Remove TCP port checks other than the built-in CTDB one
Martin Schwenke [Thu, 17 Oct 2013 00:02:54 +0000 (11:02 +1100)]
eventscripts: Remove TCP port checks other than the built-in CTDB one

"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.

Remove tests related to the removed checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agoscripts: Remove setting of PATH from functions file
Martin Schwenke [Wed, 16 Oct 2013 23:52:00 +0000 (10:52 +1100)]
scripts: Remove setting of PATH from functions file

The current setting is inconsistent with settings on most systems,
putting /bin before /sbin.  Use of /usr/local/bin, which may be
required on some systems, is also overridden.  This can make it
difficult to do interactive debugging of script problems.

Rely on the system PATH instead.

If system-specific changes need to be made then this can be done in a
configuration file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/eventscripts: Run scripts under sh by default
Martin Schwenke [Wed, 16 Oct 2013 23:39:09 +0000 (10:39 +1100)]
tests/eventscripts: Run scripts under sh by default

Some scripts are disabled by default so are no executable.  Explicitly
running them under sh allows them to be run without having to mess
around and make them executable or similar.

Signed-off-by: Martin Schwenke <martin@meltin.net>
10 years agotests/eventscripts: New tests for 20.multipathd
Martin Schwenke [Tue, 15 Oct 2013 05:44:45 +0000 (16:44 +1100)]
tests/eventscripts: New tests for 20.multipathd

Signed-off-by: Martin Schwenke <martin@meltin.net>