git.samba.org - amitay/ctdb.git/log

git.samba.org / amitay / ctdb.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Martin Schwenke [Tue, 16 Jul 2013 09:57:18 +0000 (19:57 +1000)]

tests/eventscripts: Add tests for monitoring of missing interfaces

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 12 Jul 2013 02:48:34 +0000 (12:48 +1000)]

eventscripts: A missing interface should cause monitoring to fail

A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.

This couldn't be done previously because orphaned interfaces used to
be listed for monitoring. This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.

While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch. ;-)

This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 12 Jul 2013 02:33:36 +0000 (12:33 +1000)]

eventscripts: Get list of configured interfaces using "ctdb ifaces"

This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces. This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 24 Jun 2013 05:49:48 +0000 (15:49 +1000)]

ctdbd: Allow extra recovery to repair persistent DBs during first recovery

Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
regression because a node may not have completed the "recovered" event
(so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
becomes healthy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Tue, 16 Jul 2013 02:53:16 +0000 (12:53 +1000)]

packaging: Bundle debug_locks.sh script in RPM

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 16 Jul 2013 02:52:00 +0000 (12:52 +1000)]

packaging: No need to check for existence of scripts, they always do

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Thu, 11 Jul 2013 04:26:38 +0000 (14:26 +1000)]

scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used

It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Mathieu Parent [Fri, 7 Jun 2013 17:01:06 +0000 (19:01 +0200)]

Update Nagios check to work with ctdb versions past 30 Aug 2011

Because of commit a779d83a6213e2ba

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 11 Jul 2013 03:01:13 +0000 (13:01 +1000)]

recoverd: Really fix bogus info in message about changed flags

Commit 9119a568c2b4601318f7751f537dca2f92a7230b attempted to fix this.
However, this was wrong because old_flags and new_flags were confused.
The latter has since been fixed in commit
7eb2f89979360b6cc98ca9b17c48310277fa89fc so this can now be fixed
properly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Jul 2013 04:44:56 +0000 (14:44 +1000)]

doc: Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Sumit Bose [Mon, 19 Nov 2012 17:45:37 +0000 (18:45 +0100)]

Print deleted nodes as well

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Sumit Bose [Thu, 1 Sep 2011 13:18:46 +0000 (15:18 +0200)]

IPv6 neighbor solicit cleanup

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Sumit Bose [Mon, 19 Nov 2012 10:13:03 +0000 (11:13 +0100)]

Fix memory leak in ctdb_send_message()

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Sumit Bose [Wed, 10 Aug 2011 15:53:56 +0000 (17:53 +0200)]

Fixes for various issues found by Coverity

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Sumit Bose [Mon, 19 Nov 2012 10:20:31 +0000 (11:20 +0100)]

Check return value of tdb_delete()

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 11 Jul 2013 03:46:18 +0000 (13:46 +1000)]

web: Update webpages

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 11 Jul 2013 01:34:46 +0000 (11:34 +1000)]

Tests: Correct the arguments to memset

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 10 Jul 2013 04:44:56 +0000 (14:44 +1000)]

doc: Update NEWS

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Jul 2013 07:19:55 +0000 (17:19 +1000)]

packaging: Add systemd support

Based on an original patch by Sumit Bose <sbose@redhat.com>.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Jul 2013 06:35:53 +0000 (16:35 +1000)]

build: Turn off all deprecation warnings

The "‘tevent_loop_allow_nesting’ is deprecated" warnings will be
around for a while and are annoying.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Wed, 10 Jul 2013 06:30:29 +0000 (16:30 +1000)]

build: Remove -DTEVENT_DEPRECATED_QUIET=1 from CFLAGS

This reverts the last part of 788cdbddbc902a5b076d23473450065b551d274d
- the rest of this has been implicitly reverted via tevent syncs.
This is just leftover noise.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Tue, 9 Jul 2013 05:22:07 +0000 (15:22 +1000)]

initscript: Simpify initscript and control CTDB via new ctdbd_wrapper

Currently the initscript is very complex. This makes it hard to read
and hard to add support for new init systems, such as systemd.

Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd. It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Mon, 8 Jul 2013 02:45:31 +0000 (12:45 +1000)]

recoverd: Recovery daemon should use ctdb_get_pnn, which can't fail

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Wed, 10 Jul 2013 02:23:30 +0000 (12:23 +1000)]

ctdbd: Print tdb flags when logging attached to database message

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 9 Jul 2013 02:32:53 +0000 (12:32 +1000)]

ctdbd: Set process names for child processes

This helps distinguish processes in process list in top, perf, etc.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 9 Jul 2013 02:24:59 +0000 (12:24 +1000)]

common/system: Add ctdb_set_process_name() function

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 6 Jun 2013 06:29:04 +0000 (16:29 +1000)]

traverse: Remove unused start_time field

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 6 Jun 2013 06:26:25 +0000 (16:26 +1000)]

traverse: Send records directly from traverse child to srcnode

Currently CTDB daemon reads records from a child process and then sends them to
srcnode via TRAVERSE_DATA control. This ties up main CTDB daemon and also
requires an extra copy of the record in the CTDB daemon. Instead send records
directly from traverse child process.

The control from child process still goes via local CTDB daemon as there
is no infrastructure currently to open a TCP socket to the srcnode.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 6 Jun 2013 06:12:07 +0000 (16:12 +1000)]

traverse: Pass reqid and srcnode information to local database traverse

So that traverse child process can directly send the TRAVERSE_DATA control to
the srcnode without first sending it to local node.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 8 Jul 2013 06:14:59 +0000 (16:14 +1000)]

packaging: When building with system libraries, add dependency for them

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 8 Jul 2013 05:49:58 +0000 (15:49 +1000)]

ctdbd: No need for DeadlockTimeout tunable

The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 8 Jul 2013 05:57:22 +0000 (15:57 +1000)]

initscript: Export CTDB_DEBUG_LOCKS variable

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 8 Jul 2013 05:56:30 +0000 (15:56 +1000)]

scripts: Add an example debug_locks.sh script to debug locking issue

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 8 Jul 2013 05:46:53 +0000 (15:46 +1000)]

locking: Use external script to debug locking issues

Use an external script to parse /proc/locks and log useful debugging
information about locks rather than doing that in C code.

To use this feature, add configuration variable to /etc/sysconfig/ctdb:

CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 3 Jul 2013 01:01:21 +0000 (11:01 +1000)]

locking: Update locking bucket intervals

0   < 1 ms
1   < 10 ms
2   < 100 ms
3   < 1 s
4   < 2 s
5   < 4 s
6   < 8 s
7   < 16 s
8   < 32 s
9   < 64 s
10   >= 64 s

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 3 Jul 2013 01:46:53 +0000 (11:46 +1000)]

locking: Update locks latency in CTDB statistics only for RECORD or DB locks

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 25 Jun 2013 05:36:13 +0000 (15:36 +1000)]

tools/ctdb: Fix the format of DB statistics output

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Tue, 25 Jun 2013 05:25:16 +0000 (15:25 +1000)]

ctdbd: Remove incomplete ctdb_db_statistics_wire structure

Send the ctdb_db_statistics directly instead of first copying it to
duplicate ctdb_db_statistics_wire structure. This simplifies the
implementation of the control to get database statistics.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 3 Jul 2013 23:04:49 +0000 (09:04 +1000)]

ctdbd: Update debug messages for setting readonly property on database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 5 Jul 2013 04:04:20 +0000 (14:04 +1000)]

recoverd: Fix buffer overflow error in reloadips

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 4 Jul 2013 10:02:29 +0000 (20:02 +1000)]

tests/eventscripts: Add some rudimentary tests for 60.ganesha

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 4 Jul 2013 06:05:01 +0000 (16:05 +1000)]

eventscripts: New configuration variable $CTDB_SKIP_GANESHA_NFSD_CHECK

This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 4 Jul 2013 06:00:33 +0000 (16:00 +1000)]

eventscript: Move Ganesha nfsd monitoring to a function

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Thu, 4 Jul 2013 05:11:54 +0000 (15:11 +1000)]

eventscripts: Drop RPC service version from nfs_check_rpc_service() calls

Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Tue, 2 Jul 2013 04:43:17 +0000 (14:43 +1000)]

ctdbd: Log something when releasing all IPs

At the moment this is silent and it can be confusing to see IPs just
disappear.

Also, this message:

Been in recovery mode for too long. Dropping all IPS

can cause anxiety when all IPs should already have been dropped.
Adding a comforting message saying that 0 IPs were dropped relieves
such anxiety. :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 09:00:36 +0000 (19:00 +1000)]

recoverd: Minor style improvements for ctdb_reload_remote_public_ips()

* Add a variable to the loop to make the code more readable and have
it generally fit into 80 columns.

* Improve comments.

* Improve log messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 08:45:46 +0000 (18:45 +1000)]

recoverd: Clean up log messages in remote IP verification

The log messages in verify_remote_ip_allocation() are confusing
because they don't include the PNN of the problem node, because it is
not known in this function.

Add the PNN of the node being verified as a function argument and then
shuffle the log messages around to make them clearer.

Also fold 3 nested if statements into just one.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:57:33 +0000 (17:57 +1000)]

recoverd: Fix an unclear log message - "Restart recovery process"

When the recovery master notices a node in recovery mode it starts the
recovery process, it doesn't restart it.

Update documentation to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:53:37 +0000 (17:53 +1000)]

recoverd: Fix an incorrect comment

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:48:01 +0000 (17:48 +1000)]

ctdbd: Use ctdb_die() on "setup" event failure

This is slightly easier to read because it all fits on 1 line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:43:52 +0000 (17:43 +1000)]

ctdbd: Avoid a core dump when "init" event fails

The "init" event only really fails in the scripts, which should log
something useful on failure. Therefore, a core dump isn't terribly
useful and sometimes attracts unwanted attention.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:42:11 +0000 (17:42 +1000)]

util: New function ctdb_die()

This is like ctdb_fatal() but exits cleanly without dumping core or
generating a backtrace.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 24 Jun 2013 09:03:26 +0000 (19:03 +1000)]

eventscripts: When replaying monitor status, don't log empty output

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Mon, 24 Jun 2013 06:05:03 +0000 (16:05 +1000)]

ctdbd: Release IP callback should fail if the IP is still hosted

At the moment there (at least) are 2 bugs that cause rogue IPs:

* A race where release_ip_callback() runs after a "subsequent" take IP
  has completed.  The IP is back on an interface but we unset
  vnn->iface in the callback.

* A "releaseip" eventscript times out.  We ignore the timeout and call
  it success, deleting the VNN even if the IP is still hosted.

  We could decide not to ignore the timeout and ban the node, but
  killing TCP connections can take a long time and that might result
  in a lot of manning.  We probably won't reinstate banning on
  "releaseip" until killing TCP connections has been optimised.

In both cases, a rogue IP can be avoided by leaving vnn->iface set and
simply failing the control.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Mon, 24 Jun 2013 05:49:48 +0000 (15:49 +1000)]

ctdbd: Log warnings in release IP when unexpected interface is encountered

Previous code changes work around a potential problems but do not
provide useful information when the a problem occurs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 07:37:05 +0000 (17:37 +1000)]

ping_pong: Validate num_locks argument > 0

This fixes the floating point error if num_locks = 0.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 07:27:00 +0000 (17:27 +1000)]

tests: If connection to ctdb daemon fails, exit

This fixes the segmentation error if any of the test code fails to
connect to CTDB daemon.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 07:00:23 +0000 (17:00 +1000)]

build: Fix compiler warnings for uninitialized variables

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 05:36:29 +0000 (15:36 +1000)]

recoverd: Send the result from child process only once

The result has been sent before the child keeps waiting for parent
ctdbd process.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 05:31:52 +0000 (15:31 +1000)]

packaging: Enable compiler optimizations

This reverts d09570c70551aa40390ce9ceffe7bc234e1afafe.

... hoping the segv has been found in last 6 years. :-)

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 05:14:10 +0000 (15:14 +1000)]

packaging: Allow building RPMs with system tdb/talloc/tevent

To build CTDB RPMs with system installed libraries, use following command:

  ./packaging/RPM/makerpms.sh \
    --with system_talloc \
    --with system_tdb \
    --with system_tevent

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 04:29:09 +0000 (14:29 +1000)]

packaging: Do not mark /etc/ctdb/functions as configuration file

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 03:19:56 +0000 (13:19 +1000)]

packaging: Install README.notify.d using %doc directive

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 02:45:32 +0000 (12:45 +1000)]

packaging: Install docs using %doc directive

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Thu, 4 Jul 2013 01:33:38 +0000 (11:33 +1000)]

packaging: Remove ctdb_transaction from docdir

It's bundled in ctdb-tests package.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:23:08 +0000 (17:23 +1000)]

doc: Add a disclaimer for the EnableBans tunable

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Sun, 30 Jun 2013 07:22:06 +0000 (17:22 +1000)]

doc: Add banning bug fixes to NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Tue, 2 Jul 2013 02:40:37 +0000 (12:40 +1000)]

ctdbd: Don't ban self if init or shutdown event fails

There is no point in banning the node if init or shutdown event times
out since it's going to quit anyway.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 27 Jun 2013 07:46:43 +0000 (17:46 +1000)]

doc: The second half of monitoring is only for recovery master

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Michael Adam [Wed, 26 Jun 2013 07:23:22 +0000 (09:23 +0200)]

recoverd: when the recmaster is banned, use that information when forcing an election

When we trigger an election because the recmaster considers itself inactive,
update our local nodemap with the recmaster's flags before calling
force_election(). This way, we don't send the inactive node freeze commands
(e.g.) that may fail and then lead to ourselves getting banned.

The theory is that this should help avoiding banning loops.

Signed-off-by: Michael Adam <obnox@samba.org>

commit | commitdiff | tree

Michael Adam [Wed, 26 Jun 2013 05:11:51 +0000 (07:11 +0200)]

recoverd: fix a comment typo

Signed-off-by: Michael Adam <obnox@samba.org>

commit | commitdiff | tree

Michael Adam [Fri, 21 Jun 2013 15:57:37 +0000 (17:57 +0200)]

recoverd: fix a comment in main_loop

Signed-off-by: Michael Adam <obnox@samba.org>

commit | commitdiff | tree

Michael Adam [Fri, 21 Jun 2013 12:06:22 +0000 (14:06 +0200)]

recoverd: eliminate some trailing spaces from ctdb_election_win()

Signed-off-by: Michael Adam <obnox@samba.org>

commit | commitdiff | tree

Martin Schwenke [Fri, 28 Jun 2013 06:31:07 +0000 (16:31 +1000)]

recoverd: Don't continue if the current node gets banned

Can not continue with recovery or monitoring cluster.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 28 Jun 2013 04:31:02 +0000 (14:31 +1000)]

recoverd: Refactor code to ban misbehaving nodes

Since we have nodemap information, there is no need to hardcode the
limit of 20.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Thu, 27 Jun 2013 06:01:16 +0000 (16:01 +1000)]

recoverd: Move code to ban other nodes after we get local node flags

If a node gets banned first, then it should not ban other nodes.

This code was moved up in main_loop to avoid waiting for nodemap
from other nodes (commit 83b0261f2cb453195b86f547d360400103a8b795).

To prevent a banned node from banning other nodes, we need to first get
nodemap information from local node, so trying to ban other nodes can
fail if we are already banned.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 27 Jun 2013 05:44:27 +0000 (15:44 +1000)]

recoverd: Delay the initial election if node is started in stopped state

Since there is an early exit if a node is stopped or banned, we can wait till
the node becomes active to start initial election.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 27 Jun 2013 05:33:49 +0000 (15:33 +1000)]

recoverd: Update capabilities only if the current node is active

Since we do an early return if a node is stopped or banned, move update
capabilities code below the early return and just before we check the
capabilities of current recovery master.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 27 Jun 2013 05:46:04 +0000 (15:46 +1000)]

recoverd: No need to check if node is recovery master when inactive

If a node is stopped or banned, it will cause early return from the
main_loop, so this check is redundent. The election will called by an
active node.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Thu, 27 Jun 2013 05:39:15 +0000 (15:39 +1000)]

recoverd: Always do an early exit from main_loop if node is stopped or banned

A stopped or banned node cannot do anything useful. So do not participate
in any cluster activity and do not cause any unnecessary network traffic.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 28 Jun 2013 04:10:47 +0000 (14:10 +1000)]

recoverd: Do not set banning credits on a node if current node is inactive

If the current node is banned or stopped, then it should not assign banning
credits to other nodes since the current node will not have up-to-date flags
of other nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 1 Jul 2013 07:40:36 +0000 (17:40 +1000)]

banning: Do not come out of ban if databases are not frozen

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 24 Jun 2013 04:33:32 +0000 (14:33 +1000)]

banning: No need to check if banned pnn is for local node

If the banned pnn is not the local node, the function returns early.
So no need for additional check.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 28 Jun 2013 04:04:18 +0000 (14:04 +1000)]

banning: Make ctdb_local_node_got_banned() a void function

When this function is called, we are already committed to banning
and there is no point in failing this function. In case, freezing of
databases fails, it will be fixed from recovery daemon.

commit | commitdiff | tree

Amitay Isaacs [Fri, 28 Jun 2013 04:02:44 +0000 (14:02 +1000)]

recoverd: Also check if current node is in recovery when it is banned

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Fri, 28 Jun 2013 04:09:35 +0000 (14:09 +1000)]

recoverd: Set node_flags information as soon as we get nodemap

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 26 Jun 2013 06:02:23 +0000 (16:02 +1000)]

recovered: Remove old comment as the code corresponding to that has gone away

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 24 Jun 2013 04:31:50 +0000 (14:31 +1000)]

banning: Log ban state changes for other nodes at higher debug level

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 1 Jul 2013 06:28:04 +0000 (16:28 +1000)]

freeze: Make ctdb_start_freeze() a void function

If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 1 Jul 2013 06:21:00 +0000 (16:21 +1000)]

freeze: If priority is invalid here, it's time to abort

ctdb_start_freeze() is called from ctdb_control_freeze() which fixes the
priority if it's 0 and return error if it's invalid. Other callers of
ctdb_start_freeze() are internal to CTDB. So if priority is invalid in
ctdb_start_freeze(), definitely something is seriously wrong.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 1 Jul 2013 03:26:33 +0000 (13:26 +1000)]

freeze: Log message from ctdb_start_freeze() and ctdb_control_freeze()

This ensures that whenever databases are frozen either via sending
control or by calling ctdb_start_freeze(), the action is logged.
Since ctdb_control_freeze() calls ctdb_start_freeze(), move logging of
message in early return condition if databases are already frozen.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Mon, 24 Jun 2013 04:18:58 +0000 (14:18 +1000)]

recoverd: Print banning message only after verifying pnn

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Amitay Isaacs [Wed, 26 Jun 2013 05:22:46 +0000 (15:22 +1000)]

recoverd: When updating flags on nodes, send updated flags and not old flags

This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa.
Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control
was sent to the local daemon which in turn informed the recovery daemon.
And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Wed, 26 Jun 2013 04:34:47 +0000 (14:34 +1000)]

tools/ctdb: Add "force" option to "recover" command

At the moment there is no easy way to force a recovery when attempting
to reproduce certain classes of bugs. This option is added without
documentation because it is dangerous until the bugs are fixed! :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Amitay Isaacs [Mon, 24 Jun 2013 07:37:15 +0000 (17:37 +1000)]

client: Exit with non-zero status when unix socket is closed

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Jun 2013 04:49:20 +0000 (14:49 +1000)]

doc: Fix ctdb ping entry in manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Jun 2013 04:47:20 +0000 (14:47 +1000)]

doc: Fix documentation for NoIPTakeover in ctdbd manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Jun 2013 04:33:12 +0000 (14:33 +1000)]

doc: Update notification script section in ctdbd manpage

The example notification script is now much more useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Jun 2013 04:32:50 +0000 (14:32 +1000)]

doc: Add nodestatus command to the ctdb manpage

Signed-off-by: Martin Schwenke <martin@meltin.net>

commit | commitdiff | tree

Martin Schwenke [Fri, 21 Jun 2013 00:52:05 +0000 (10:52 +1000)]

doc: Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>

Amitay's CTDB development