Martin Schwenke [Mon, 8 Apr 2013 04:37:08 +0000 (14:37 +1000)]
tests/takeover: Allow takeover runs with differing IP allocations per node
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
954ae6f84cb06a8dcbc12456d4752280072be5bf)
Amitay Isaacs [Fri, 24 May 2013 08:07:39 +0000 (18:07 +1000)]
vacuum: Reduce the priority of non-critical error
Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain. We will try again to store this record.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)
Michael Adam [Fri, 17 May 2013 09:05:44 +0000 (11:05 +0200)]
ctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
b697625b184227dad1be31a41b7a3fd9bd312e29)
Michael Adam [Fri, 17 May 2013 09:01:31 +0000 (11:01 +0200)]
ctdbd: remove a nonempty blank line
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
d9e24782a90d9ce29c0e6584b75d2b186142174d)
Michael Adam [Fri, 17 May 2013 09:00:32 +0000 (11:00 +0200)]
ctdbd: update comment describing ctdb_call_send_redirect()
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
9a21d417c51fb9cad8f2e87e00ca54d379aef860)
Martin Schwenke [Mon, 6 May 2013 10:31:08 +0000 (20:31 +1000)]
tests/takeover: New tests to check runstate handling
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c57430998a3bdedc8a904eb3a9cdfde1421aff50)
Martin Schwenke [Mon, 6 May 2013 05:36:29 +0000 (15:36 +1000)]
recoverd: Nodes can only takeover IPs if they are in runstate RUNNING
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined. Both of
these events can (re)start services.
This stops IPs being hosted before the "startup" event has completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f15dd562fd8c08cafd957ce9509102db7eb49668)
Martin Schwenke [Thu, 23 May 2013 09:03:11 +0000 (19:03 +1000)]
recoverd: Handle errors carefully when fetching tunables
If a tunable is not implemented on a remote node then this should not
be fatal. In this case the takeover run can continue using benign
defaults for the tunables.
However, timeouts and any unexpected errors should be fatal. These
should abort the takeover run because they can lead to unexpected IP
movements.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c0c27762ea728ed86405b29c642ba9e43200f4ae)
Martin Schwenke [Thu, 23 May 2013 09:01:01 +0000 (19:01 +1000)]
recoverd: Set explicit default value when getting tunable from nodes
Both of the current defaults are implicitly 0. It is better to make
the defaults obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
1190bb0d9c14dc5889c2df56f6c8986db23d81a1)
Martin Schwenke [Thu, 23 May 2013 06:09:38 +0000 (16:09 +1000)]
client: async_callback() sets result to -ETIME if a control times out
Otherwise there is no way of treating a timeout differently to a
general failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
40e34773b8063196457746ffe7a048eb87d96d61)
Martin Schwenke [Tue, 21 May 2013 05:41:56 +0000 (15:41 +1000)]
ctdbd: Update the get_tunable code to return -EINVAL for unknown tunable
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
Martin Schwenke [Wed, 22 May 2013 07:19:34 +0000 (17:19 +1000)]
recoverd: Whitespace improvements
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
473cfcb019f0cb4a094bf10397f7414f7923ee57)
Martin Schwenke [Wed, 22 May 2013 10:56:03 +0000 (20:56 +1000)]
recoverd: Use talloc_array_length() for simpler code
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
f6792f478197774d2f3b2258c969b67c83e017ab)
Martin Schwenke [Fri, 11 Jan 2013 07:02:51 +0000 (18:02 +1100)]
ctdbd: When the "setup" event fails log an error and exit, don't abort
The "setup" event can fail when one of the eventscripts fails to run
its "setup" event. If this occurs then the eventscript should log an
error. The stack trace and core file generated when we abort provides
no useful information.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c50eca6fbf49a6c7bf50905334704f8d2d3237d7)
Martin Schwenke [Fri, 11 Jan 2013 05:02:31 +0000 (16:02 +1100)]
eventscripts: 11.natgw should not call ctdb tool in "init" event
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.
Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
39a43feae7c7de07ddaf2d6cb962f923d47d0c19)
Martin Schwenke [Thu, 18 Apr 2013 10:30:14 +0000 (20:30 +1000)]
ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
Martin Schwenke [Fri, 11 Jan 2013 03:09:14 +0000 (14:09 +1100)]
tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.
At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately. This behaviour isn't very
friendly.
The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.
The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
4a2effcc455be67ff4a779a59ca81ba584312cd6)
Martin Schwenke [Fri, 11 Jan 2013 03:07:12 +0000 (14:07 +1100)]
tools/ctdb: New command runstate to print current runstate
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
bf20c3ab090f75f59097b36186347cedb1c445d4)
Martin Schwenke [Tue, 21 May 2013 06:18:28 +0000 (16:18 +1000)]
ctdbd: New control CTDB_CONTROL_GET_RUNSTATE
Also new client function ctdb_ctrl_get_runstate().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1)
Martin Schwenke [Thu, 10 Jan 2013 05:48:39 +0000 (16:48 +1100)]
ctdbd: Start logging process earlier
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f43fe3a560d5915c1a9893256f4e7bfe3d7e290a)
Martin Schwenke [Thu, 10 Jan 2013 05:33:36 +0000 (16:33 +1100)]
ctdbd: Only start recovery daemon and timed events after setup event
This deconstructs ctdb_start_transport(), which did much more than
starting the transport.
This removes a very unlikely race and adds some clarity. The setup
event is supposed to set the tunables before the first recovery.
However, there was nothing stopping the first recovery from starting
before the setup event had completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
c31feb27dcdb748b5333321c85fe54852dfa1bcf)
Martin Schwenke [Thu, 10 Jan 2013 05:06:25 +0000 (16:06 +1100)]
ctdbd: Replace ctdb->done_startup with ctdb->runstate
This allows states, including startup and shutdown states, to be
clearly tracked. This doesn't include regular runtime "states", which
are handled by node flags.
Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
Martin Schwenke [Thu, 23 May 2013 06:06:47 +0000 (16:06 +1000)]
tools/ctdb: Remove duplicate command definition for "sync"
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
9e7b7cd04adc5e66e2ffa4edf463a682aaea379b)
Amitay Isaacs [Wed, 8 May 2013 13:29:55 +0000 (23:29 +1000)]
logging: Make sure ringbuffer messages are terminated with a newline
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
dbb7c550133c92292a7212bdcaaa79f399b0919b)
Amitay Isaacs [Wed, 8 May 2013 06:25:30 +0000 (16:25 +1000)]
tests: Fix output of run_tests usage
(This used to be ctdb commit
29911fa44a480c17c701528ef46919b2a962a366)
Amitay Isaacs [Wed, 8 May 2013 03:45:55 +0000 (13:45 +1000)]
locking: Set lock helper path once
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
80fbe9364350d42658f7f8af250ac87eb1afbc21)
Amitay Isaacs [Wed, 8 May 2013 00:42:08 +0000 (10:42 +1000)]
locking: Remove functions that are not used anymore
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
c660f33c3eaa1b4a2c4e951c1982979e57374ed4)
Amitay Isaacs [Tue, 30 Apr 2013 05:13:44 +0000 (15:13 +1000)]
locking: Remove functions that are not used anymore
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
6ea3212a7b177c6c06b1484cf9e8b2f4036653d9)
Amitay Isaacs [Tue, 30 Apr 2013 05:07:49 +0000 (15:07 +1000)]
locking: Use separate locking helper binary for locking
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7cde53a6cbe74b1e46f7e1bca298df82c08de866)
Amitay Isaacs [Tue, 30 Apr 2013 04:32:46 +0000 (14:32 +1000)]
locking: Create commandline arguments for locking helper
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f665e3d540c90579952e590caa5828acb581ae61)
Amitay Isaacs [Mon, 22 Apr 2013 05:36:27 +0000 (15:36 +1000)]
locking: Add a standalone helper to lock record/db
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a08b6ac19506160f3fb5925ea025027dce07781d)
Amitay Isaacs [Tue, 30 Apr 2013 04:14:16 +0000 (14:14 +1000)]
locking: Use database iterator for unmarking databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7630ca4116b476636c27407748088ea335f1a06c)
Amitay Isaacs [Tue, 30 Apr 2013 04:16:07 +0000 (14:16 +1000)]
locking: Add handler function for unmarking a database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
adc113055de98fae276f9b501aff5c03cd25ddc8)
Amitay Isaacs [Tue, 30 Apr 2013 04:12:40 +0000 (14:12 +1000)]
locking: Use database iterator for marking databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e8ea65b2713417db4a618a9f4633991cfaa93fe6)
Amitay Isaacs [Tue, 30 Apr 2013 04:07:11 +0000 (14:07 +1000)]
locking: Add handler function for marking a database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f120e40533780e02ff1cdc41cc6d3af1c4c83258)
Amitay Isaacs [Tue, 30 Apr 2013 04:10:06 +0000 (14:10 +1000)]
locking: Use database iterator for unlocking databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
187ed83f9701c7fa8d3cc476d47c5d2a87d5c308)
Amitay Isaacs [Tue, 30 Apr 2013 04:06:46 +0000 (14:06 +1000)]
locking: Add handler function for unlocking a database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
725239535f40ca2cca445bb5bf2e181351b330e9)
Amitay Isaacs [Tue, 30 Apr 2013 04:08:51 +0000 (14:08 +1000)]
locking: Use database iterator for locking databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
d2634d72d9ca0ceeb72cbb1adc95017a234480fd)
Amitay Isaacs [Tue, 30 Apr 2013 04:06:27 +0000 (14:06 +1000)]
locking: Add handler function for locking a database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
2a1c933ef7c78ee071e2a640ea10941f1c12e32a)
Amitay Isaacs [Tue, 30 Apr 2013 03:23:59 +0000 (13:23 +1000)]
locking: Refactor code to iterate over databases based on priority
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a3275854812aca86032704134fdf6a129069c86a)
Amitay Isaacs [Wed, 1 May 2013 02:55:22 +0000 (12:55 +1000)]
locking: Add newline to debug logs
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
d98a861716d5f8c1f4387d21666396d3164551b3)
Amitay Isaacs [Thu, 23 May 2013 03:04:06 +0000 (13:04 +1000)]
tools/ctdb: Fix racy ipreallocate code
This code tried to find the recovery master and send an ipreallocate
request to that node. When a node is stopped, this code asked the
stopped node for recovery master. Stopped node does not have up-to-date
information on the current recovery master. So ipreallocate requests
were sent to the wrong node and ignored by that node which is not the
recovery master.
Send ipreallocate request to all active nodes. That way we guarantee
that the current recovery master will see it and respond to it.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
0577ce3c68e4febf49a1ef5093e918db9d5ec636)
Amitay Isaacs [Wed, 22 May 2013 05:37:46 +0000 (15:37 +1000)]
ctdbd: Print version string in the daemon startup
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
9d4524d13cbba21bfaf61bd35667984359b379b3)
Amitay Isaacs [Wed, 22 May 2013 04:23:17 +0000 (14:23 +1000)]
build: Rename version.h to ctdb_version.h
This avoids clash with version.h from Samba tree.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
d18fcfff674e876abde8d51afec92d9c4a090d2f)
Amitay Isaacs [Thu, 9 May 2013 05:43:10 +0000 (15:43 +1000)]
logging: Fix a bug in ringbuffer
When ringbuffer is full, it does not return any entries. Simplify
ringbuffer logic by keeping track of number of log entries rather than
last entry.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
939d12b96a0cbebbe6269fa2b14f584058dd6174)
Martin Schwenke [Mon, 13 May 2013 05:27:04 +0000 (15:27 +1000)]
recoverd: takeover_run_core() should not use modified node flags
Modifying the node flags with IP-allocation-only flags is not
necessary. It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.
Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags. As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446)
Martin Schwenke [Mon, 20 May 2013 00:47:07 +0000 (10:47 +1000)]
ctdbd: Update confusing log message
Inactive can also mean stopped. To add information, just print the
flags instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a8605f7e06076e7edf84e0cc160fd3d9ab5c4b64)
Martin Schwenke [Fri, 17 May 2013 06:46:41 +0000 (16:46 +1000)]
Packaging: maketarball.sh should be a bash script due to pushd use
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3105f9e291d0792199ac9e689f6d0e0a47ee4b0d)
Martin Schwenke [Fri, 17 May 2013 06:42:25 +0000 (16:42 +1000)]
scripts: Rework notify.sh to use notify.d/ directory
This makes it easier to add notification handlers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
d29e9a420b133088bf23a847c8d1dbce56c25eb0)
Martin Schwenke [Tue, 14 May 2013 06:20:32 +0000 (16:20 +1000)]
ctdbd: Log a message when recovery master changes
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1f96ea08f9a39dfe537c9b957ac512c84dc76f91)
Martin Schwenke [Tue, 14 May 2013 05:38:08 +0000 (15:38 +1000)]
ctdbd: Log add and delete of IPs
At the moment, when someone deletes all the IPs on a node, all we see
are the release IP messages and we have to guess why.
Some would argue that add/release are more significant than
take/release so they should be logged.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3c3df1d6afec7e3e721f9bcd4e8b8e008fd6e50b)
Martin Schwenke [Tue, 14 May 2013 05:30:53 +0000 (15:30 +1000)]
ctdbd: Removed bogus comment in ctdb_find_iface()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
4a8d90d0812a3242f58a2a0e2aa0f528f60f7013)
Martin Schwenke [Tue, 14 May 2013 04:56:26 +0000 (14:56 +1000)]
eventscripts: Fix regression in _loadconfig()
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
f1619a36c1beba11533052dc5728fa3adaa08870)
Martin Schwenke [Thu, 9 May 2013 10:44:11 +0000 (20:44 +1000)]
initscript: If CTDB doesn't become ready, print a message before killing
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e6b6b793f61556c21e8daf34abf89ee7b388ecfb)
Christian Ambach [Wed, 8 May 2013 06:45:09 +0000 (08:45 +0200)]
build: Create sudoers.d dir during make install
otherwise make install into non-standard prefix will fail
Signed-off-by: Christian Ambach <ambi@samba.org>
(This used to be ctdb commit
0c0752515b66661ffae24be5f138bd2fab4dec5c)
Amitay Isaacs [Tue, 14 May 2013 13:18:32 +0000 (23:18 +1000)]
eventscripts: Do not use bashism for string comparison
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
b0cae7d5a00ef3764bae187affc8e9a252f4b329)
Martin Schwenke [Thu, 9 May 2013 02:53:48 +0000 (12:53 +1000)]
recoverd: Move IP flags into ctdb_takeover.c
These should never be seen outside the IP allocation code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e143abd16ccde2e0edfe103673d31a5fb06b6aef)
Martin Schwenke [Thu, 9 May 2013 02:51:57 +0000 (12:51 +1000)]
recoverd: Clear IP flags after IP allocation algorithm has run
If these flags are left set they will confuse other recovery daemon
code.
Factor the clearing code into new function clear_ipflags().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
45c776958017ea7001f061842c9e0f60e4a25f23)
Martin Schwenke [Fri, 3 May 2013 10:46:15 +0000 (20:46 +1000)]
recoverd: Remove unused mask argument and initial mask calculation
This has been replaced by set_ipflags() and associated functionality.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
d0a3822573db296e73cc897835f783c8abc084b3)
Martin Schwenke [Fri, 3 May 2013 10:41:32 +0000 (20:41 +1000)]
recoverd: When calculating rebalance candidates don't consider flags
This is really a check to see if a node is already hosting IPs. If
so, we assume it was previously healthy so it isn't considered as a
rebalance candidate. There's no need to limit this to healthy node,
since this is checked elsewhere.
Due to this the variable newly_healthy is renamed everywhere to
rebalance_candidates.
The mask argument is now completely unused.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
65e0ea6c2c0629e19349ba4b9affa221fde2b070)
Martin Schwenke [Fri, 3 May 2013 10:13:40 +0000 (20:13 +1000)]
recoverd: Remove unused mask argument from IP allocation functions
This is a no-op and is in a separate commit to make the previous
commit less cumbersome.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
107e656bbe24f9d21fbaf886a3e9417da4effe5a)
Martin Schwenke [Fri, 3 May 2013 05:57:21 +0000 (15:57 +1000)]
tests/takeover: Add takeover tests, mostly for NoIPHostOnAllDisabled
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7cf63722873a6a7baafd77aa3d8a1989b221dee9)
Martin Schwenke [Fri, 3 May 2013 06:59:20 +0000 (16:59 +1000)]
recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1308a51f73f2e29ba4dbebb6111d9309a89732cc)
Martin Schwenke [Fri, 3 May 2013 06:56:24 +0000 (16:56 +1000)]
recoverd: Factor out new function all_nodes_are_disabled()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
12aef10e9889760d98f58c8d916f19d069fa381a)
Martin Schwenke [Fri, 3 May 2013 05:55:01 +0000 (15:55 +1000)]
tests/takeover: Allow per-node tunable settings
Implemented for CTDB_SET_NoIPTakeover.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a1addd89fd9c0390912604097acd028cc24d3483)
Martin Schwenke [Fri, 3 May 2013 06:21:16 +0000 (16:21 +1000)]
recoverd: Refactor code to get NoIPTakeover tunable from all nodes
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1fb5352d2b6918fcc6f630db49275d25a3eebe8d)
Martin Schwenke [Fri, 3 May 2013 05:53:13 +0000 (15:53 +1000)]
tests: Unit test diff output should use filtered output
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
9721aae001b3023e9c8b4af2b143c0db3442d623)
Martin Schwenke [Fri, 3 May 2013 05:41:26 +0000 (15:41 +1000)]
recoverd: Add debug message when dropping IPs in IP allocation
Update tests accordingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
91405282ba4abad4ad8e8c5f7ee4c83c75f38280)
Martin Schwenke [Tue, 23 Apr 2013 02:30:33 +0000 (12:30 +1000)]
eventscripts: NFS RPC checks no longer support "knfsd"
No longer used, support removed from test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
0eb351ff4c7ee096de7c5e0a59561067091fa32e)
Martin Schwenke [Tue, 23 Apr 2013 02:17:31 +0000 (12:17 +1000)]
eventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC services
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs
* Installation and packaging additions to handle nfs-rpc-checks.d/
* Unit test updates, including deleting 1 test that sanity checked
test infrastructure
* Test infrastructure changes to use nfs-rpc-checks.d/
Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs. To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
7e792d6768d9ca420ce3713cb122e63afd594b15)
Martin Schwenke [Tue, 23 Apr 2013 01:14:48 +0000 (11:14 +1000)]
eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"
Want nfs_check_rpc_services() to support filenames without the 'k'.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
Martin Schwenke [Mon, 22 Apr 2013 20:42:54 +0000 (06:42 +1000)]
eventscripts: New function nfs_check_rpc_services()
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.
nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator. The files have one limit check and
a set of actions per line. The program name is extracted from the
file name.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
Martin Schwenke [Mon, 22 Apr 2013 20:28:27 +0000 (06:28 +1000)]
eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
5a717fd495ba5a2bfd481d69f38b68fa4576716f)
Martin Schwenke [Mon, 22 Apr 2013 20:27:02 +0000 (06:27 +1000)]
eventscripts: Factor out common code from nfs_check_rpc_service()
This creates new function _nfs_check_rpc_common().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
cc3bb42e48bbdabd19187c231846b98589b4f4f3)
Martin Schwenke [Mon, 22 Apr 2013 20:17:15 +0000 (06:17 +1000)]
eventscripts: Remove ganesha support from nfs_check_rpc_service()
This is unused so doesn't need to be maintained. An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
887733dd7be53158bfe07b30ef31b611d0f8122f)
Martin Schwenke [Mon, 22 Apr 2013 20:14:43 +0000 (06:14 +1000)]
Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"
This reverts commit
92f74fd589467b46c758e116e97417edfe8773d7.
This change is unused and is just complicating the function.
Conflicts:
config/functions
(This used to be ctdb commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f)
Martin Schwenke [Mon, 22 Apr 2013 19:54:12 +0000 (05:54 +1000)]
eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
The code in 60.nfs is going to be genericised, so make all the checks
look the same.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
Martin Schwenke [Mon, 22 Apr 2013 05:45:13 +0000 (15:45 +1000)]
eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3)
Martin Schwenke [Tue, 30 Apr 2013 05:33:12 +0000 (15:33 +1000)]
eventscripts: Remove unused function ctdb_check_counter_limit()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a8ef00608e48a551a334aded206146807aeb4c5a)
Martin Schwenke [Tue, 30 Apr 2013 05:23:20 +0000 (15:23 +1000)]
eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()
ctdb_check_counter_limit() can soon be removed...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1)
Martin Schwenke [Tue, 30 Apr 2013 05:19:52 +0000 (15:19 +1000)]
eventscripts: Might as well try to stat the reclock file first
It is in the background but it still might cause the counter to be
reset before it is checked.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
ef2cf75e95ff382c65524a4d77eb00ab8411d2fc)
Martin Schwenke [Tue, 30 Apr 2013 05:16:44 +0000 (15:16 +1000)]
eventscripts: Make the early exit in 01.reclock earlier
That way we don't even check the counter...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
136abd4604dc68f7c696704bac708bae53cf1940)
Martin Schwenke [Mon, 6 May 2013 06:23:25 +0000 (16:23 +1000)]
eventscripts: Minor cleanups for killtcp/tickle functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
25ef4f655f1efc833deb5e244f9fff461e92f439)
Martin Schwenke [Tue, 30 Apr 2013 01:39:46 +0000 (11:39 +1000)]
eventscripts: Tweak the timeout check in kill_tcp_connections()
This has 2 advantages:
1. It uses get_tcp_connections_for_ip() to check for leftover
connections, instead of custom code.
2. It checks for the timeout condition before sleeping. The current
code sleeps and then checks, so wastes a second.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
60a08eb96e1d97aab31e9bd4af01683c650541c2)
Martin Schwenke [Mon, 29 Apr 2013 20:31:30 +0000 (06:31 +1000)]
eventscripts: In killtcp/tickle functions, $_failed should be boolean
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
319c1b68d5aa78f82a68febcad233a7c78afc887)
Martin Schwenke [Mon, 29 Apr 2013 20:27:58 +0000 (06:27 +1000)]
eventscripts: Remove unused $_killcount from tickle_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
8514ca56830b30e7f0eb5018632640daaf8ff65d)
Martin Schwenke [Mon, 29 Apr 2013 20:25:26 +0000 (06:25 +1000)]
eventscripts: Refactor connection listing in killtcp and tickle functions
Uses new function get_tcp_connections_for_ip(). This avoids using a
temporary file and running netstat twice.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a621622903c7ef17764b15293d6ea8df5a53c7e1)
Martin Schwenke [Mon, 29 Apr 2013 20:19:18 +0000 (06:19 +1000)]
eventscripts: Reimplement kill_tcp_connections_local_only()
... using kill_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
10e4db8f796d1e3259733180494db3b4bbad291a)
Martin Schwenke [Mon, 29 Apr 2013 20:14:01 +0000 (06:14 +1000)]
eventscripts: Change handling of one-way kills in kill_tcp_connections()
This change is a no-op. However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
Martin Schwenke [Mon, 29 Apr 2013 20:05:52 +0000 (06:05 +1000)]
eventscripts: Remove unnecessary variables from killtcp/tickle functions
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3eae161472e6352f7f656851c73dc056f95113eb)
Martin Schwenke [Mon, 29 Apr 2013 17:54:17 +0000 (03:54 +1000)]
eventscripts: Clean up ctdb_check_command()
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
9e25fb261447a196de05937052779b36e75e7215)
Martin Schwenke [Mon, 29 Apr 2013 17:48:51 +0000 (03:48 +1000)]
eventscripts; Cleanup up ctdb_check_directories()
The documentation comments are wrong... and remove option
$service_name argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
d9e6cb945c5edac9ca6405c9228bf647fab814f5)
Martin Schwenke [Mon, 29 Apr 2013 17:45:21 +0000 (03:45 +1000)]
eventscripts: Assert that $service_name is set in a few key places
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3d0a7d83ddc824961d876fc9afba829c90aef3e7)
Martin Schwenke [Tue, 30 Apr 2013 05:31:27 +0000 (15:31 +1000)]
eventscripts: counters default to $script_name if $service_name not set
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
fff88940f71058e4eefd65f50a6701389c005c17)
Martin Schwenke [Mon, 29 Apr 2013 17:32:29 +0000 (03:32 +1000)]
eventscripts: Simplify handling of $service name in "managed" functions
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().
Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
27aab8783898a50da8c4bc887b512d8f0c0d842c)
Martin Schwenke [Mon, 29 Apr 2013 17:18:01 +0000 (03:18 +1000)]
eventscripts: Simplify handling of $service name in start/stop functions
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
Martin Schwenke [Mon, 29 Apr 2013 17:13:36 +0000 (03:13 +1000)]
eventscripts: Simplify handling of $service name in service_management
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e24baac0d2952e86d5ff31235901f06e2f2b2449)
Martin Schwenke [Mon, 29 Apr 2013 16:59:41 +0000 (02:59 +1000)]
eventscripts: Simplify handling of $service name in reconfigure functions
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c2ea72ff565222f9edab408638bd45dbba6e8ff7)
Martin Schwenke [Wed, 24 Apr 2013 07:14:32 +0000 (17:14 +1000)]
eventscripts: Remove unused function ctdb_check_counter_equal()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
fd536a26b310b5bf9628da62cca0b425f4a54030)
Martin Schwenke [Tue, 23 Apr 2013 03:56:15 +0000 (13:56 +1000)]
scripts: Fix script_log() regression
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.
Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
9dee4c84273633b9ad82e94dabbf0e6f86edbcef)