Ronnie Sahlberg [Thu, 1 Sep 2011 01:40:51 +0000 (11:40 +1000)]
ReadOnly: update the documentation about readonly locks
Ronnie Sahlberg [Thu, 1 Sep 2011 01:08:18 +0000 (11:08 +1000)]
ReadOnly: add a new control to activate readonly lock capability for a database.
let all databases default to not support this until enabled through this control
Ronnie Sahlberg [Thu, 1 Sep 2011 00:28:15 +0000 (10:28 +1000)]
ReadOnly: add a readonly flag to the getdbmap control and show the readonly setting in ctdb getdbmap output
Ronnie Sahlberg [Thu, 1 Sep 2011 00:21:55 +0000 (10:21 +1000)]
ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for
the persistent flag.
This is the same size as the original boolean but allows ut to add additional flags for the database
Ronnie Sahlberg [Tue, 23 Aug 2011 06:15:34 +0000 (16:15 +1000)]
LibCTDB : initialize ctdb->pnn to -1 when we create a new context
but before we learn the pnn of the local node
Ronnie Sahlberg [Tue, 23 Aug 2011 05:13:40 +0000 (15:13 +1000)]
LibCTDB : change the ctdb_fetch_lock_once test tool to use libctdb instead of the old client
Ronnie Sahlberg [Tue, 23 Aug 2011 05:00:27 +0000 (15:00 +1000)]
LibCTDB : add support for getrecmode
Ronnie Sahlberg [Tue, 23 Aug 2011 00:41:52 +0000 (10:41 +1000)]
ReadOnly: Check the readonly flag instead of whether the tdb pointer is NULL or not
Ronnie Sahlberg [Tue, 23 Aug 2011 00:37:20 +0000 (10:37 +1000)]
ReadOnly: add description of readonly records
Ronnie Sahlberg [Wed, 17 Aug 2011 06:14:57 +0000 (16:14 +1000)]
ReadOnly: clear out the tracking record once a revoke is completed
Ronnie Sahlberg [Thu, 21 Jul 2011 05:59:37 +0000 (15:59 +1000)]
ReadOnly: When the client wants a readwrite lock but the local node is the dmaster and also have delegations active we must send a CALL to the local daemon to trigger it to revoke the delegations
Ronnie Sahlberg [Thu, 21 Jul 2011 05:58:56 +0000 (15:58 +1000)]
ReadOnly: Change the update_record test tool to use the new fetchlock routine that can do either normal or readonly fetchlock
Ronnie Sahlberg [Wed, 20 Jul 2011 05:47:15 +0000 (15:47 +1000)]
ReadOnly: Add a test tool that requests a readonly delegation in a loop
Ronnie Sahlberg [Wed, 20 Jul 2011 05:43:55 +0000 (15:43 +1000)]
ReadOnly: Add a test tool to fetch a record, requesting a readonly delegation and lock the record once
Ronnie Sahlberg [Wed, 20 Jul 2011 05:37:37 +0000 (15:37 +1000)]
ReadOnly: Add clientside code to fetch readonly records
Ronnie Sahlberg [Wed, 20 Jul 2011 05:31:44 +0000 (15:31 +1000)]
ReadOnly: Add a ctdb_ltdb_fetch_readonly() helper function
Ronnie Sahlberg [Wed, 20 Jul 2011 05:17:29 +0000 (15:17 +1000)]
ReadOnly: Add handlign of readonly requests readwrite requests, delegations and revoking of delegation to the processing loop for CALL requests coming in from a local client via domain socket
Ronnie Sahlberg [Wed, 20 Jul 2011 05:13:47 +0000 (15:13 +1000)]
ReadOnly: Add processing for ReadOnly delegation requests and revoke requests to the processing loop for CALL packets we receive from different nodes.
This implements the ReadOnly and ReadWrite request processing, delegation and revoking of delegations for all requests coming in across the network from a remote node.
Ronnie Sahlberg [Wed, 20 Jul 2011 04:25:29 +0000 (14:25 +1000)]
ReadOnly: Once recovery has finished, make sure to free all revoke child processes and trigger the destructors for all deferred calls to re-queue the original packets to the input packet processing function
Ronnie Sahlberg [Wed, 20 Jul 2011 04:23:05 +0000 (14:23 +1000)]
ReadOnly: When releasing all deferred calls that blocked during revoke of all previous delegations, add a 1 second grace/delay for any new readonly delegation requests so that the read-write fetch-lock porcess has a chance to make progress
Ronnie Sahlberg [Wed, 20 Jul 2011 04:21:04 +0000 (14:21 +1000)]
ReadOnly: Add a new flag to call request packet to indicate that the client wants a readonly delegation
Ronnie Sahlberg [Tue, 23 Aug 2011 00:27:31 +0000 (10:27 +1000)]
ReadOnly: Add a function to start a revoke of all delegations for a record.
This triggers a child process to be created to perform the actual potentially blocking calls that are required.
Ronnie Sahlberg [Wed, 20 Jul 2011 03:49:17 +0000 (13:49 +1000)]
ReadOnly: Add functions to register CALLs to a context used to handle deferal of processing of CALL commands.
Once the contexts are freed, the deferred calls are re-issued to the input packet processing functions again.
This is needed when/if a CALL can not currently be processed by the main engine due to the record being locked down for revoking of all delegations.
The data is passed through several layers of callbacks, and finally a timed event callback to ensure that the processing of the packet will be restarted again at the topmost eventloop, avoinding event loop nesting.
Ronnie Sahlberg [Wed, 20 Jul 2011 03:30:12 +0000 (13:30 +1000)]
ReadOnly: Add an extra flag to ctdb_call_local to specify whether we want to write the record and header back to the tdb (for example we do when performing dmaster migrations)
Ronnie Sahlberg [Wed, 20 Jul 2011 03:20:32 +0000 (13:20 +1000)]
ReadOnly: After recovering all databases, make sure to clear out the tracking database used to track delegations and revoke. This is because the recovery will implicitely result in a revoke of all delegations.
Ronnie Sahlberg [Wed, 20 Jul 2011 03:15:48 +0000 (13:15 +1000)]
ReadOnly: Add "readonly" flag to the ctdb_db_context to indicate if this database supports readonly operations or not. Add a private lock-less tdb file to the ctdb_db_context to use for tracking delegarions for records
Assume all databases will support readonly mode for now and se thte flag for all databases. At later stage we will add support to control on a per database level whether delegations will be supported or not.
Ronnie Sahlberg [Wed, 20 Jul 2011 03:08:21 +0000 (13:08 +1000)]
ReadOnly: After performing a recovery, clear out all flags related to readonly delegations and revoke
Ronnie Sahlberg [Tue, 23 Aug 2011 00:23:18 +0000 (10:23 +1000)]
Add the missing "persistent" argument to db_exist()
The API for this function has changed since the 1.2 branch where readonly locks are being merged from
Ronnie Sahlberg [Wed, 20 Jul 2011 02:30:33 +0000 (12:30 +1000)]
ReadOnly: Add a new command 'ctdb cattdb'. This fucntion differs from 'ctdb catdb' in that 'cattdb' will always traverse the local tdb file only, while 'catdb' does a cluster traverse.
Since some record flags may differ between nodes in the cluster when read only delegations are in use, cattdb is needed when you need to know the exact flag settings on the current node itself.
Ronnie Sahlberg [Wed, 20 Jul 2011 02:21:33 +0000 (12:21 +1000)]
ReadOnly: Add printing of the record flags when we are traversing a database to print its content.
Ronnie Sahlberg [Wed, 20 Jul 2011 02:17:27 +0000 (12:17 +1000)]
ReadOnly: Add 4 new record flags to handle read only delegation and revoking of delegations
Ronnie Sahlberg [Wed, 20 Jul 2011 02:13:53 +0000 (12:13 +1000)]
ReadOnly: add a new test tool that does a fetchlock on a record, then bunps the RSN by 10 and writes the new content to the record as sprintf("%d", rsn)
Ronnie Sahlberg [Wed, 20 Jul 2011 02:06:37 +0000 (12:06 +1000)]
ReadOnly: Add clientside functions to send the UPDATE_RECORD control
Ronnie Sahlberg [Wed, 20 Jul 2011 01:50:14 +0000 (11:50 +1000)]
ReadOnly: Add test tool to validate the functions to manipulate and enumerate the bitmap of nodes to where we have readonly delegations
Ronnie Sahlberg [Wed, 20 Jul 2011 01:39:50 +0000 (11:39 +1000)]
ReadOnly: Add helper functions to manipulate a TDB_DATA as a bitmap for nodes that we are tracking as having a readonly delegation
Ronnie Sahlberg [Wed, 20 Jul 2011 01:27:05 +0000 (11:27 +1000)]
ReadOnly records: Add a new RPC function FETCH_WITH_HEADER.
This function differs from the old FETCH in that this function will also fetch the record header and not just the record data
Volker Lendecke [Mon, 22 Aug 2011 14:40:58 +0000 (16:40 +0200)]
Fix a const warning
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Mon, 22 Aug 2011 14:39:32 +0000 (16:39 +0200)]
Remove an unused variable
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Fri, 19 Aug 2011 15:05:36 +0000 (17:05 +0200)]
libctdb: "unpack_reply_control" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Fri, 19 Aug 2011 15:05:36 +0000 (17:05 +0200)]
libctdb: "unpack_reply_call" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Fri, 19 Aug 2011 15:05:36 +0000 (17:05 +0200)]
libctdb: "ctdb_request_free" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Fri, 19 Aug 2011 14:36:20 +0000 (16:36 +0200)]
libctdb: Make sure ctdb_request->ctdb is filled correctly
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Thu, 18 Aug 2011 12:47:09 +0000 (14:47 +0200)]
libctdb: Ensure 0-termination of sun_path
Rusty, please check!
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Thu, 18 Aug 2011 11:59:48 +0000 (13:59 +0200)]
libctdb: Fix a few format warnings
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Thu, 18 Aug 2011 11:57:58 +0000 (13:57 +0200)]
libctdb: Add license header to messages.c
Rusty, please check!
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Thu, 18 Aug 2011 11:37:23 +0000 (13:37 +0200)]
libctdb: Reorder attachdb
No code change, this is for easier reading the sequence of what happens
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Thu, 18 Aug 2011 11:55:24 +0000 (13:55 +0200)]
libctdb: Reorder set_message_handler
No code change, this is for better readability
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Thu, 18 Aug 2011 11:54:36 +0000 (13:54 +0200)]
libctdb: Correct
4bfdfda, stddef.h is needed by libctdb_private.h
Signed-off-by: Michael Adam <obnox@samba.org>
Volker Lendecke [Wed, 17 Aug 2011 12:46:43 +0000 (14:46 +0200)]
Add missing #include to libctdb/ctdb.c
We need that to have the "offsetof" macro, thus we don't need to redeclare it
in libctdb_private.h
Signed-off-by: Michael Adam <obnox@samba.org>
Ronnie Sahlberg [Wed, 17 Aug 2011 04:10:04 +0000 (14:10 +1000)]
Merge remote branch 'martins/eventscripts'
Martin Schwenke [Wed, 17 Aug 2011 04:02:45 +0000 (14:02 +1000)]
Eventscripts - new default TCP port checker using "ctdb checktcpport"
New function ctdb_check_tcp_ports_ctdb(). This should be fast... and
is now the default checker. If it fails in an unexpected way we fall
back to the nmap and netstat checkers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 17 Aug 2011 02:12:20 +0000 (12:12 +1000)]
Eventscripts - generalise TCP port checking plus new nmap-based checker
Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().
Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.
ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try. Default
value is currently "nmap netstat". If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed. This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 17 Aug 2011 00:27:01 +0000 (10:27 +1000)]
Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging
Use the new debug function to conditionally print the netstat output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 5 Aug 2011 06:39:57 +0000 (16:39 +1000)]
Eventscripts - weaken TCP port check message if CTDB has just been started.
Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started. The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.
This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service. When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message. This means that until the node actually becomes
healthy we see more friendly messages.
The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started. This reduces the chances of people reporting such
false recreates...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 5 Jul 2011 01:32:06 +0000 (11:32 +1000)]
Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port. There are 2 problems with this:
* Netstat is run on each loop iteration when it need only be run once.
* The -a option is used to list all connections but the function only
cares about the listening ports. There may be many thousands of
non-listening ports to grep through.
This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option. It also only runs netstat once before the
main loop.
When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Aug 2011 23:44:11 +0000 (09:44 +1000)]
Eventscripts: add a debug() function and call ctdb_set_current_debuglevel()
The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG). If no args are given
then use stdin - this allows the function to be used with here
documents.
To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Wed, 17 Aug 2011 00:16:35 +0000 (10:16 +1000)]
Add a new command 'ctdb checktcpport <port>'
that tries to bind to the specified port on INADDR_ANY.
This can be used for testing if a service is listening to that port or not.
Errors are printed to stdout and the returned status code is either 0 : if we managed to bind to the port (in which case the service is NOT listening on that bort) or the value of errno that stopped us from binding to a port.
errno for EADDRINUSE is 98 so a script using this command should check the status code against the value 98.
If this command returns 98 it means the service is listening to the specified port.
Ronnie Sahlberg [Tue, 16 Aug 2011 23:59:42 +0000 (09:59 +1000)]
dont use a too big persistence timeout value
Martin Schwenke [Tue, 16 Aug 2011 23:14:23 +0000 (09:14 +1000)]
Eventscripts - conditionally inherit ctdbd debug level in each monitor event
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Aug 2011 23:00:46 +0000 (09:00 +1000)]
Eventscripts - new function ctdb_set_current_debuglevel()
This function ensures that CTDB_CURRENT_DEBUGLEVEL is set. It works
like this:
1. If it is already set then do nothing, since it might have been set
some other way.
The recommended "other way" would be to add a file in rc.local.d/.
2. If it is not set then set it by sourcing
/var/ctdb/eventscript_debuglevel.
3. If this file does not exist then create it using output from "ctdb
getdebug".
If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.
If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Aug 2011 03:28:40 +0000 (13:28 +1000)]
Eventscripts - ensure the statd update-trigger file always exists.
See the comment in the code for details.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Aug 2011 03:18:40 +0000 (13:18 +1000)]
Eventscripts: remove "return 0" from 50.samba service_stop().
This potentially masks errors and was basically included by accident.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 15 Aug 2011 05:53:04 +0000 (15:53 +1000)]
Change the errors for 10.interface to clearly state ERROR: for error messages
Update the tests system to catch the new error strings generated by this change
Ronnie Sahlberg [Mon, 15 Aug 2011 05:43:15 +0000 (15:43 +1000)]
Merge remote branch 'martins/eventscript_tests'
Martin Schwenke [Mon, 15 Aug 2011 05:40:35 +0000 (15:40 +1000)]
Tests - exportfs stub needs to print out export options.
This is needed due to
bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 15 Aug 2011 05:27:50 +0000 (15:27 +1000)]
Merge remote branch 'martins/eventscript.10.interface'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:22:20 +0000 (15:22 +1000)]
Merge remote branch 'martins/60_nfs_regression'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:20:18 +0000 (15:20 +1000)]
Merge remote branch 'martins/eventscript.60.nfs.rpc'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:16:06 +0000 (15:16 +1000)]
Merge remote branch 'martins/test_suite'
Ronnie Sahlberg [Mon, 15 Aug 2011 05:15:12 +0000 (15:15 +1000)]
Merge remote branch 'martins/eventscript_tests'
Martin Schwenke [Mon, 15 Aug 2011 03:53:39 +0000 (13:53 +1000)]
Tests - ctdb listvars test should allow alphanumericals in tunable names.
This matches the new "LCP2PublicIPs" tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 15 Aug 2011 00:23:50 +0000 (10:23 +1000)]
Change the default for ip failover to be LCP2 and not DeterministicIPs
Martin Schwenke [Tue, 5 Jul 2011 07:21:57 +0000 (17:21 +1000)]
Eventscripts: 10.interfaces - make startup event actually mark interfaces up!
The startup event intends to mark interfaces up. However, it doesn't
actually do that because $INTERFACES is empty.
This uses the function get_all_interfaces() to list the
interfaces... and then mark them up.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 5 Jul 2011 07:20:09 +0000 (17:20 +1000)]
Eventscripts: 10.interfaces - startup comment says assume all interfaces good.
Interfaces are currently marked down. Mark them up instead, as per
the comment... and discussion with Ronnie.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 5 Jul 2011 07:18:30 +0000 (17:18 +1000)]
Eventscripts: 10.interfaces - new function get_all_interfaces().
Move existing interface listing code to new function in preparation
for using it in startup event.
While we're here change the "sort | uniq" into "sort -u" and save some
complexity.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 Jun 2011 07:07:39 +0000 (17:07 +1000)]
Eventscripts: 10.interface clean-ups - minor tweaks and new comments.
* sed can read files, it doesn't need a file piped to it
* use $() subshells instead of `` - they seem to quote better in dash
* tweak the uniquifying code so that it is easier to read
* add comments
* remove some extraneous semicolons at ends of lines
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 12 Aug 2011 06:30:54 +0000 (16:30 +1000)]
Tests: re-enable the NFS eventscript tests - they work again.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 12 Aug 2011 06:28:09 +0000 (16:28 +1000)]
Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd.
This effectively reverts
953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 Jun 2011 06:50:47 +0000 (16:50 +1000)]
Eventscripts: 10.interface clean-ups - variable name fix-ups.
Change most of the uppercase variable names to lowercase for
consistency with other variables, readability and so they can be
easily distinguished from environment/configuration variables. Change
the name of 2 of the variabless to add some clarity. Changes are as
follows:
INTERFACES -> all_interfaces
IFACES -> ctdb_interfaces
IFACE -> iface
I -> i
REALIFACE -> realiface
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 Jun 2011 06:27:01 +0000 (16:27 +1000)]
Eventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces().
The logic in the monitor event itself is very complex. Nearly all of
it can go away by adding a single check of
$CTDB_PARTIALLY_ONLINE_INTERFACES to the return logic of
monitor_interfaces() and reversing the sense of the corresponding
check.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 Jun 2011 06:10:23 +0000 (16:10 +1000)]
Eventscripts: 10.interfaces clean-up - use more descriptive variable names.
The name of variable $ok gives no clue to its meaning/use so this
changes that variable to be named $up_interfaces_found.
The return logic relating to $ok and $fail is difficult to read, so
these variables are given true/fale values, allowing the return logic
to be simplified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 Jun 2011 05:53:54 +0000 (15:53 +1000)]
Eventscripts: 10.interfaces cleanup - new functions mark_up(), mark_down().
The same few lines of logic are used every time an interface up or down.
This encapsulates those few lines in 2 new functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jan 2011 22:40:11 +0000 (09:40 +1100)]
Eventscripts: change failure counts and behaviour for statd and nfsd.
We reduce the number of failures before attempting a restart.
However, after 6 failures we mark the cluster unhealthy and no longer
try to restart. If the previous 2 attempts didn't work then there
isn't any use in bogging the system down with an attempted restart on
every monitor event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 17 Dec 2010 05:25:04 +0000 (16:25 +1100)]
Eventscripts: clean up 60.nfs monitor event.
This adds a helper function called nfs_check_rpc_service() and uses it
to make the monitor event much more readable. An example of usage is
as follows:
nfs_check_rpc_service "mountd" \
-ge 10 "verbose restart:b unhealthy" \
-eq 5 "restart:b"
The first argument to nfs_check_rpc_service() is the name of the RPC
service to be checked. The RPC service corresponding to this command
is checked for availability using the rpcinfo command. If the service
is available then the function succeeds and subsequent arguments are
ignored.
If the rpcinfo check fails then a failure counter for that particular
RPC service is incremented and subsequent arguments are processed in
groups of 3:
1. An integer comparison operator supported by test.
2. An integer failure limit.
3. An action string.
The value of the failure counter is checked using (1) and (2) above.
The first check that succeeds has its action string processed - note
that this explains the somewhat curious reverse ordering of checks.
It the example above:
* If the counter is >= 10 then a verbose message is printed
describing the failure, the service is restarted in the background
and the node is marked as unhealthy (via an "exit 1" from the
function).
* If the counter is == 5 then the service us restarted in the
background.
For more action options please see the code.
This also changes the ctdb_check_rpc() function so that it no longer
takes a program number to check. It now just takes a real RPC program
name that rpcinfo can resolve via /etc/rpc.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 11 Aug 2011 05:33:46 +0000 (15:33 +1000)]
Tests: Re-enable the Samba eventscript tests.
They work again.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 11 Aug 2011 05:32:28 +0000 (15:32 +1000)]
Revert "Tests: tweak some samba tests to cope with debug from ctdb_check_tcp_ports()."
This reverts commit
557ac30e60516742da10b83bfbbbb41430c977a2.
Martin Schwenke [Wed, 13 Apr 2011 02:37:42 +0000 (12:37 +1000)]
Eventscripts: fix regression in 60.nfs export checking.
Commit
35a60a63a9b5c7d98dde514ae552239506b691c9 introduced a
regression, reported by "Jonathan Buzzard" <J.Buzzard@dundee.ac.uk>,
as follows:
Basically the use of sed in the following code snippet does not work
for long exports where exportfs wraps the host or network onto the
next line.
exportfs | grep -v '^#' | grep '^/' |
sed -e 's/[[:space:]]*[^[:space:]]*$//' |
ctdb_check_directories
The result is that the you get lots of blank lines being sent to
ctdb_check_directories which causes the host to be marked as
unhealthy and then thrashing sets in of the managed IP's making the
whole cluster unusable.
This tightens up the sed expression so that it is less likely to
produce a spurious empty line. It also removes an unnecessary "grep -v".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Thu, 11 Aug 2011 04:15:22 +0000 (14:15 +1000)]
Merge remote branch 'martins/eventscript.10.interface'
Ronnie Sahlberg [Thu, 11 Aug 2011 04:01:02 +0000 (14:01 +1000)]
Merge remote branch 'martins/eventscript_infrastructure'
Martin Schwenke [Mon, 23 May 2011 06:00:05 +0000 (16:00 +1000)]
Eventscripts: in 60.nfs move statd-notify code to service_reconfigure().
This means that it now occurs on every reconfigure event. As a result
the ipreallocated event is removed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 11 Aug 2011 03:55:02 +0000 (13:55 +1000)]
Eventscripts - 60.nfs should define service_reconfigure().
Not $service_reconfigure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Thu, 11 Aug 2011 01:45:59 +0000 (11:45 +1000)]
When starting and stopping ctdb through the init-script, make sure we first clear all public ips bvefore we start the daemon, in case they are still hanging around since a previous kill -9 and also make sure we drop them after we have stopped the deamon when shutting down
CQ S1027550
Martin Schwenke [Thu, 13 Jan 2011 22:31:56 +0000 (09:31 +1100)]
Evenscripts: improvements to ctdb_service_check_reconfigure().
* Make this function applicable to "ipreallocated" event too.
* Monitor event should not always succeed just because we reconfigure.
If the service was unhealthy before the reconfigure and we end the
reconfigure with "exit 0" then we can cause the node's health status
to flip-flop.
To avoid this we return the status of the service from the previous
monitor event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 27 May 2011 04:37:37 +0000 (14:37 +1000)]
Eventscripts: 50.samba - only start/stop nmbd if $CTDB_SERVICE_NMB set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Mon, 23 May 2011 05:37:09 +0000 (15:37 +1000)]
Eventscripts: 50.samba needs null service_reconfigure() function.
Samba doesn't need to do anything for configuration changes. It will
notice configuration changes and reload automatically.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jan 2011 22:42:18 +0000 (09:42 +1100)]
Eventscripts: 40.vsftpd service_stop() no longer /dev/null's output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jan 2011 22:43:01 +0000 (09:43 +1100)]
Eventscripts: improvements to 41.httpd.
* Reduce the failure counts so that restart attempts happen sooner.
* Use service_start() and service_stop() for the restart.
ctdb_service_start() resets the failure count, which isn't very
useful in this context.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 17 Dec 2010 05:10:56 +0000 (16:10 +1100)]
Eventscript functions: new function ctdb_check_counter().
This should eventually be able to replace ctdb_check_counter_limit()
and ctdb_check_counter_equal(), although it doesn't issue warnings
like the former.
It takes 4 optional arguments:
1. _msg - If "error" then over limit causes an error message and and
exit 1. Anything else fails silently but the function returns 1.
Default is "error".
2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt).
Default is -ge.
3. _limit - Limit for the counter to be used in comparison. Default is
$service_fail_limit.
4. _service_name - Used to identify the counter. Default is
$service_name.
For example:
ctdb_check_counter error -ge 5 foo
will print a message and exit 1 if the counter for foo is >= 5,
whereas
ctdb_check_counter check -ge 5 foo
will just return 1 if the counter for foo is >= 5, and
ctdb_counter_check
with print a message and exit 1 if the counter for $service_name is >=
$service_fail_limit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 28 Jun 2011 04:57:11 +0000 (14:57 +1000)]
Eventscripts: remove unused remove_ip() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 13 Jan 2011 22:31:05 +0000 (09:31 +1100)]
Eventscripts: startstop_nfs stop no longer redirects output to /dev/null.
When stopping (as opposed to restarting) it is useful to see this
information.
Signed-off-by: Martin Schwenke <martin@meltin.net>