vlendec/samba-autobuild/.git
14 years agofor debugging
Ronnie Sahlberg [Tue, 27 Oct 2009 02:18:52 +0000 (13:18 +1100)]
for debugging

add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon

(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)

14 years agoctdb_diagnostics: don't use hardcoded path to iptables
Stefan Metzmacher [Tue, 6 Oct 2009 14:16:13 +0000 (16:16 +0200)]
ctdb_diagnostics: don't use hardcoded path to iptables

All event scripts use only the relative path, so we should
here.

Also PATH includes /sbin and /usr/sbin...

metze

(This used to be ctdb commit 20678e1506db1f96b58c326ee91339e797c07c22)

14 years agoctdb_client: fix DEBUG statement in ctdb_ctrl_modflags()
Stefan Metzmacher [Fri, 9 Oct 2009 13:47:06 +0000 (15:47 +0200)]
ctdb_client: fix DEBUG statement in ctdb_ctrl_modflags()

metze

(This used to be ctdb commit a244b75ee49556b0ff51e254cc812594ee3b23a7)

14 years agoserver: if takeover runs when the recovery master becomes unhealthy
Stefan Metzmacher [Fri, 9 Oct 2009 13:47:49 +0000 (15:47 +0200)]
server: if takeover runs when the recovery master becomes unhealthy

The problem was this:

When the monitor event fails, the node->flags get updated,
and an update (containing the old and new flags) is sent to
the recovery master.

If the recovery master sends the update to itself (the same process),
it was compairing the node->flags variable with the received new flags.
This check always found both flag values to be equal
and never sets the rec->need_takeover_run variable to true.

There were two problem, first the push_flags_handler() function
didn't pass the received old flags.

And the ctdb_control_modflags() function ignored the received old flags.

metze

(This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)

14 years agoserver: print out the full 64-bit srvid on 32-bit hosts
Stefan Metzmacher [Fri, 9 Oct 2009 13:50:59 +0000 (15:50 +0200)]
server: print out the full 64-bit srvid on 32-bit hosts

metze

(This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949)

14 years agotcp: don't log an error when we succefully bind to the desired address
Stefan Metzmacher [Wed, 21 Oct 2009 15:06:48 +0000 (17:06 +0200)]
tcp: don't log an error when we succefully bind to the desired address

metze

(This used to be ctdb commit 752a9c81de97be509de7e7feddde749cc5ee22a8)

14 years agopatch the event loop so we read the current time every iteration.
Ronnie Sahlberg [Mon, 26 Oct 2009 02:20:35 +0000 (13:20 +1100)]
patch the event loop so we read the current time every iteration.

log an error if the clock jumps backwards
also log an error if the clock jumps >5 seconds forward (we assume here we will get at least one event every 5 seconds)

(This used to be ctdb commit 11193e1e192bee6f579bdf1303153571a82711d7)

14 years agoSuggestion from Volker,
Ronnie Sahlberg [Mon, 26 Oct 2009 01:20:52 +0000 (12:20 +1100)]
Suggestion from Volker,

make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time.

(This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)

14 years agodisabel the multipath eventscript by default
Ronnie Sahlberg [Sun, 25 Oct 2009 23:22:00 +0000 (10:22 +1100)]
disabel the multipath eventscript by default

(This used to be ctdb commit e79c3bcead7bd4bfb74d0aec81908da71551c107)

14 years agoupdate the manpage for ctdb setreclock
Ronnie Sahlberg [Sun, 25 Oct 2009 23:11:00 +0000 (10:11 +1100)]
update the manpage for ctdb setreclock

(This used to be ctdb commit ab4a6a58fb002ec29c19d167800e47987b023fe4)

14 years agoautomatically re-activate the reclock file check if we set the reclock file to something
Ronnie Sahlberg [Sun, 25 Oct 2009 23:13:20 +0000 (10:13 +1100)]
automatically re-activate the reclock file check if we set the reclock file to something

(This used to be ctdb commit db250cad7c92c1cc0a690725a4e39531a2e1b7fd)

14 years agolower the log level of a debug message
Ronnie Sahlberg [Sun, 25 Oct 2009 22:35:18 +0000 (09:35 +1100)]
lower the log level of a debug message

(This used to be ctdb commit 496dc2e80b714811c6e69dc928deaad61cf603b1)

14 years agoAdd a mechanism where we can register notifications to be sent out to a SRVID when...
Ronnie Sahlberg [Fri, 23 Oct 2009 04:24:51 +0000 (15:24 +1100)]
Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects.

The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
   A special prefix for the srvid space has been set aside for samba :
   Only samba is allowed to use srvid's with the top 32 bits set like this.
   The lower 32 bits are for samba to use internally.

2, register a "notification" using the new control :
                    CTDB_CONTROL_REGISTER_NOTIFY         = 114,
   This control takes as indata a structure like this :
struct ctdb_client_notify_register {
        uint64_t srvid;
        uint32_t len;
        uint8_t notify_data[1];
};

srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.

If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.

A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".

3, a client that no longer wants to have a notification set up can deregister using control
                    CTDB_CONTROL_DEREGISTER_NOTIFY       = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
        uint64_t srvid;
};

When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.

(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)

14 years agowhen scripts timeout, log pstree to a file in /tmp and just log the filename in the...
Ronnie Sahlberg [Fri, 23 Oct 2009 02:55:21 +0000 (13:55 +1100)]
when scripts timeout, log pstree to a file in /tmp and just log the filename in the messages file

(This used to be ctdb commit 0785afba8e5cd501b9e0ecb4a6a44edf43b57ab0)

14 years agoset the eventscripts to timeout after 20 seconds
Ronnie Sahlberg [Fri, 23 Oct 2009 02:54:45 +0000 (13:54 +1100)]
set the eventscripts to timeout after 20 seconds
change the ban count to 10 failures before we ban by default

(This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)

14 years agoMerge commit 'martins/master'
Ronnie Sahlberg [Thu, 22 Oct 2009 23:43:13 +0000 (10:43 +1100)]
Merge commit 'martins/master'

(This used to be ctdb commit 514a60c57557042e463efeff53dd11b9fec40561)

14 years agonew version 1.0.99
Ronnie Sahlberg [Thu, 22 Oct 2009 07:16:33 +0000 (18:16 +1100)]
new version 1.0.99

(This used to be ctdb commit 14fca8383b6b1da49278a9181a975543b956161b)

14 years agoMerge commit 'origin/master'
Martin Schwenke [Thu, 22 Oct 2009 06:48:09 +0000 (17:48 +1100)]
Merge commit 'origin/master'

(This used to be ctdb commit f3e09f2cfd33e79e69fc8c84ce4781a31a7a0437)

14 years agoDocument onnode -n and -f options.
Martin Schwenke [Thu, 22 Oct 2009 06:47:10 +0000 (17:47 +1100)]
Document onnode -n and -f options.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 431f79f7c9038ebd95d27c2465207ca40b8f4f23)

14 years agoif a lock wait child died/finished, we could have released the lockwait handle and...
Ronnie Sahlberg [Thu, 22 Oct 2009 02:41:28 +0000 (13:41 +1100)]
if a lock wait child died/finished, we could have released the lockwait handle and set it to NULL before we call the destructors for releaseing the waiters.

The waiters reference the locakwait handle in order to remove itself from the li
nked list which caused a SEGV.

We dont actually need to remove ourselves from this list here since
if the parent freeze_handle holding the list is freed, then all waiters are rele
ased as well, and the only place we actually need to relink the waiter is in ctd
b_freeze_lock_handler, where we want to respond back to the clients and release
the waiters  but we still want to keep the freeze_handle hanging around.

(This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)

14 years agoFrom Volker L
Ronnie Sahlberg [Thu, 22 Oct 2009 01:19:40 +0000 (12:19 +1100)]
From Volker L
Fix some warnings  and an incorrect check for a talloc failure

(This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)

14 years agoFrom Wolfgang M.
Ronnie Sahlberg [Wed, 21 Oct 2009 20:58:44 +0000 (07:58 +1100)]
From Wolfgang M.

With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad.

(This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)

14 years agoMerge commit 'origin/master'
Martin Schwenke [Wed, 21 Oct 2009 10:48:15 +0000 (21:48 +1100)]
Merge commit 'origin/master'

(This used to be ctdb commit 61282d4a9be9e544aaa86f3cffc5b58e417f5ab1)

14 years agoTest suite: Remove the disable/enable monitor tests - they are useless.
Martin Schwenke [Wed, 21 Oct 2009 10:47:06 +0000 (21:47 +1100)]
Test suite: Remove the disable/enable monitor tests - they are useless.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8264c42969d4be7fc6c5b4d56f8b5ef7c62b3bfb)

14 years agoTest suite: Fix the timeouts on the skip share check tests.
Martin Schwenke [Wed, 21 Oct 2009 10:36:39 +0000 (21:36 +1100)]
Test suite: Fix the timeouts on the skip share check tests.

The timeout for waiting for state changes isn't very predictable.  It
is "about" MonitorInterval seconds...  but can be longer given the
duration of eventscript runs and other things.  So, we change the
timeout to MonitorInterval + EventScriptTimeout, hoping it never takes
that long.

Move the eventscript installation/removal from the old fake-tests into
a function in the functions file.  Implement supporting functions to
create/remove/check-for various files that it handles.  Also add a
function that uses all of this that waits for the next monitor event
(but only if all other monitor events pass).

The final check in the skip share check tests uses the above and waits
for a monitor event, and then checks that the node is still healthy.

Also enhance the wait_until function to handle a command starting with
'!' (as a separate word) to make it easy to wait for a file not to
exist.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 25e82a8a667a54c6921ef076c63fdd738dd75d19)

14 years agoDuring tests it is common to add/delete test eventscripts at runtime.
Ronnie Sahlberg [Wed, 21 Oct 2009 05:50:39 +0000 (16:50 +1100)]
During tests it is common to add/delete test eventscripts at runtime.
This can race with teh eventascript handling that does a :

list all scripts,   sort them,  then execute them

so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy

(This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)

14 years agolower the debug levels for the "create FD messages" so we dont fill up the logs.
Ronnie Sahlberg [Wed, 21 Oct 2009 04:26:24 +0000 (15:26 +1100)]
lower the debug levels for the "create FD messages" so we dont fill up the logs.

(This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)

14 years ago When clients have blocked, perhaps because the node is banned or stopped and...
Ronnie Sahlberg [Wed, 21 Oct 2009 04:20:55 +0000 (15:20 +1100)]
When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.

    Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.

    This avoids having queued up very very large number of MESSAGES that samba semds
     between eachother to nodes that are blocked/banned/stopped for extended periods
    .

(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)

14 years agodont restart ctdb when installing the rpm
Ronnie Sahlberg [Wed, 21 Oct 2009 02:54:02 +0000 (13:54 +1100)]
dont restart ctdb when installing the rpm

(This used to be ctdb commit ead97cabeb1e0b73bff9d45f8aec8b226769ee9f)

14 years agoIn ctdb_ltdb_store(), add a missing transaction_cancel when local store failed.
Michael Adam [Tue, 20 Oct 2009 14:57:23 +0000 (16:57 +0200)]
In ctdb_ltdb_store(), add a missing transaction_cancel when local store failed.

Spotted by Volker.

Michael

(This used to be ctdb commit 0a4d409baabf242a87c06293789d589c896b104c)

14 years agomprove the log message when we skip the ip allocation check from the recovery daemon.
Ronnie Sahlberg [Wed, 21 Oct 2009 00:51:30 +0000 (11:51 +1100)]
mprove the log message when we skip the ip allocation check from the recovery daemon.

we also skip this check if we are already in the process of performing an ip reallocation and not only when we are performing a full recovery.

(This used to be ctdb commit 1a09b02767f3928d3c5db0e0afc59bb938e4a445)

14 years agotreat interfaces with the name ethX* as bond devices
Ronnie Sahlberg [Wed, 21 Oct 2009 00:34:17 +0000 (11:34 +1100)]
treat interfaces with the name ethX* as bond devices

(This used to be ctdb commit 3997d7e5471810e9a2f145ce2e795073dfc5eded)

14 years agoTest suite: A timeout of MonitorInterval seconds sometimes isn't enough.
Martin Schwenke [Tue, 20 Oct 2009 06:11:01 +0000 (17:11 +1100)]
Test suite: A timeout of MonitorInterval seconds sometimes isn't enough.

Monitor events sometimes happen a little bit more than MonitorInterval
seconds apart.  This changes some timeouts to MonitorInterval + 1
seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 6ef4364b3349145b2fec23e0431cd6df6dcadd41)

14 years agoMerge commit 'origin/master'
Martin Schwenke [Tue, 20 Oct 2009 05:53:04 +0000 (16:53 +1100)]
Merge commit 'origin/master'

(This used to be ctdb commit a4aac7312947aa3b26bc26993f04b586c64f18cb)

14 years agoTest suite: New tests for validating SKIP_SHARE_CHECK options.
Martin Schwenke [Tue, 20 Oct 2009 05:52:22 +0000 (16:52 +1100)]
Test suite: New tests for validating SKIP_SHARE_CHECK options.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f50d64a8ac91415ca297216d2103ff940076f02b)

14 years agoTest suite: Update 99_ctdb_uninstall_eventscript.sh to use ctdb_init().
Martin Schwenke [Tue, 20 Oct 2009 05:51:06 +0000 (16:51 +1100)]
Test suite: Update 99_ctdb_uninstall_eventscript.sh to use ctdb_init().

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2b478b0f5f09dd06626592573f053706ac637edd)

14 years agoTest suite: Fix bug in node_has_status().
Martin Schwenke [Tue, 20 Oct 2009 05:45:29 +0000 (16:45 +1100)]
Test suite: Fix bug in node_has_status().

This function has been broken since it was updated to work with the
"stopped" state (probably commit
67c5bfb5f02c9d45a32d976021ede4fb2174dfe9).  Although ${var#:*:0}
removes the shortest matching prefix of $var, '*' can match substrings
that include ':' if '0' isn't where you expect.  So we were making
unexpected matches and incorrectly returning true for some cases.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11137bc2d492a62a26ec9f9f62ff362e81643f66)

14 years agoTest suite: add -x option to ctdb_init() function.
Martin Schwenke [Tue, 20 Oct 2009 05:44:44 +0000 (16:44 +1100)]
Test suite: add -x option to ctdb_init() function.

This facilitates tracing of tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1f906bd3476e7cebf217e35b5477d6a7bb615a0c)

14 years agoversion 1.0.98
Ronnie Sahlberg [Tue, 20 Oct 2009 04:36:35 +0000 (15:36 +1100)]
version 1.0.98

(This used to be ctdb commit 02862c086d045497f49f3c060700419815d607e7)

14 years agoFrom Wolfgang Mueller
Ronnie Sahlberg [Tue, 20 Oct 2009 02:01:15 +0000 (13:01 +1100)]
From Wolfgang Mueller

make sure to always create the vactun database and get rid of some annoying log messages

(This used to be ctdb commit 54f9c314a0354f1039208fe6ac7dc159b6db8750)

14 years agoFrom wolfgang Mueller
Ronnie Sahlberg [Tue, 20 Oct 2009 01:59:48 +0000 (12:59 +1100)]
From wolfgang Mueller

Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned

(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)

14 years agoMerge commit 'origin/master'
Martin Schwenke [Mon, 19 Oct 2009 05:46:45 +0000 (16:46 +1100)]
Merge commit 'origin/master'

(This used to be ctdb commit b3ae2b753261443dca317803752a9d61285a3270)

14 years agoadd a direcotry where multiple local scripts can be added to run when executing event...
Ronnie Sahlberg [Mon, 19 Oct 2009 05:22:15 +0000 (16:22 +1100)]
add a direcotry where multiple local scripts can be added to run when executing eventscripts

(This used to be ctdb commit 27d152a918680a59c7412aec7e1772f25b72d469)

14 years agowait a bit longer before shutting down when the reclock file is missing
Ronnie Sahlberg [Mon, 19 Oct 2009 04:33:20 +0000 (15:33 +1100)]
wait a bit longer before shutting down when the reclock file is missing

pring the filename of the missing file when we turn unhealthy and also
a 'df'

(This used to be ctdb commit 97ded8a629ec762f71bad28515e4fbc810790b1d)

14 years agoRevert "dont shutdown a node when the reclock file is temporarily unavailable."
Ronnie Sahlberg [Mon, 19 Oct 2009 04:30:44 +0000 (15:30 +1100)]
Revert "dont shutdown a node when the reclock file is temporarily unavailable."

This reverts commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50.

(This used to be ctdb commit 02f68dc60e0b7bf26d631850b12834d5c71a88f2)

14 years agoMerge branch 'onnode_options'
Martin Schwenke [Fri, 16 Oct 2009 05:39:46 +0000 (16:39 +1100)]
Merge branch 'onnode_options'

(This used to be ctdb commit 454125ccfda04aa6b4e14f5c05164d29f41a0ead)

14 years agoMerge commit 'origin/master'
Martin Schwenke [Fri, 16 Oct 2009 05:36:48 +0000 (16:36 +1100)]
Merge commit 'origin/master'

(This used to be ctdb commit 5ad283458e59ea8232e01f34be007901c10c8a2e)

14 years agoinitscript: when stopping on Red Hat use the success/failure functions.
Martin Schwenke [Fri, 16 Oct 2009 05:35:56 +0000 (16:35 +1100)]
initscript: when stopping on Red Hat use the success/failure functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bf5402b41282da94fee1ab3e4546ec089ff12f37)

14 years agoDont run eventscript monitor when the databases are frozen.
Ronnie Sahlberg [Thu, 15 Oct 2009 05:03:43 +0000 (16:03 +1100)]
Dont run eventscript monitor when the databases are frozen.
The databases can become frozen a while before we do the actual recovery
since we have the re-recovery timeout.

There is no point in doing much monitoring if we are waiting for a recovery,
or if we are banned.
This will eliminate some annoying log entries where certain tests will fail if the databases are locked.

(This used to be ctdb commit ff824676fab94168707aada7423ae766bc0f711c)

14 years agodont shutdown a node when the reclock file is temporarily unavailable.
Ronnie Sahlberg [Thu, 15 Oct 2009 02:19:10 +0000 (13:19 +1100)]
dont shutdown a node when the reclock file is temporarily unavailable.
Leave the node as UNHEALTHY this stops clients from accessing the node until
the reclock file can be accessed again

(This used to be ctdb commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50)

14 years agoadd logging everytime we create a filedescriptor in the main ctdb daemon
Ronnie Sahlberg [Thu, 15 Oct 2009 00:24:54 +0000 (11:24 +1100)]
add logging everytime we create a filedescriptor in the main ctdb daemon
so we can spot if there are leaks.

plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish

(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)

14 years agonew version 1.0.97
Ronnie Sahlberg [Wed, 14 Oct 2009 20:41:56 +0000 (07:41 +1100)]
new version 1.0.97

(This used to be ctdb commit ef992a64d2376b621d4d2973ae22e567158aee12)

14 years agoMerge commit 'martins/onnode_options'
Ronnie Sahlberg [Wed, 14 Oct 2009 04:51:57 +0000 (15:51 +1100)]
Merge commit 'martins/onnode_options'

(This used to be ctdb commit 82fad66123c1b8c5d4ed3b19c39acf6f367b3f37)

14 years agoversion 1.0.96
Ronnie Sahlberg [Wed, 14 Oct 2009 03:52:24 +0000 (14:52 +1100)]
version 1.0.96

(This used to be ctdb commit 536229fd120bc3fdc2419e22d3bd6ab243dd6667)

14 years agoadd more debugging output to eventscripts and when a script has timed out,
Ronnie Sahlberg [Wed, 14 Oct 2009 03:14:28 +0000 (14:14 +1100)]
add more debugging output to eventscripts and when a script has timed out,
print a full "pstree -p" to the log.

Example :
        |-ctdbd(29826)-+-ctdbd(29862)
        |              `-ctdbd(31897)-+-00.ctdb(31898)---sleep(31908)

change the default timeout to 60 seconds for eventscripts

(This used to be ctdb commit a3406c10d70f89d332eab25d481083142dff987d)

14 years agoMerge commit 'origin/master' into onnode_options
Martin Schwenke [Wed, 14 Oct 2009 02:49:30 +0000 (13:49 +1100)]
Merge commit 'origin/master' into onnode_options

(This used to be ctdb commit e62928f56ce8927b1d8686db2c31538c86462d1a)

14 years agoNew onnode options: -f to specify nodes file, -n to allow use of hostnames.
Martin Schwenke [Wed, 14 Oct 2009 02:44:57 +0000 (13:44 +1100)]
New onnode options: -f to specify nodes file, -n to allow use of hostnames.

The -f option allows an alternate nodes file to be specified,
overriding the CTDB_NODES_FILE environment variable.

The -n option allows hostnames to be used instead of node numbers.
Using a range of hostnames is invalid, so hostnames can't contain
hyphens ('-') - sorry!  You can use this option without a nodes file
by specifying "-f /dev/null".

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 46474e5f21fd97dd765c616647ff46055a9970e7)

14 years agomove the logging of the warning "No reclock file used" to the startup case so we...
Ronnie Sahlberg [Wed, 14 Oct 2009 01:12:04 +0000 (12:12 +1100)]
move the logging of the warning "No reclock file used" to the startup case so we only print this warning on "service ctdb start" and not for "service ctdb *"

(This used to be ctdb commit eb854f65f978f24583e221138eb4f9b917b89285)

14 years agowhen we change state between healthy/unhealthy, make sure we ask the recovery
Ronnie Sahlberg [Wed, 14 Oct 2009 00:59:16 +0000 (11:59 +1100)]
when we change state between healthy/unhealthy, make sure we ask the recovery
master to perform an explicit ip reallocation.

This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.

(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)

14 years agoallow a pre .95 version of a recovery master to freeze databases on a post .95 node...
Ronnie Sahlberg [Tue, 13 Oct 2009 23:14:03 +0000 (10:14 +1100)]
allow a pre .95 version of a recovery master to freeze databases on a post .95 node by remapping priority numbers and log this to log.ctdb

(This used to be ctdb commit 343c005367789e108c0320e95d7a264535d68dd8)

14 years agoalways create the nfs state directories during the monitor event.
Ronnie Sahlberg [Tue, 13 Oct 2009 22:15:24 +0000 (09:15 +1100)]
always create the nfs state directories during the monitor event.
this allows us to configure and enable nfs at runtime without having to restart ctdbd

(This used to be ctdb commit f6e39d35713475defaa08a623e194f3f2f8f7d53)

14 years agoPort Volkers deadlock avoidance patch to HEAD.
Ronnie Sahlberg [Tue, 13 Oct 2009 21:17:49 +0000 (08:17 +1100)]
Port Volkers deadlock avoidance patch to HEAD.
This patch ensures that we lock all non-notify related databases first and
then the notify databases to avoiud a deadlock where samba needs to lock records on two databases at once (and notify being the second database).

Newer versions of samba would instead use the set-db-prio control to set this explicitely on a database per database basis instead of relying on  hardcoded database names. This patch will be reverted in the future when all updated versions of samba has been pushed out.

(This used to be ctdb commit 70e7781df1f118a0e2632a9c634f3fd388fa6c8c)

14 years agowe must break the loop as soon as we find a suitable recmaster does exist
Ronnie Sahlberg [Mon, 12 Oct 2009 22:49:05 +0000 (09:49 +1100)]
we must break the loop as soon as we find a suitable recmaster does exist
otherwise "tdb ipreallocate" will silently fail to update the addresses.

(This used to be ctdb commit 346fa055f4106497b87df97da5ebd6e51fa1ef8c)

14 years agonew version 1.0.95
Ronnie Sahlberg [Mon, 12 Oct 2009 07:53:20 +0000 (18:53 +1100)]
new version 1.0.95

(This used to be ctdb commit 3501d6b70bd905d6fdc4e74fe2cedc3ba77e4b86)

14 years agouse the correct expected size for thew _cancel control
Ronnie Sahlberg [Mon, 12 Oct 2009 07:41:57 +0000 (18:41 +1100)]
use the correct expected size for thew _cancel control

(This used to be ctdb commit 5974b5f7998ef96aeadb7377f32ef1ab85bb5943)

14 years agoadd a dispatch to the recovery transaction cancel call
Ronnie Sahlberg [Mon, 12 Oct 2009 07:31:59 +0000 (18:31 +1100)]
add a dispatch to the recovery transaction cancel call

(This used to be ctdb commit c1d7c11978d27d2ee41a2129b31d9ab61a43f8da)

14 years agoMerge commit 'martins/master'
Ronnie Sahlberg [Mon, 12 Oct 2009 05:51:36 +0000 (16:51 +1100)]
Merge commit 'martins/master'

(This used to be ctdb commit 5f14874c5c705dd637f88a77f30c930fea1201d2)

14 years agoadd a new control for explicitely cancelling recovery transactions, i.e. the
Ronnie Sahlberg [Mon, 12 Oct 2009 05:48:05 +0000 (16:48 +1100)]
add a new control for explicitely cancelling recovery transactions, i.e. the
transactions we start across all tdb databased during the recovery.

this allows us to properly clean up and delete these tdb transactions on a
recovery failure.

(This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d)

14 years agoClean up ctdb_check_directories* eventscript functions.
Martin Schwenke [Mon, 12 Oct 2009 05:32:49 +0000 (16:32 +1100)]
Clean up ctdb_check_directories* eventscript functions.

There are 2 problems with this code:

* The loop in ctdb_check_directories_probe() breaks on filenames
  containing whitespace.

  The fix to protect them is to pass "$@" to this function and have it
  operate on "$@".

  Note that there's still a problem with whitespace in filenames in
  the 50.samba eventscript.  To fix this ctdb_check_directories_probe
  should read the filenames from stdin.  Another time...

* The check for '%' in filenames in ctdb_check_directories_probe()
  ends up involving several forks.  On a modern machine this can cost
  a couple of minutes when checking a large number of directories.

  The fix is to use a case statement.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit eb1fecaef9aa5cb85dff7d4f7af8a9878deabed8)

14 years ago40.vsftpd: reset the fail counter in the "recovered" event.
Martin Schwenke [Mon, 12 Oct 2009 05:17:37 +0000 (16:17 +1100)]
40.vsftpd: reset the fail counter in the "recovered" event.

Each recovery that involves IP reassignments results in a restart of
vsftpd in the "recovered" event.  Currently, we can have several
recoveries in quick succession and the "monitor" event following each
can fail because vsftpd isn't ready yet.  This results in cumulative
failures, so the node is marked unhealthy, even though vsftpd has
never had a proper opportunity to become ready.

This resets the fail count after each recovery.

While we're here, also move the delete of the restart flag file into
the body of the conditional.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 318abeb4b913a8d846e7eaf4cf5c2a67b61ce974)

14 years agoallow setting the recmode even when not completely frozen.
Ronnie Sahlberg [Mon, 12 Oct 2009 02:06:16 +0000 (13:06 +1100)]
allow setting the recmode even when not completely frozen.
we sometimes have to do this when we want to trigger a recovery

(This used to be ctdb commit 46194e87e189521375b39b4ef33da2b493429fd8)

14 years agoinitial attempt at freezing databases in priority order
Ronnie Sahlberg [Mon, 12 Oct 2009 01:08:39 +0000 (12:08 +1100)]
initial attempt at freezing databases in priority order

(This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)

14 years agouptade the freeze/thaw commands to be able to send the requested database priority...
Ronnie Sahlberg [Sun, 11 Oct 2009 22:22:17 +0000 (09:22 +1100)]
uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon.

this is encoded in the srvid field of the request header

(This used to be ctdb commit 0cb3d33caa42ed783e03bc825b181dde4cf63616)

14 years agoduring recovery, update all remote nodes so they use the same priorities
Ronnie Sahlberg [Sat, 10 Oct 2009 05:28:20 +0000 (16:28 +1100)]
during recovery, update all remote nodes so they use the same priorities
for the databases as this node.

(This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4)

14 years agoadd a control to read the db priority from a database
Ronnie Sahlberg [Sat, 10 Oct 2009 04:04:18 +0000 (15:04 +1100)]
add a control to read the db priority from a database

(This used to be ctdb commit ca6d045e419f308f57e74d4c978907afb05ddb85)

14 years agoadd a control to set a database priority. Let newly created databases default to...
Ronnie Sahlberg [Sat, 10 Oct 2009 03:26:09 +0000 (14:26 +1100)]
add a control to set a database priority. Let newly created databases default to priority 1.

database priorities will be used to control in which order databases are locked during recovery in.

(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)

14 years agoverify the DISABLED flag and compare with the previous flag we have registered for...
Ronnie Sahlberg [Sat, 10 Oct 2009 02:55:11 +0000 (13:55 +1100)]
verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference.

this prevents a situation where the remove node may cause spurious ip reallocations.

(This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35)

14 years agoFix bug spotted by Metze,
Ronnie Sahlberg [Fri, 9 Oct 2009 11:22:11 +0000 (22:22 +1100)]
Fix bug spotted by Metze,

the argument to ctdb_control_event_Script_disabled() is a string not a uint32

(This used to be ctdb commit 687535b51622d1fac7ccb38fa640bf1febd69fd8)

14 years agoversion 1.0.94
Ronnie Sahlberg [Thu, 8 Oct 2009 08:17:57 +0000 (19:17 +1100)]
version 1.0.94

(This used to be ctdb commit 5cb4d63bf6887d15aba37fafc3f6b6ba38027f13)

14 years agoif a node fails to become frozen during recovery, mark it up with as a culprit so...
Ronnie Sahlberg [Thu, 8 Oct 2009 05:45:25 +0000 (16:45 +1100)]
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned

(This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9)

14 years agoversion 1.0.93
Ronnie Sahlberg [Tue, 6 Oct 2009 06:05:14 +0000 (17:05 +1100)]
version 1.0.93

(This used to be ctdb commit e77bf5708df6782b4516f698b9981a1d27e2f10b)

14 years agoupdate natgw eventscript to allow you to fore it to update and / or to remove the...
Ronnie Sahlberg [Tue, 6 Oct 2009 05:09:24 +0000 (16:09 +1100)]
update natgw eventscript to allow you to fore it to update and / or to remove the configuration at runtime

(This used to be ctdb commit deed52b7e4aac94b4d11a8d89d08739e1dfd4ed7)

14 years agoMerge commit 'origin/master'
Martin Schwenke [Tue, 6 Oct 2009 02:39:31 +0000 (13:39 +1100)]
Merge commit 'origin/master'

(This used to be ctdb commit 7d91de8a837a12082c343980428153720dcad741)

14 years agoDocument CTDB_NODES_FILE environment variable used by onnode.
Martin Schwenke [Tue, 6 Oct 2009 02:38:00 +0000 (13:38 +1100)]
Document CTDB_NODES_FILE environment variable used by onnode.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 22f0065cd6b66fa0f623f465aaca98883955ac79)

14 years agoalways send the release/take ip controls to make sure all nodes are updated
Ronnie Sahlberg [Tue, 6 Oct 2009 01:25:44 +0000 (12:25 +1100)]
always send the release/take ip controls to make sure all nodes are updated

(This used to be ctdb commit 789703ea684717781c176fd3a2a24d96abde220b)

14 years agoadd a new message to ask the recovery daemon to temporarily disable checking ip addre...
Ronnie Sahlberg [Tue, 6 Oct 2009 01:11:32 +0000 (12:11 +1100)]
add a new message to ask the recovery daemon to temporarily disable checking ip address consistency.

This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery

(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)

14 years agoupdate addip/moveip/delip to make it less likely to trigger an accidental recovery
Ronnie Sahlberg [Tue, 6 Oct 2009 00:41:18 +0000 (11:41 +1100)]
update addip/moveip/delip to make it less likely to trigger an accidental recovery

(This used to be ctdb commit 3befe5526e147d49451fddc930aaafc3dbe2e9c1)

14 years agochange some loglevels and also pront the pnn of the ip for takeip/releaseip logging
Ronnie Sahlberg [Tue, 6 Oct 2009 00:40:38 +0000 (11:40 +1100)]
change some loglevels and also pront the pnn of the ip for takeip/releaseip logging

(This used to be ctdb commit 9d95dfbd12898975ba0d8560d95a974210d3de7c)

14 years agoadd a new function to collect a list of all active nodes EXCEPT a certain node
Ronnie Sahlberg [Mon, 5 Oct 2009 23:52:31 +0000 (10:52 +1100)]
add a new function to collect a list of all active nodes EXCEPT a certain node

(This used to be ctdb commit be52954d921e7d443304cf49fbd488c619a9c4ec)

14 years agoallocate takeoverip state as a child of vnn and also make the takeocerip context...
Ronnie Sahlberg [Mon, 5 Oct 2009 22:35:15 +0000 (09:35 +1100)]
allocate takeoverip state as a child of vnn and also make the takeocerip context a child of vnn

(This used to be ctdb commit 804e5905be51f43c8a338bfbe216fd8d5718850f)

14 years agoWhen adding a public ip to a node, make sure to push the assignment of ip addresses...
Ronnie Sahlberg [Mon, 5 Oct 2009 21:19:25 +0000 (08:19 +1100)]
When adding a public ip to a node, make sure to push the assignment of ip addresses out to all nodes so all nodes become aware who currently holds the ip.

(This used to be ctdb commit e8df6fc301fb7faf72c72eb39ea68d44d1526b00)

14 years agoversion 1.0.92
Ronnie Sahlberg [Fri, 2 Oct 2009 04:38:16 +0000 (14:38 +1000)]
version 1.0.92

(This used to be ctdb commit 9ffb0d08d34cbafed0e49350a3a72b15d92c8ea7)

14 years agowe should close this file on exec
Ronnie Sahlberg [Fri, 2 Oct 2009 03:41:54 +0000 (13:41 +1000)]
we should close this file on exec

(This used to be ctdb commit c1c0ebb8da9a6c29ee83868a311f07f30cb4ed16)

14 years agoMerge commit 'martins/master'
Ronnie Sahlberg [Thu, 1 Oct 2009 05:46:01 +0000 (15:46 +1000)]
Merge commit 'martins/master'

(This used to be ctdb commit 9b206d96da3341836cc25aee5693f551f6f3a80e)

14 years agoTest suite: The ctdb ping test should allow time to go backwards.
Martin Schwenke [Thu, 1 Oct 2009 05:39:09 +0000 (15:39 +1000)]
Test suite: The ctdb ping test should allow time to go backwards.

Time can actually go backwards during this test if ntpd happens to
adjust it little bit.  So we should cope...

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 23ae9e9863ea90c6fb3f105403fd098041fa73f4)

14 years agodont exit on a commit failure
Ronnie Sahlberg [Thu, 1 Oct 2009 04:53:35 +0000 (14:53 +1000)]
dont exit on a commit failure

(This used to be ctdb commit 4e9a3a5dc232bac12ab387ea0cf4f1b279bed5c1)

14 years agoRevert "Revert "allow the transaction commit to fail""
Ronnie Sahlberg [Thu, 1 Oct 2009 04:51:32 +0000 (14:51 +1000)]
Revert "Revert "allow the transaction commit to fail""

This reverts commit 74e416108df6934f45ca646d709785dd76ab3c35.

(This used to be ctdb commit d1d370033d5007ad1c2c34cd9eeac53001f4b13e)

14 years agodocument how to use the notification script
Ronnie Sahlberg [Thu, 1 Oct 2009 04:31:55 +0000 (14:31 +1000)]
document how to use the notification script

(This used to be ctdb commit b77e4698e7f83443243965f93b84237f2903cd46)

14 years agoadd a new notification to trigger on when ctdb has started
Ronnie Sahlberg [Thu, 1 Oct 2009 04:05:30 +0000 (14:05 +1000)]
add a new notification to trigger on when ctdb has started

(This used to be ctdb commit b1fe04f2e9447f762a0b805763deb29296585ff8)

14 years agoMinor fixes to 01.reclock eventscript.
Martin Schwenke [Wed, 30 Sep 2009 11:21:56 +0000 (21:21 +1000)]
Minor fixes to 01.reclock eventscript.

test -z really needs its argument to be quoted.  Simplified a status
test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit fe26da7780545b1ecc0a7da5bc1cf8beaeea94cc)