Martin Schwenke [Fri, 9 Aug 2013 05:41:37 +0000 (15:41 +1000)]
tools/ctdb: Factor, simplify and improve robustness of ipreallocate code
Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed. This is not
the case. Change the callers so they call the new ipreallocate()
function instead.
Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes. Inactive
nodes will ignore it. This is safe since we only want 1 reply. If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...
Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
4bf0b1c9d21986eecb7682f935bd6154c65533cc)
Martin Schwenke [Wed, 14 Aug 2013 18:38:02 +0000 (04:38 +1000)]
tools/ctdb: Use ctdb_get_pnn() to get PNN of the current node
This has already been stored at connect time and can't fail.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
d8eb2e7fdd7645719370dad4f2faa5c3fffa8249)
Michael Adam [Mon, 19 Aug 2013 14:54:06 +0000 (16:54 +0200)]
util: In passing the code, fix a space vs. tab in set_close_on_exec().
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
f9556a6f1fe0046308c8b363e6dcaf3f7ce6f2b7)
Michael Adam [Mon, 19 Aug 2013 15:07:19 +0000 (17:07 +0200)]
server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
00d3bf092e2f72eda330978c75ec85f17e870553)
Michael Adam [Tue, 13 Aug 2013 08:17:45 +0000 (10:17 +0200)]
server: fix wording and punctuation in comment block for ctdb_reply_dmaster().
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit
cb3a1c5af3b796dba30cae07118670d3c9e57df7)
Amitay Isaacs [Wed, 14 Aug 2013 01:44:12 +0000 (11:44 +1000)]
recoverd: Improve log message when nodes disagree on recmaster
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7b7aa7b599536cd60ebb84d363607bb4e953248a)
Amitay Isaacs [Fri, 2 Aug 2013 01:05:08 +0000 (11:05 +1000)]
common: Null terminate process name string so valgrind doesn't complain
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1c9025fdd08d1cea342af7487d0123015e08831b)
Amitay Isaacs [Mon, 12 Aug 2013 05:50:30 +0000 (15:50 +1000)]
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f0853013655ac3bedf1b793de128fb679c6db6c6)
Amitay Isaacs [Mon, 12 Aug 2013 05:51:00 +0000 (15:51 +1000)]
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a610bc351f0754c84c78c27d02f9a695e60c5b0f)
Amitay Isaacs [Tue, 6 Aug 2013 04:37:13 +0000 (14:37 +1000)]
db_wrap: Make sure tdb messages are logged correctly
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
60cb40d090e45ff6134c098a238fac7ad854f134)
Martin Schwenke [Mon, 12 Aug 2013 01:36:25 +0000 (11:36 +1000)]
eventscripts: Become unhealthy faster on nfsd failure
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
Martin Schwenke [Fri, 9 Aug 2013 01:56:29 +0000 (11:56 +1000)]
tools/ctdb: Increase default control timeout to 10 seconds
The current 3 second timeout is arbitrary and users trip over it
sometimes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
b49c4f39666d5b1596213bf41bcdc47ed3c327ae)
Martin Schwenke [Thu, 8 Aug 2013 06:02:44 +0000 (16:02 +1000)]
eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
ff5f0d1e29af2b293e30cdc54bed03a644be7038)
Martin Schwenke [Tue, 6 Aug 2013 02:42:13 +0000 (12:42 +1000)]
eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
11fbf4789d783dd0bac22754b374dd9ea4b03bad)
Martin Schwenke [Mon, 5 Aug 2013 05:12:14 +0000 (15:12 +1000)]
eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
Martin Schwenke [Fri, 2 Aug 2013 05:18:47 +0000 (15:18 +1000)]
eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
Martin Schwenke [Fri, 2 Aug 2013 06:05:46 +0000 (16:05 +1000)]
eventscripts: Separate out RPC service restart code
While doing this:
* Explicitly assign RPC program and version information in
_nfs_check_rpc_common(). This is more lines of code but is easier
to read.
* Don't print the options when starting a service. Trying to print it
makes the code messy for little benefit.
Update the eventscript unit testing code and a Ganesha test to
reflect this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e8b531405665885196c95fe1608db33a255bf761)
Martin Schwenke [Fri, 2 Aug 2013 06:03:42 +0000 (16:03 +1000)]
tests/eventscripts: Override background_with_logging(), just prepend "&"
That is, output that goes through background_with_logging() just gets
"&" prepended to each line. This is cleaner than having the tests
grovel through logs.
Update some 49.winbind/50.samba tests to deal with this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3ba933d806106d12bc48b83b22d0f314d9d1e5e5)
Martin Schwenke [Tue, 30 Jul 2013 06:24:24 +0000 (16:24 +1000)]
eventscripts: Remove support for RPC service 'q' and 's' restart flags
They're hard to maintain and provide very little benefit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
Martin Schwenke [Tue, 30 Jul 2013 06:21:36 +0000 (16:21 +1000)]
eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output. This is consistent with the way
CTDB generally deals with the output when stopping a service.
It also makes updating the eventscript unit tests easier.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c7332526b1b488abefeb4be78a7cd3f2f9abc451)
Martin Schwenke [Mon, 29 Jul 2013 05:27:24 +0000 (15:27 +1000)]
tests/simple: Unreachable node test should wait for recovery to complete
This should minimise the chances of a control timing out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
63be516673c5d9c0d543617bf1bb8bca919956a8)
Martin Schwenke [Mon, 29 Jul 2013 05:09:23 +0000 (15:09 +1000)]
tests/simple: Fix the missing IP test
Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.
Also, do not disable 10.interface until it matters. Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
4e3bd06916bd3adac213fb18c7c2a24854b02d45)
Amitay Isaacs [Tue, 13 Aug 2013 04:02:46 +0000 (14:02 +1000)]
recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases
When creating missing databases either locally or remotely, recovery
master calls ctdb_ctrl_createdb(). Recovery master always passes 0
for tdb_flags. For volatile databases, if TDB_INCOMPATIBLE_HASH is not
specified, then they will be attached without using jenkins hash causing
database corruption.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
2fc6b6403707a292d134140fc0b9145b454992c5)
Amitay Isaacs [Tue, 13 Aug 2013 03:55:47 +0000 (13:55 +1000)]
Revert "recoverd: Use correct tdb flags when creating missing databases"
This reverts commit
10a057d8e15c8c18e540598a940d3548c731b0b4.
This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)
Amitay Isaacs [Mon, 5 Aug 2013 07:28:47 +0000 (17:28 +1000)]
common/io: Keep queue buffer size multiple of 4K
Currently queue buffer size is realloc'd every time we need to extend the
buffer. Small increments can cause memory fragmentation. Instead always
extend buffer in multiples of 4K. This should reduce multiple talloc_realloc
calls when there are lots of packets in the socket buffer.
Also, if queue buffer has grown larger than 64K, throw away the buffer once
all the requests in the queue have been processed. That way queue does not
hold on to large buffers.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
5e9b1a7e24d058ff88aaa0563db36a804e866fa9)
Martin Schwenke [Fri, 26 Jul 2013 03:57:03 +0000 (13:57 +1000)]
packaging: Allow setting custom release number in RPM spec file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
867afb247bd8cc86c8d738f051a44cc534cafacf)
Amitay Isaacs [Wed, 31 Jul 2013 05:59:11 +0000 (15:59 +1000)]
ctdbd: When a record is made sticky, log only once
Instead of logging from ctdb_request_call(), log the message from
ctdb_make_record_sticky(). That way if the record is already sticky, the
message is not repeated unnecessarily.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
44a64d1c388bfe3c3388b191edfaedecfb7bb831)
Amitay Isaacs [Mon, 15 Jul 2013 07:34:31 +0000 (17:34 +1000)]
ctdbd: Improve high hopcount log messages when request is redirected
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e)
Martin Schwenke [Tue, 6 Aug 2013 06:11:40 +0000 (16:11 +1000)]
scripts: Do not run ctdb tool commands when debugging hung "init" event
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.
Also, minor log formatting changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
81d7ce03b28d592a1337639e14d9ea141e20bfff)
Amitay Isaacs [Mon, 5 Aug 2013 07:38:42 +0000 (17:38 +1000)]
ctdbd: Avoid leaking file descriptor if talloc fails
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe)
Amitay Isaacs [Mon, 5 Aug 2013 04:08:28 +0000 (14:08 +1000)]
eventscript: Wait for debug hung script to finish or timeout before continuing
Currently if the debug hung script takes long time to finish, the subsequent
monitor event can collide with the previous event which is not yet finished.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
9e99e0eb072e2b845914ee3896acbc66b96138d7)
Amitay Isaacs [Fri, 2 Aug 2013 05:49:06 +0000 (15:49 +1000)]
eventscripts: Use configured RECLOCK file instead of asking CTDB
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead. And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
44eb86e6042adb6efe75d2a5528b82a0f21d496d)
Amitay Isaacs [Fri, 2 Aug 2013 00:54:38 +0000 (10:54 +1000)]
locking: Do not create multiple lock processes for the same key
If there are multiple lock helper processes waiting for the same record, then
it will cause a thundering herd when that record has been unlocked. So avoid
scheduling lock contexts for the same record. This will also mean that
multiple requests will get queued up behind the same lock context and can be
processed quickly once the lock has been obtained.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9)
Amitay Isaacs [Fri, 2 Aug 2013 00:51:45 +0000 (10:51 +1000)]
locking: Move function find_lock_context() before ctdb_lock_schedule()
So that ctdb_lock_schedule() can call this function without requiring extra
prototype declaration.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
68af5405acc123b5a90decd2123e2a02961a8fcf)
Amitay Isaacs [Tue, 30 Jul 2013 04:17:55 +0000 (14:17 +1000)]
ctdbd: Print set db sticky message after it's set
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
824dcec35ec461d78e22b2ea109473b32bfe3972)
Amitay Isaacs [Tue, 4 Dec 2012 07:27:10 +0000 (18:27 +1100)]
tests: Add a test program to hold a lock on a database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
f6b066a23610fb0092298861c21a9b354b91e2f1)
Amitay Isaacs [Tue, 30 Jul 2013 02:45:01 +0000 (12:45 +1000)]
recoverd: Use correct tdb flags when creating missing databases
When creating missing databases either locally or remotely, make sure
to use the correct tdb flags from other nodes. Without this, volatile
databases can get attached without TDB_INCOMPATIBLE_HASH flag.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
10a057d8e15c8c18e540598a940d3548c731b0b4)
Amitay Isaacs [Thu, 1 Aug 2013 01:07:59 +0000 (11:07 +1000)]
client: Always use jenkins hash when attaching volatile databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
7e7e59c4047c78159387089eca65d90037bcf722)
Amitay Isaacs [Mon, 29 Jul 2013 03:50:44 +0000 (13:50 +1000)]
recoverd: Make sure to use jenkins hash for recovery databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
32c83e209823e9a4d6306bb7fd63d4500f3e2668)
Amitay Isaacs [Mon, 22 Jul 2013 07:26:28 +0000 (17:26 +1000)]
recoverd: Assemble up-to-date node flags information from remote nodes
Currently nodemap used by recovery master is the one obtained from the local
node. This information may have been updated while processing main loop.
Before comparing node flags on all the nodes, create up-to-date node flags
information based on the information received from all the nodes.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
fcf77dec5af973a0e32f3999bc012053a6f47a96)
Amitay Isaacs [Mon, 15 Jul 2013 06:35:30 +0000 (16:35 +1000)]
tools/ctdb: Only print the hot records with non-zero hopcount
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
049d9beb3783482490e6273a434ccbad23f85f0a)
Amitay Isaacs [Mon, 15 Jul 2013 06:32:40 +0000 (16:32 +1000)]
ctdbd: Don't consider a hot record if the hopcount is zero
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
ab35773518ad15588013f4d859f7bee790437450)
Amitay Isaacs [Fri, 12 Jul 2013 07:33:13 +0000 (17:33 +1000)]
ctdbd: Fix updating of hot keys in database statistics
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
fde4b4db5a57f75c5efa5647c309f33e0d5a68f3)
Amitay Isaacs [Mon, 15 Jul 2013 05:24:11 +0000 (15:24 +1000)]
ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Instead of maintaining another structure, add an element as place holder for
marshall buffer of hot keys. This avoids duplication of the structure.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e73b2e12adc9db1dedb48d32bba3a8406a80f4cd)
Amitay Isaacs [Mon, 15 Jul 2013 04:52:07 +0000 (14:52 +1000)]
Revert "ctdbd: Remove incomplete ctdb_db_statistics_wire structure"
The structure cannot be removed without adding support for marshalling keys
for hot records.
This reverts commit
26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
023ca2e84f5ed064a288526b9c2bc7e06674dd81)
Martin Schwenke [Fri, 26 Jul 2013 05:09:24 +0000 (15:09 +1000)]
doc: Update XML files to use standard DocBook DTD
This simplifies building since we don't use any of the Samba
extensions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
57aa2dffea60abd73a95233f8b761cc676adebb6)
Martin Schwenke [Fri, 26 Jul 2013 01:20:47 +0000 (11:20 +1000)]
initscript: The wrapper script should export CTDB_SOCKET
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value. This at least ensures that ctdbd will be
started.
If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
37ccc7c6cc43a80aaa92291aea7a438f4225488a)
Martin Schwenke [Thu, 25 Jul 2013 06:17:07 +0000 (16:17 +1000)]
ctdbd: Kill client process without checking for tracked child
Commit
f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
to ensure that CTDB never kills unrelated processes. However, client
processes are unrelated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
782814288bb560099ee44b607bf35f3eddf37f82)
Martin Schwenke [Thu, 25 Jul 2013 03:40:43 +0000 (13:40 +1000)]
eventscripts: kill_tcp_connections() should send connections to stdin
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.
Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
Martin Schwenke [Thu, 25 Jul 2013 03:28:26 +0000 (13:28 +1000)]
tools/ctdb: Allow killtcp to read connections from standard input
This will allows eventscripts to send information about multiple tcp
connections to a single "ctdb killtcp" command, saving the overhead of
setting up a client connection per tcp connection.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
af5aa369c266430fe912df0c26116b68bac3572e)
Martin Schwenke [Mon, 22 Jul 2013 10:11:58 +0000 (20:11 +1000)]
tests: Always tally the number of passed/failed tests
Regardless of whether a summary is being printed!
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a69e03a5e4671e998d45b4fef8611a421bbdb3e1)
Martin Schwenke [Mon, 22 Jul 2013 06:39:46 +0000 (16:39 +1000)]
recoverd: Call takeover fail callback only once per node
Currently the fail callback is called once per (takeip/releaseip) control
failure. This is overkill and can get a node banned much too quickly.
Instead, keep track of control failures per node and only call fail
callback once per failed node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)
Martin Schwenke [Mon, 22 Jul 2013 05:08:32 +0000 (15:08 +1000)]
scripts: Run scriptstatus for hung event
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.
Since there is now quite a bit of output, serialise the calls to this
script using flock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)
Martin Schwenke [Mon, 22 Jul 2013 05:06:52 +0000 (15:06 +1000)]
ctdbd: Pass event name to hung script debugger
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e0f3fa1020e13b84bdd672538168d148f1847d57)
Martin Schwenke [Mon, 22 Jul 2013 04:32:13 +0000 (14:32 +1000)]
tests/complex: Fix NFS tests to work with root_squash
Refactor the NFS test setup/cleanup code into new common functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
29e98017221326bdc9b1c4f7c05b3b495c1de29b)
Martin Schwenke [Fri, 19 Jul 2013 09:59:43 +0000 (19:59 +1000)]
tests: Fix exit status of run_tests when a single test is run with -H
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
9d6e1c147bd036d832b98c155f405ee2a5d6f57f)
Martin Schwenke [Fri, 19 Jul 2013 05:33:38 +0000 (15:33 +1000)]
tests/simple: Add -p in onnode test to help show groups of connections
Change the command from "true" to "hostname" since the former won't
produce any output when used in combination with "onnode -p". This
could just be changed to "echo" but the hostname might actually be
useful.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
ae3c03d80264e997b7da9f3279d7810e18b8a1df)
Martin Schwenke [Wed, 17 Jul 2013 01:14:37 +0000 (11:14 +1000)]
ctdbd: Sleep at exit to allow time for log messages to flush
Register print_exit_message() earlier so that it covers most of the
early exits.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
90d792cf28d6a823141e4c417b6978f02a9cf596)
Martin Schwenke [Fri, 19 Jul 2013 05:36:29 +0000 (15:36 +1000)]
ctdbd: Exit if something is already listening on CTDB socket
Don't blindly remove the socket.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
3dd5b925dcf0e9a5b877638e471c5ecf36b46c58)
Martin Schwenke [Tue, 16 Jul 2013 09:57:18 +0000 (19:57 +1000)]
tests/eventscripts: Add tests for monitoring of missing interfaces
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
53e4eca74429f76adc81d98e3d11d1bd61194d71)
Martin Schwenke [Fri, 12 Jul 2013 02:48:34 +0000 (12:48 +1000)]
eventscripts: A missing interface should cause monitoring to fail
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.
This couldn't be done previously because orphaned interfaces used to
be listed for monitoring. This was worked around in 10.interface in
commit
49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.
While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch. ;-)
This effectively reverts
d67955b42f7627be9dae995230c8fcbb8a948ec2.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
501f19b16fd6d67fbb754248868c38ee5bcf79ef)
Martin Schwenke [Fri, 12 Jul 2013 02:33:36 +0000 (12:33 +1000)]
eventscripts: Get list of configured interfaces using "ctdb ifaces"
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces. This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)
Martin Schwenke [Mon, 24 Jun 2013 05:49:48 +0000 (15:49 +1000)]
ctdbd: Allow extra recovery to repair persistent DBs during first recovery
Commit
8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
regression because a node may not have completed the "recovered" event
(so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
becomes healthy.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
57ef5d3827ea3417a32703e259a53ce6fd10ac45)
Amitay Isaacs [Tue, 16 Jul 2013 02:53:16 +0000 (12:53 +1000)]
packaging: Bundle debug_locks.sh script in RPM
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
5740155cc5de1a223412e8529aa1a383a5412514)
Amitay Isaacs [Tue, 16 Jul 2013 02:52:00 +0000 (12:52 +1000)]
packaging: No need to check for existence of scripts, they always do
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
67c227a5d30cb8487b20b19b20bdfa4613906609)
Martin Schwenke [Thu, 11 Jul 2013 04:26:38 +0000 (14:26 +1000)]
scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
412bc0e20bef694d4e911dc9c984fd7716231f1f)
Mathieu Parent [Fri, 7 Jun 2013 17:01:06 +0000 (19:01 +0200)]
Update Nagios check to work with ctdb versions past 30 Aug 2011
Because of commit
a779d83a6213e2ba
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
a4afe7af9c9391048d6f80135bbd5e15367770c7)
Martin Schwenke [Thu, 11 Jul 2013 03:01:13 +0000 (13:01 +1000)]
recoverd: Really fix bogus info in message about changed flags
Commit
9119a568c2b4601318f7751f537dca2f92a7230b attempted to fix this.
However, this was wrong because old_flags and new_flags were confused.
The latter has since been fixed in commit
7eb2f89979360b6cc98ca9b17c48310277fa89fc so this can now be fixed
properly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
40f2825d6e818dc8c745b6385a545969dfb45fbc)
Martin Schwenke [Wed, 10 Jul 2013 04:44:56 +0000 (14:44 +1000)]
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
76703514040b804b880cab909f6ff52576f80f89)
Sumit Bose [Mon, 19 Nov 2012 17:45:37 +0000 (18:45 +0100)]
Print deleted nodes as well
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
0930a3b806977555509c3228726e2250aef1f971)
Sumit Bose [Thu, 1 Sep 2011 13:18:46 +0000 (15:18 +0200)]
IPv6 neighbor solicit cleanup
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a81edf7eb908659a379f0cb55fd5d04551dc2c37)
Sumit Bose [Mon, 19 Nov 2012 10:13:03 +0000 (11:13 +0100)]
Fix memory leak in ctdb_send_message()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
da87395d29f5d11ecfedaf36b53fa060a9140bfd)
Sumit Bose [Wed, 10 Aug 2011 15:53:56 +0000 (17:53 +0200)]
Fixes for various issues found by Coverity
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
05bfdbbd0d4abdfbcf28e3930086723508b35952)
Sumit Bose [Mon, 19 Nov 2012 10:20:31 +0000 (11:20 +0100)]
Check return value of tdb_delete()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
5cdcc3d45d358ddbcd7e864898eed9cbd9935429)
Amitay Isaacs [Thu, 11 Jul 2013 03:46:18 +0000 (13:46 +1000)]
web: Update webpages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
ed9ba1d3dcfcb51aa69bf4d7a74b95063743d8d9)
Amitay Isaacs [Thu, 11 Jul 2013 01:34:46 +0000 (11:34 +1000)]
Tests: Correct the arguments to memset
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
9ffcd6a91287d86bae7b0c73aa129c81126e08e7)
Amitay Isaacs [Wed, 10 Jul 2013 04:44:56 +0000 (14:44 +1000)]
doc: Update NEWS
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
14141b02b61d2783b750ee5b30f9520253e88f09)
Martin Schwenke [Wed, 10 Jul 2013 07:19:55 +0000 (17:19 +1000)]
packaging: Add systemd support
Based on an original patch by Sumit Bose <sbose@redhat.com>.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e43a4b7b69a21c4cec2453dcac436b64bf5d7f06)
Martin Schwenke [Wed, 10 Jul 2013 06:35:53 +0000 (16:35 +1000)]
build: Turn off all deprecation warnings
The "‘tevent_loop_allow_nesting’ is deprecated" warnings will be
around for a while and are annoying.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
30a0040fbb7c4d97d107f0e55c600295c2603a68)
Martin Schwenke [Wed, 10 Jul 2013 06:30:29 +0000 (16:30 +1000)]
build: Remove -DTEVENT_DEPRECATED_QUIET=1 from CFLAGS
This reverts the last part of
788cdbddbc902a5b076d23473450065b551d274d
- the rest of this has been implicitly reverted via tevent syncs.
This is just leftover noise.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
b6bbfb4c464c39e322830cbbebcc51c225508584)
Martin Schwenke [Tue, 9 Jul 2013 05:22:07 +0000 (15:22 +1000)]
initscript: Simpify initscript and control CTDB via new ctdbd_wrapper
Currently the initscript is very complex. This makes it hard to read
and hard to add support for new init systems, such as systemd.
Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd. It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
e3abc7eebab5cceddc4ce7817890dd5db9be3450)
Martin Schwenke [Mon, 8 Jul 2013 02:45:31 +0000 (12:45 +1000)]
recoverd: Recovery daemon should use ctdb_get_pnn, which can't fail
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
c6fded59fa4da67f738a90fdacb51900e41801f9)
Amitay Isaacs [Wed, 10 Jul 2013 02:23:30 +0000 (12:23 +1000)]
ctdbd: Print tdb flags when logging attached to database message
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
846109169ee5e3d03135156e45c8dac93aa2e95b)
Amitay Isaacs [Tue, 9 Jul 2013 02:32:53 +0000 (12:32 +1000)]
ctdbd: Set process names for child processes
This helps distinguish processes in process list in top, perf, etc.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
2493f57ce268d6fe7e4c40a87852c347fd60d29e)
Amitay Isaacs [Tue, 9 Jul 2013 02:24:59 +0000 (12:24 +1000)]
common/system: Add ctdb_set_process_name() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
fc3689c977f48d7988eed0654fb8e5ce4b8bfc8b)
Amitay Isaacs [Thu, 6 Jun 2013 06:29:04 +0000 (16:29 +1000)]
traverse: Remove unused start_time field
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
dc834d5e78c3fb97ae15cddf1139b3c4a4051a7c)
Amitay Isaacs [Thu, 6 Jun 2013 06:26:25 +0000 (16:26 +1000)]
traverse: Send records directly from traverse child to srcnode
Currently CTDB daemon reads records from a child process and then sends them to
srcnode via TRAVERSE_DATA control. This ties up main CTDB daemon and also
requires an extra copy of the record in the CTDB daemon. Instead send records
directly from traverse child process.
The control from child process still goes via local CTDB daemon as there
is no infrastructure currently to open a TCP socket to the srcnode.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
1a74192aa7d51ed99553e7292860027f06b6ef37)
Amitay Isaacs [Thu, 6 Jun 2013 06:12:07 +0000 (16:12 +1000)]
traverse: Pass reqid and srcnode information to local database traverse
So that traverse child process can directly send the TRAVERSE_DATA control to
the srcnode without first sending it to local node.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
faabce1b99fb3de9ff03bf54d303e7656538fee3)
Amitay Isaacs [Mon, 8 Jul 2013 06:14:59 +0000 (16:14 +1000)]
packaging: When building with system libraries, add dependency for them
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
8225b3e77e140db34b52571a95d553d1e59e3f1e)
Amitay Isaacs [Mon, 8 Jul 2013 05:49:58 +0000 (15:49 +1000)]
ctdbd: No need for DeadlockTimeout tunable
The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
2211cd94bea266547d3e6f167d3160a6b23bec88)
Amitay Isaacs [Mon, 8 Jul 2013 05:57:22 +0000 (15:57 +1000)]
initscript: Export CTDB_DEBUG_LOCKS variable
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
a415a1986900135f889efc25ecaf2761b1dae81a)
Amitay Isaacs [Mon, 8 Jul 2013 05:56:30 +0000 (15:56 +1000)]
scripts: Add an example debug_locks.sh script to debug locking issue
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
c711ff4702c5f95b75e4bf030665fc2afffc2f9e)
Amitay Isaacs [Mon, 8 Jul 2013 05:46:53 +0000 (15:46 +1000)]
locking: Use external script to debug locking issues
Use an external script to parse /proc/locks and log useful debugging
information about locks rather than doing that in C code.
To use this feature, add configuration variable to /etc/sysconfig/ctdb:
CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
2bfb8499366d530f16515b08928056bbda40f781)
Amitay Isaacs [Wed, 3 Jul 2013 01:01:21 +0000 (11:01 +1000)]
locking: Update locking bucket intervals
0 < 1 ms
1 < 10 ms
2 < 100 ms
3 < 1 s
4 < 2 s
5 < 4 s
6 < 8 s
7 < 16 s
8 < 32 s
9 < 64 s
10 >= 64 s
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
6fc36a7036933237d09151a0baf4d8ccd2bc2c99)
Amitay Isaacs [Wed, 3 Jul 2013 01:46:53 +0000 (11:46 +1000)]
locking: Update locks latency in CTDB statistics only for RECORD or DB locks
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
dcc42a75b4638b3aa40c44ed9e0aaae26483e2b0)
Amitay Isaacs [Tue, 25 Jun 2013 05:36:13 +0000 (15:36 +1000)]
tools/ctdb: Fix the format of DB statistics output
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
594c421f90ce132c75fbd985872114e4967f92b5)
Amitay Isaacs [Tue, 25 Jun 2013 05:25:16 +0000 (15:25 +1000)]
ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Send the ctdb_db_statistics directly instead of first copying it to
duplicate ctdb_db_statistics_wire structure. This simplifies the
implementation of the control to get database statistics.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5)
Amitay Isaacs [Wed, 3 Jul 2013 23:04:49 +0000 (09:04 +1000)]
ctdbd: Update debug messages for setting readonly property on database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit
545a46437dfb2b755bb2fddb11dea8c4ccce3ed7)
Amitay Isaacs [Fri, 5 Jul 2013 04:04:20 +0000 (14:04 +1000)]
recoverd: Fix buffer overflow error in reloadips
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
41182623891d74a7e9e9c453183411a161201e67)
Martin Schwenke [Thu, 4 Jul 2013 10:02:29 +0000 (20:02 +1000)]
tests/eventscripts: Add some rudimentary tests for 60.ganesha
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit
e1cf1f728236d808bb41265e74bc65f54bf1c133)