git.samba.org - metze/samba/wip.git/log

recoverd: Remove an orphaned comment

This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)

recoverd: Update a comment to use current terminology

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)

client: Remove unused function list_of_active_nodes_except_pnn()

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d8a76cf79f07dfb5a93c6c9a13f16e3268c7dd57)

tools/ctdb: list_of_active_nodes_except_pnn() -> list_of_nodes()

list_of_active_nodes_except_pnn() is only used here and can be removed
if we remove this call. Less is more...

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d4e206fb818048b7fab4797c877b854bdbb1ab70)

tools/ctdb: Fix a memory leak in parse_nodestring()

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8753a094b97340deb26dd44f6ea345ca0a642a95)

tests/eventscripts: Tests for memory checking in 00.ctdb

... plus updates to test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7)

eventscripts: Clean up monitoring of system memory in 00.ctdb

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d)

server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..

This was the comment block I was touching and meant to adapt in
commit 00d3bf092e2f72eda330978c75ec85f17e870553.
My search was apparently not unique...

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 09940255011b119dc6af3304f5d3e9568e6006fd)

doc: Update NEWS

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c446579fc442955ecc74f5566eaa0635c3171498)

build: Fix build dependencies for ctdb_lock_tdb

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit eb8575718400c45626cd1b2e0fd247bc3ebff655)

tests/simple: Minimise the chance of a monitor event being cancelled

A monitor event following a "ctdb delip" might reconfigure services.
If the monitor event is cancelled then a service might be stopped but
not yet restarted and this could result in the subsequent monitor
events failing.

This obviously needs to be fixed in CTDB itself. This will happen by
making "ctdb reloadips" the supported way of reconfiguring IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 618ea3660e36e7bd92b686e1ca8728cf63c3c068)

packaging: Remove pushd/popd from maketarball.sh, don't need bash

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3ffca990a18cbd31c8bd3ae01c6671d60da58f58)

tools/ctdb_diagnostics: Add output of "ctdb getdbmap"

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f0d69a9079b7aecc68f1d2d8510702046b618b19)

tools/ctdb_diagnostics: Safer temporary file creation

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 406e1cb1fdd17ddd239774d0228e3657b73ae68f)

eventscripts: Avoid using a temporary file in 62.cnfs

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81833052d7ee8f76b1e98376a0273448640cfa8e)

scripts: Remove gdb_backtrace

This uses potentially insecure temporary files and is not referenced
anywhere else.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)

tools/ctdb: Make most non-auto-all commands abort if run with -n all

Or if run with -n A,B,...

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b1d8732b5da18ae80aea1df0e66b0b5cdcd919bc)

tools/ctdb: Remove more non-essential fetching of PNN from daemon

The useful cases are either CTDB_CURRENT_NODE, in which case
ctdb_get_pnn() does the job, or a PNN, which is... ummm... a PNN! :-)

This works because parse_nodestring() validates PNNs.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7b3f7eea2465efb099a2faf3e42174bc97b13a16)

tools/ctdb: Improve auto-all settings for some commands

* ipreallocate is cluster-wide so should not be auto-all

* enablescript, disablescript, getreclock, setreclock, natgwlist can
all be auto-all without issues

* xpnn, ipiface a local-only so don't work with -n, so might as well
not be auto-all

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 123a4677528cb46bee1c6dad8a5162eba9880bc1)

recoverd: Remove an unused temporary talloc context

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit da22d5e60dc023009854025cc9e6bc4b0a84c60e)

recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c

This is an internal structure. It was moved into ctdb_private.h a
long time ago to allow unit testing. Unit test compilation was
changed shortly afterwards to make this unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)

recoverd: Log more information when interfaces change

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b)

traverse: Log when database traverse is started

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 256b157232c60bc432c94e54b1fae9699f737557)

ctdbd: Finish eventscript callback processing before debugging hung script

This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script. So "ctdb scriptstatus" output
will be useful from debug hung script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e)

ctdbd: Make sure call data is freed if doing an early return

This should avoid memory bloat when a request bounces between nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7677fb263f06a97398e2c546e32273fb96edca69)

common/io: Limit the queue buffer size for fair scheduling via tevent

If we process all the data available in a socket buffer, CTDB can stay busy
processing lots of packets via immediate event mechanism in tevent.  After
processing an immediate event, tevent returns without epoll_wait.  So as long
as there are immediate events, tevent will never poll other FDs.  CTDB will
report this as "Event handling took xx seconds" warning.  This is misleading
since CTDB is very busy processing packets, but never gets to the point of
polling FDs.

The improvement in socket handling made it worse when handling traverse
control.  There were lots of packets filled in the socket buffer quickly and
CTDB stayed busy processing those packets and not polling other FDs and timer
events.  This can lead to controls timing out and in worse case other nodes
marking busy node as disconnected.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 92939c1178d04116d842708bc2d6a9c2950e36cc)

Revert "common/io: Keep queue buffer size multiple of 4K"

This reverts commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9.

This is not the best approach. Allowing queue buffer size to grow
indefinitely causes large number of CTDB packets to be queued up very
quickly which when processed via immediate events will block CTDB from
processing events from other FDs. If there are immediate events queued
up, tevent will never process any of the FDs till all immediate events
are processed.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d8b094e804efc53fae9f44c6ef961b7b5797d290)

Revert "LACOUNT:  Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node"

This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.

This is a premature optimization.  Record can bounce between nodes
very quickly if it is a contended record.  There is no need to hold a
record on a node unnecessarily.  In case record contention becomes bad,
enabling sticky records on a database is a better idea.

Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)

ctdbd: Print a log message when a key becomes hot

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 48f40985f4592c28402303ccbb458756f4914f75)

ctdbd: For volatile databases, write an empty record with rsn=0 only on dmaster

Empty record with rsn=0 should not be written on any other node other than
dmaster. This is however not true for persistent databases. So currently
apply the check only for volatile databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit df83ae7a047dab4803e0d94b1c11df48ae17ca96)

tools/ctdb: Fix message in showban when node is banned

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5cdad2b8ebd71a5e458c301d00eac00a211feeb3)

tools/ctdb: Reimplement ban/unban using update_flags_wait_and_ipreallocate()

This has the side effect of making these commands more resilient to
control timeouts.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0fe79662e20e347d9e1cb12a42cd356e33572402)

tools/ctdb: Factor out common pattern used in disable/enable/stop/continue

Now we will only have one set of bugs. :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 444521c852749558f39dc6131acce9e47eefd489)

tools/ctdb: Factor, simplify and improve robustness of ipreallocate code

Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed.  This is not
the case.  Change the callers so they call the new ipreallocate()
function instead.

Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes.  Inactive
nodes will ignore it.  This is safe since we only want 1 reply.  If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...

Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4bf0b1c9d21986eecb7682f935bd6154c65533cc)

tools/ctdb: Use ctdb_get_pnn() to get PNN of the current node

This has already been stored at connect time and can't fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d8eb2e7fdd7645719370dad4f2faa5c3fffa8249)

util: In passing the code, fix a space vs. tab in set_close_on_exec().

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit f9556a6f1fe0046308c8b363e6dcaf3f7ce6f2b7)

server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 00d3bf092e2f72eda330978c75ec85f17e870553)

server: fix wording and punctuation in comment block for ctdb_reply_dmaster().

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit cb3a1c5af3b796dba30cae07118670d3c9e57df7)

recoverd: Improve log message when nodes disagree on recmaster

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7b7aa7b599536cd60ebb84d363607bb4e953248a)

common: Null terminate process name string so valgrind doesn't complain

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1c9025fdd08d1cea342af7487d0123015e08831b)

vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2)

This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)

vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1)

This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)

db_wrap: Make sure tdb messages are logged correctly

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 60cb40d090e45ff6134c098a238fac7ad854f134)

eventscripts: Become unhealthy faster on nfsd failure

Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)

tools/ctdb: Increase default control timeout to 10 seconds

The current 3 second timeout is arbitrary and users trip over it
sometimes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae)

eventscripts: Improve message logged when a counter hits a limit

It should print the actual number of consecutive failures rather than
the limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)

eventscripts: Print a message when waiting for TCP connections to be killed

This makes the gaps in the logs more obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)

eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST

Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)

eventscripts: Add modulo (%) operator to ctdb_check_counter()

Also add it to the corresponding eventscript unit test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)

eventscripts: Separate out RPC service restart code

While doing this:

* Explicitly assign RPC program and version information in
  _nfs_check_rpc_common().  This is more lines of code but is easier
  to read.

* Don't print the options when starting a service.  Trying to print it
  makes the code messy for little benefit.

  Update the eventscript unit testing code and a Ganesha test to
  reflect this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)

tests/eventscripts: Override background_with_logging(), just prepend "&"

That is, output that goes through background_with_logging() just gets
"&" prepended to each line. This is cleaner than having the tests
grovel through logs.

Update some 49.winbind/50.samba tests to deal with this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5)

eventscripts: Remove support for RPC service 'q' and 's' restart flags

They're hard to maintain and provide very little benefit.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)

eventscripts: When restarting the nfslock service only show output of start

That is, /dev/null the "stop" output. This is consistent with the way
CTDB generally deals with the output when stopping a service.

It also makes updating the eventscript unit tests easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)

tests/simple: Unreachable node test should wait for recovery to complete

This should minimise the chances of a control timing out.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 63be516673c5d9c0d543617bf1bb8bca919956a8)

tests/simple: Fix the missing IP test

Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.

Also, do not disable 10.interface until it matters. Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45)

recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases

When creating missing databases either locally or remotely, recovery
master calls ctdb_ctrl_createdb(). Recovery master always passes 0
for tdb_flags. For volatile databases, if TDB_INCOMPATIBLE_HASH is not
specified, then they will be attached without using jenkins hash causing
database corruption.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2fc6b6403707a292d134140fc0b9145b454992c5)

Revert "recoverd: Use correct tdb flags when creating missing databases"

This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4.

This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)

common/io: Keep queue buffer size multiple of 4K

Currently queue buffer size is realloc'd every time we need to extend the
buffer.  Small increments can cause memory fragmentation.  Instead always
extend buffer in multiples of 4K.  This should reduce multiple talloc_realloc
calls when there are lots of packets in the socket buffer.

Also, if queue buffer has grown larger than 64K, throw away the buffer once
all the requests in the queue have been processed.  That way queue does not
hold on to large buffers.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9)

packaging: Allow setting custom release number in RPM spec file

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 867afb247bd8cc86c8d738f051a44cc534cafacf)

ctdbd: When a record is made sticky, log only once

Instead of logging from ctdb_request_call(), log the message from
ctdb_make_record_sticky(). That way if the record is already sticky, the
message is not repeated unnecessarily.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831)

ctdbd: Improve high hopcount log messages when request is redirected

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e)

scripts: Do not run ctdb tool commands when debugging hung "init" event

CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.

Also, minor log formatting changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)

ctdbd: Avoid leaking file descriptor if talloc fails

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe)

eventscript: Wait for debug hung script to finish or timeout before continuing

Currently if the debug hung script takes long time to finish, the subsequent
monitor event can collide with the previous event which is not yet finished.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 9e99e0eb072e2b845914ee3896acbc66b96138d7)

eventscripts: Use configured RECLOCK file instead of asking CTDB

On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead. And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)

locking: Do not create multiple lock processes for the same key

If there are multiple lock helper processes waiting for the same record, then
it will cause a thundering herd when that record has been unlocked. So avoid
scheduling lock contexts for the same record. This will also mean that
multiple requests will get queued up behind the same lock context and can be
processed quickly once the lock has been obtained.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9)

locking: Move function find_lock_context() before ctdb_lock_schedule()

So that ctdb_lock_schedule() can call this function without requiring extra
prototype declaration.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 68af5405acc123b5a90decd2123e2a02961a8fcf)

ctdbd: Print set db sticky message after it's set

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 824dcec35ec461d78e22b2ea109473b32bfe3972)

tests: Add a test program to hold a lock on a database

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f6b066a23610fb0092298861c21a9b354b91e2f1)

recoverd: Use correct tdb flags when creating missing databases

When creating missing databases either locally or remotely, make sure
to use the correct tdb flags from other nodes. Without this, volatile
databases can get attached without TDB_INCOMPATIBLE_HASH flag.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 10a057d8e15c8c18e540598a940d3548c731b0b4)

client: Always use jenkins hash when attaching volatile databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7e7e59c4047c78159387089eca65d90037bcf722)

recoverd: Make sure to use jenkins hash for recovery databases

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 32c83e209823e9a4d6306bb7fd63d4500f3e2668)

recoverd: Assemble up-to-date node flags information from remote nodes

Currently nodemap used by recovery master is the one obtained from the local
node. This information may have been updated while processing main loop.
Before comparing node flags on all the nodes, create up-to-date node flags
information based on the information received from all the nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit fcf77dec5af973a0e32f3999bc012053a6f47a96)

tools/ctdb: Only print the hot records with non-zero hopcount

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 049d9beb3783482490e6273a434ccbad23f85f0a)

ctdbd: Don't consider a hot record if the hopcount is zero

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ab35773518ad15588013f4d859f7bee790437450)

ctdbd: Fix updating of hot keys in database statistics

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit fde4b4db5a57f75c5efa5647c309f33e0d5a68f3)

ctdbd: Remove incomplete ctdb_db_statistics_wire structure

Instead of maintaining another structure, add an element as place holder for
marshall buffer of hot keys. This avoids duplication of the structure.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e73b2e12adc9db1dedb48d32bba3a8406a80f4cd)

Revert "ctdbd: Remove incomplete ctdb_db_statistics_wire structure"

The structure cannot be removed without adding support for marshalling keys
for hot records.

This reverts commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 023ca2e84f5ed064a288526b9c2bc7e06674dd81)

doc: Update XML files to use standard DocBook DTD

This simplifies building since we don't use any of the Samba
extensions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 57aa2dffea60abd73a95233f8b761cc676adebb6)

initscript: The wrapper script should export CTDB_SOCKET

This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value. This at least ensures that ctdbd will be
started.

If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)

ctdbd: Kill client process without checking for tracked child

Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
to ensure that CTDB never kills unrelated processes. However, client
processes are unrelated.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 782814288bb560099ee44b607bf35f3eddf37f82)

eventscripts: kill_tcp_connections() should send connections to stdin

This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.

Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)

tools/ctdb: Allow killtcp to read connections from standard input

This will allows eventscripts to send information about multiple tcp
connections to a single "ctdb killtcp" command, saving the overhead of
setting up a client connection per tcp connection.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit af5aa369c266430fe912df0c26116b68bac3572e)

tests: Always tally the number of passed/failed tests

Regardless of whether a summary is being printed!

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1)

recoverd: Call takeover fail callback only once per node

Currently the fail callback is called once per (takeip/releaseip) control
failure. This is overkill and can get a node banned much too quickly.

Instead, keep track of control failures per node and only call fail
callback once per failed node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)

scripts: Run scriptstatus for hung event

The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.

Since there is now quite a bit of output, serialise the calls to this
script using flock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)

ctdbd: Pass event name to hung script debugger

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e0f3fa1020e13b84bdd672538168d148f1847d57)

tests/complex: Fix NFS tests to work with root_squash

Refactor the NFS test setup/cleanup code into new common functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b)

tests: Fix exit status of run_tests when a single test is run with -H

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9d6e1c147bd036d832b98c155f405ee2a5d6f57f)

tests/simple: Add -p in onnode test to help show groups of connections

Change the command from "true" to "hostname" since the former won't
produce any output when used in combination with "onnode -p". This
could just be changed to "echo" but the hostname might actually be
useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ae3c03d80264e997b7da9f3279d7810e18b8a1df)

ctdbd: Sleep at exit to allow time for log messages to flush

Register print_exit_message() earlier so that it covers most of the
early exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 90d792cf28d6a823141e4c417b6978f02a9cf596)

ctdbd: Exit if something is already listening on CTDB socket

Don't blindly remove the socket.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58)

tests/eventscripts: Add tests for monitoring of missing interfaces

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 53e4eca74429f76adc81d98e3d11d1bd61194d71)

eventscripts: A missing interface should cause monitoring to fail

A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.

This couldn't be done previously because orphaned interfaces used to
be listed for monitoring. This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.

While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch. ;-)

This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef)

eventscripts: Get list of configured interfaces using "ctdb ifaces"

This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces. This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)

ctdbd: Allow extra recovery to repair persistent DBs during first recovery

Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
regression because a node may not have completed the "recovered" event
(so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
becomes healthy.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 57ef5d3827ea3417a32703e259a53ce6fd10ac45)

packaging: Bundle debug_locks.sh script in RPM

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 5740155cc5de1a223412e8529aa1a383a5412514)

packaging: No need to check for existence of scripts, they always do

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 67c227a5d30cb8487b20b19b20bdfa4613906609)

scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used

It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 412bc0e20bef694d4e911dc9c984fd7716231f1f)

Update Nagios check to work with ctdb versions past 30 Aug 2011

Because of commit a779d83a6213e2ba

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a4afe7af9c9391048d6f80135bbd5e15367770c7)