9 years agoEventscripts - 10.interfaces should not check orphaned interfaces.
Martin Schwenke [Mon, 1 Aug 2011 03:37:06 +0000 (13:37 +1000)]
Eventscripts - 10.interfaces should not check orphaned interfaces.

If the last IP address on an interfaces is removed then that
interfaces should no longer be checked by 10.interfaces.  However,
"ctdb ifaces" still lists such interfaces so they are currently

The problem really needs to be addressed in ctdbd but a neat quick
eventscript fix will be minimally invasive...

This changes the code to use "ctdb -Y ip -v" instead of "ctdb -Y
ifaces".  The former includes details of all public addresses and
associated interfaces, so when an address is removed there is no
output for it.  This avoids orphaned interfaces from being listed.

The logic is also slightly improved so that $IFACES includes just a
(non-uniquified) list of interfaces, allowing an existing loop to be

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443)

9 years agoMerge branch 'master' of
Ronnie Sahlberg [Thu, 28 Jul 2011 23:04:01 +0000 (09:04 +1000)]
Merge branch 'master' of

(This used to be ctdb commit 518945e59e2e48f07fcc0955f3aa81cd0d946aea)

9 years agoTests: Initial test code for LCP2 IP allocation algorithm.
Martin Schwenke [Thu, 28 Jul 2011 05:22:42 +0000 (15:22 +1000)]
Tests: Initial test code for LCP2 IP allocation algorithm.

Move struct ctdb_public_ip_list to ctdb_private.h and put some
definitions for some functions from ctdb_takeover.c there.  This
allows those functions to be called from unit tests.

Add ctdb_takeover_tests.c and the Makefile support to build it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)

9 years agoIP allocation - add LCP2 algorithm.
Martin Schwenke [Thu, 28 Jul 2011 05:16:46 +0000 (15:16 +1000)]
IP allocation - add LCP2 algorithm.

The current non-deterministic IP allocation algorithm balances IPs
across the whole cluster.  It does not consider different
interfaces/VLANs/subnets, so these different groups of IPs aren't
generally well balanced.

This adds the LCP2 algorithm for IP allocation and allows it to be
enabled by setting the "LCP2PublicIPs" tunable to 1.

The LCP2 algorithm calculates the imbalance of a node by totalling the
squares of the distances between each IP on the node.  The IP distance
is defined as the length longest common prefix (LCP) of bits that is
found when comparing 2 IPs.  The imbalance of a cluster is the maximum
imbalance for any node.  At each step the algorithm selects an
allocation to the IP/node combination that results in the choosing the
allocation that best reduces the imbalance of the cluster.

The implementation splits out the IP allocation part of
ctdb_takeover_run() into new function ctdb_takeover_run_core(), and
then extracts out the basic IP assignment code into new functions
basic_allocate_unassigned() and basic_failback().  3 new functions
lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement
the LCP2 algorithm, and are hooked into ctdb_takeover_run_core().

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)

9 years agoMerge branch 'master' of
Ronnie Sahlberg [Thu, 28 Jul 2011 22:53:43 +0000 (08:53 +1000)]
Merge branch 'master' of

(This used to be ctdb commit 0e60a738f9a6275ed45abc3d933f872d93132d92)

9 years agoUpdate the delip command
Ronnie Sahlberg [Thu, 28 Jul 2011 22:41:35 +0000 (08:41 +1000)]
Update the delip command
Dont talloc_free(vnn) immediately but postphone it until later when
the eventscript callback has completed.

CQ S1026664

(This used to be ctdb commit 0a99e8742a261b1d3a2c8830f5c19ea6c2c47cad)

9 years agoeventscript: fix callback after free
Rusty Russell [Mon, 25 Jul 2011 08:26:06 +0000 (17:56 +0930)]
eventscript: fix callback after free

ctdb_event_script_callback() takes a mem_ctx arg which it doesn't use, but
the implication is pretty clear, that when that mem_ctx is freed, the callback
shouldn't happen.  Indeed, Ronnie reproduced a case where that callback
refers to freed memory, in the ip reallocation code under stress.

So attach the callback to the mem_ctx they give us, and remove it from the
script state structure when that's freed.  It's a bit weird, but it works.

CQ: S1026179
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6fcd867cc835ef1ffc1c50964f135c346503d40c)

9 years agopackaging: honour rpm build target options handed in to makerpms.sh
Michael Adam [Fri, 22 Jul 2011 08:27:40 +0000 (10:27 +0200)]
packaging: honour rpm build target options handed in to makerpms.sh

This allows to call e.g. "makerpms.sh -bs" to build only the source RPM.

(This used to be ctdb commit c6bfba2bb66962b7b05d708f0747002700991472)

9 years agoMerge branch 'master' of ssh://git.samba.org/data/git/ctdb
Ronnie Sahlberg [Wed, 20 Jul 2011 05:53:11 +0000 (15:53 +1000)]
Merge branch 'master' of ssh://git.samba.org/data/git/ctdb

(This used to be ctdb commit a1b3661973489f0111e7975fec422fb99a25f0c8)

9 years agoAdd a text about "ban" "unban" not being permanent and htat recovery daemon can auto...
Ronnie Sahlberg [Fri, 8 Jul 2011 21:14:32 +0000 (07:14 +1000)]
Add a text about "ban" "unban" not being permanent and htat recovery daemon can auto unban nodes. Suggest using "stop" / "continue" instead.

(This used to be ctdb commit 8e30dffad5b1385818b2d7350d6c3767a220d745)

9 years agoweb: correctly terminate list items <li> with </li> instead of with <br>
Michael Adam [Fri, 8 Jul 2011 08:07:42 +0000 (10:07 +0200)]
web: correctly terminate list items <li> with </li> instead of with <br>

(This used to be ctdb commit 3f698e69a56305c5ec27b8d119bf2d57d5cd2ec6)

9 years agoweb: add Stefan Metzmacher to the list of CTDB developers.
Michael Adam [Fri, 8 Jul 2011 08:06:14 +0000 (10:06 +0200)]
web: add Stefan Metzmacher to the list of CTDB developers.

(This used to be ctdb commit 912a33cebe7c51b33cda2e6d5f2b3a481fa7fd49)

9 years agoWhen trying to re-balance the ip assignment and shuffle ips from
Ronnie Sahlberg [Tue, 3 Aug 2010 03:34:27 +0000 (13:34 +1000)]
When trying to re-balance the ip assignment and shuffle ips from
nodes with many addresses to nodes with few addresses,
loop up to num_ips+5 times instead of only 5 times.

When we have very many public ips per node, we might need to loop more than
5 times or else we will exit without reaching optimal balance.

(This used to be ctdb commit aa8114a625a637277561a66c80bdece3c27e9e20)

9 years agoAdd log output to wipedb and backupdb
Ronnie Sahlberg [Mon, 4 Jul 2011 20:29:00 +0000 (06:29 +1000)]
Add log output to wipedb and backupdb
CQ S1025379

(This used to be ctdb commit 6f51d4a75f8a9f2cdb8ecde946ed31809ab5a415)

9 years agochange the name for the key for the record where we stoire the public address config...
Ronnie Sahlberg [Tue, 28 Jun 2011 05:39:38 +0000 (15:39 +1000)]
change the name for the key for the record where we stoire the public address config from public-addresses... to public_addresses...


(This used to be ctdb commit 114d5034ff4880848588caf493382a537a1469ae)

9 years agoclient: handle transient connection errors
David Disseldorp [Tue, 5 Apr 2011 11:26:29 +0000 (13:26 +0200)]
client: handle transient connection errors

Client connections to the ctdbd unix domain socket may fail
intermittently while the server is under heavy load. This change
introduces a client connect retry loop.

During failure the client will retry for a maximum of 64 seconds, the
ctdb --timelimit option can be used to cap client runtime.

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit dc0c58547cd4b20a8e2cd21f3c8363f34fd03e75)

9 years agoManpage for ping_pong
Mathieu Parent [Sat, 26 Mar 2011 10:55:30 +0000 (11:55 +0100)]
Manpage for ping_pong

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit af75d3e37412e03d3978073edbe6dee78f265c3c)

9 years agoonnode: fix natgwlist nodespec
Martin Schwenke [Mon, 23 May 2011 05:33:12 +0000 (15:33 +1000)]
onnode: fix natgwlist nodespec

This hasn't worked for a while if ever.

We treat this case specially because the output has 2 works on the 1st
line.  We also handle the error case where /etc/ctdb_natgw_nodes
exists but none of the other $NATGW_* configuration is done.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 66e89797c7866d207a5bbf1836f52d70dba7cea6)

9 years agoonnode: fix get_nodes_with_status()
Martin Schwenke [Mon, 23 May 2011 05:24:52 +0000 (15:24 +1000)]
onnode: fix get_nodes_with_status()

Setting IFS and looping though items with colons in them doesn't work.
Change this to read through the output line by line.  The header line
needs to be thrown away by throwing away everything up to the 1st

Keep stderr from the "ctdb status" command, otherwise debugging is

On error, append any output from ctdb to onnode's error message.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d60592cf99999f10344a05ef0571fb300bb9d97c)

9 years agoonnode: Remove an unnecessary comment.
Martin Schwenke [Tue, 17 May 2011 04:26:55 +0000 (14:26 +1000)]
onnode: Remove an unnecessary comment.

The comment about $CTDB_NODES_SOCKETS is meaningless.  The code ti
refers to works just find with $CTDB_NODES_SOCKETS.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 74e69a564bac653dadfffe8b08145b9b3be16e61)

9 years agoonnode: Future-proof get_nodes_with_status().
Martin Schwenke [Tue, 17 May 2011 04:24:30 +0000 (14:24 +1000)]
onnode: Future-proof get_nodes_with_status().

The current code requires knowledge of the number of status bits
output by "ctdb status -Y".

This changes the code to be completely general.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e1788f25fde3d1f26bf4831a331741aa280f6fbc)

9 years agoonnode: Exit with error for unknown command-line flags.
Martin Schwenke [Tue, 17 May 2011 03:25:08 +0000 (13:25 +1000)]
onnode: Exit with error for unknown command-line flags.

Use of "local" was masking errors in command-line processing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ca80adda7517b43147ef30156ae34c66b29fa2bd)

9 years agoonnode: Be defensive when listing IPs of nodes with designated status.
Martin Schwenke [Tue, 17 May 2011 03:20:51 +0000 (13:20 +1000)]
onnode: Be defensive when listing IPs of nodes with designated status.

The current version gives the last item left after stripping the known
fields.  If an insufficent number of status fields is stripped then
this would return a residual status field value, which turned out to
be a valid IP address for localhost...  so no error occurs.

This change means that the node number is stripped and any residual
status field value will stay appended, causing an error the first time
this command is tested.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 74715e6ec7b67c6f0e863aa51c87279758d6bf91)

9 years agoonnode - Fix long standing bug in onnode healthy/ok/connected/con.
Martin Schwenke [Tue, 17 May 2011 03:18:11 +0000 (13:18 +1000)]
onnode - Fix long standing bug in onnode healthy/ok/connected/con.

When the output of "ctdb status -Y" changed to add an extra status
column we didn't fix onnode.

This adds a match for the extra column.

Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 793febaebd3d484ddfbbcb47aaa0cdf3cfc1a00d)

9 years agoFix bashism
Mathieu Parent [Wed, 23 Mar 2011 21:16:18 +0000 (22:16 +0100)]
Fix bashism

... again ;-)

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 2266586c1839af032622be54dc7f71e39d2bd9ef)

9 years agoMerge branch 'master' of ssh://git.samba.org/data/git/ctdb
Ronnie Sahlberg [Thu, 12 May 2011 08:58:07 +0000 (18:58 +1000)]
Merge branch 'master' of ssh://git.samba.org/data/git/ctdb

(This used to be ctdb commit 307e915459c26a728a1ec16bd735d983d493df53)

9 years agoWhen using multiple VLANs, some funky stuff can sometimes happen when
Ronnie Sahlberg [Thu, 12 May 2011 00:24:46 +0000 (10:24 +1000)]
When using multiple VLANs, some funky stuff can sometimes happen when
adding/removing IP addresses causing routes might be dropped by the system.

The easiest workaround for this is to unconditionally try to reapply
all static routes for all interfaces once ipreallocation has finished,
not just adding them back on the affected interface.

This worksaround a funky issue in
CQ S1023538

(This used to be ctdb commit 84600d1f53632d5fe76c308727f31f61b5ec1010)

9 years agodoc: regenerate ctdb docs
Michael Adam [Wed, 11 May 2011 14:13:47 +0000 (16:13 +0200)]
doc: regenerate ctdb docs

(This used to be ctdb commit 2d67186e5acd5aa8cb3eb1f4fbd4a41153c52e96)

9 years agodoc/ctdb.1.xml: update listvars documentation
Luk Claes [Wed, 11 May 2011 15:53:59 +0000 (17:53 +0200)]
doc/ctdb.1.xml: update listvars documentation

Signed-off-by: Luk Claes <luk@debian.org>
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit afd96d5990815019b1f9ddc8b78a05f86eca0421)

9 years agodoc: regenerate ctdb docs
Luk Claes [Wed, 11 May 2011 14:13:47 +0000 (16:13 +0200)]
doc: regenerate ctdb docs

Signed-off-by: Luk Claes <luk@debian.org>
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 39f595cad5321c64e2b1e72fe7b4bbb720f4b906)

9 years agodoc/ctdb.1.xml: Fix typo
Luk Claes [Wed, 11 May 2011 14:11:40 +0000 (16:11 +0200)]
doc/ctdb.1.xml: Fix typo


Bug 8124

Signed-off-by: Luk Claes <luk@debian.org>
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit a6d2f1bd552dba33640acb7a0b8110534debd4ce)

9 years agoRemove all checking of GPFS from ctdb_diagnostics
Ronnie Sahlberg [Wed, 11 May 2011 09:50:09 +0000 (19:50 +1000)]
Remove all checking of GPFS from ctdb_diagnostics

CQ S1023524

(This used to be ctdb commit 4cddba08b46db0a56a86b32403a41b89cd097317)

9 years agoIf samba fails to start for some reason, make this cause the startup event to fail...
Ronnie Sahlberg [Mon, 9 May 2011 22:25:27 +0000 (08:25 +1000)]
If samba fails to start for some reason, make this cause the startup event to fail too,   so that ctdbd will re-try the startup event later.
Or else this will leave samba not running.

CQ S1023394

(This used to be ctdb commit f90485b08d32cbe56050718a3b28ca0fe1d64e0f)

9 years agoDont exit from checking interfaces once we have found one interface that is not
Ronnie Sahlberg [Mon, 9 May 2011 20:19:34 +0000 (06:19 +1000)]
Dont exit from checking interfaces once we have found one interface that is not
in use by public addresses.   this can happen when we have removed existing interfaces/ip addresses and prevents us from verifying the status of other interfaces

(This used to be ctdb commit d67955b42f7627be9dae995230c8fcbb8a948ec2)

9 years agoRemove logging of spam/errors from the 10.interfrace
Ronnie Sahlberg [Sun, 8 May 2011 20:35:33 +0000 (06:35 +1000)]
Remove logging of spam/errors from the 10.interfrace
script if/when we have for example NATGW configured but no public addresses defined on that interface

CQ S1023378

(This used to be ctdb commit 8837daa424732aeb5a20814b1709c345a97a0e09)

9 years agopackaging: add ltdbtool and its manpage to the RPM
Michael Adam [Wed, 4 May 2011 12:28:26 +0000 (14:28 +0200)]
packaging: add ltdbtool and its manpage to the RPM

(This used to be ctdb commit ce6409dc7d059701f0fe4b57e7c05c38c66629c5)

9 years agoinstall the ltdbtool manpage with "make install"
Michael Adam [Wed, 4 May 2011 12:25:48 +0000 (14:25 +0200)]
install the ltdbtool manpage with "make install"

(This used to be ctdb commit ffbff1affed8301831387e23b4f8f824d9f78e20)

9 years agoinstall ltdbtool with "make install"
Michael Adam [Wed, 4 May 2011 11:44:59 +0000 (13:44 +0200)]
install ltdbtool with "make install"

(This used to be ctdb commit 991ea66e5ed0eb7ab256dc8e3118dc78462d4752)

9 years agobuild "ltdbtool" in "make all"
Michael Adam [Wed, 4 May 2011 11:44:10 +0000 (13:44 +0200)]
build "ltdbtool" in "make all"

(This used to be ctdb commit d91e80c698a7706460e9ee74bd4f5a9ab0a7b9b1)

9 years agoltdbtool: add manpage html + roff
Gregor Beck [Wed, 4 May 2011 12:17:04 +0000 (14:17 +0200)]
ltdbtool: add manpage html + roff

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 992baa4215bfc1b29fd153ccb7c42bb0cb66fa4f)

9 years agoltdbtool: add manpage
Gregor Beck [Wed, 4 May 2011 12:14:54 +0000 (14:14 +0200)]
ltdbtool: add manpage

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 2ed3603274cd38dde4ae98eef653e9a9de631eb5)

9 years agoadd ltdbtool - a standalone ltdb tool
Gregor Beck [Thu, 14 Apr 2011 10:51:59 +0000 (12:51 +0200)]
add ltdbtool - a standalone ltdb tool

This this is a tool to handle (dump and convert) ctdb's local tdb
copies (ltdbs) without connecting to a ctdb daemon.

It can be used to

* dump the contents of a ltdb, printing
  the ctdb record header information

* dump a non-clustered tdb database (like tdbdump)

* convert between an ltdb and a non-clustered tdb
  (adding or removing ctdb headers)

* convert between 64 and 32 bit ltdbs
  (the ctdb record headers differ by 4 bytes of padding)

usage: bin/ltdbtool dump [-p] [-s{0|32|64}] <idb>
       bin/ltdbtool convert [-s{0|32|64}] [-o{0|32|64}] <idb> <odb>

Pair-Programmed-With: Michael Adam <obnox@samba.org>

(This used to be ctdb commit efcf2815711cd5371633614fb91273bd0a786da0)

9 years agoctdb catdb: fix escaping of '"' and '\'
Gregor Beck [Thu, 14 Apr 2011 10:55:57 +0000 (12:55 +0200)]
ctdb catdb: fix escaping of '"' and '\'

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 2b5cb0841fd813cd54be170c305a828885e0f038)

9 years agoDont call the UPDATE event if both old and new interface is the same.
Ronnie Sahlberg [Wed, 4 May 2011 01:34:17 +0000 (11:34 +1000)]
Dont call the UPDATE event if both old and new interface is the same.

CQ S1018175

(This used to be ctdb commit 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7)

9 years agoCleanup of logging messages/spamming
Ronnie Sahlberg [Tue, 3 May 2011 22:54:02 +0000 (08:54 +1000)]
Cleanup of logging messages/spamming

Reduce an infomational message about not performing ip reallocation
from NOTICE(the default) to INFO.
These messages are normal during startup or when stopped/banned when
we will be in recovery mode for a while.

Remove a messager in the loop waiting for initial startup to complete about
the generation being invalid. It is always invalid at this stage before we have
finished initial recovery.

Rate-limit the informational messages for CTDB_WAIT_UNTIL_RECOVERED
so that we only print them once per second for the first 60 seconds and after that only once per 10 minutes.
These messages are normal during startup, but we should not be logging them every second for cases where we will remain in recovery mode during startup for an extended period of time.
Such as if suspended or permabanned.

CQ S1023302

(This used to be ctdb commit 3a0af8780dc595acbed880f288fcbc4f62c862fb)

9 years agobonding mode 4 monitoring:
Ronnie Sahlberg [Tue, 12 Apr 2011 21:51:36 +0000 (07:51 +1000)]
bonding mode 4 monitoring:
we can not just check if MII Status is up for bonding mode 4, since the kernel will always report the bond device as UP
even if all cables are disconneccted.

For mode 4, ignore the status of the bond device and instead chek if at least one slave interface is up
when determining if the device is good or bad

(This used to be ctdb commit a6930cec6d9503dba18b9d4839d87a1c1a8ddba2)

9 years agoIf the eventscript is finished but state->ctdb is NULL,
Ronnie Sahlberg [Mon, 11 Apr 2011 19:24:43 +0000 (05:24 +1000)]
If the eventscript is finished but state->ctdb is NULL,
log an error and return.

(Need to find root cause for this is soo too.)

(This used to be ctdb commit 2e80d53b73fcba58ed5a72bab66c051691ccf719)

9 years agoIFACE handling. Assume links are always good on nstartup (they almost always
Ronnie Sahlberg [Sun, 10 Apr 2011 19:56:14 +0000 (05:56 +1000)]
IFACE handling. Assume links are always good on nstartup (they almost always

Simplify the handling of setting the links in the 10.interface eventscript
and remove the optimization to only call setifacelink on state change
to make the code simpler to read.

If a take ip event fails, flag the node as unhealthy.

Add a check to the interface script to check if the interface exists
or if it has been deleted.
So that we can capture and become UNHELTHY if someone deletes an interface
we are using to host public addresses.

(This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)

9 years agoweb: use the new git repository url on the download page
David Disseldorp [Tue, 29 Mar 2011 09:08:39 +0000 (11:08 +0200)]
web: use the new git repository url on the download page

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit b36818888fac7ebbed26fcdd2dd1d426e3d2f8f0)

9 years agoNATGW: dont set arp_ignore in 11.natgw anymore since we no longer
Ronnie Sahlberg [Wed, 6 Apr 2011 00:26:27 +0000 (10:26 +1000)]
NATGW: dont set arp_ignore in 11.natgw anymore since we no longer
need this for the natgw functionality

(This used to be ctdb commit bf3bf2967e3781c918e33b3a210e68e0ccca0c51)

9 years agopackaging: remove the dependency to tdbtool and tdbdump from the spec file
Michael Adam [Tue, 5 Apr 2011 11:58:09 +0000 (13:58 +0200)]
packaging: remove the dependency to tdbtool and tdbdump from the spec file

The init script does now check for the availability of tdbdump
and "tdbtool check" and issues warnings if they are not available.
This can remove a dependency loop with building samba RPMs.

(This used to be ctdb commit c7652c4038e012b7ef9bc1da352dd2c02d60dc29)

9 years agoctdb.init: print a warning when tdbdump is found but tdbtoo or "tdbtool check" is...
Michael Adam [Tue, 5 Apr 2011 11:50:00 +0000 (13:50 +0200)]
ctdb.init: print a warning when tdbdump is found but tdbtoo or "tdbtool check" is not available

(This used to be ctdb commit afb26e38b617b85cdac14a7cd6dd3c85b8fddbc4)

9 years agoctdb.init: check for availability of "tdbtool check" and "tdbdump"
Michael Adam [Tue, 5 Apr 2011 11:43:56 +0000 (13:43 +0200)]
ctdb.init: check for availability of "tdbtool check" and "tdbdump"

Print a warning if neither is available.

(This used to be ctdb commit 4137d2a7d31cdce22847cebfc0239cfe2d8e937c)

9 years agoCorrection of spelling errors
Mathieu Parent [Tue, 22 Mar 2011 23:16:27 +0000 (00:16 +0100)]
Correction of spelling errors

* continous -> continuous
* activete  -> activate

(thanks to lintian)

See https://bugzilla.samba.org/show_bug.cgi?id=6935

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit fb6987c2f747d6dbf9bb3899a480124d1c242a90)

9 years agoThis needs more testing first
Ronnie Sahlberg [Mon, 21 Mar 2011 03:25:53 +0000 (14:25 +1100)]
This needs more testing first

Revert "ctdbd: call tdb_reopen_all() in freeze child."

This reverts commit 3d9828861c771a060923f3181fa8224e0122bffc.

(This used to be ctdb commit 55c3446c9ba82d24b1d7db92bc3611fd8027b7fb)

9 years agoctdbd: call tdb_reopen_all() in freeze child.
Rusty Russell [Mon, 21 Mar 2011 02:37:17 +0000 (13:07 +1030)]
ctdbd: call tdb_reopen_all() in freeze child.

In theory, the ctdbd parent shouldn't be holding any locks, but it's a good
idea to always call tdb_reopen_all() after a fork().

(This used to be ctdb commit 3d9828861c771a060923f3181fa8224e0122bffc)

9 years agoctdbd: fix lock held on error ("ctdb_req_dmaster from non-master.")
Rusty Russell [Mon, 21 Mar 2011 02:33:01 +0000 (13:03 +1030)]
ctdbd: fix lock held on error ("ctdb_req_dmaster from non-")

We should release the lock on the record before returning; otherwise the
recovery (which tries to freeze the database) will fail.  Symptoms are as

ctdbd: pnn 15 dmaster request for new-dmaster 19 from non-master 1 real-dmaster=5 key f049c3c8 dbid 0x6cf2837d gen=1148812532 curgen=1148812532 c->rsn=2 header.rsn=15 reqid=2147483585 keyval=0x4f464e49
ctdbd: ctdb_req_dmaster from non-master. Force a recovery.
ctdbd: freeze_lock-1:server/ctdb_freeze.c:55 Failed to lock database registry.tdb


(This used to be ctdb commit 38b2dbe0605816742e74e2b8a811eaba99c7e12d)

9 years agoDeferred attach: create the timed event as a child context of the da context we want...
Ronnie Sahlberg [Wed, 16 Mar 2011 03:55:58 +0000 (14:55 +1100)]
Deferred attach: create the timed event as a child context of the da context we want to delete.
Othwervise the da context can be timed out and talloc_free()d
but the event for this already freed object will still trigger,
causing a talloc error and shutdown.

CQ S1022515

(This used to be ctdb commit 2fd27bdedb1e0d6558c07e1b74fc8e70ddf593dc)

9 years agoIP reallocation. If a public address is already hosted on the node when we startup...
Ronnie Sahlberg [Sun, 13 Mar 2011 22:55:28 +0000 (09:55 +1100)]
IP reallocation. If a public address is already hosted on the node when we startup, log a warning message but do not cause the recovery to fail.

CQ S1022356

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 89f8169c24da96c1fdd0ac19b8a1e0e1df01a72a)

9 years agoVacuuming: initialize a variable to avoid a harmless valgrind hit
Ronnie Sahlberg [Sun, 13 Mar 2011 00:30:52 +0000 (11:30 +1100)]
Vacuuming: initialize a variable to avoid a harmless valgrind hit

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit ad709e99bcad7a4884f2336663d161ba61307ae5)

9 years agoDont allow clients to connect to databases untile we are well past and through
Ronnie Sahlberg [Fri, 11 Mar 2011 22:42:07 +0000 (09:42 +1100)]
Dont allow clients to connect to databases untile we are well past and through
the initial recovery phase

CQ S1022412

Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit e02bbd915b7151c615ff64f09ad9abc9720bef7d)

9 years agovacuum: fix a comment typo
Michael Adam [Fri, 11 Mar 2011 15:05:44 +0000 (16:05 +0100)]
vacuum: fix a comment typo

(This used to be ctdb commit a16dc65b4602da5ce2c16578bec2e7882aff240d)

9 years agovacuum: use insert_record_into_delete_queue in ctdb_local_schedule_for_deletion.
Michael Adam [Fri, 11 Mar 2011 14:57:45 +0000 (15:57 +0100)]
vacuum: use insert_record_into_delete_queue in ctdb_local_schedule_for_deletion.

This is to take advantage of the hash collision handling and logging
also in ctdb_local_schedule_for_deletion.

(This used to be ctdb commit 52193b6692091e341ed7a81dbd9a61ae49a8aac5)

9 years agovacuum: refactor insert_record_into_delete_queue out of ctdb_control_schedule_for_del...
Michael Adam [Fri, 11 Mar 2011 14:55:52 +0000 (15:55 +0100)]
vacuum: refactor insert_record_into_delete_queue out of ctdb_control_schedule_for_deletion

(This used to be ctdb commit be4b63ee18933524f780df5c313447e5ef0786d1)

9 years agovacuum: raise a debug level from INFO to DEBUG
Michael Adam [Fri, 11 Mar 2011 13:57:15 +0000 (14:57 +0100)]
vacuum: raise a debug level from INFO to DEBUG

when overwriting an existing entry in the delete_queue.

(This used to be ctdb commit f28e636cc4a04ef982672d5f569ad6b6b963db1f)

9 years agoctdb_ltdb_store_server: honour the AUTOMATIC record flag
Michael Adam [Thu, 3 Feb 2011 15:32:23 +0000 (16:32 +0100)]
ctdb_ltdb_store_server: honour the AUTOMATIC record flag

Do not delete empty records that carry this flag but store
them and schedule them for deletetion. Do not store the flag
in the ltdb though, since this is internal only and should not
be visible to the client.

(This used to be ctdb commit f898ff21fa338358179e79381215b13a6bc77c53)

9 years agoltdb: add the CTDB_REC_FLAG_AUTOMATIC to the initial header in ctdb_ltdb_fetch()
Michael Adam [Thu, 3 Feb 2011 15:30:52 +0000 (16:30 +0100)]
ltdb: add the CTDB_REC_FLAG_AUTOMATIC to the initial header in ctdb_ltdb_fetch()

Signals that this record was not created by a client level store.

(This used to be ctdb commit 69d34983a37b0324ff7610b8dfdcd8d13bf81c54)

9 years agoctdb_private.h: add record flag CTDB_REC_FLAG_AUTOMATIC
Michael Adam [Thu, 3 Feb 2011 15:27:42 +0000 (16:27 +0100)]
ctdb_private.h: add record flag CTDB_REC_FLAG_AUTOMATIC

This is a flag that shall signa that a record has been automatically generated by ctdb
and not by an explicit client store operation. This will be used in the ctdb_ltdb_fetch
operation which stores an empty record with default initial header before trying to
migrate the record from the dmaster when the record does not exist in the local tdb.

(This used to be ctdb commit 46381a3cb58ccc11422af8f7798c80ea8d72294f)

9 years agoctdb_ltdb_store_server: add ability to send SCHEDULE_FOR_DELETION control to ctdb_ltd...
Michael Adam [Tue, 28 Dec 2010 12:19:22 +0000 (13:19 +0100)]
ctdb_ltdb_store_server: add ability to send SCHEDULE_FOR_DELETION control to ctdb_ltdb_store.

(This used to be ctdb commit ab2711701999a5ecc23a36b3d9ba8e94f92e4c87)

9 years agoctdb_ltdb_store_server: Improve debug message in ctdb_ltdb_store when store or delete...
Michael Adam [Tue, 21 Dec 2010 17:08:11 +0000 (18:08 +0100)]
ctdb_ltdb_store_server: Improve debug message in ctdb_ltdb_store when store or delete fails.

(This used to be ctdb commit 2559b2a45eb11834da3b0e0963e24351c8b7477f)

9 years agoctdb_ltdb_store_server: always store the data when ctdb_ltdb_store() is called from...
Michael Adam [Tue, 21 Dec 2010 16:50:52 +0000 (17:50 +0100)]
ctdb_ltdb_store_server: always store the data when ctdb_ltdb_store() is called from the client

This also fixes a segfault since ctdb_lmaster uses the vnn_map.

(This used to be ctdb commit e58c8f51f27e468897af5210b80e5f5f45c3c4bb)

9 years agoctdb_ltdb_store_server: implement fastpath vacuuming deletion based on VACUUM_MIGRATE...
Michael Adam [Fri, 10 Dec 2010 13:13:50 +0000 (14:13 +0100)]
ctdb_ltdb_store_server: implement fastpath vacuuming deletion based on VACUUM_MIGRATED flag.

When the record has been obtained by the lmaster as part of the vacuuming-fetch
handler and it is empty and never been migrated with data, then such records
are deleted instead of being stored. These records have automatically been
deleted when leaving the former dmaster, so that they vanish for good when
hitting the lmaster in this way. This will reduces the load on traditional

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit c9b65f3602f51bcbf0e6d82c12076c31e4aebe38)

9 years agoctdb_ltdb_store_server: delete an empty record that is safe to delete instead of...
Michael Adam [Fri, 3 Dec 2010 14:29:21 +0000 (15:29 +0100)]
ctdb_ltdb_store_server: delete an empty record that is safe to delete instead of storing locally.

When storing a record that is being migrated off to another node
and has never been migrated with data, then we can safely delete it
from the local tdb instead of storing the record with empty data.

Note: This record is not deleted if we are its lmaster or dmaster.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 3cca0d4b48325d86de2cb0b44bb7811a30701352)

9 years agoserver: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs
Michael Adam [Thu, 30 Dec 2010 17:19:32 +0000 (18:19 +0100)]
server: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs

This is realized by adding a ctdb_ltdb_store_fn function pointer to the db
context and filling it in the attach procedure for non-persistent dbs.

(This used to be ctdb commit df49ec44de80affa5ccc637dec12a20a26e8706e)

9 years agoserver: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().
Michael Adam [Thu, 30 Dec 2010 16:44:51 +0000 (17:44 +0100)]
server: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().

This is supposed to contain logic for deleting records that are safe
to delete and scheduling records for deletion. It will be called in
server context for non-persistent databases instead of the standard
ctdb_ltdb_store() function.

(This used to be ctdb commit 23631ffc152486aed9ce5b69a391e52bc4947833)

9 years agodaemon: fill ctdb->ctdbd_pid early
Michael Adam [Tue, 28 Dec 2010 12:14:23 +0000 (13:14 +0100)]
daemon: fill ctdb->ctdbd_pid early

(This used to be ctdb commit 3da1e2e30bf34622f08e6ecd5b8fe55684e5007a)

9 years agotest: send SCHEDULE_FOR_DELETION control from randrec test.
Michael Adam [Tue, 21 Dec 2010 14:29:46 +0000 (15:29 +0100)]
test: send SCHEDULE_FOR_DELETION control from randrec test.

(This used to be ctdb commit 30aa55b3efc6fbd4078f93da386b6aeb337c1a0c)

9 years agoclient: add accessor function ctdb_header_from_record_handle().
Michael Adam [Tue, 21 Dec 2010 14:29:23 +0000 (15:29 +0100)]
client: add accessor function ctdb_header_from_record_handle().

(This used to be ctdb commit cf57efd440ccc3db381386f4749bfcbf8ac5ecae)

9 years agovacuum: add ctdb_local_schedule_for_deletion()
Michael Adam [Tue, 28 Dec 2010 12:13:34 +0000 (13:13 +0100)]
vacuum: add ctdb_local_schedule_for_deletion()

(This used to be ctdb commit b70bc141d84f7355d2c6c901961b7366db566980)

9 years agoserver: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.
Michael Adam [Tue, 21 Dec 2010 13:25:48 +0000 (14:25 +0100)]
server: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.

(This used to be ctdb commit 680223074e992b32ccf6f42cb80c3fa93074fee7)

9 years agocontrol: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION
Michael Adam [Tue, 8 Mar 2011 23:57:55 +0000 (00:57 +0100)]
control: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION

(This used to be ctdb commit 4cebfa33db3c7effa087f753530c52b2dd8550e6)

9 years agocontrol: add macro CHECK_CONTROL_MIN_DATA_SIZE.
Michael Adam [Tue, 8 Mar 2011 23:56:25 +0000 (00:56 +0100)]
control: add macro CHECK_CONTROL_MIN_DATA_SIZE.

This is for the control dispatcher to check whether the input data has
a required minimum size.

(This used to be ctdb commit 2038e745db33cc5c3b4e2db8a00a57ede03906a2)

9 years agovacuum: lower level of hash collision debug message to INFO
Michael Adam [Thu, 23 Dec 2010 10:54:09 +0000 (11:54 +0100)]
vacuum: lower level of hash collision debug message to INFO

(This used to be ctdb commit b9bdef46fedfbc543263b67cfee3e896773cd8e8)

9 years agovacuum: add statistics output to the fast and full traverse runs.
Michael Adam [Wed, 22 Dec 2010 23:27:27 +0000 (00:27 +0100)]
vacuum: add statistics output to the fast and full traverse runs.

(This used to be ctdb commit 3addd28aa73883b3b05888e309d19db0eb67eab9)

9 years agovacuum: refactor insert_delete_record_data_into_tree() out of add_record_to_delete_tree()
Michael Adam [Tue, 21 Dec 2010 13:19:00 +0000 (14:19 +0100)]
vacuum: refactor insert_delete_record_data_into_tree() out of add_record_to_delete_tree()

for reuse in filling the delete_queue.

(This used to be ctdb commit 7bbb12695c24da25671f1c39a411295d35870d2c)

9 years agovacuum: change all Vacuum*Interval tunables to default to 10
Michael Adam [Mon, 20 Dec 2010 20:43:41 +0000 (21:43 +0100)]
vacuum: change all Vacuum*Interval tunables to default to 10

So, by default we have a fastpath vacuuming every 10 seconds and
full blown db-traverse vacuuming once every 10 minutes.

(This used to be ctdb commit 4f0ace982dbb5b4f9c035dbf4cb0ae74cd18d81b)

9 years agovacuum: disable full db-traverse vacuuming runs when VacuumFastPathCount == 0
Michael Adam [Mon, 20 Dec 2010 20:30:39 +0000 (21:30 +0100)]
vacuum: disable full db-traverse vacuuming runs when VacuumFastPathCount == 0

(This used to be ctdb commit 571683e7c48aeed8ce41c584d016ced7ff0d2e2d)

9 years agovacuum: Only run full vacuumig (db traverse) every VacuumFastPathCount times.
Michael Adam [Mon, 20 Dec 2010 17:03:38 +0000 (18:03 +0100)]
vacuum: Only run full vacuumig (db traverse) every VacuumFastPathCount times.

(This used to be ctdb commit 23b8c8c5fc8604ee0bd6da1f4b5152277eb5f1c0)

9 years agovacuum: reset the fast path count in the event handle if it exceeds the limit.
Michael Adam [Mon, 20 Dec 2010 16:54:04 +0000 (17:54 +0100)]
vacuum: reset the fast path count in the event handle if it exceeds the limit.

(This used to be ctdb commit 91e6d36a190b1c9e4c8b18f7833e51c5c9a67574)

9 years agovacuum: bump the number of fast-path runs in the vacuum child destructor
Michael Adam [Mon, 20 Dec 2010 16:49:29 +0000 (17:49 +0100)]
vacuum: bump the number of fast-path runs in the vacuum child destructor

(This used to be ctdb commit c0668bfe0bb4e69988ae34d875568d08539e6fb9)

9 years agovacuum: add a fast_path_count to the vacuum_handle.
Michael Adam [Mon, 20 Dec 2010 16:44:02 +0000 (17:44 +0100)]
vacuum: add a fast_path_count to the vacuum_handle.

(This used to be ctdb commit 53a39d0cc5ea251c2189ec8178ccb769fa046c43)

9 years agoAdd a tunable VacuumFastPathCount.
Michael Adam [Mon, 20 Dec 2010 16:42:25 +0000 (17:42 +0100)]
Add a tunable VacuumFastPathCount.

This will control how many fast-path vacuuming runs wil have to
be done, before a full vacuuming will be triggered, i.e. one with
a db-traversal.

(This used to be ctdb commit 0d997ec7e61a7bee2cb05456f9c7d5e6f7a44797)

9 years agovacuum: traverse the delete_queue befor traversing the database.
Michael Adam [Mon, 20 Dec 2010 16:25:35 +0000 (17:25 +0100)]
vacuum: traverse the delete_queue befor traversing the database.

(This used to be ctdb commit 04c335f9195a5fd83c91a57d06b1e4eaa511844e)

9 years agovacuum: add delete_queue_traverse() for traversal of the delete_queue.
Michael Adam [Mon, 20 Dec 2010 16:24:32 +0000 (17:24 +0100)]
vacuum: add delete_queue_traverse() for traversal of the delete_queue.

(This used to be ctdb commit 5eee05c4d256c08f4ee60a1a69efda6844e39729)

9 years agovacuum: reduce indentation in add_record_to_delete_tree()
Michael Adam [Tue, 21 Dec 2010 10:22:50 +0000 (11:22 +0100)]
vacuum: reduce indentation in add_record_to_delete_tree()

This simplyfies the logical structure a bit by using early return.

(This used to be ctdb commit 4d32908fdcec120426536a761e1d0be60f076198)

9 years agovacuum: refactor new add_record_to_delete_tree() out of vacuum_traverse().
Michael Adam [Mon, 20 Dec 2010 16:11:27 +0000 (17:11 +0100)]
vacuum: refactor new add_record_to_delete_tree() out of vacuum_traverse().

This will be reused by the traversal of the delete_queue list.

(This used to be ctdb commit 4407e5a7fb045ce56b6d902f7116de663ea648cb)

9 years agovacuum: skip adding records to list of records to send to lmaster on lmaster
Michael Adam [Mon, 20 Dec 2010 15:41:13 +0000 (16:41 +0100)]
vacuum: skip adding records to list of records to send to lmaster on lmaster

This list is skipped afterwards when the lists are processed.

(This used to be ctdb commit e99834c1a2eea60f7f974c0689ae0a65cfe178ff)

9 years agovacuum: refactor new add_record_to_vacuum_fetch_list() out of vacuum_traverse().
Michael Adam [Mon, 20 Dec 2010 15:31:27 +0000 (16:31 +0100)]
vacuum: refactor new add_record_to_vacuum_fetch_list() out of vacuum_traverse().

This is the function that fills the list of records to send to each lmaster
with the VACUUM_FETCH message.

This function will be reused in the traverse function for the delete_queue.

(This used to be ctdb commit d4ab790c1f679e833eb97816762fcfcee15ccb10)

9 years agoserver: rename ctdb_repack_db() to ctdb_vacuum_and_repack_db()
Michael Adam [Mon, 20 Dec 2010 09:55:53 +0000 (10:55 +0100)]
server: rename ctdb_repack_db() to ctdb_vacuum_and_repack_db()

(This used to be ctdb commit 6c603f85726d2efac9710af7c4875ded2ca7230e)

9 years agoWhen wiping a database, clear the delete_queue.
Michael Adam [Fri, 17 Dec 2010 01:22:02 +0000 (02:22 +0100)]
When wiping a database, clear the delete_queue.

(This used to be ctdb commit 731a6011ce4a1301f86eacb039955745f2b5d866)