Rusty Russell [Thu, 22 Apr 2010 04:23:51 +0000 (13:53 +0930)]
tdb: cleanup: rename GLOBAL_LOCK to OPEN_LOCK.
The word global is overloaded in tdb. The GLOBAL_LOCK offset is used at
open time to serialize initialization (and by the transaction code to block
open).
Rename it to OPEN_LOCK.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Imported from commit
7ab422d6fbd4f8be02838089a41f872d538ee7a7)
Rusty Russell [Thu, 22 Apr 2010 04:23:42 +0000 (13:53 +0930)]
tdb: make _tdb_transaction_cancel static.
Now tdb_open() calls tdb_transaction_cancel() instead of
_tdb_transaction_cancel, we can make it static.
Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
(Imported from commit
a6e0ef87d25734760fe77b87a9fd11db56760955)
Rusty Russell [Thu, 22 Apr 2010 04:23:42 +0000 (13:53 +0930)]
tdb: cleanup: split brlock and brunlock methods.
This is taken from the CCAN code base: rather than using tdb_brlock for
locking and unlocking, we split it into brlock and brunlock functions.
For extra debugging information, brunlock says what kind of lock it is
unlocking (even though fnctl locks don't need this). This requires an
extra argument to tdb_transaction_unlock() so we know whether the
lock was upgraded to a write lock or not.
We also use a "flags" argument tdb_brlock:
1) TDB_LOCK_NOWAIT replaces lck_type = F_SETLK (vs F_SETLKW).
2) TDB_LOCK_MARK_ONLY replaces setting TDB_MARK_LOCK bit in ltype.
3) TDB_LOCK_PROBE replaces the "probe" argument.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Imported from commit
452b4a5a6efeecfb5c83475f1375ddc25bcddfbe)
Brad Hards [Thu, 22 Apr 2010 04:23:42 +0000 (13:53 +0930)]
Spelling fixes for tdb.
Signed-off-by: Matthias Dieter Wallnöfer <mwallnoefer@yahoo.de>
(Imported from commit
09e756b1d651caef203a4b7e02234f6dea374b08)
Andrew Tridgell [Thu, 22 Apr 2010 04:23:42 +0000 (13:53 +0930)]
tdb: use fdatasync() instead of fsync() in transactions
This might help on some filesystems
(Imported from commit
1373e748aa53fbd3afe4d2377208257d42628d86)
Volker Lendecke [Thu, 22 Apr 2010 04:23:42 +0000 (13:53 +0930)]
tdb: Apply some const, just for clarity
(Imported from commit
6824c6f46ba7c15e8af91d5aa8b21a946b63107b)
Rusty Russell [Thu, 22 Apr 2010 04:23:41 +0000 (13:53 +0930)]
tdb: fix recovery reuse after crash
If a process (or the machine) dies after just after writing the
recovery head (pointing at the end of file), the recovery record will filled
with 0x42. This will not invoke a recovery on open, since rec.magic
!= TDB_RECOVERY_MAGIC.
Unfortunately, the first transaction commit will happily reuse that
area: tdb_recovery_allocate() doesn't check the magic. The recovery
record has length 0x42424242, and it writes that back into the
now-valid-looking transaction header) for the next comer (which
happens to be tdb_wipe_all in my tests).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Imported from commit
b37b452cb8c1f56b37b04abe7bffdede371ca361)
Rusty Russell [Thu, 22 Apr 2010 04:23:26 +0000 (13:53 +0930)]
tdb: give a name to the invalid recovery area constant (0)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Imported from commit
6269cdcd1538e2e3cead9e0f3c156b0363d607a0)
Simo Sorce [Thu, 22 Apr 2010 04:23:21 +0000 (13:53 +0930)]
release-scripts: parametrize scripts
This should make it easier to keep all release scripts alined as it will reduce
the difference between them to ideally a few variables
Also moves the tdb script in the scripts directory.
(Imported from commit
6339de7f4fef46fb3ad32d1ecf9379f5b5d24ccb)
Simo Sorce [Thu, 22 Apr 2010 04:15:58 +0000 (13:45 +0930)]
tdb: raise version to 1.2.1
after recent fixes we need to raise the version to 1.2.1 so that
we can require also the right patched version.
(Imported from commit
70534adee10fc6f5bba2d9304668dc6508e5de5a)
Martin Schwenke [Tue, 20 Apr 2010 00:52:31 +0000 (10:52 +1000)]
Fix a thinko in
2ea0a9f1a93781a0d036feb9fcc0d120b182922f.
If the driver is virtio_net then we assume that the link is up rather
than ignoring the check altogether.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ralph Wuerthner [Thu, 15 Apr 2010 06:38:19 +0000 (16:38 +1000)]
ethtool does not support virtio_net devices.
Skip link test for this type of devices
Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Thu, 15 Apr 2010 03:45:50 +0000 (13:45 +1000)]
Merge branch 'master' of git://git.samba.org/sahlberg/ctdb
Ronnie Sahlberg [Thu, 8 Apr 2010 04:30:01 +0000 (14:30 +1000)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Ronnie Sahlberg [Thu, 8 Apr 2010 04:28:52 +0000 (14:28 +1000)]
Fix a compiler warning
Ronnie Sahlberg [Thu, 8 Apr 2010 04:07:57 +0000 (14:07 +1000)]
In the recovery daemon, keep track of which node we have assigned public ip
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.
If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.
Ronnie Sahlberg [Wed, 7 Apr 2010 00:45:27 +0000 (10:45 +1000)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Ronnie Sahlberg [Wed, 7 Apr 2010 00:42:51 +0000 (10:42 +1000)]
Lower the loglevel for "Recovery lock successfully taken"
from ERR to NOTICE
BZ62086
Martin Schwenke [Wed, 31 Mar 2010 06:52:42 +0000 (17:52 +1100)]
Merge commit 'origin/master'
Ronnie Sahlberg [Tue, 30 Mar 2010 01:50:19 +0000 (12:50 +1100)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Ronnie Sahlberg [Tue, 30 Mar 2010 01:47:54 +0000 (12:47 +1100)]
When we forcefully abort a running eventscript, dont log this as is
the script timedout.
Instead send a different signal (SIGABRT) to the child process to silently
kill the process group for the script and its children without logging
anything.
We abort any running "monitor" script anytime any other event is generated
either by ctdbd itself or by "ctdb eventscript ..."
BZ61043
Ronnie Sahlberg [Tue, 30 Mar 2010 00:58:37 +0000 (11:58 +1100)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Ronnie Sahlberg [Tue, 30 Mar 2010 00:57:25 +0000 (11:57 +1100)]
Reduce the loglevel for two log messages for Registering and Deregistering server ids.
BZ61890
Ronnie Sahlberg [Mon, 29 Mar 2010 06:06:50 +0000 (17:06 +1100)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Volker Lendecke [Wed, 24 Mar 2010 09:35:10 +0000 (10:35 +0100)]
In ctdb catdb, print the payload data length without the ctdb header length
Volker Lendecke [Mon, 22 Feb 2010 14:04:16 +0000 (15:04 +0100)]
Fix a typo in run_startrecovery_eventscript
Michael Adam [Fri, 26 Mar 2010 16:33:51 +0000 (17:33 +0100)]
events:50.samba: wipe the local part of the serverid db before starting winbind/smnd/nmbd
This is necessary for the new serverid approach.
Michael
Volker Lendecke [Wed, 24 Mar 2010 09:35:10 +0000 (10:35 +0100)]
In ctdb catdb, print the payload data length without the ctdb header length
Volker Lendecke [Mon, 22 Feb 2010 14:04:16 +0000 (15:04 +0100)]
Fix a typo in run_startrecovery_eventscript
Ronnie Sahlberg [Wed, 24 Mar 2010 06:21:10 +0000 (17:21 +1100)]
new version 1.0.114
Stefan Metzmacher [Fri, 26 Feb 2010 11:41:21 +0000 (12:41 +0100)]
config: let 13.per_ip_routing use a flock for generate_auto_link_local()
metze
Ronnie Sahlberg [Thu, 11 Mar 2010 07:34:32 +0000 (18:34 +1100)]
Merge commit 'obnox/master-rebase'
Ronnie Sahlberg [Thu, 11 Mar 2010 07:15:41 +0000 (18:15 +1100)]
Merge root@10.1.1.27:/shared/ctdb/ctdb-git
Christian Ambach [Wed, 10 Mar 2010 17:46:15 +0000 (18:46 +0100)]
adjust a vacuum log level
made the severity of the decreasing interval log level the same as for the increasing,
they are both just info logs because they don't report errors
Wolfgang Mueller-Friedt [Wed, 10 Mar 2010 09:39:31 +0000 (10:39 +0100)]
ctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called with a wrong parameter list
Wolfgang Mueller-Friedt [Wed, 10 Mar 2010 09:39:31 +0000 (10:39 +0100)]
ctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called with a wrong parameter list
Michael Adam [Wed, 24 Feb 2010 13:52:55 +0000 (14:52 +0100)]
packaging: add tdbtool and tdbdump as dependencies to the RPM
The init script relies on the existence.
This should fix bug #6773 on bugzilla.samba.org:
https://bugzilla.samba.org/show_bug.cgi?id=6773
Michael
Michael Adam [Wed, 24 Feb 2010 13:52:04 +0000 (14:52 +0100)]
doc: regenerate ctdb(1) manpages after xml change
Michael Adam [Wed, 24 Feb 2010 13:50:37 +0000 (14:50 +0100)]
doc: fix a linebreak in the example output of "ctdb getdbmap" in ctdb(1)
Mathieu Parent [Thu, 4 Mar 2010 15:06:11 +0000 (16:06 +0100)]
Fix some more bashisms
Mathieu Parent [Mon, 8 Mar 2010 20:19:35 +0000 (21:19 +0100)]
Correct nice_service()
nice takes a binary as argument and not a function or builtin command
Michael Adam [Wed, 24 Feb 2010 11:58:57 +0000 (12:58 +0100)]
doc: regenerate ctdb and ctdb manpages after xml changes
Michael
Michael Adam [Wed, 24 Feb 2010 11:53:21 +0000 (12:53 +0100)]
doc: add metainfo "manual" and "source" in the ctdbd manual page
Michael Adam [Wed, 24 Feb 2010 11:52:30 +0000 (12:52 +0100)]
doc: fill metainfo "manual" and "source" in the ctdb manual page
Mathieu Parent [Tue, 5 Jan 2010 10:04:24 +0000 (11:04 +0100)]
Correction of spelling errors.
* interupted -> interrupted
* dont -> don't
(thanks to lintian)
See https://bugzilla.samba.org/show_bug.cgi?id=6935
Mathieu Parent [Tue, 5 Jan 2010 09:59:44 +0000 (10:59 +0100)]
Correction of spelling errors in manpages
thanks to lintian
See https://bugzilla.samba.org/show_bug.cgi?id=6935
Michael Adam [Tue, 23 Feb 2010 10:00:23 +0000 (11:00 +0100)]
fix bug #7152: check NFS-Shares, fails with to long path-names
Thanks to Thomas Sesselmann <t.sesselmann@dkfz.de> .
Michael
Michael Adam [Wed, 6 Jan 2010 13:59:23 +0000 (14:59 +0100)]
server:ctdb_send_dmaster_reply: fix a message typo.
Michael
Stefan Metzmacher [Tue, 23 Feb 2010 09:29:27 +0000 (10:29 +0100)]
doc: regenerate ctdb.1*
metze
Stefan Metzmacher [Tue, 23 Feb 2010 09:36:46 +0000 (10:36 +0100)]
doc/ctdb.1.xml: document "ctdb setifacelink <iface> <status>"
metze
Stefan Metzmacher [Tue, 23 Feb 2010 09:04:51 +0000 (10:04 +0100)]
doc/ctdb.1.xml: document "ctdb ipinfo <ip>"
metze
Stefan Metzmacher [Tue, 23 Feb 2010 09:03:00 +0000 (10:03 +0100)]
doc/ctdb.1.xml: update "ctdb ip" documentation
metze
Stefan Metzmacher [Tue, 23 Feb 2010 09:01:50 +0000 (10:01 +0100)]
doc/ctdb.1.xml: document "ctdb ifaces"
metze
Stefan Metzmacher [Tue, 23 Feb 2010 07:35:08 +0000 (08:35 +0100)]
doc/ctdb.1.xml: document PARTIALLYONLINE status
metze
Stefan Metzmacher [Fri, 12 Feb 2010 08:54:46 +0000 (09:54 +0100)]
config/13.per_ip_routing: fix typo in error message
metze
Stefan Metzmacher [Fri, 12 Feb 2010 13:06:40 +0000 (14:06 +0100)]
config/13.per_ip_routing: use better names for release_script and setup_script
As the basename of the script will be used for the readd script
from setup_iface_ip_readd_script, it's know easier to identify
what script is called by delete_ip_from_iface() while readding
ips to the interface.
metze
Stefan Metzmacher [Fri, 12 Feb 2010 08:52:09 +0000 (09:52 +0100)]
config/13.per_ip_routing: register the setup script with setup_iface_ip_readd_script()
This is needed because we need to resetup the routing table when
the delete_ip_from_iface() function readds the ip to the interface.
metze
Stefan Metzmacher [Tue, 9 Feb 2010 15:34:59 +0000 (16:34 +0100)]
config/13.per_ip_routing: add a setup_per_ip_routing() function
This combines the logic into a shell function which can be used by the
"takeip" and "updateip" hooks.
We check the return values of the "ip" commands now
instead of ignoring them.
We now create a setup_script.sh similar to the release_script.sh
which makes it easier to analyze problems.
metze
Stefan Metzmacher [Fri, 12 Feb 2010 10:24:08 +0000 (11:24 +0100)]
server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.
metze
Stefan Metzmacher [Fri, 12 Feb 2010 10:25:26 +0000 (11:25 +0100)]
config/10.interface: use delete_ip_from_iface also in the "init" event
metze
Stefan Metzmacher [Fri, 12 Feb 2010 09:33:54 +0000 (10:33 +0100)]
config/11.natgw: use delete_ip_from_iface() instead of remove_ip()
This also initializes the variables correctly for the
shutdown|removenatgw code path to delete_all.
metze
Stefan Metzmacher [Fri, 12 Feb 2010 09:24:44 +0000 (10:24 +0100)]
config: make remove_ip() a wrapper of delete_ip_from_iface()
metze
Stefan Metzmacher [Fri, 12 Feb 2010 09:23:17 +0000 (10:23 +0100)]
config: interface_modify states in a $CTDB_BASE/state/interface_modify directory
metze
Stefan Metzmacher [Fri, 12 Feb 2010 08:48:01 +0000 (09:48 +0100)]
config: add setup_iface_ip_readd_script() helper function
This adds a generic infrastructure to register scripts which will
be called when the delete_ip_from_iface() funtion needs to readd
secondary ips to an interface.
metze
Stefan Metzmacher [Fri, 12 Feb 2010 08:55:28 +0000 (09:55 +0100)]
config: readd ips with a broadcast address in delete_ip_from_iface()
metze
Ronnie Sahlberg [Tue, 23 Feb 2010 01:43:49 +0000 (12:43 +1100)]
In ctdb_control_end_recovery,
We used to talloc_steal c (the command packet) and make it a child of the
"event script state context".
If we failed to create a eventscript child context for some reason,
this would have talloc freed state, but at the same time it would also
implicitely have freed c.
Once ctdb_control_end_recovery() returns the error back to the caller,
the caller would dereference both c, and also outdata which is a child of c
and we would either read garbage data or segv.
Change the ordering so we only talloc_steal c as a child of state IFF
we have successfully created a child context for the script.
BZ61068
Ronnie Sahlberg [Mon, 22 Feb 2010 23:14:51 +0000 (10:14 +1100)]
Make sure that the natgw eventscript also triggers on the "stopped" event
to remove the natgw configuration and ip assignments used.
BZ61036
Ronnie Sahlberg [Mon, 22 Feb 2010 04:34:26 +0000 (15:34 +1100)]
ctdb regsrvids is much more useful for testing if it sleeps once it has registered its srvid.
Othervise, as soon as it terminates, ctdbd will deregister the id automatically.
Ronnie Sahlberg [Mon, 22 Feb 2010 03:06:52 +0000 (14:06 +1100)]
From Sumit Bose <sbose@redhat.com>
Fixes for init script to meet guidelines
Ronnie Sahlberg [Mon, 22 Feb 2010 03:00:33 +0000 (14:00 +1100)]
From Elia Pinto <gitter.spiros@gmail.com>
We dont need to include getopt.h under AIX
Ronnie Sahlberg [Tue, 16 Feb 2010 00:18:43 +0000 (11:18 +1100)]
Ignore any scripts that timesout for most events, except startup.
Threat hung scripts always (except startup) as success.
Ronnie Sahlberg [Fri, 12 Feb 2010 02:19:57 +0000 (13:19 +1100)]
try to restart rpc-rquotad if it is not running
bz60317
Rusty Russell [Fri, 12 Feb 2010 06:32:56 +0000 (17:02 +1030)]
Leave sequence number alone when merely migrating records.
(Based on earlier version from Ronnie which modified tdb; this one
is standalone).
When storing records in a tdb that has "automatic seqnum updates"
also check if the actual data for the record has changed or not.
If it has not changed at all, except for possibly the header,
this is likely just a dmaster migration operation in which case
we want to write the record to the tdb but we do not want the tdb
sequence number to be increased.
This resolves the problem of notify.tdb being thrashed under load:
the heuristic in smbd to only reread this when the sequence number
increases (rarely) breaks down.
Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like:
512 1496 118.33 MB/sec execute 60 sec latency 0.00 msec
And turning on latency tracking, this was typical in the logs:
ctdbd: High latency
9380914.000000s for operation lockwait on database notify.tdb
After this commit:
512 2451 143.85 MB/sec execute 60 sec latency 0.00 msec
And no more latency messages...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ronnie Sahlberg [Thu, 11 Feb 2010 01:00:43 +0000 (12:00 +1100)]
Reduce loglevel for two eventscript related debug messages
Ronnie Sahlberg [Thu, 11 Feb 2010 00:54:46 +0000 (11:54 +1100)]
Reducing the log level for a debug message
DEBUG(DEBUG_DEBUG,("pnn %u starting migration of %08x t\
Ronnie Sahlberg [Thu, 11 Feb 2010 00:49:48 +0000 (11:49 +1100)]
Reduce the log level for two debug messages
DEBUG(DEBUG_DEBUG,("pnn %u dmaster response %08x\n", ctdb->pnn, ctdb_has
DEBUG(DEBUG_DEBUG,("pnn %u dmaster request on %08x for %u from %u\n",
Ronnie Sahlberg [Thu, 11 Feb 2010 00:32:22 +0000 (11:32 +1100)]
Add a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
to control whether or not to check if we are swapping, and produce
useful output into the logfile if we are.
For production systems with dedicated nas-heads we should never swap.
But for developer/test systems we often use smaller nondedicated systems where
we can no longer guarantee that we will not be using swap.
Ronnie Sahlberg [Thu, 11 Feb 2010 00:19:08 +0000 (11:19 +1100)]
lower the loglevel for a debug message for redundant releases of public ips
Ronnie Sahlberg [Thu, 11 Feb 2010 00:09:39 +0000 (11:09 +1100)]
Add a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
when set to "yes" this will skip checking if knfsd has hung or not.
bz59626
Andrew Tridgell [Fri, 5 Feb 2010 06:11:29 +0000 (17:11 +1100)]
fixed printing of high latency
Ronnie Sahlberg [Thu, 11 Feb 2010 03:08:41 +0000 (14:08 +1100)]
Merge commit 'martins/master'
Martin Schwenke [Wed, 10 Feb 2010 09:27:53 +0000 (20:27 +1100)]
Test suite: Make "ctdb ip" test backward compatible with older ctdb versions.
Recent updates to the test meant that it only worked with the latest
ctdb versions. This changes things so that we never bother matching
the machine readable header, just the actual data in the output. It
also takes a slightly more liberal approach in massaging the human
readable output to ensure it matches the machine readable output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 10 Feb 2010 09:27:53 +0000 (20:27 +1100)]
Test suite: Make "ctdb ip" test backward compatible with older ctdb versions.
Recent updates to the test meant that it only worked with the latest
ctdb versions. This changes things so that we never bother matching
the machine readable header, just the actual data in the output. It
also takes a slightly more liberal approach in massaging the human
readable output to ensure it matches the machine readable output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 10 Feb 2010 09:24:28 +0000 (20:24 +1100)]
Merge commit 'origin/master'
Ronnie Sahlberg [Tue, 9 Feb 2010 07:34:47 +0000 (18:34 +1100)]
commands that relate to manual failover of ip addresses (moveip)
can sometimes take long so allow for a longer timeout for the controls used.
Ronnie Sahlberg [Tue, 9 Feb 2010 03:35:10 +0000 (14:35 +1100)]
dont just exit(0) upon successful completion of waiting for an ipreallocate to finish.
return success back to the caller instead.
otherwise things like 'ctdb enable -n all' will just finish after the first disabled node has become enabled.
Rusty Russell [Tue, 9 Feb 2010 02:16:35 +0000 (12:46 +1030)]
event scripts: add logging for low memory conditions
We should never enter swap; if we do, show the memory state of the machine and the process list. This will help us diagnose what caused the condition before it's too late and the box starts OOM-killing processes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Andrew Tridgell [Sun, 7 Feb 2010 08:02:06 +0000 (19:02 +1100)]
ctdb: migrate to new dlinklist.h from Samba
Martin Schwenke [Fri, 5 Feb 2010 04:30:39 +0000 (15:30 +1100)]
onnode documentation - update documentation to reflect recent onnode changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 5 Feb 2010 03:00:23 +0000 (14:00 +1100)]
Merge branch 'master' of git://git.samba.org/sahlberg/ctdb
Andrew Tridgell [Thu, 4 Feb 2010 03:36:14 +0000 (14:36 +1100)]
ctdb: when we fill the client packet queue we need to drop the client
We can't just drop packets to the list, as those packets could be part
of the core protocol the client is using. This happens (for example)
when Samba is doing a traverse. If we drop a traverse packet then
Samba hangs indefinately. We are better off dropping the ctdb socket
to Samba.
Andrew Tridgell [Thu, 4 Feb 2010 03:14:18 +0000 (14:14 +1100)]
ctdb: move ctdb_io.c to use TLIST_*() macros
This will make large packet queues much more efficient
Andrew Tridgell [Thu, 4 Feb 2010 03:13:49 +0000 (14:13 +1100)]
util: added TLIST_*() macros
The TLIST_*() macros are like the DLIST_*() macros, but take both a
head and tail pointer for the list. This means that adding an element
to the end of the list is efficient (it doesn't need to walk the
list).
We should move all uses of the DLIST_*() macros which use
DLIST_ADD_END() to use the TLIST_*() macros instead.
Ronnie Sahlberg [Wed, 3 Feb 2010 23:03:21 +0000 (10:03 +1100)]
When trying to enable/disable a node.
Check if the node is already enabled/disabled and log an information
message if so.
Ronnie Sahlberg [Wed, 3 Feb 2010 22:54:06 +0000 (09:54 +1100)]
We only queued up to 1000 packets per queue before we start dropping
packets, to avoid the queue to grow excessively if smbd has blocked.
This could cause traverse packets to become discarded in case the main
smbd daemon does a traverse of a database while there is a recovery
(sending a erconfigured message to smbd, causing an avalanche of unlock
messages to be sent across the cluster.)
This avalance of messages could cause also the tranversal message to be
discarded causing the main smbd process to hang indefinitely waiting
for the traversal message that will never arrive.
Bump the maximum queue length before starting to discard messages from
1000 to
1000000 and at the same time rework the queueing slightly so we
can append messages cheaply to the queue instead of walking the list
from head to tail every time.
Ronnie Sahlberg [Wed, 3 Feb 2010 22:45:32 +0000 (09:45 +1100)]
add two new debug controls to send and receive messages
ctdb msglisten and msgsend
Ronnie Sahlberg [Wed, 3 Feb 2010 19:37:41 +0000 (06:37 +1100)]
Drop the debug level for logging fd creation to DEBUG_DEBUG
Volker Lendecke [Fri, 29 Jan 2010 17:21:09 +0000 (18:21 +0100)]
tdb: fix an early release of the global lock that can cause data corruption
There was a bug in tdb where the
tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW, 0, 1);
(ending the transaction-"mutex") was done before the
/* remove the recovery marker */
This means that when a transaction is committed there is a window where another
opener of the file sees the transaction marker while the transaction committer
is still fully functional and working on it. This led to transaction being
rolled back by that second opener of the file while transaction_commit() gave
no error to the caller.
This patch moves the F_UNLCK to after the recovery marker was removed, closing
this window.
Martin Schwenke [Fri, 22 Jan 2010 06:19:12 +0000 (17:19 +1100)]
eventscripts: stop loadconfig function from loading ctdb config file twice.
If "$1" was empty than loadconfig would load the ctdb config twice.
This stops that from happening.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 22 Jan 2010 06:14:50 +0000 (17:14 +1100)]
eventscript: Use of $NFS_TICKLE_SHARED_DIRECTORY must be after loadconfig.
Proper fix for
085d1bea78fabf754ef6dd6d323f74a1d361e45c's workaround.
$NFS_TICKLE_SHARED_DIRECTORY was being used before it is set via
loadconfig.
Ronnie actually spotted this one. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>