tridge/ctdb.git
14 years agoctdb: when we fill the client packet queue we need to drop the client master
Andrew Tridgell [Thu, 4 Feb 2010 03:36:14 +0000 (14:36 +1100)]
ctdb: when we fill the client packet queue we need to drop the client

We can't just drop packets to the list, as those packets could be part
of the core protocol the client is using. This happens (for example)
when Samba is doing a traverse. If we drop a traverse packet then
Samba hangs indefinately. We are better off dropping the ctdb socket
to Samba.

14 years agoctdb: move ctdb_io.c to use TLIST_*() macros
Andrew Tridgell [Thu, 4 Feb 2010 03:14:18 +0000 (14:14 +1100)]
ctdb: move ctdb_io.c to use TLIST_*() macros

This will make large packet queues much more efficient

14 years agoutil: added TLIST_*() macros
Andrew Tridgell [Thu, 4 Feb 2010 03:13:49 +0000 (14:13 +1100)]
util: added TLIST_*() macros

The TLIST_*() macros are like the DLIST_*() macros, but take both a
head and tail pointer for the list. This means that adding an element
to the end of the list is efficient (it doesn't need to walk the
list).

We should move all uses of the DLIST_*() macros which use
DLIST_ADD_END() to use the TLIST_*() macros instead.

14 years agoWhen trying to enable/disable a node.
Ronnie Sahlberg [Wed, 3 Feb 2010 23:03:21 +0000 (10:03 +1100)]
When trying to enable/disable a node.
Check if the node is already enabled/disabled and log an information
message if so.

14 years agoWe only queued up to 1000 packets per queue before we start dropping
Ronnie Sahlberg [Wed, 3 Feb 2010 22:54:06 +0000 (09:54 +1100)]
We only queued up to 1000 packets per queue before we start dropping
packets, to avoid the queue to grow excessively if smbd has blocked.

This could cause traverse packets to become discarded in case the main
smbd daemon does a traverse of a database while there is a recovery
(sending a erconfigured message to smbd, causing an avalanche of unlock
messages to be sent across the cluster.)

This avalance of messages could cause also the tranversal message to be
discarded  causing the main smbd process to hang indefinitely waiting
for the traversal message that will never arrive.

Bump the maximum queue length before starting to discard messages from
1000 to 1000000 and at the same time rework the queueing slightly so we
can append messages cheaply to the queue instead of walking the list
from head to tail every time.

14 years agoadd two new debug controls to send and receive messages
Ronnie Sahlberg [Wed, 3 Feb 2010 22:45:32 +0000 (09:45 +1100)]
add two new debug controls to send and receive messages

ctdb msglisten and msgsend

14 years agoDrop the debug level for logging fd creation to DEBUG_DEBUG
Ronnie Sahlberg [Wed, 3 Feb 2010 19:37:41 +0000 (06:37 +1100)]
Drop the debug level for logging fd creation to DEBUG_DEBUG

14 years agotdb: fix an early release of the global lock that can cause data corruption
Volker Lendecke [Fri, 29 Jan 2010 17:21:09 +0000 (18:21 +0100)]
tdb: fix an early release of the global lock that can cause data corruption

There was a bug in tdb where the

                tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW, 0, 1);

(ending the transaction-"mutex") was done before the

                        /* remove the recovery marker */

This means that when a transaction is committed there is a window where another
opener of the file sees the transaction marker while the transaction committer
is still fully functional and working on it. This led to transaction being
rolled back by that second opener of the file while transaction_commit() gave
no error to the caller.

This patch moves the F_UNLCK to after the recovery marker was removed, closing
this window.

14 years agoinitscript: handle spaces in option values inserted into $CTDB_OPTIONS.
Martin Schwenke [Fri, 22 Jan 2010 02:19:00 +0000 (13:19 +1100)]
initscript: handle spaces in option values inserted into $CTDB_OPTIONS.

This puts single quotes around everything and uses eval on the
command-lines that actually start ctdbd.  The eval causes the single
quotes to be interpreted.

The "redhat" init style no longer uses the Red Hat daemon function.
It loses the quoting and re-splits on spaces.  Instead we add an extra
line that uses the success/failure functions to keep things pretty.
Note that this means that we don't respect daemon's
$DAEMON_COREFILE_LIMIT variable but we do our own core file handling
with $CTDB_SUPPRESS_COREFILE anyway.  daemon's core file handling was
probably overriding what we were doing anyway, so this can be regarded
as a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoonnode: update algorithm for finding nodes file.
Martin Schwenke [Thu, 21 Jan 2010 02:40:03 +0000 (13:40 +1100)]
onnode: update algorithm for finding nodes file.

2 changes:

* If a relative nodes file is specified via -f or $CTDB_NODES_FILE but
  this file does not exist then try looking for the file in /etc/ctdb
  (or $CTDB_BASE if set).

* If a nodes file is specified via -f or $CTDB_NODES_FILE but this
  file does not exist (even when checked as per above) then do not
  fall back to /etc/ctdb/nodes ((or $CTDB_BASE if set).  The old
  behaviour was surprising and hid errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoonnode - respect $CTDB_BASE rather than hard-coding /etc/ctdb.
Martin Schwenke [Thu, 21 Jan 2010 02:16:18 +0000 (13:16 +1100)]
onnode - respect $CTDB_BASE rather than hard-coding /etc/ctdb.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoconfig: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path
Stefan Metzmacher [Mon, 18 Jan 2010 12:05:54 +0000 (13:05 +0100)]
config: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path

This is very useful for testing, I use such a script:

cat ~/bin/ethtool
 #!/bin/sh

 IFACE=$1

 case "$IFACE" in
        Neth2)
                ;;
        Neth3)
                ;;
        Neth4)
                ;;
        Neth5)
                ;;
        *)
                exec /usr/sbin/ethtool $@
                ;;
 esac

 ip link set down $IFACE

 exec /usr/sbin/ethtool $@

metze

14 years agoserver: reload the public addresses before doing a takeover run
Stefan Metzmacher [Tue, 19 Jan 2010 07:42:48 +0000 (08:42 +0100)]
server: reload the public addresses before doing a takeover run

metze

14 years agoserver: ban ourself if the ctdb and kernel knowledge of a public ip differs
Stefan Metzmacher [Mon, 18 Jan 2010 14:04:32 +0000 (15:04 +0100)]
server: ban ourself if the ctdb and kernel knowledge of a public ip differs

metze

14 years agoserver: give an error if we're getting an takeover_ip event with a wrong pnn
Stefan Metzmacher [Mon, 18 Jan 2010 14:38:01 +0000 (15:38 +0100)]
server: give an error if we're getting an takeover_ip event with a wrong pnn

metze

14 years agoserver: return an error if we get an takeover ip event and we cannot serve the ip
Stefan Metzmacher [Mon, 18 Jan 2010 14:08:15 +0000 (15:08 +0100)]
server: return an error if we get an takeover ip event and we cannot serve the ip

metze

14 years agoserver: print node number as signed integer on release ip event
Stefan Metzmacher [Mon, 18 Jan 2010 14:12:46 +0000 (15:12 +0100)]
server: print node number as signed integer on release ip event

metze

14 years agoserver: debug redundant takeover ip events with level INFO
Stefan Metzmacher [Mon, 18 Jan 2010 14:22:16 +0000 (15:22 +0100)]
server: debug redundant takeover ip events with level INFO

metze

14 years agoserver: be less verbose on redundant release_ip events
Stefan Metzmacher [Mon, 18 Jan 2010 14:04:32 +0000 (15:04 +0100)]
server: be less verbose on redundant release_ip events

metze

14 years agoserver: add a ctdb_do_updateip()
Stefan Metzmacher [Sat, 16 Jan 2010 14:01:17 +0000 (15:01 +0100)]
server: add a ctdb_do_updateip()

metze

14 years agoserver: split out a ctdb_do_takeover_ip() function
Stefan Metzmacher [Sat, 16 Jan 2010 12:30:58 +0000 (13:30 +0100)]
server: split out a ctdb_do_takeover_ip() function

metze

14 years agoserver: split out a ctdb_announce_vnn_iface() function
Stefan Metzmacher [Sat, 16 Jan 2010 12:20:45 +0000 (13:20 +0100)]
server: split out a ctdb_announce_vnn_iface() function

metze

14 years agoevents: add updateip event to 13.per_ip_routing
Stefan Metzmacher [Mon, 21 Dec 2009 07:45:19 +0000 (08:45 +0100)]
events: add updateip event to 13.per_ip_routing

metze

14 years agoevents: 10.interface handle updateip event
Stefan Metzmacher [Mon, 21 Dec 2009 07:40:50 +0000 (08:40 +0100)]
events: 10.interface handle updateip event

metze

14 years agoserver: add updateip event
Stefan Metzmacher [Mon, 21 Dec 2009 07:33:55 +0000 (08:33 +0100)]
server: add updateip event

metze

14 years agoconfig: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig
Stefan Metzmacher [Mon, 21 Dec 2009 13:02:03 +0000 (14:02 +0100)]
config: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig

With this option set to "yes", we don't become unhealthy
as long as at least one interface is still available.

metze

14 years agoserver: start with disabled interfaces and let the event scripts enable the interface...
Stefan Metzmacher [Mon, 21 Dec 2009 18:18:10 +0000 (19:18 +0100)]
server: start with disabled interfaces and let the event scripts enable the interfaces explicit

This makes sure that we don't get public addresses assigned during the
initial recovery and remove them again in the startup event.

metze

14 years agoconfig: 10.interfaces call monitor_interfaces on startup
Stefan Metzmacher [Tue, 22 Dec 2009 14:25:30 +0000 (15:25 +0100)]
config: 10.interfaces call monitor_interfaces on startup

metze

14 years agoconfig: 10.interfaces call ctdb ifaces and ctdb setifacelink for monitoring
Stefan Metzmacher [Tue, 22 Dec 2009 14:25:30 +0000 (15:25 +0100)]
config: 10.interfaces call ctdb ifaces and ctdb setifacelink for monitoring

metze

14 years agoevents: splitout a monitor_interfaces function in 10.interface
Stefan Metzmacher [Mon, 14 Dec 2009 10:59:45 +0000 (11:59 +0100)]
events: splitout a monitor_interfaces function in 10.interface

metze

14 years agoserver: monitor interfaces in verify_ip_allocation()
Stefan Metzmacher [Tue, 22 Dec 2009 14:21:08 +0000 (15:21 +0100)]
server: monitor interfaces in verify_ip_allocation()

metze

14 years agoserver: only trigger one takeover run in verify_ip_allocation()
Stefan Metzmacher [Tue, 22 Dec 2009 14:21:08 +0000 (15:21 +0100)]
server: only trigger one takeover run in verify_ip_allocation()

metze

14 years agotools/ctdb: add PartiallyOnline state for "ctdb status" and "ctdb status -Y"
Stefan Metzmacher [Mon, 21 Dec 2009 12:30:45 +0000 (13:30 +0100)]
tools/ctdb: add PartiallyOnline state for "ctdb status" and "ctdb status -Y"

This is based on the GET_IFACES control against each node.

metze

14 years agotools/ctdb: display interfaces in "ctdb ip" and "ctdb ip -Y" outputs
Stefan Metzmacher [Sat, 16 Jan 2010 09:36:35 +0000 (10:36 +0100)]
tools/ctdb: display interfaces in "ctdb ip" and "ctdb ip -Y" outputs

metze

14 years agotests: add a all_ips_on_node() helper function that wraps ctdb ip -Y
Stefan Metzmacher [Sat, 16 Jan 2010 09:35:41 +0000 (10:35 +0100)]
tests: add a all_ips_on_node() helper function that wraps ctdb ip -Y

metze

14 years agotests/simple/11_ctdb_ip.sh: be more strict in checking ctdb ip -Y output
Stefan Metzmacher [Fri, 15 Jan 2010 09:53:14 +0000 (10:53 +0100)]
tests/simple/11_ctdb_ip.sh: be more strict in checking ctdb ip -Y output

metze

14 years agotools/ctdb: add "ctdb ipinfo <ip>"
Stefan Metzmacher [Thu, 17 Dec 2009 10:23:59 +0000 (11:23 +0100)]
tools/ctdb: add "ctdb ipinfo <ip>"

metze

14 years agotools/ctdb: add "ctdb setifacelink <iface> <status>"
Stefan Metzmacher [Wed, 16 Dec 2009 16:02:23 +0000 (17:02 +0100)]
tools/ctdb: add "ctdb setifacelink <iface> <status>"

metze

14 years agotools/ctdb: add "ctdb ifaces"
Stefan Metzmacher [Wed, 16 Dec 2009 15:50:23 +0000 (16:50 +0100)]
tools/ctdb: add "ctdb ifaces"

metze

14 years agoserver: implement ctdb_control_set_iface_link()
Stefan Metzmacher [Thu, 17 Dec 2009 09:30:36 +0000 (10:30 +0100)]
server: implement ctdb_control_set_iface_link()

This only marks the interface status and doesn't
generate any directly triggered action.

The actions is later taken by the recovery process
in verify_ip_allocation.

metze

14 years agoserver: implement ctdb_control_get_ifaces()
Stefan Metzmacher [Wed, 16 Dec 2009 10:14:44 +0000 (11:14 +0100)]
server: implement ctdb_control_get_ifaces()

metze

14 years agoserver: implement ctdb_control_get_public_ip_info()
Stefan Metzmacher [Wed, 16 Dec 2009 10:20:28 +0000 (11:20 +0100)]
server: implement ctdb_control_get_public_ip_info()

metze

14 years agoclient: implement ctdb_ctrl_set_iface_link()
Stefan Metzmacher [Wed, 16 Dec 2009 15:18:36 +0000 (16:18 +0100)]
client: implement ctdb_ctrl_set_iface_link()

metze

14 years agoclient: implement ctdb_ctrl_get_ifaces()
Stefan Metzmacher [Wed, 16 Dec 2009 14:30:07 +0000 (15:30 +0100)]
client: implement ctdb_ctrl_get_ifaces()

metze

14 years agoclient: implement ctdb_ctrl_get_public_ip_info()
Stefan Metzmacher [Wed, 16 Dec 2009 15:23:08 +0000 (16:23 +0100)]
client: implement ctdb_ctrl_get_public_ip_info()

metze

14 years agocontrols: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE
Stefan Metzmacher [Wed, 16 Dec 2009 13:40:21 +0000 (14:40 +0100)]
controls: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE

metze

14 years agoserver: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run
Stefan Metzmacher [Wed, 16 Dec 2009 15:09:40 +0000 (16:09 +0100)]
server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run

We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.

metze

14 years agoserver: implement CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE behavior
Stefan Metzmacher [Wed, 16 Dec 2009 15:08:45 +0000 (16:08 +0100)]
server: implement CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE behavior

metze

14 years agoclient: add CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE ctdb_ctrl_get_public_ips_flags()
Stefan Metzmacher [Wed, 16 Dec 2009 14:50:06 +0000 (15:50 +0100)]
client: add CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE ctdb_ctrl_get_public_ips_flags()

metze

14 years agoreserve upper bits in ctdb_control->flags for opcode specific flags
Stefan Metzmacher [Mon, 21 Dec 2009 11:10:18 +0000 (12:10 +0100)]
reserve upper bits in ctdb_control->flags for opcode specific flags

metze

14 years agoserver: keep the interface information in a list of ctdb_iface structures
Stefan Metzmacher [Wed, 16 Dec 2009 09:39:40 +0000 (10:39 +0100)]
server: keep the interface information in a list of ctdb_iface structures

metze

14 years agoserver: we don't need to copy strings we pass as talloc_asprintf() arguments
Stefan Metzmacher [Wed, 16 Dec 2009 08:48:21 +0000 (09:48 +0100)]
server: we don't need to copy strings we pass as talloc_asprintf() arguments

metze

14 years agoevents: 10.interfaces allow multiple interfaces per public address
Stefan Metzmacher [Mon, 21 Dec 2009 07:39:21 +0000 (08:39 +0100)]
events: 10.interfaces allow multiple interfaces per public address

metze

14 years agoserver: allow multiple interfaces comma separated in public_addresses
Stefan Metzmacher [Mon, 14 Dec 2009 17:52:06 +0000 (18:52 +0100)]
server: allow multiple interfaces comma separated in public_addresses

metze

14 years agoserver: add a ctdb_vnn_iface_string() helper function to access vnn->iface
Stefan Metzmacher [Wed, 16 Dec 2009 07:54:02 +0000 (08:54 +0100)]
server: add a ctdb_vnn_iface_string() helper function to access vnn->iface

metze

14 years agoserver: add a ctdb_set_single_public_ip() helper function
Stefan Metzmacher [Mon, 14 Dec 2009 18:33:35 +0000 (19:33 +0100)]
server: add a ctdb_set_single_public_ip() helper function

metze

14 years agoconfig: add 13.per_ip_routing event script
Stefan Metzmacher [Sat, 19 Dec 2009 17:26:01 +0000 (18:26 +0100)]
config: add 13.per_ip_routing event script

With this script it's possible to generate routing tables
per public ip address.

metze

14 years agoconfig: add some ipv4 helper shell functions
Stefan Metzmacher [Fri, 11 Dec 2009 18:56:36 +0000 (19:56 +0100)]
config: add some ipv4 helper shell functions

Many thanks to Michael Adam <obnox@samba.org>
for the basic work.

metze

14 years agoconfig: add interface_modify.sh and call it under flock to make modification on inter...
Stefan Metzmacher [Wed, 20 Jan 2010 10:10:48 +0000 (11:10 +0100)]
config: add interface_modify.sh and call it under flock to make modification on interfaces atomic

When two releaseip events run in parallel it's possible that the 2nd script
readds a secondary ip that was removed by the 1st script.

metze

14 years agoevents/10.interfaces: move some parts to helper functions
Stefan Metzmacher [Fri, 18 Dec 2009 10:08:22 +0000 (11:08 +0100)]
events/10.interfaces: move some parts to helper functions

metze

14 years agoconfig/functions: add tickle_tcp_connections()
Stefan Metzmacher [Fri, 18 Dec 2009 08:43:20 +0000 (09:43 +0100)]
config/functions: add tickle_tcp_connections()

metze

14 years agoserver: add "init" event
Stefan Metzmacher [Tue, 19 Jan 2010 09:07:14 +0000 (10:07 +0100)]
server: add "init" event

This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

14 years agoserver: setup fault handler to get the build-in backtrace support
Stefan Metzmacher [Thu, 7 Jan 2010 08:21:56 +0000 (09:21 +0100)]
server: setup fault handler to get the build-in backtrace support

The panic action feature will be added later.

metze

14 years agolib/util: add pre and post panic action hooks
Stefan Metzmacher [Tue, 12 Jan 2010 11:17:00 +0000 (12:17 +0100)]
lib/util: add pre and post panic action hooks

metze

14 years agolib/util: import fault/backtrace handling from samba.
Stefan Metzmacher [Fri, 18 Dec 2009 11:32:38 +0000 (12:32 +0100)]
lib/util: import fault/backtrace handling from samba.

metze

14 years agoconfigure: don't overwrite AC_CHECK_FUNC_EXT and AC_CHECK_LIB_EXT
Stefan Metzmacher [Fri, 18 Dec 2009 11:14:28 +0000 (12:14 +0100)]
configure: don't overwrite AC_CHECK_FUNC_EXT and AC_CHECK_LIB_EXT

This has curently no affect on the generated configure and config.h.in files.

metze

14 years agomove DEBUG* macros to one place
Stefan Metzmacher [Sat, 19 Dec 2009 10:40:06 +0000 (11:40 +0100)]
move DEBUG* macros to one place

metze

14 years agotools/ctdb: display INACTIVE status in "ctdb status" and "ctdb status -Y"
Stefan Metzmacher [Mon, 21 Dec 2009 12:34:21 +0000 (13:34 +0100)]
tools/ctdb: display INACTIVE status in "ctdb status" and "ctdb status -Y"

metze

14 years agoserver: add missing goto again after do_recovery()
Stefan Metzmacher [Tue, 19 Jan 2010 07:38:53 +0000 (08:38 +0100)]
server: add missing goto again after do_recovery()

metze

14 years agolib/events: finish "Run only one event for each epoll_wait/select call"
Stefan Metzmacher [Mon, 18 Jan 2010 12:19:29 +0000 (13:19 +0100)]
lib/events: finish "Run only one event for each epoll_wait/select call"

This finished commit a78b8ea7168e5fdb2d62379ad3112008b2748576.

The logic was missing in events_standard (the one that's used by default).

metze

14 years agosource the nfs sysconfig file from the 61.nfstickles script
Ronnie Sahlberg [Tue, 19 Jan 2010 23:35:02 +0000 (10:35 +1100)]
source the nfs sysconfig file from the 61.nfstickles script

14 years agodocument the in-memory ringbuffer for logging and the commands
Ronnie Sahlberg [Fri, 15 Jan 2010 05:01:51 +0000 (16:01 +1100)]
document the in-memory ringbuffer for logging and the commands
used to set it up and manage it.

14 years agoMake the size of the in memory ringbuffer for keeping the recent log messages
Ronnie Sahlberg [Fri, 15 Jan 2010 04:38:56 +0000 (15:38 +1100)]
Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.

Add an entry in the sysconfig file to set this persistently.

14 years agonew version 1.0.113
Ronnie Sahlberg [Tue, 12 Jan 2010 20:12:08 +0000 (07:12 +1100)]
new version 1.0.113

14 years agoMerge commit 'metze/master-for-ronnie'
Ronnie Sahlberg [Tue, 12 Jan 2010 20:01:40 +0000 (07:01 +1100)]
Merge commit 'metze/master-for-ronnie'

14 years agoserver: call event_add_fd at the end of ctdb_set_child_logging()
Stefan Metzmacher [Thu, 7 Jan 2010 12:29:09 +0000 (13:29 +0100)]
server: call event_add_fd at the end of ctdb_set_child_logging()

metze

14 years agoctdb_logging: simplify ctdb_fork_with_logging a lot and reduce the syscall usage
Stefan Metzmacher [Thu, 7 Jan 2010 12:47:46 +0000 (13:47 +0100)]
ctdb_logging: simplify ctdb_fork_with_logging a lot and reduce the syscall usage

metze

14 years agoNew version 1.0.112.
Martin Schwenke [Tue, 12 Jan 2010 10:07:45 +0000 (21:07 +1100)]
New version 1.0.112.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoRevert "Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way...
Martin Schwenke [Tue, 12 Jan 2010 10:02:44 +0000 (21:02 +1100)]
Revert "Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state."

This reverts commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6.

wbinfo --ping-dc is proving too unreliable.

14 years agoRevert "events/50.samba: only use wbinfo --ping-dc if available"
Martin Schwenke [Tue, 12 Jan 2010 10:02:11 +0000 (21:02 +1100)]
Revert "events/50.samba: only use wbinfo --ping-dc if available"

This reverts commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b.

wbinfo --ping-dc is proving too unreliable.

14 years agoMerge commit 'origin/master'
Martin Schwenke [Thu, 7 Jan 2010 01:46:26 +0000 (12:46 +1100)]
Merge commit 'origin/master'

14 years agoNew version 1.0.111
Ronnie Sahlberg [Fri, 18 Dec 2009 04:16:04 +0000 (15:16 +1100)]
New version 1.0.111

14 years agoeventscript: fix bug when script is aborted
Rusty Russell [Fri, 18 Dec 2009 03:43:09 +0000 (14:13 +1030)]
eventscript: fix bug when script is aborted

Another corner case when we terminate running monitor scripts to run
something else: logging can flush the output and we write to a NULL
pointer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoeventscript: remove cb_status, fix uninitialized bug when monitoring aborted
Rusty Russell [Fri, 18 Dec 2009 03:24:40 +0000 (13:54 +1030)]
eventscript: remove cb_status, fix uninitialized bug when monitoring aborted

(Reapplied with merge after accidental revert)

Previously we updated cb_status a each script finished.  Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.

In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized.  In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't.  So we use the last status (normally,
this will be the just-saved current status).

In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoMerge commit 'origin/master'
Martin Schwenke [Fri, 18 Dec 2009 03:44:25 +0000 (14:44 +1100)]
Merge commit 'origin/master'

14 years agoTest suite: Add an optimisation in the getvar test.
Martin Schwenke [Fri, 18 Dec 2009 03:43:45 +0000 (14:43 +1100)]
Test suite: Add an optimisation in the getvar test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoTest suite: allow settign of timeout triggers for all events not just monitor.
Martin Schwenke [Fri, 18 Dec 2009 03:42:58 +0000 (14:42 +1100)]
Test suite: allow settign of timeout triggers for all events not just monitor.

Signed-off-by: Martin Schwenke <martin@meltin.net>
14 years agoVersion 1.0.110
Ronnie Sahlberg [Fri, 18 Dec 2009 01:32:58 +0000 (12:32 +1100)]
Version 1.0.110

14 years agoeventscript: fix cleanup path when setting up script list
Rusty Russell [Fri, 18 Dec 2009 01:24:24 +0000 (11:54 +1030)]
eventscript: fix cleanup path when setting up script list

We shouldn't set ctdb->current_monitor until we set destructor: that's
what cleans it up.

Also, free state->scripts on no-scripts exit path: it's not a child of
state because we need it in the destructor.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoserver: add set_close_on_exec() on more fds
Stefan Metzmacher [Thu, 17 Dec 2009 12:04:27 +0000 (13:04 +0100)]
server: add set_close_on_exec() on more fds

metze

14 years agoserver: fix fd leaks in the new logging code
Stefan Metzmacher [Thu, 17 Dec 2009 12:03:42 +0000 (13:03 +0100)]
server: fix fd leaks in the new logging code

metze

14 years agoversion 1.0.109
Ronnie Sahlberg [Thu, 17 Dec 2009 04:49:01 +0000 (15:49 +1100)]
version 1.0.109

14 years agoeventscript: remove cb_status, fix uninitialized bug when monitoring aborted
Rusty Russell [Thu, 17 Dec 2009 04:08:15 +0000 (14:38 +1030)]
eventscript: remove cb_status, fix uninitialized bug when monitoring aborted

Previously we updated cb_status a each script finished.  Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.

In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized.  In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't.  So we use the last status (normally,
this will be the just-saved current status).

In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agofix a conflict in the merge from rusty
Ronnie Sahlberg [Wed, 16 Dec 2009 21:18:04 +0000 (08:18 +1100)]
fix a conflict in the merge from rusty

Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

server/ctdb_vacuum.c

14 years agoctdb: use mlockall, cautiously
Rusty Russell [Wed, 16 Dec 2009 10:27:20 +0000 (20:57 +1030)]
ctdb: use mlockall, cautiously

We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays.  But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.

This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB.  We warn, but don't exit, if it fails.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoRemove RT priority, use niceness.
Rusty Russell [Wed, 16 Dec 2009 08:56:22 +0000 (19:26 +1030)]
Remove RT priority, use niceness.

1) It's buggy.  Code needs to be carefully written (ie. no busy
   loops) to handle running with it, and we fork and run scripts.[1]

2) It makes debugging harder.  If ctdbd loops (as has happened recently)
   it can be extremely hard to get in and see what's happening.  We've already
   seen the valgrind hacks.

3) We have seen recent scheduler problems.  Perhaps they are unrelated,
   but removing this very unusual setup is unlikely to hurt.

4) It doesn't make anything faster.  Under all but the most perverse of
   circumstances, 99% of the cpu gives the same performance as 100%, and
   we will always preempt normal processes anyway.

[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
    each script" by removing the switch_from_server_to_client() which
    restored it, but even that was only for monitor scripts.  Others were
    run with RT priority.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agoAdd --valgringing flag instead of --nosetsched
Rusty Russell [Wed, 16 Dec 2009 10:29:15 +0000 (20:59 +1030)]
Add --valgringing flag instead of --nosetsched

The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
14 years agofix conflict in merge from metze
Ronnie Sahlberg [Wed, 16 Dec 2009 07:34:40 +0000 (18:34 +1100)]
fix conflict in merge from metze

Merge commit 'metze/master-tdb-check'

Conflicts:

server/ctdb_vacuum.c

14 years agoctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls
Stefan Metzmacher [Fri, 20 Nov 2009 20:17:59 +0000 (21:17 +0100)]
ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls

metze

Signed-off-by: Stefan Metzmacher <metze@samba.org>
14 years agodoc: regenerate manpages
Stefan Metzmacher [Mon, 7 Dec 2009 12:02:59 +0000 (13:02 +0100)]
doc: regenerate manpages

metze