Martin Schwenke [Thu, 18 Nov 2010 00:04:52 +0000 (11:04 +1100)]
50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.
This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.
An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().
To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Wed, 17 Nov 2010 02:50:56 +0000 (13:50 +1100)]
add a new support function ctdb_check_counter_equal()
update nfs to try to restart the service after 10 consecutive failures
and to flag the node unhealthy after 15
add similar function to mountd
Martin Schwenke [Tue, 31 Aug 2010 07:40:40 +0000 (17:40 +1000)]
Eventscripts: make loadconfig() function hookable by the test suite.
Rename loadconfig() to _loadconfig(). Add a new loadconfig() that
simply calls _loadconfig().
This makes it easy for the test suite to override loadconfig().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Nov 2010 08:42:31 +0000 (19:42 +1100)]
Make a time comparison in 60.nfs eventscript more readable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 16 Nov 2010 08:31:18 +0000 (19:31 +1100)]
60.nfs only fails or warns after 10 consecutive nfsd/statd failures.
These failures are sometimes the result of slow restarts so we want to
avoid dirtying the logs or marking a node unhealthy because of them,
unless they are excessive.
For these 2 cases we use the existing fail counting code but hack a
temporary service_name in a subshell to allow separate fail counts.
We also update ctdb_check_rpc() so that it captures the error output
from rpcinfo and we add a message including the service name to the
beginning. The error is printed to stdout but is also stored in
ctdb_check_rpc_out to allow it to be conditionally used by the caller.
This function also now returns non-zero rather than exiting on
failure.
Other direct rpcinfo calls are relaced by called to ctdb_check_rpc()
for consistency.
Option handling code for service restarts is cleaned up so that fits
in 80 columns. A more informative restart messageis now used in all
cases, printing the exact command being used to start a service.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Tue, 12 Oct 2010 00:10:38 +0000 (11:10 +1100)]
Test suite: fix typo in ctdb ping test grep pattern.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Wed, 6 Oct 2010 05:32:22 +0000 (16:32 +1100)]
Test suite: match changed output for ctdb ping to disconnected node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Martin Schwenke [Fri, 15 Oct 2010 04:09:08 +0000 (15:09 +1100)]
Test suite: make statistics test cope with changes to statistics output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Ronnie Sahlberg [Mon, 15 Nov 2010 05:47:22 +0000 (16:47 +1100)]
New version 1.2.10
* Mon Nov 15 2010 : Version 1.2.10
- Make sure to initialize the statistics start time to current time
instead of leaving it to point to start of epoch.
CQ : S1020838
- Create a new tunable DisableIPFailover that is used to tell ctdb
to not check any ip allocation at all and never do any failover
This can be used to stop/restart individual nodes without causing
any ip failovers to happen.
Ronnie Sahlberg [Mon, 15 Nov 2010 05:30:44 +0000 (16:30 +1100)]
initialize the statistics to the current time, not start of epoch
this makes "ctdb statistics" show correct "start of starts collection"
Ronnie Sahlberg [Wed, 10 Nov 2010 03:47:28 +0000 (14:47 +1100)]
Dont exit the update ip function if the old and new interfaces are the same
since if they are the same for whatever reason this triggers the system
to go into an infinite loop and is unrobust
The scriptds have been changed instead to be able to cope with this
situation for enhanced robustness
During takeover_run and when merging all ip allocations across the cluster
try to kepe track of when and which node currently hosts an ip address
so that we avoid extra ip failovers between nodes
Ronnie Sahlberg [Wed, 10 Nov 2010 03:46:45 +0000 (14:46 +1100)]
change the takeover script timeout to 9 seconds from 5
Ronnie Sahlberg [Wed, 10 Nov 2010 03:46:05 +0000 (14:46 +1100)]
Dont check remote ip allocation if public ip mgmt is disabled
Ronnie Sahlberg [Wed, 10 Nov 2010 03:45:43 +0000 (14:45 +1100)]
this stuff is just so fragile that it will enter infinite recovery and fail loops
on any kind of tiny unexpected error
unconditionally try to remove ip addresses from both old and new interface
before trying to add it to the new interface to make it less
fragile
Ronnie Sahlberg [Wed, 10 Nov 2010 03:40:43 +0000 (14:40 +1100)]
delete from old interface before adding to new interface
this stops the script from failing with an error if
both interfaces are specified as the same, which otherwise breaks and leads to an infinite recovery loop
Ronnie Sahlberg [Wed, 10 Nov 2010 01:59:25 +0000 (12:59 +1100)]
delay loading the public ip address file until after we have started the transport and discovered ouw own pnn number
Ronnie Sahlberg [Wed, 10 Nov 2010 01:11:11 +0000 (12:11 +1100)]
when we load the public address file, at the same time check if we are already hosting the public address, if so, set ourselves up as the pnn for that address
Ronnie Sahlberg [Wed, 10 Nov 2010 01:06:05 +0000 (12:06 +1100)]
dont check the public ip assignment or if even we are hosting them and shouldnt
when public ips have been disabled
Ronnie Sahlberg [Tue, 9 Nov 2010 04:19:06 +0000 (15:19 +1100)]
Add a new tunable : DisableIPFailover that when set to non 0
will stopp any ip reallocations at all from happening.
Ronnie Sahlberg [Tue, 9 Nov 2010 01:59:05 +0000 (12:59 +1100)]
change the default for how long to waqit before dropping all ips to 120 seconds
Ronnie Sahlberg [Tue, 9 Nov 2010 01:56:02 +0000 (12:56 +1100)]
dont delete all ips from the system during the initial "init" event
leave any ips as they are and let the recovery daemon remove them as required
Ronnie Sahlberg [Tue, 9 Nov 2010 01:55:20 +0000 (12:55 +1100)]
when creating/adding a public ip, set the initial interface to be the first interface specified
Ronnie Sahlberg [Tue, 2 Nov 2010 09:11:09 +0000 (20:11 +1100)]
New version 1.2.9
* Tue Nov 2 2010 : Version 1.2.9
- Drop loglevels on several items and remove spam from the messages file
- Both nfs and nfslock can fail so restart both if there is a problem
Ronnie Sahlberg [Thu, 28 Oct 2010 02:43:57 +0000 (13:43 +1100)]
Both nfs and nfslock scripts can fail under redhat in very rare situations.
Ctdb can also be configured to ignore checking for knfsd and if it is alive.
In that situation, no attempt will be made to restart nfs, and sicne nfs is not running, lockd can not be restarted either.
To workaround this, everytime we try to restart the lockmanager, also try to restart nfsd
Ronnie Sahlberg [Thu, 28 Oct 2010 02:38:34 +0000 (13:38 +1100)]
during shutdown there is a window after we have stopped TCP and disconnected from all other nodes but before we have stopped all processing.
During this window we may still hit asynchronous events that will fail because we can not send/receive packets from other nodes.
These messages are logged as ... Transport is DOWN. To help indicate that they are benign messages related to the process of shutting down.
These messages spam the syslog during normal shutdown, so this patch will drop the loglevel of these messages to DEBUG, so that they will not appear in or spam the syslog.
Ronnie Sahlberg [Thu, 28 Oct 2010 02:36:24 +0000 (13:36 +1100)]
When shuttind down, we always unconditionally try to remove the natgw address
even if we are not currently the natgw master.
This adds extra reliability in case we have stopped previously without removing it proper,
but does add spam messages to syslog everytime we shutdowm.
Remove these spam messages from pulluting the syslog upon normal shutdown
Ronnie Sahlberg [Thu, 28 Oct 2010 02:34:33 +0000 (13:34 +1100)]
Redirect the output from 00.ctdb pfetch to stdout.
Normally, the config.tdb database would not exist, so we do not need
to spam syslog with a "config.tdb does not exist" message every time we start ctdb
Ronnie Sahlberg [Thu, 28 Oct 2010 02:32:29 +0000 (13:32 +1100)]
Drop the loglevel of the "reqid wrap" developer debug message to DEBUG
so that we dont spam the logs with this normal benign message.
Ronnie Sahlberg [Mon, 25 Oct 2010 08:49:19 +0000 (19:49 +1100)]
new version 1.2.8
Ronnie Sahlberg [Mon, 25 Oct 2010 00:31:12 +0000 (11:31 +1100)]
Add support to create TDB databases using the new jenkins hash.
SRVID for the control to attach to a database is used to pass
tdb flags from samba to ctdb when samba attached to a database.
This has been used earlier for TDB_NOSYNC flag.
Add TDB_INCOMPATIBLE_HASH as a supported tdb flag to store in the
SRVID field when attaching to a database.
This allows samba to control if ctdb should create databases using the
new jenkins hash, or using the old hash.
This only affects new databases when they are initially created.
Existing databases remain using the old hash when attached to.
Ronnie Sahlberg [Mon, 18 Oct 2010 04:58:03 +0000 (15:58 +1100)]
New version 1.2.7
- Dont monitor GPFS filesystems in 62.cnfs
- If tdb_open() fails, print errno to make troubleshooting easier
- Try restarting RPC.LOCKD if it failed to start
- Remove a dbug message
- Make sure the statd state directory exists before trying to touch files in
Ronnie Sahlberg [Mon, 18 Oct 2010 00:57:38 +0000 (11:57 +1100)]
remove checking for filesystems and filesystem health from the cnfs script.
remove the gpfsmount and gpfsumount entry points
Ronnie Sahlberg [Wed, 13 Oct 2010 22:49:23 +0000 (09:49 +1100)]
If tdb_open() fails when trying to open the vacuuming database,
print errno so we get some idea of why this failed.
Ronnie Sahlberg [Wed, 13 Oct 2010 21:12:41 +0000 (08:12 +1100)]
try to restart NFS LOCKD if it failed to start
Ronnie Sahlberg [Tue, 12 Oct 2010 22:21:09 +0000 (09:21 +1100)]
Remove a debug message "Timed out waiting ..."
from the ctdb command.
This is a debugging message and is normal tro tigger on a busy system.
It should not be logged as ERROR.
Ronnie Sahlberg [Mon, 11 Oct 2010 21:02:18 +0000 (08:02 +1100)]
Make sure the statd directory exist before trying to access the
"update trigger" file.
CQ
1020344
Ronnie Sahlberg [Mon, 11 Oct 2010 15:54:12 +0000 (02:54 +1100)]
New version 1.2.6
* Tue Oct 12 2010 : Version 1.2.6
- Move config.tdb handling into a function in 00.ctdb
- Latency counters min/max/avg for all latency statistics
- Update default hash size to be 100001
- Check all bond devices, dont exit after the first one
- Change to useing the Jenkins hash for LMASTER selection
- Sync with TDB from upstream samba
- idtree fix for AIX
- remove some log messages
- add rolling statistics
- libctdb updates
Ronnie Sahlberg [Mon, 11 Oct 2010 15:49:11 +0000 (02:49 +1100)]
move extracting the config from config.tdb for public addresses
into its own function
Ronnie Sahlberg [Mon, 11 Oct 2010 04:11:18 +0000 (15:11 +1100)]
Update latency countes to show min/max and average
Ronnie Sahlberg [Sun, 10 Oct 2010 20:09:18 +0000 (07:09 +1100)]
Update the default hash size to be 100001 instead of 10000
This can sometimes improve performance for environments where very many
files are touched in rapid succession
Ronnie Sahlberg [Fri, 8 Oct 2010 23:54:12 +0000 (10:54 +1100)]
dont stop checking interfaces after the first bond device
continue the loop to process all other interfaces too
Ronnie Sahlberg [Fri, 8 Oct 2010 04:51:44 +0000 (15:51 +1100)]
Spotted by rusty.
Add a missing $
so we delete $_ip and not _ip
Ronnie Sahlberg [Fri, 8 Oct 2010 02:14:14 +0000 (13:14 +1100)]
change the hash function to use the much better Jenkins hash
from the tdb library
cq S1020233
Jelmer Vernooij [Mon, 4 Oct 2010 11:17:25 +0000 (13:17 +0200)]
pytdb: Add __version__ attribute.
Jelmer Vernooij [Sat, 2 Oct 2010 21:40:19 +0000 (23:40 +0200)]
pytdb: Include Python.h first to prevent warning.
Kirill Smelkov [Sat, 2 Oct 2010 13:43:50 +0000 (17:43 +0400)]
pytdb: Check errors after PyObject_New() calls
The call could fail with e.g. MemoryError, and we'll dereference NULL
pointer without checking.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sat, 2 Oct 2010 13:43:46 +0000 (17:43 +0400)]
pytdb: Add support for tdb_repack()
Cc: 597386@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sat, 2 Oct 2010 13:43:40 +0000 (17:43 +0400)]
pytdb: Add TDB_INCOMPATIBLE_HASH open flag
In 2dcf76 Rusty added TDB_INCOMPATIBLE_HASH open flag which selects
Jenkins lookup3 hash for new databases.
Expose this flag to python users too.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Rusty Russell [Mon, 27 Sep 2010 01:36:51 +0000 (11:06 +0930)]
tdb: fix non-WAF build, commit 1.2.6 ABI file.
Sorry Jeremy.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 24 Sep 2010 06:15:11 +0000 (15:45 +0930)]
tdb: TDB_INCOMPATIBLE_HASH, to allow safe changing of default hash.
This flag to tdb_open/tdb_open_ex effects creation of a new database:
1) Uses the Jenkins lookup3 hash instead of the old gdbm hash if none is
specified,
2) Places a non-zero field in header->rwlocks, so older versions of TDB will
refuse to open it.
This means that the caller (ie Samba) can set this flag to safely
change the hash function. Versions of TDB from this one on will either
use the correct hash or refuse to open (if a different hash is specified).
Older TDB versions will see the nonzero rwlocks field and refuse to open
it under any conditions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 24 Sep 2010 06:09:43 +0000 (15:39 +0930)]
tdb: automatically identify Jenkins hash tdbs
If the caller to tdb_open_ex() doesn't specify a hash, and tdb_old_hash
doesn't match, try tdb_jenkins_hash.
This was Metze's idea: it makes life simpler, especially with the upcoming
TDB_INCOMPATIBLE_HASH flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Fri, 24 Sep 2010 06:04:06 +0000 (15:34 +0930)]
tdb: add Bob Jenkins lookup3 hash as helper hash.
This is a better hash than the default: shipping it with tdb makes it easy
for callers to use it as the hash by passing it to tdb_open_ex().
This version taken from CCAN and modified, which took it from
http://www.burtleburtle.net/bob/c/lookup3.c.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Volker Lendecke [Sat, 18 Sep 2010 06:56:10 +0000 (10:56 +0400)]
tdb: add restore
Based on an idea by Simon McVittie, largely rewritten
Günther Deschner [Mon, 20 Sep 2010 23:01:51 +0000 (16:01 -0700)]
lib/tdb: fix c++ build warning in tdb_header_hash().
Guenther
Jelmer Vernooij [Sun, 19 Sep 2010 17:42:29 +0000 (10:42 -0700)]
pytdb: Make filename argument optional.
Kirill Smelkov [Sun, 19 Sep 2010 09:53:29 +0000 (13:53 +0400)]
pytdb: Add support for tdb_freelist_size()
Cc: 597386@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sun, 19 Sep 2010 09:53:32 +0000 (13:53 +0400)]
pytdb: Add support for tdb_transaction_prepare_commit()
Cc: 597386@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sun, 19 Sep 2010 16:34:33 +0000 (09:34 -0700)]
pytdb: Add support for tdb_enable_seqnum, tdb_get_seqnum and tdb_increment_seqnum_nonblock
Cc: 597386@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sun, 19 Sep 2010 09:53:19 +0000 (13:53 +0400)]
pytdb: Update open flags to match those for tdb_open() in tdb.h
Namely TDB_NOSYNC, TDB_SEQNUM, TDB_VOLATILE, TDB_ALLOW_NESTING and
TDB_DISALLOW_NESTING were missing.
Cc: 597386@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sun, 19 Sep 2010 09:53:21 +0000 (13:53 +0400)]
pytdb: Fix repr segfault for internal db
The problem was tdb->name is NULL for TDB_INTERNAL databases, and
so it was crashing ...
#0 0xb76944f3 in strlen () from /lib/i686/cmov/libc.so.6
#1 0x0809862b in PyString_FromFormatV (format=0xb72b6a26 "Tdb('%s')", vargs=0xbfc26a94 "")
at ../Objects/stringobject.c:211
#2 0x08098888 in PyString_FromFormat (format=0xb72b6a26 "Tdb('%s')") at ../Objects/stringobject.c:358
#3 0xb72b65f2 in tdb_object_repr (self=0xb759e060) at ./pytdb.c:439
Cc: 597089@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Kirill Smelkov [Sun, 19 Sep 2010 09:53:20 +0000 (13:53 +0400)]
pytdb: Add support for tdb_add_flags() & tdb_remove_flags()
Note, unlike tdb_open where flags is `int', tdb_{add,remove}_flags want
flags as `unsigned', so instead of "i" I used "I" in PyArg_ParseTuple.
Cc: 597386@bugs.debian.org
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Jelmer Vernooij <jelmer@samba.org>
Andrew Tridgell [Thu, 16 Sep 2010 10:06:44 +0000 (20:06 +1000)]
tdb: added TDB_NO_FSYNC env variable
this might help reduce test times and load on test machines
Rusty Russell [Thu, 7 Oct 2010 04:37:22 +0000 (15:07 +1030)]
tdb: increment version to 1.2.4
Rusty Russell [Mon, 13 Sep 2010 10:35:59 +0000 (20:05 +0930)]
tdb: put example hashes into header, so we notice incorrect hash_fn.
This is Stefan Metzmacher <metze@samba.org>'s patch with minor changes:
1) Use the TDB_MAGIC constant so both hashes aren't of strings.
2) Check the hash in tdb_check (paranoia, really).
3) Additional check in the (unlikely!) case where both examples hash to 0.
4) Cosmetic changes to var names and complaint message.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Mon, 13 Sep 2010 10:29:18 +0000 (19:59 +0930)]
tdb: fix tdb_check() on other-endian tdbs.
We must not endian-convert the magic string, just the rest.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Mon, 13 Sep 2010 10:28:23 +0000 (19:58 +0930)]
tdb: fix tdb_check() on read-only TDBs to actually work.
Commit
bc1c82ea137 "Fix tdb_check() to work with read-only tdb databases."
claimed to do this, but tdb_lockall_read() fails on read-only databases.
Also make sure we can still do tdb_check() inside a transaction (weird,
but we previously allowed it so don't break the API).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Mon, 13 Sep 2010 10:25:26 +0000 (19:55 +0930)]
tdb: make check more robust against recovery failures.
We can end up with dead areas when we die during transaction commit;
tdb_check() fails on such a (valid) database.
This is particularly noticable now we no longer truncate on recovery;
if the recovery area was at the end of the file we used to remove it
that way.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 5 Oct 2010 02:36:19 +0000 (13:06 +1030)]
idtree: fix right shift of signed ints, crash on large ids on AIX
Right-shifting signed integers in undefined; indeed it seems that on
AIX with their compiler, doing a 30-bit shift on (INT_MAX-200) gives
0, not 1 as we might expect.
The obvious fix is to make id and oid unsigned: l (level count) is also
logically unsigned.
(Note: Samba doesn't generally get to ids > 1 billion, but ctdb does)
Reported-by: Chris Cowan <cc@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-User: Rusty Russell <rusty@samba.org>
Autobuild-Date: Wed Oct 6 08:31:09 UTC 2010 on sn-devel-104
Ronnie Sahlberg [Thu, 7 Oct 2010 05:18:27 +0000 (16:18 +1100)]
get rid of the "ctdb setflags" command since
1, we dont need it
2, it uses the ugly "modify flags" control that should die
Ronnie Sahlberg [Thu, 7 Oct 2010 03:39:07 +0000 (14:39 +1100)]
Dont log a normal vacuuming message about a missing record and using default vacuuming intervals as an error.
This is normal for a new system until the vacuuming has been initialized.
Ronnie Sahlberg [Thu, 30 Sep 2010 05:07:30 +0000 (15:07 +1000)]
when printing machinereadable statistics only print the header with the fieldnames once
Ronnie Sahlberg [Thu, 30 Sep 2010 04:59:59 +0000 (14:59 +1000)]
add a machinereadable version of ctdb stats/statistics
Ronnie Sahlberg [Thu, 30 Sep 2010 04:39:54 +0000 (14:39 +1000)]
Create a tunable for how often to collect rolling statistics and initialize it to 1 second
Ronnie Sahlberg [Wed, 29 Sep 2010 02:13:05 +0000 (12:13 +1000)]
Add rolling statistics that are collected across 10 second intervals.
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.
Ronnie Sahlberg [Wed, 29 Sep 2010 00:58:18 +0000 (10:58 +1000)]
Add a new statistics structure to keep the current running statistics
Ronnie Sahlberg [Wed, 29 Sep 2010 00:38:41 +0000 (10:38 +1000)]
Create macros to update the statistics counters and use these macros
everywhere instead of manipulating the coutenrs directly.
Ronnie Sahlberg [Mon, 27 Sep 2010 22:58:03 +0000 (08:58 +1000)]
Add back monitoring for time skips, forward as well as backward.
This serviceability tool was lost during the migration from the old eventsystem to the tevent system.
Ronnie Sahlberg [Mon, 27 Sep 2010 22:46:12 +0000 (08:46 +1000)]
update/improve the log message related to rerecovery timeouts
Ronnie Sahlberg [Wed, 22 Sep 2010 00:59:01 +0000 (10:59 +1000)]
set up a handler to catch and log debug messages from the tevent layer
Ronnie Sahlberg [Wed, 15 Sep 2010 04:56:57 +0000 (14:56 +1000)]
adda GETPUBLICIPS control to libctdb and use this in the test example
enhance the test example to show the new releaseip/takeip messages
Ronnie Sahlberg [Mon, 13 Sep 2010 05:42:00 +0000 (15:42 +1000)]
add a new serverid to send a message everytime an ip address is taken on the local node
Ronnie Sahlberg [Mon, 13 Sep 2010 05:08:36 +0000 (15:08 +1000)]
Update the comment for the range reserved for SAMBA and
define a new symbol to represent this range similarly to NFSD and ISCSID
Keep the old symbol name to be backward compatible with software using
these headers.
Ronnie Sahlberg [Mon, 13 Sep 2010 05:06:43 +0000 (15:06 +1000)]
define and reserve a range of ctdb message ports for use by nfs and iscsi servers
Ronnie Sahlberg [Mon, 13 Sep 2010 05:01:47 +0000 (15:01 +1000)]
Add two new server types to the server_id structure.
NFSD and ISCSID for now.
Ronnie Sahlberg [Mon, 13 Sep 2010 04:28:11 +0000 (14:28 +1000)]
Implement a new function GETNODEMAP in libctdb.
This function returns a pointer to a nodemap structure.
The returned structure must later be freed by calling ctdb_free_nodemap().
Move the definition of ctdb_sock_addr from ctdb_client.h to ctdb_protocol.h
Move the definition of the node flags, ctdb_node_and_flags and ctdb_node_map from ctdb_private.h to ctdb_protocol.h
Add both sync and async example for ctdb_getnodemap to the test application libctdb/tst.c
Ronnie Sahlberg [Mon, 13 Sep 2010 03:12:41 +0000 (13:12 +1000)]
remove an unused variable
Ronnie Sahlberg [Thu, 9 Sep 2010 06:34:00 +0000 (16:34 +1000)]
new version 1.2.5
- Suppress some VSFTPD warnings
- Make sure all STATD directories exist before we dereference them
- AIX socket fix
- Fix for a crash when we write a debug message after a memory allocation
fail. Fix the message and call ctdb_fatal() properly
- Move the state directory off /etc/ctdb and to /var/ctdb
- Natgw changes to allow "slave only" natgw members
- Fix "ctdb listnodes" so it works again.
Ronnie Sahlberg [Wed, 8 Sep 2010 21:35:10 +0000 (07:35 +1000)]
Dont try to read the nodemap from the daemon for "ctdb listnodes"
Always read it from the /etc/ctdb/nodes file
Ronnie Sahlberg [Tue, 7 Sep 2010 23:16:42 +0000 (09:16 +1000)]
Change how NATGW is configured to allow special nodes that do not have
network connectivity outside of the cluster to still be able to
participate in a natgw group.
These nodes can not become natgw master since they lack external network
connectivity.
These nodes are configured just the same way as for any other node with
NATGW, with the following two exceptions :
* we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes.
since these ndoes lack external network we should not check the interface
for link.
* we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that
can not become natgw master.
Ronnie Sahlberg [Fri, 3 Sep 2010 02:35:25 +0000 (12:35 +1000)]
Dont store temporary runtime data in $CTDB_BASE/state
since that will usually be /etc/ctdb/state and storing this under /etc is just
wrong.
Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead.
Ronnie Sahlberg [Fri, 3 Sep 2010 01:58:27 +0000 (11:58 +1000)]
When memory allocations for recovery fails,
dont dereference a null pointer while trying to print the log message for the failure.
also shutdown ctdb with ctdb_fatal()
Harald Klatte [Mon, 30 Aug 2010 08:40:43 +0000 (10:40 +0200)]
AIX bind wants the correct addrsize
Ronnie Sahlberg [Wed, 1 Sep 2010 05:48:55 +0000 (15:48 +1000)]
make sure all statd state directories exist before we try to reference them
or else tar and friends will throw an error in the log
Ronnie Sahlberg [Wed, 1 Sep 2010 03:28:25 +0000 (13:28 +1000)]
dont print a lot of log information about shutting down vsftpd
Ronnie Sahlberg [Mon, 30 Aug 2010 09:50:15 +0000 (19:50 +1000)]
new version 1.2.4
Ronnie Sahlberg [Mon, 30 Aug 2010 09:47:50 +0000 (19:47 +1000)]
ouch, remove a dummy debug printout that snuck in there somehow
Ronnie Sahlberg [Mon, 30 Aug 2010 09:42:30 +0000 (19:42 +1000)]
ouch, the ordering of the constants and the strings must be kept in sync
manually and ther eis no check for errors. should fix this later
Ronnie Sahlberg [Mon, 30 Aug 2010 08:37:54 +0000 (18:37 +1000)]
new version 1.2.3
Ronnie Sahlberg [Mon, 30 Aug 2010 08:29:18 +0000 (18:29 +1000)]
remove 61.nfstickles from the makefile
Ronnie Sahlberg [Mon, 30 Aug 2010 08:23:19 +0000 (18:23 +1000)]
we no longer have a 61.nfstickle script