Michael Adam [Fri, 30 Jan 2009 17:14:41 +0000 (18:14 +0100)]
ctdb_check_tcp_ports: correctly detect listeners on ipv6 :::<port> w/out netcat
The netstat test only grepped for the ipv4 wildcard address.
Now the ipv6 wildcard listener is correctly detected as well.
Michael
Michael Adam [Fri, 30 Jan 2009 15:41:37 +0000 (16:41 +0100)]
ctdb_check_tcp_ports: fail the check if neither netstat nor netcat/nc is found
Michael
Michael Adam [Fri, 30 Jan 2009 15:10:05 +0000 (16:10 +0100)]
ctdb_check_tcp_ports: cope with multiple locations of netcat or nc
This fixes tcp port monitor events on systems, where netcat or nc
is not found in /usr/bin/, Debian, for instance.
The patch also separates the process of finding the binaries and
calling them, moving the detection outside of the loop over the
ports list.
Michael
Michael Adam [Thu, 29 Jan 2009 12:22:02 +0000 (13:22 +0100)]
remove include <netinet/in.h> from public ctdb.h
This is not portable.
The ctdb build includes the necessary headers from includes.h.
And users of ctdb should cope with including the necessary
prerequisite headers themselves.
Michael
Michael Adam [Thu, 29 Jan 2009 10:46:04 +0000 (11:46 +0100)]
packaging: add a maketarball script
The script extracts the version number from the spec file.
It takes an extra argument, that can be appended to the
version in the tar ball name and directory prefix.
Michael
Michael Adam [Wed, 28 Jan 2009 16:40:24 +0000 (17:40 +0100)]
Fix the build on AIX: sys/socket.h needs to be included before ctdb.h
(for struct sockaddr to be defined)
Thanks to William Jojo <w.jojo@hvcc.edu> for reporting.
Michael
Michael Adam [Thu, 29 Jan 2009 09:22:02 +0000 (10:22 +0100)]
autoconf: Make sure the result of the mkdir_has_mode test gets cached.
This fixes the autoconf 2.63 warning
"suspicious cache-id, must contain _cv_ to be cached".
Thanks to William Jojo <w.jojo@hvcc.edu> for reporting.
Michael
Michael Adam [Tue, 27 Jan 2009 16:17:58 +0000 (17:17 +0100)]
events.d/41.httpd: fix a comment typo
Michael
Michael Adam [Mon, 19 Jan 2009 14:33:24 +0000 (15:33 +0100)]
Fix treatment of link local ipv6 addresses: set the scope id.
metze / Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 13:14:07 +0000 (14:14 +0100)]
ctdb_util: use the parse_ip() function - avoid code duplication
Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 18:08:37 +0000 (19:08 +0100)]
ctdb_sys_have_ip: fix ipv6 support for aix, too.
Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Stefan Metzmacher [Mon, 19 Jan 2009 12:24:09 +0000 (13:24 +0100)]
ctdb_sys_have_ip: don't overwrite input data (setting port to 0)
metze
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 11:02:18 +0000 (12:02 +0100)]
Fix verification of IP allocation with ipv6 addresses on Linux.
Set sin_port or sin6_port to 0, depending on sa_family.
Michael
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Mon, 19 Jan 2009 20:22:58 +0000 (21:22 +0100)]
events 50.samba: fix control of nmbd without separate nmb service script.
protect all potentially empty $CTDB_SERVICE_* script names
Michael
Michael Adam [Mon, 19 Jan 2009 13:46:30 +0000 (14:46 +0100)]
packaging(RPM): detect and use ccache if available
Michael
Michael Adam [Mon, 19 Jan 2009 08:42:48 +0000 (09:42 +0100)]
Makefile: remove extra "/" in paths
Michael
Michael Adam [Sat, 17 Jan 2009 15:18:02 +0000 (16:18 +0100)]
makerpms: fix detection of support for --rsyncable flag in gzip.
Michael
Michael Adam [Fri, 16 Jan 2009 13:01:37 +0000 (14:01 +0100)]
ctdb.init: fix typo
Michael
Michael Adam [Fri, 16 Jan 2009 12:33:13 +0000 (13:33 +0100)]
events 50.samba: also support suse and ubuntu/debain systems
for managing samba and winbind
This uses CTDB_INIT_STYLE as exported by ctdb.init.
suse systems usually have separate init scripts for
smb for smbd and nmb for nmbd, and the ubuntu/debian
start script for smbd and nmbd is called samba instead
of smb (on redhat).
Michael
Michael Adam [Fri, 16 Jan 2009 12:31:02 +0000 (13:31 +0100)]
funcions: make (nice_)service a noop for empty service name
Michael
Michael Adam [Fri, 16 Jan 2009 12:28:19 +0000 (13:28 +0100)]
ctdb.init: use detect_init_style() in the init script
and export CTDB_INIT_STYLE, so that event scripts
as called by ctdbd can use it.
Michael
Michael Adam [Fri, 16 Jan 2009 12:26:57 +0000 (13:26 +0100)]
functions: add detect_init_style().
Michael
root [Thu, 15 Jan 2009 23:13:53 +0000 (10:13 +1100)]
new version
Michael Adam [Fri, 19 Dec 2008 10:50:06 +0000 (11:50 +0100)]
ctdb.init: add $network to RequiredStop to match RequiredStart.
This is to make rpm checks (eg.g for SuSE systems) survive.
Michael
Andreas Schneider [Wed, 29 Oct 2008 13:12:04 +0000 (14:12 +0100)]
Fix circular dependency error with autoconf 2.6.3.
Signed-off-by: Andreas Schneider <anschneider@suse.de>
(cherry picked from commit
b39611c36bb904774fd4032bf2f8003fbdeb5d34)
Signed-off-by: Michael Adam <obnox@samba.org>
Michael Adam [Wed, 17 Dec 2008 15:01:49 +0000 (16:01 +0100)]
makerpms: fix creation of tarball when gzip does not know "--rsynceable"
--rsynceable is a patch to gzip that not all distributors / packagers
add to gzip. (It has just bitten me on openSUSE.) This path first detects
whether gzip knows about --rsynceable and then calls gzip accordingly.
Michael
Michael Adam [Wed, 17 Dec 2008 11:18:09 +0000 (12:18 +0100)]
ctdb.spec: fix version and (RPM-)release number.
Originally, 1.0 was the version in the spec file and 68 was the release.
But in fact everyone talked about ctdb version 1.0.68.
This puts this straight...
Michael
Michael Adam [Wed, 17 Dec 2008 11:17:15 +0000 (12:17 +0100)]
makerpms: confess
Michael
Michael Adam [Wed, 17 Dec 2008 11:15:34 +0000 (12:15 +0100)]
makerpms: don't hard-code the version number but extract it from ctdb.spec
Michael
Michael Adam [Wed, 17 Dec 2008 11:13:42 +0000 (12:13 +0100)]
makerpms: remove the need of calling makerpms.sh from the top level directory
Instead, extract needed information from the dirname of the invoked name.
Michael
Michael Adam [Wed, 17 Dec 2008 11:12:17 +0000 (12:12 +0100)]
makerpms: don't cd to $SPECDIR but rpmbuild -ba $SPECDIR/$SPECFILE instead
Michael
Michael Adam [Wed, 17 Dec 2008 11:09:13 +0000 (12:09 +0100)]
makerpms: catch error of git archive correctly (echo resets $?)
Michael
Michael Adam [Wed, 17 Dec 2008 11:06:25 +0000 (12:06 +0100)]
makerpms: move comment to appropriate place
Michael
Michael Adam [Wed, 17 Dec 2008 11:05:05 +0000 (12:05 +0100)]
makerpms: use variable (SPECFILE) that is available instead of hard coded file name
Michael
Michael Adam [Mon, 15 Dec 2008 23:30:55 +0000 (00:30 +0100)]
doc: join broken lines in excerpt from log.ctdb
Michael
Michael Adam [Mon, 15 Dec 2008 23:17:04 +0000 (00:17 +0100)]
ctdb.samba.org: fix instruction for turning off samba service autostart
Extend to show valid commands on Redhat and SuSE Linux.
Michael
Stefan Metzmacher [Thu, 15 Jan 2009 12:20:33 +0000 (13:20 +0100)]
Fix segfault in ip takeover fallback code.
metze
Signed-off-by: Michael Adam <obnox@samba.org>
root [Tue, 13 Jan 2009 05:17:20 +0000 (16:17 +1100)]
finish the ipv6 support.
allow clients to register either ipv4 or ipv6 client connections to the tickles list
Ronnie Sahlberg [Thu, 18 Dec 2008 03:31:28 +0000 (14:31 +1100)]
new version 1.0.69
root [Wed, 17 Dec 2008 03:26:01 +0000 (14:26 +1100)]
add better errorchecking that nodes we try to talk to using the "ctdb" tool actually exist and that it is connected.
two new dedicated ctdb error codes
21: node does not exist
22: node is disconnected
Ronnie Sahlberg [Wed, 17 Dec 2008 01:01:40 +0000 (12:01 +1100)]
dont call ctdb_fatal() just because we are asked to restart a connection
to a remote node and ctdb->methods is NULL.
This can happen when we are in the middle of a normal shutdown of the
daemon and we have already shut down the transport layer (thus setting
ctdb->methods == NULL in the transport layer destructor)
band there is some unprocessed data related to a remote node.
This prevents an ugly race condition where ctdb might sometimes (rare)
cause a core dump during "ctdb shutdown".
Michael Adam [Mon, 15 Dec 2008 17:21:37 +0000 (18:21 +0100)]
skip directories containing macros (%) in ctdb_check_directories_probe
This prevents the monitor action of 50.samba from failing
on e.g. a typical [homes] service with "path = /home/%S" .
Michael
Michael Adam [Sat, 5 Jul 2008 12:28:27 +0000 (14:28 +0200)]
ctdb.init: add Default-Start to init script to enable autostart.
Michael
Michael Adam [Fri, 12 Dec 2008 15:57:58 +0000 (16:57 +0100)]
ctdb.init: check availability of ctdb (with ping) before calling ctdb status
Michael
Michael Adam [Fri, 12 Dec 2008 15:00:07 +0000 (16:00 +0100)]
ctdb.init: behave correctly when calling "service ctdb stop" on stopped service
When "service ctdb stop" is called and the ctdbd is not running,
don't print the "Failed to connect to daemon" error messages.
But print a warning and exit with status success instead.
Michael
Michael Adam [Fri, 12 Dec 2008 15:05:04 +0000 (16:05 +0100)]
ctdb.init: fix return code of "service ctdb stop" on non-redhat systems
Michael
Michael Adam [Fri, 12 Dec 2008 15:04:29 +0000 (16:04 +0100)]
ctdb.init: fix status message of "service ctdb stop" on suse systems
Michael
Michael Adam [Sat, 5 Jul 2008 12:42:46 +0000 (14:42 +0200)]
packaging: set docdir in calls to make (to get it right on e.g. SuSE systems).
Currently docdir = /usr/share/doc is hardcoded in the Makefile.in.
Some systems use a different doc dir (SuSE uses /usr/share/doc/packages).
And not all versions of autoconf provide the --docdir parameter
(2.61 does, while 2.59 does not). So we use the quick solution
to specify "docdir=%{_docdir}" in the make calls in the spec file.
Michael
Ronnie Sahlberg [Thu, 11 Dec 2008 22:39:55 +0000 (09:39 +1100)]
New version 1.0.68
Michael Adam [Wed, 10 Dec 2008 21:27:36 +0000 (22:27 +0100)]
Improve the monitor event test for ethernet interfaces (link detection).
On some systems, the ethtool link detection is not successful when a
cable is plugged but the interface has not been brought up previously.
This improves the test by bringing the interface up (without checking
for success here) and trying the ethtool test again afterwards.
Michael
Michael Adam [Wed, 10 Dec 2008 21:19:31 +0000 (22:19 +0100)]
Use "grep -q" instead of "grep ... > /dev/null" in events.d/10.interfaces
This enhances readability.
Michael
root [Wed, 10 Dec 2008 01:06:51 +0000 (12:06 +1100)]
update the "ctdb recover" command.
block and wait until the clustered has completed the recovery before returning.
this makes it easier to script since it avoids the common need for
ctdb recover
... complex loop to wait for recovery to complete ...
script continues
root [Wed, 10 Dec 2008 01:01:19 +0000 (12:01 +1100)]
add a CTDB_TIMEOUT variable for the ctdb tool.
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ... option would.
root [Wed, 10 Dec 2008 00:49:51 +0000 (11:49 +1100)]
make sure we return an errorcode when the ctdb command has hung and is timeodout by the -T <timeout> setting
root [Tue, 9 Dec 2008 01:03:42 +0000 (12:03 +1100)]
add a helper that waits until the clueter is no longe rin recovery mode and return the generation number.
change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.
also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns. this makes it much easier to script things ...
root [Mon, 8 Dec 2008 23:45:14 +0000 (10:45 +1100)]
update to the flags handling
make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node
root [Mon, 8 Dec 2008 06:29:17 +0000 (17:29 +1100)]
If ctdbd was started with the --socket option then we also set the CTDB_SOCKET variable so that the eventscripts can pick up the name proper
root [Mon, 8 Dec 2008 01:57:40 +0000 (12:57 +1100)]
return -1 if ctdb ping failed
root [Fri, 5 Dec 2008 05:32:30 +0000 (16:32 +1100)]
redo and update how we synchronize flags across the cluster.
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.
root [Thu, 4 Dec 2008 23:33:38 +0000 (10:33 +1100)]
some platforms are very picky about the third argument passed to bind().
and would complain if sa.family is AF_INET and the third argument is not exactly the size of a sockaddr_in.
We used to pass a union containing both a sockaddr_in and a sockaddr_in6 which would mean that on those platforms bind() would fail since the passed structure for AF_INET would be too big.
Thus we need to set and pass the appropriate size to bind. At the same time for thos eplatforms we can also set sin[6]_size to the expected size.
(bind() on those platforms were isurprisingly perfectly ok with sin_len was "too big")
Ronnie Sahlberg [Thu, 4 Dec 2008 04:25:03 +0000 (15:25 +1100)]
new version 1.0.67
root [Thu, 4 Dec 2008 04:03:40 +0000 (15:03 +1100)]
fix an incorrect path
Ronnie Sahlberg [Thu, 4 Dec 2008 03:35:00 +0000 (14:35 +1100)]
add a description of the recovery-process
Ronnie Sahlberg [Tue, 2 Dec 2008 03:08:10 +0000 (14:08 +1100)]
print the list of valid debug level literals when an invalid debug level
is specified in 'ctdb setdebug'
Ronnie Sahlberg [Tue, 2 Dec 2008 02:26:30 +0000 (13:26 +1100)]
redesign how reloadnodes is implemented.
modify the transport methods to allow to restart individual connections
and set up destructors properly.
only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.
make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded
root [Fri, 28 Nov 2008 00:29:43 +0000 (11:29 +1100)]
debuglevel is a signed int, not usnigned.
Ronnie Sahlberg [Thu, 27 Nov 2008 22:52:26 +0000 (09:52 +1100)]
make it possible to delete an ip from all nodes at once using
"ctdb delip x.x.x.x -n all"
This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.
Thus we first delete the ip from all connected nodes which are not
currently hosting it.
After this we delete the ip from the node which is hosting it.
Ronnie Sahlberg [Mon, 24 Nov 2008 08:06:02 +0000 (19:06 +1100)]
inew version 1.0.66
ddwq
Ronnie Sahlberg [Fri, 21 Nov 2008 05:24:12 +0000 (16:24 +1100)]
allow to change the recmaster even the database is not frozen
Ronnie Sahlberg [Fri, 21 Nov 2008 00:30:32 +0000 (11:30 +1100)]
remove two variables no longer used from the example sysconfig file
Andrew Tridgell [Thu, 20 Nov 2008 21:05:59 +0000 (08:05 +1100)]
fixed problem with looping ctdb recoveries
After a node failure, GPFS can get into a state where non-blocking
fcntl() locks can take a long time. This means to the ctdb set_recmode
test timing out, which leads to a recovery failure, and a new
recovery. The recovery loop can last a long time.
The fix is to consider a fcntl timeout as a success of this test. The
test is to see that we can't lock the shared reclock file, so a
timeout is fine for a success.
Andrew Tridgell [Thu, 20 Nov 2008 10:23:26 +0000 (21:23 +1100)]
Merge commit 'ronnie/master'
Ronnie Sahlberg [Thu, 20 Nov 2008 05:39:56 +0000 (16:39 +1100)]
dont override/change CTDB_BASE if it is already set by the shell
Ronnie Sahlberg [Thu, 20 Nov 2008 02:35:08 +0000 (13:35 +1100)]
Keepalive packets were only sent every KeepaliveInterval if the socket
had been completely idle during that interval.
If we had been sending other packets such as Messages, Calls or Controls
there wouldnt be any need for an explicit keepalive and thus we didnt
send one.
This does make it somewhat awkward when analyzing traces since it is
non-intuitive when keepalives are sent and when they are not sent.
Change the keepalive logic to always send a keepalive regardless of
whether the link is idle or not.
Ronnie Sahlberg [Wed, 19 Nov 2008 03:43:46 +0000 (14:43 +1100)]
reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.
Ronnie Sahlberg [Wed, 12 Nov 2008 23:55:20 +0000 (10:55 +1100)]
new version 1.0.65
update the example sysconfig file. the default log level is 2, not 0
Ronnie Sahlberg [Tue, 11 Nov 2008 03:49:30 +0000 (14:49 +1100)]
add a CTDB_SOCKET variable that can be used to override the default
/tmp/ctdb.socket
Ronnie Sahlberg [Mon, 3 Nov 2008 10:54:52 +0000 (21:54 +1100)]
we actually need a ctdb_db variable
Ronnie Sahlberg [Thu, 30 Oct 2008 02:34:10 +0000 (13:34 +1100)]
latency is measured in us, not ms
use an explicit ctdb_db variable instead of dereferencing state
Ronnie Sahlberg [Thu, 30 Oct 2008 01:49:53 +0000 (12:49 +1100)]
add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold
Ronnie Sahlberg [Wed, 22 Oct 2008 00:06:18 +0000 (11:06 +1100)]
new version 1.0.64
Ronnie Sahlberg [Wed, 22 Oct 2008 00:04:41 +0000 (11:04 +1100)]
add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses
Ronnie Sahlberg [Sun, 19 Oct 2008 22:47:54 +0000 (09:47 +1100)]
new version 1.0.63
Ronnie Sahlberg [Sun, 19 Oct 2008 22:45:15 +0000 (09:45 +1100)]
dont log "running periodic cleanup" ...
Ronnie Sahlberg [Fri, 17 Oct 2008 10:38:42 +0000 (21:38 +1100)]
null out the pointer before we reload the nodes file
Ronnie Sahlberg [Fri, 17 Oct 2008 10:18:06 +0000 (21:18 +1100)]
when we reload the nodes file, we may need to reload the nodes file
inside the recovery daemon as well.
Ronnie Sahlberg [Thu, 16 Oct 2008 22:02:03 +0000 (09:02 +1100)]
make it possible to set the script log level in CTDB sysconfig
Ronnie Sahlberg [Thu, 16 Oct 2008 20:56:12 +0000 (07:56 +1100)]
specify a "script log level" on the commandline to set under which log
level any/all output from eventscripts will be logged as
Ronnie Sahlberg [Thu, 16 Oct 2008 06:59:55 +0000 (17:59 +1100)]
new version 1.0.62
Ronnie Sahlberg [Thu, 16 Oct 2008 06:57:50 +0000 (17:57 +1100)]
allow multiple eventscripts using the same prefix.
this eases the pain for users that use out of tree eventscripts
Andrew Tridgell [Thu, 16 Oct 2008 01:58:25 +0000 (12:58 +1100)]
Merge commit 'ronnie/master'
Ronnie Sahlberg [Wed, 15 Oct 2008 05:40:44 +0000 (16:40 +1100)]
new version 1.0.61
Ronnie Sahlberg [Wed, 15 Oct 2008 05:29:09 +0000 (16:29 +1100)]
install the new multipath monitoring event script
Ronnie Sahlberg [Wed, 15 Oct 2008 05:27:33 +0000 (16:27 +1100)]
add an eventscript to monitor that the multipath devices are healthy
Ronnie Sahlberg [Tue, 14 Oct 2008 21:33:37 +0000 (08:33 +1100)]
we must also check the status returned from the get tickles control to
determine whether it was successful or not
Ronnie Sahlberg [Tue, 14 Oct 2008 16:02:09 +0000 (03:02 +1100)]
lower the loglevel for the informational message that a TCP_ADD opeation
described an ip address not known to be a public address.
This could happen if someone for genuine reasons accesses a share
through a static ip address.
It can also happen if non homogenous public address configurations are
used and when a tcp description is pushed out to a different node that
does not server/know the specific ip address.
Ronnie Sahlberg [Tue, 14 Oct 2008 14:49:19 +0000 (01:49 +1100)]
change ip route add to route add -net since this works more reliably
update the makefile and rpm to install 99.routing
Ronnie Sahlberg [Tue, 14 Oct 2008 14:32:46 +0000 (01:32 +1100)]
new version 1.0.60
Ronnie Sahlberg [Tue, 14 Oct 2008 14:23:57 +0000 (01:23 +1100)]
verify that the nodes we try to ban/unban are operational and print an
error to the user othervise.
Ronnie Sahlberg [Tue, 14 Oct 2008 14:08:29 +0000 (01:08 +1100)]
Revert "from Mathieu Parent <math.parent@gmail.com>"
This reverts commit
dc9cd4779db4a89697731e4cf415be51067a07c1.
Conflicts: