git.samba.org - martins/ctdb.git/log

git.samba.org / martins / ctdb.git / log

Ronnie Sahlberg [Wed, 17 Oct 2007 05:03:58 +0000 (15:03 +1000)]

use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
about this packet any more and just forget it ever saw it

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 17 Oct 2007 03:42:42 +0000 (13:42 +1000)]

reverse the order in which public ips are listed so it matches the order
of the public_addresses file

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 17 Oct 2007 00:10:52 +0000 (10:10 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Tue, 16 Oct 2007 10:14:04 +0000 (20:14 +1000)]

increase release number

commit | commitdiff | tree

Andrew Tridgell [Tue, 16 Oct 2007 10:13:28 +0000 (20:13 +1000)]

more detail on multipath config

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 16 Oct 2007 05:27:07 +0000 (15:27 +1000)]

add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file
since at this stage it MUST be locked by the recovery daemon.

in order to avoid a non-blocking fnctl() lock from blocking and cause
"issues" we move the 'test that we can not lock reclock file' into a
child process.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 16 Oct 2007 02:15:02 +0000 (12:15 +1000)]

add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set, the first entry in /etc/ctdb/public_addresses will
always be hosted by node 0, when that node is available, the second
entry by node1 and so on.

This tunable allows the allocation of addresses to become very
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are
identical on all the nodes in the cluster.

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 16 Oct 2007 01:29:33 +0000 (11:29 +1000)]

include system/network.h so we get the prototype for inet_aton()

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 16 Oct 2007 01:26:22 +0000 (11:26 +1000)]

merge from tridge

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 15 Oct 2007 23:50:31 +0000 (09:50 +1000)]

dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding
itself after a node arriving/leaving the cluster

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 04:44:06 +0000 (14:44 +1000)]

only link to -lipq if needed

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 04:37:54 +0000 (14:37 +1000)]

improved handling of systems without libipq.h

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 04:29:47 +0000 (14:29 +1000)]

disable ipmux code until we have a configure test

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 04:28:51 +0000 (14:28 +1000)]

sync flags between nodes in monitor loop in recmaster

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 04:17:49 +0000 (14:17 +1000)]

merge from ronnie

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 03:31:09 +0000 (13:31 +1000)]

disable optimisation for now, until we find a occasional segv

commit | commitdiff | tree

Andrew Tridgell [Mon, 15 Oct 2007 03:22:58 +0000 (13:22 +1000)]

add config option for disabling bans

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Oct 2007 21:51:57 +0000 (07:51 +1000)]

use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Oct 2007 21:30:10 +0000 (07:30 +1000)]

use kill_tcp_connections() to kill off all tcp connections to the
"single public ip" address when we do a recovery

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Oct 2007 21:27:38 +0000 (07:27 +1000)]

move the kill_tcp_connections() function from 10.interfaces to functions

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Oct 2007 21:10:17 +0000 (07:10 +1000)]

first check that recovery master is connected (we know this from our own
flags)

then pull the flags off recovery master before checking if it is banned

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Oct 2007 20:16:36 +0000 (06:16 +1000)]

simplify election handling

make sure we read and update the flags from all remote nodes before we
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 10 Oct 2007 00:49:55 +0000 (10:49 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Wed, 10 Oct 2007 00:45:22 +0000 (10:45 +1000)]

make sure reconnected nodes start off as unhealthy so they don't get a public IP

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Oct 2007 23:42:32 +0000 (09:42 +1000)]

add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip
address

update the killtcp control in the daemon that if a socketpair that is to
be killed does not match a normal public address it checks if the
destination address maches the single public ip address and if so uses
that vnn structure from the ctdb context

this allows killtcp to kill also connections to the single public ip
instead of only normal public addresses

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Oct 2007 03:45:42 +0000 (13:45 +1000)]

remove some debug outputs

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Oct 2007 02:00:12 +0000 (12:00 +1000)]

send out gratious arps when we are starting up serving the "single
public ip" but before we start the ipmux tool

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 9 Oct 2007 01:56:09 +0000 (11:56 +1000)]

add a control to send gratious arps from the ctdb daemon

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 8 Oct 2007 04:05:22 +0000 (14:05 +1000)]

add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.

this ip address is attached to lo on all nodes but only the recmaster
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming
packets to this ip address onto the other node sin the cluster based on
the ip address of the client host

to use this feature one must
1, have one fixed ip address in the customers network attached
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
   to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
   to specify which ipaddress should be the "single public ip address"

to test with only one single client,   attach several ip addresses to
the client and ping the public address from the client with different -I
options.   look in network trace to see to which node the packet is
passed onto.

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 7 Oct 2007 23:47:20 +0000 (09:47 +1000)]

add a function in the ctdb tool to determine whether the local node is
the recmaster or not.

return 0 if the node is the recmaster and 1 (true) if it is not or if
we could not communicate with the ctdb daemon.

call it 'isnotrecmaster' to cope with that if the tool could not bind to
the socket to tyalk to the daemon, the tool will automatically return an
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the
local daemon and if the local daemon confirms this node is the recmaster

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 5 Oct 2007 22:11:24 +0000 (08:11 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Fri, 5 Oct 2007 03:51:31 +0000 (13:51 +1000)]

fixed several places where we set the recovery culprit incorrectly

commit | commitdiff | tree

Andrew Tridgell [Fri, 5 Oct 2007 03:28:21 +0000 (13:28 +1000)]

- catch ESTALE in the recovery lock by trying a read()
- priortise nodes that are unbanned and healthy in the election

commit | commitdiff | tree

Andrew Tridgell [Fri, 5 Oct 2007 02:01:40 +0000 (12:01 +1000)]

we are the culprit if we can't get the reclock

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 26 Sep 2007 04:25:32 +0000 (14:25 +1000)]

change async.private to async.private_data since private is a reserved
work in c++

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 25 Sep 2007 01:43:42 +0000 (11:43 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Mon, 24 Sep 2007 05:27:01 +0000 (15:27 +1000)]

upped version number

commit | commitdiff | tree

Andrew Tridgell [Mon, 24 Sep 2007 03:52:35 +0000 (13:52 +1000)]

merge from ronnie

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 24 Sep 2007 00:52:26 +0000 (10:52 +1000)]

when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 24 Sep 2007 00:27:48 +0000 (10:27 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Mon, 24 Sep 2007 00:19:07 +0000 (10:19 +1000)]

fixed a fd leak on the recovery lock

commit | commitdiff | tree

Andrew Tridgell [Mon, 24 Sep 2007 00:12:18 +0000 (10:12 +1000)]

run monitoring more quickly when unhealthy and at startup

commit | commitdiff | tree

Andrew Tridgell [Mon, 24 Sep 2007 00:00:14 +0000 (10:00 +1000)]

no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen

commit | commitdiff | tree

Andrew Tridgell [Sun, 23 Sep 2007 23:57:14 +0000 (09:57 +1000)]

fixed a valgrind error, and some warnings

commit | commitdiff | tree

Andrew Tridgell [Fri, 21 Sep 2007 06:12:04 +0000 (16:12 +1000)]

make the persistent dbdir configurable

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 21 Sep 2007 05:45:48 +0000 (15:45 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Fri, 21 Sep 2007 05:44:13 +0000 (15:44 +1000)]

avoid using connected nodes that aren't in the vnn map yet

commit | commitdiff | tree

Andrew Tridgell [Fri, 21 Sep 2007 05:32:11 +0000 (15:32 +1000)]

merge bugfix from ronnie

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 21 Sep 2007 05:19:33 +0000 (15:19 +1000)]

in ctdb_control_persistent_store() we must talloc_steal() the pointer to
c to prevent it from being immediately freed (and our persistent store
state with it) if we need to wait asynchronously for other nodes before
we can reply back to the client

commit | commitdiff | tree

Andrew Tridgell [Fri, 21 Sep 2007 04:47:32 +0000 (14:47 +1000)]

merge from ronnie

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 21 Sep 2007 03:47:40 +0000 (13:47 +1000)]

when ctdb attaches to a database it broadcasts the attach to all other
nodes so that the db is created on them as well

when we send this broadcast we must use the correct control and not
assume all databases created are of the temporary kind

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 21 Sep 2007 03:20:29 +0000 (13:20 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Fri, 21 Sep 2007 02:24:02 +0000 (12:24 +1000)]

added support for persistent databases in ctdbd

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 19 Sep 2007 01:54:45 +0000 (11:54 +1000)]

merge from tridge

commit | commitdiff | tree

Ronnie Sahlberg [Wed, 19 Sep 2007 01:53:48 +0000 (11:53 +1000)]

one more command to run to enable winbind for vsftpd

commit | commitdiff | tree

Andrew Tridgell [Wed, 19 Sep 2007 01:46:37 +0000 (11:46 +1000)]

make sure we set close on exec on any possibly inherited fds

commit | commitdiff | tree

Andrew Tridgell [Wed, 19 Sep 2007 01:46:11 +0000 (11:46 +1000)]

separate out the various fs display ops

commit | commitdiff | tree

Andrew Tridgell [Mon, 17 Sep 2007 05:31:33 +0000 (15:31 +1000)]

expanded ctdb_diagnostics a bit

commit | commitdiff | tree

Ronnie Sahlberg [Mon, 17 Sep 2007 03:01:16 +0000 (13:01 +1000)]

add documantation of additional requirements for FTP so that users can
log in and access files using the AD username/password

commit | commitdiff | tree

Ronnie Sahlberg [Sun, 16 Sep 2007 21:43:15 +0000 (07:43 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Fri, 14 Sep 2007 09:27:11 +0000 (19:27 +1000)]

increase release number

commit | commitdiff | tree

Andrew Tridgell [Fri, 14 Sep 2007 05:23:23 +0000 (15:23 +1000)]

merge from ronnie

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 14 Sep 2007 05:19:44 +0000 (15:19 +1000)]

let ctdb ip only print the ip addresses known to the specified node
and not the entire cluster

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 14 Sep 2007 04:24:53 +0000 (14:24 +1000)]

update vnn -> pnn in documentation

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 14 Sep 2007 04:19:12 +0000 (14:19 +1000)]

documentation updates

it is --event-script-dir not --event-script

add explanation of the public_addresses file

commit | commitdiff | tree

Andrew Tridgell [Fri, 14 Sep 2007 04:14:03 +0000 (14:14 +1000)]

cope with non-standard install dirs in event scripts

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 14 Sep 2007 02:18:34 +0000 (12:18 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Fri, 14 Sep 2007 01:59:04 +0000 (11:59 +1000)]

fix pkill args

commit | commitdiff | tree

Andrew Tridgell [Fri, 14 Sep 2007 01:56:40 +0000 (11:56 +1000)]

make sure all public IPs are removed at startup

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 14 Sep 2007 00:37:10 +0000 (10:37 +1000)]

during startup make sure to delete any public addresses from any
interface

commit | commitdiff | tree

Ronnie Sahlberg [Fri, 14 Sep 2007 00:16:36 +0000 (10:16 +1000)]

let each node verify that they have a correct assignment of public ip
addresses (i.e. htey hold those they should hold and they dont hold
any of those they shouldnt hold)

if an inconsistency is found, mark the local node as recovery mode
active
and wait for the recovery master to trigger a full blown recovery

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 23:49:12 +0000 (09:49 +1000)]

- merge from ronnie
- add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 23:25:11 +0000 (09:25 +1000)]

wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 23:24:34 +0000 (09:24 +1000)]

nicer use of testparm

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 13 Sep 2007 22:56:27 +0000 (08:56 +1000)]

update the section about event scripts

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 13 Sep 2007 22:15:24 +0000 (08:15 +1000)]

disable nfsv4 in etc/sysconfig/nfs

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 13 Sep 2007 04:51:37 +0000 (14:51 +1000)]

when a ctdb_takeover_run has failed  we must make sure that
need_takeover_run is set to true  or else we might forget to rerun it
again during the next recovery

othervise,  need_takeover_run is only set to true IFF the node flags for
a remote node and the local nodes differ.
It is possible that a takeover run fails  and thus the reassignment of
ip addresses is incomplete  but before we get back to the test in
monitor_cluster()  that all the node flags of all nodes have converged
and they now match each others again.   and thus causing
monitor_cluster() to fail to realize that a takeover run is needed.

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 04:36:23 +0000 (14:36 +1000)]

ensure smbd and winbindd do die in 50.samba

commit | commitdiff | tree

Ronnie Sahlberg [Thu, 13 Sep 2007 04:28:18 +0000 (14:28 +1000)]

merge from tridge

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 04:08:18 +0000 (14:08 +1000)]

prevent recursion in the calling of ctdb_takeover_run

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 01:57:42 +0000 (11:57 +1000)]

more shell scripting fixes in 10.interface

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 01:19:49 +0000 (11:19 +1000)]

force recovery if unable to tell a node to release an IP

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 01:19:30 +0000 (11:19 +1000)]

fixed script errors in 10.interface

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 00:45:06 +0000 (10:45 +1000)]

we don't need the is_loopback logic in ctdb any more

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 00:39:05 +0000 (10:39 +1000)]

remove more cruft from the logs

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 00:24:48 +0000 (10:24 +1000)]

new approach for killing TCP connections on IP release

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 00:03:18 +0000 (10:03 +1000)]

remove clutter from ctdb log file

commit | commitdiff | tree

Andrew Tridgell [Thu, 13 Sep 2007 00:02:56 +0000 (10:02 +1000)]

fixed return code

commit | commitdiff | tree

Andrew Tridgell [Wed, 12 Sep 2007 03:26:24 +0000 (13:26 +1000)]

handle hung or slow ctdb daemons on shutdown

commit | commitdiff | tree

Andrew Tridgell [Wed, 12 Sep 2007 03:23:36 +0000 (13:23 +1000)]

- set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;

commit | commitdiff | tree

Andrew Tridgell [Wed, 12 Sep 2007 03:22:31 +0000 (13:22 +1000)]

- don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier

commit | commitdiff | tree

Andrew Tridgell [Wed, 12 Sep 2007 03:21:19 +0000 (13:21 +1000)]

changed some debug levels

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 11 Sep 2007 21:28:24 +0000 (07:28 +1000)]

use the public addresses variable instead of hardcoding the path

commit | commitdiff | tree

Ronnie Sahlberg [Tue, 11 Sep 2007 21:26:30 +0000 (07:26 +1000)]

move all ip addresses onto loopback when we startup ctdb