'\" t .\" Title: ctdb .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.75.1 .\" Date: 12/09/2009 .\" Manual: .\" Source: .\" Language: English .\" .TH "CTDB" "1" "12/09/2009" "" "" .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" ctdb \- clustered tdb database management utility .SH "SYNOPSIS" .HP \w'\fBctdb\ [\ OPTIONS\ ]\ COMMAND\ \&.\&.\&.\fR\ 'u \fBctdb [ OPTIONS ] COMMAND \&.\&.\&.\fR .HP \w'\fBctdb\fR\ 'u \fBctdb\fR [\-n\ ] [\-Y] [\-t\ ] [\-T\ ] [\-?\ \-\-help] [\-\-usage] [\-d\ \-\-debug=] [\-\-socket=] .SH "DESCRIPTION" .PP ctdb is a utility to view and manage a ctdb cluster\&. .SH "OPTIONS" .PP \-n .RS 4 This specifies the physical node number on which to execute the command\&. Default is to run the command on the deamon running on the local host\&. .sp The physical node number is an integer that describes the node in the cluster\&. The first node has physical node number 0\&. .RE .PP \-Y .RS 4 Produce output in machine readable form for easier parsing by scripts\&. Not all commands support this option\&. .RE .PP \-t .RS 4 How long should ctdb wait for the local ctdb daemon to respond to a command before timing out\&. Default is 3 seconds\&. .RE .PP \-T .RS 4 A limit on how long the ctdb command will run for before it will be aborted\&. When this timelimit has been exceeded the ctdb command will terminate\&. .RE .PP \-? \-\-help .RS 4 Print some help text to the screen\&. .RE .PP \-\-usage .RS 4 Print useage information to the screen\&. .RE .PP \-d \-\-debug= .RS 4 Change the debug level for the command\&. Default is 0\&. .RE .PP \-\-socket= .RS 4 Specify the socketname to use when connecting to the local ctdb daemon\&. The default is /tmp/ctdb\&.socket \&. .sp You only need to specify this parameter if you run multiple ctdb daemons on the same physical host and thus can not use the default name for the domain socket\&. .RE .SH "ADMINISTRATIVE COMMANDS" .PP These are commands used to monitor and administrate a CTDB cluster\&. .SS "pnn" .PP This command displays the pnn of the current node\&. .SS "status" .PP This command shows the current status of the ctdb node\&. .sp .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .ps +1 \fBnode status\fR .RS 4 .PP Node status reflects the current status of the node\&. There are five possible states: .PP OK \- This node is fully functional\&. .PP DISCONNECTED \- This node could not be connected through the network and is currently not participating in the cluster\&. If there is a public IP address associated with this node it should have been taken over by a different node\&. No services are running on this node\&. .PP DISABLED \- This node has been administratively disabled\&. This node is still functional and participates in the CTDB cluster but its IP addresses have been taken over by a different node and no services are currently being hosted\&. .PP UNHEALTHY \- A service provided by this node is malfunctioning and should be investigated\&. The CTDB daemon itself is operational and participates in the cluster\&. Its public IP address has been taken over by a different node and no services are currnetly being hosted\&. All unhealthy nodes should be investigated and require an administrative action to rectify\&. .PP BANNED \- This node failed too many recovery attempts and has been banned from participating in the cluster for a period of RecoveryBanPeriod seconds\&. Any public IP address has been taken over by other nodes\&. This node does not provide any services\&. All banned nodes should be investigated and require an administrative action to rectify\&. This node does not perticipate in the CTDB cluster but can still be communicated with\&. I\&.e\&. ctdb commands can be sent to it\&. .PP STOPPED \- A node that is stopped does not host any public ip addresses, nor is it part of the VNNMAP\&. A stopped node can not become LVSMASTER, RECMASTER or NATGW\&. This node does not perticipate in the CTDB cluster but can still be communicated with\&. I\&.e\&. ctdb commands can be sent to it\&. .RE .sp .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .ps +1 \fBgeneration\fR .RS 4 .PP The generation id is a number that indicates the current generation of a cluster instance\&. Each time a cluster goes through a reconfiguration or a recovery its generation id will be changed\&. .PP This number does not have any particular meaning other than to keep track of when a cluster has gone through a recovery\&. It is a random number that represents the current instance of a ctdb cluster and its databases\&. CTDBD uses this number internally to be able to tell when commands to operate on the cluster and the databases was issued in a different generation of the cluster, to ensure that commands that operate on the databases will not survive across a cluster database recovery\&. After a recovery, all old outstanding commands will automatically become invalid\&. .PP Sometimes this number will be shown as "INVALID"\&. This only means that the ctdbd daemon has started but it has not yet merged with the cluster through a recovery\&. All nodes start with generation "INVALID" and are not assigned a real generation id until they have successfully been merged with a cluster through a recovery\&. .RE .sp .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .ps +1 \fBVNNMAP\fR .RS 4 .PP The list of Virtual Node Numbers\&. This is a list of all nodes that actively participates in the cluster and that share the workload of hosting the Clustered TDB database records\&. Only nodes that are participating in the vnnmap can become lmaster or dmaster for a database record\&. .RE .sp .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .ps +1 \fBRecovery mode\fR .RS 4 .PP This is the current recovery mode of the cluster\&. There are two possible modes: .PP NORMAL \- The cluster is fully operational\&. .PP RECOVERY \- The cluster databases have all been frozen, pausing all services while the cluster awaits a recovery process to complete\&. A recovery process should finish within seconds\&. If a cluster is stuck in the RECOVERY state this would indicate a cluster malfunction which needs to be investigated\&. .PP Once the recovery master detects an inconsistency, for example a node becomes disconnected/connected, the recovery daemon will trigger a cluster recovery process, where all databases are remerged across the cluster\&. When this process starts, the recovery master will first "freeze" all databases to prevent applications such as samba from accessing the databases and it will also mark the recovery mode as RECOVERY\&. .PP When CTDBD starts up, it will start in RECOVERY mode\&. Once the node has been merged into a cluster and all databases have been recovered, the node mode will change into NORMAL mode and the databases will be "thawed", allowing samba to access the databases again\&. .RE .sp .it 1 an-trap .nr an-no-space-flag 1 .nr an-break-flag 1 .br .ps +1 \fBRecovery master\fR .RS 4 .PP This is the cluster node that is currently designated as the recovery master\&. This node is responsible of monitoring the consistency of the cluster and to perform the actual recovery process when reqired\&. .PP Only one node at a time can be the designated recovery master\&. Which node is designated the recovery master is decided by an election process in the recovery daemons running on each node\&. .RE .PP Example: ctdb status .PP Example output: .sp .if n \{\ .RS 4 .\} .nf Number of nodes:4 pnn:0 11\&.1\&.2\&.200 OK (THIS NODE) pnn:1 11\&.1\&.2\&.201 OK pnn:2 11\&.1\&.2\&.202 OK pnn:3 11\&.1\&.2\&.203 OK Generation:1362079228 Size:4 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:2 hash:3 lmaster:3 Recovery mode:NORMAL (0) Recovery master:0 .fi .if n \{\ .RE .\} .SS "recmaster" .PP This command shows the pnn of the node which is currently the recmaster\&. .SS "uptime" .PP This command shows the uptime for the ctdb daemon\&. When the last recovery or ip\-failover completed and how long it took\&. If the "duration" is shown as a negative number, this indicates that there is a recovery/failover in progress and it started that many seconds ago\&. .PP Example: ctdb uptime .PP Example output: .sp .if n \{\ .RS 4 .\} .nf Current time of node : Thu Oct 29 10:38:54 2009 Ctdbd start time : (000 16:54:28) Wed Oct 28 17:44:26 2009 Time of last recovery/failover: (000 16:53:31) Wed Oct 28 17:45:23 2009 Duration of last recovery/failover: 2\&.248552 seconds .fi .if n \{\ .RE .\} .SS "listnodes" .PP This command shows lists the ip addresses of all the nodes in the cluster\&. .PP Example: ctdb listnodes .PP Example output: .sp .if n \{\ .RS 4 .\} .nf 10\&.0\&.0\&.71 10\&.0\&.0\&.72 10\&.0\&.0\&.73 10\&.0\&.0\&.74 .fi .if n \{\ .RE .\} .SS "ping" .PP This command will "ping" all CTDB daemons in the cluster to verify that they are processing commands correctly\&. .PP Example: ctdb ping .PP Example output: .sp .if n \{\ .RS 4 .\} .nf response from 0 time=0\&.000054 sec (3 clients) response from 1 time=0\&.000144 sec (2 clients) response from 2 time=0\&.000105 sec (2 clients) response from 3 time=0\&.000114 sec (2 clients) .fi .if n \{\ .RE .\} .SS "ip" .PP This command will display the list of public addresses that are provided by the cluster and which physical node is currently serving this ip\&. By default this command will ONLY show those public addresses that are known to the node itself\&. To see the full list of all public ips across the cluster you must use "ctdb ip \-n all"\&. .PP Example: ctdb ip .PP Example output: .sp .if n \{\ .RS 4 .\} .nf Number of addresses:4 12\&.1\&.1\&.1 0 12\&.1\&.1\&.2 1 12\&.1\&.1\&.3 2 12\&.1\&.1\&.4 3 .fi .if n \{\ .RE .\} .SS "scriptstatus" .PP This command displays which scripts where run in the previous monitoring cycle and the result of each script\&. If a script failed with an error, causing the node to become unhealthy, the output from that script is also shown\&. .PP Example: ctdb scriptstatus .PP Example output: .sp .if n \{\ .RS 4 .\} .nf 7 scripts were executed last monitoring cycle 00\&.ctdb Status:OK Duration:0\&.056 Tue Mar 24 18:56:57 2009 10\&.interface Status:OK Duration:0\&.077 Tue Mar 24 18:56:57 2009 11\&.natgw Status:OK Duration:0\&.039 Tue Mar 24 18:56:57 2009 20\&.multipathd Status:OK Duration:0\&.038 Tue Mar 24 18:56:57 2009 31\&.clamd Status:DISABLED 40\&.vsftpd Status:OK Duration:0\&.045 Tue Mar 24 18:56:57 2009 41\&.httpd Status:OK Duration:0\&.039 Tue Mar 24 18:56:57 2009 50\&.samba Status:ERROR Duration:0\&.082 Tue Mar 24 18:56:57 2009 OUTPUT:ERROR: Samba tcp port 445 is not responding .fi .if n \{\ .RE .\} .SS "disablescript