ctdb-daemon: Fix tickle updates to recently started nodes
authorMartin Schwenke <martin@meltin.net>
Thu, 13 Mar 2014 05:53:15 +0000 (16:53 +1100)
committerAmitay Isaacs <amitay@samba.org>
Sun, 23 Mar 2014 03:20:14 +0000 (04:20 +0100)
Commit 0723fedcedd4a97870f7b1224945f1587363c9bf added a cheap
implemention of ctdb_control_startup() that simply flags the recipient
node as needing to send updates for each IP when the tickle update
loop next fires.  Commit 026996550d726836091ff5ebd1ebf925bf237bb0
ensures that a node only sends tickle updates once being flagged to do
so.

CTDB_CONTROL_STARTUP is broadcast to all nodes, so this is a good
start.  However, the tickle updates are only broadcast to connected
nodes.  A recently started node may not yet be considered to be
connected because the keepalive monitoring loop may not yet have
marked the node as connected.  This means that the tickle update loop
races with the keepalive monitoring loop.  If the tickle update loop
wins then updates will not be sent to the recently started node.

The simplest improvement is to stop the tickle update from depending
on whether a node is connected or not.  So instead of broadcasting
tickle updates to connected nodes, they are broadcast to all nodes.
Since no reply is expected, this should work just fine.

While looking at this code, ctdb_ctrl_set_tcp_tickles() is named like
a client function.  It isn't a client function.  Also, 2 of the
arguments are ignored.  So rename this function to
ctdb_send_set_tcp_tickles_for_ip() and remove the ignored arguments.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

ctdb/server/ctdb_takeover.c

index 34b210ee24be5e46b1a7abc16cd37ed96515c74d..865357c6e5bb28af41e4108bfc14fbd7a53d5021 100644 (file)
@@ -3959,10 +3959,9 @@ int32_t ctdb_control_get_tcp_tickle_list(struct ctdb_context *ctdb, TDB_DATA ind
 /*
   set the list of all tcp tickles for a public address
  */
-static int ctdb_ctrl_set_tcp_tickles(struct ctdb_context *ctdb, 
-                             struct timeval timeout, uint32_t destnode, 
-                             ctdb_sock_addr *addr,
-                             struct ctdb_tcp_array *tcparray)
+static int ctdb_send_set_tcp_tickles_for_ip(struct ctdb_context *ctdb,
+                                           ctdb_sock_addr *addr,
+                                           struct ctdb_tcp_array *tcparray)
 {
        int ret, num;
        TDB_DATA data;
@@ -3987,7 +3986,7 @@ static int ctdb_ctrl_set_tcp_tickles(struct ctdb_context *ctdb,
                memcpy(&list->tickles.connections[0], tcparray->connections, sizeof(struct ctdb_tcp_connection) * num);
        }
 
-       ret = ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_CONNECTED, 0, 
+       ret = ctdb_daemon_send_control(ctdb, CTDB_BROADCAST_ALL, 0,
                                       CTDB_CONTROL_SET_TCP_TICKLE_LIST,
                                       0, CTDB_CTRL_FLAG_NOREPLY, data, NULL, NULL);
        if (ret != 0) {
@@ -4023,11 +4022,9 @@ static void ctdb_update_tcp_tickles(struct event_context *ev,
                if (!vnn->tcp_update_needed) {
                        continue;
                }
-               ret = ctdb_ctrl_set_tcp_tickles(ctdb, 
-                               TAKEOVER_TIMEOUT(),
-                               CTDB_BROADCAST_CONNECTED,
-                               &vnn->public_address,
-                               vnn->tcp_array);
+               ret = ctdb_send_set_tcp_tickles_for_ip(ctdb,
+                                                      &vnn->public_address,
+                                                      vnn->tcp_array);
                if (ret != 0) {
                        DEBUG(DEBUG_ERR,("Failed to send the tickle update for public address %s\n",
                                ctdb_addr_to_str(&vnn->public_address)));