"monitor" events can be cancelled. If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped. In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).
A long time ago we did service reconfiguration in "monitor" events
following failovers. Service reconfiguration was then moved to the
"ipreallocated" event. However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur. The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated". Therefore, IPs can be deleted without
running the required service reconfiguration.
The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.
This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.
Also update the associated tests. Make the first confirm that the
monitor event no longer does reconfiguration. Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
ctdb_service_reconfigure
fi
;;
ctdb_service_reconfigure
fi
;;
- monitor)
- if ctdb_service_needs_reconfigure ; then
- ctdb_service_reconfigure
- # Given that the reconfigure might not have
- # resulted in the service being stable yet, we
- # replay the previous status since that's the best
- # information we have.
- ctdb_replay_monitor_status
- fi
- ;;
esac
else
# Somebody else is running an event we don't want to collide
esac
else
# Somebody else is running an event we don't want to collide
. "${TEST_SCRIPTS_DIR}/unit.sh"
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure"
+define_test "takeip, monitor -> no reconfigure"
simple_test_event "takeip" $public_address
simple_test_event "takeip" $public_address
-# This currently assumes that ctdb scriptstatus will always return a
-# good status (when replaying). That should change and we will need
-# to split this into 2 tests.
-ok <<EOF
-Reconfiguring service "nfs"...
-Replaying previous status for this script due to reconfigure...
-EOF
simple_test_event "monitor"
simple_test_event "monitor"
. "${TEST_SCRIPTS_DIR}/unit.sh"
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure, replay error"
+define_test "takeip, take reconfigure lock, monitor -> replay error"
ctdb_fake_scriptstatus 1 "ERROR" "$err"
ctdb_fake_scriptstatus 1 "ERROR" "$err"
+eventscript_call ctdb_reconfigure_try_lock
+
-Reconfiguring service "nfs"...
Replaying previous status for this script due to reconfigure...
$err
EOF
Replaying previous status for this script due to reconfigure...
$err
EOF
. "${TEST_SCRIPTS_DIR}/unit.sh"
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure, replay timedout"
+define_test "takeip, take reconfigure lock, monitor -> reconfigure, replay timedout"
ctdb_fake_scriptstatus -62 "TIMEDOUT" "$err"
ctdb_fake_scriptstatus -62 "TIMEDOUT" "$err"
+eventscript_call ctdb_reconfigure_try_lock
+
-Reconfiguring service "nfs"...
Replaying previous status for this script due to reconfigure...
[Replay of TIMEDOUT scriptstatus - note incorrect return code.] $err
EOF
Replaying previous status for this script due to reconfigure...
[Replay of TIMEDOUT scriptstatus - note incorrect return code.] $err
EOF
. "${TEST_SCRIPTS_DIR}/unit.sh"
. "${TEST_SCRIPTS_DIR}/unit.sh"
-define_test "takeip, monitor -> reconfigure, replay disabled"
+define_test "takeip, take reconfigure lock, monitor -> reconfigure, replay disabled"
ctdb_fake_scriptstatus -8 "DISABLED" "$err"
ctdb_fake_scriptstatus -8 "DISABLED" "$err"
+eventscript_call ctdb_reconfigure_try_lock
+
-Reconfiguring service "nfs"...
Replaying previous status for this script due to reconfigure...
[Replay of DISABLED scriptstatus - note incorrect return code.] $err
EOF
Replaying previous status for this script due to reconfigure...
[Replay of DISABLED scriptstatus - note incorrect return code.] $err
EOF