eventscripts: Become unhealthy faster on nfsd failure
authorMartin Schwenke <martin@meltin.net>
Mon, 12 Aug 2013 01:36:25 +0000 (11:36 +1000)
committerAmitay Isaacs <amitay@gmail.com>
Wed, 14 Aug 2013 06:10:30 +0000 (16:10 +1000)
commite9ef93f7b6dad59eabaa32124df81f3e74c651ef
tree2e11a6d3c2b039c95c8f294dd6ae296ca939d3fd
parentb49c4f39666d5b1596213bf41bcdc47ed3c327ae
eventscripts: Become unhealthy faster on nfsd failure

Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem.  Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures.  Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>
config/nfs-rpc-checks.d/20.nfsd.check
tests/eventscripts/60.nfs.monitor.112.sh
tests/eventscripts/60.nfs.monitor.113.sh
tests/eventscripts/60.nfs.monitor.114.sh