amitay/ctdb.git
11 years agoNew version 1.2.27-204.1 ctdb-1.2.27-204.1
Martin Schwenke [Thu, 16 Aug 2012 02:32:08 +0000 (12:32 +1000)]
New version 1.2.27-204.1

Signed-off-by: Martin Schwenke <martin@meltin.net>
11 years agoDont call the UPDATE event if both old and new interface is the same.
Ronnie Sahlberg [Wed, 4 May 2011 01:34:17 +0000 (11:34 +1000)]
Dont call the UPDATE event if both old and new interface is the same.

CQ S1018175

12 years agonew version 1.2.27-204
Ronnie Sahlberg [Thu, 28 Jul 2011 22:43:49 +0000 (08:43 +1000)]
new version 1.2.27-204

12 years agoUpdate the delip command
Ronnie Sahlberg [Thu, 28 Jul 2011 22:41:35 +0000 (08:41 +1000)]
Update the delip command
Dont talloc_free(vnn) immediately but postphone it until later when
the eventscript callback has completed.

CQ S1026664

12 years agoNew version 1.2.27-203
Ronnie Sahlberg [Mon, 25 Jul 2011 11:18:31 +0000 (21:18 +1000)]
New version 1.2.27-203

12 years agoeventscript: fix callback after free
Rusty Russell [Mon, 25 Jul 2011 08:26:06 +0000 (17:56 +0930)]
eventscript: fix callback after free

ctdb_event_script_callback() takes a mem_ctx arg which it doesn't use, but
the implication is pretty clear, that when that mem_ctx is freed, the callback
shouldn't happen.  Indeed, Ronnie reproduced a case where that callback
refers to freed memory, in the ip reallocation code under stress.

So attach the callback to the mem_ctx they give us, and remove it from the
script state structure when that's freed.  It's a bit weird, but it works.

CQ: S1026179
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
12 years agoRemove logging of spam/errors from the 10.interfrace
Ronnie Sahlberg [Sun, 8 May 2011 20:35:33 +0000 (06:35 +1000)]
Remove logging of spam/errors from the 10.interfrace
script if/when we have for example NATGW configured but no public addresses defined on that interface

CQ S1023378

12 years agoNew version 1.2.27-202
Ronnie Sahlberg [Mon, 27 Jun 2011 05:30:24 +0000 (15:30 +1000)]
New version 1.2.27-202

12 years agoRemove a benign by annoying log message that will be logged after an interface that...
Ronnie Sahlberg [Sat, 18 Jun 2011 00:47:25 +0000 (10:47 +1000)]
Remove a benign by annoying log message that will be logged after an interface that has been in use has later been removed and is no longer referenced by any public addresses.

CQ S1024495

12 years agonew version 1.2.27-201
Ronnie Sahlberg [Mon, 30 May 2011 04:25:42 +0000 (14:25 +1000)]
new version 1.2.27-201

12 years agoRemove all checking of GPFS from ctdb_diagnostics
Ronnie Sahlberg [Wed, 11 May 2011 09:50:09 +0000 (19:50 +1000)]
Remove all checking of GPFS from ctdb_diagnostics

CQ S1023524

12 years agoWhen using multiple VLANs, some funky stuff can sometimes happen when
Ronnie Sahlberg [Thu, 12 May 2011 00:24:46 +0000 (10:24 +1000)]
When using multiple VLANs, some funky stuff can sometimes happen when
adding/removing IP addresses causing routes might be dropped by the system.

The easiest workaround for this is to unconditionally try to reapply
all static routes for all interfaces once ipreallocation has finished,
not just adding them back on the affected interface.

This worksaround a funky issue in
CQ S1023538

12 years agoVerify that state is not NULL before we dereference it in
Ronnie Sahlberg [Mon, 23 May 2011 02:19:36 +0000 (12:19 +1000)]
Verify that state is not NULL before we dereference it in
ctdb_event_script_hndler().

This should not be possible since hndler is called through an event that is a child of state itself.

CQ1023707

12 years agorenme to version 1.2.27-200 to leve some spce so we dont collide with
Ronnie Sahlberg [Mon, 16 May 2011 21:50:18 +0000 (07:50 +1000)]
renme to version 1.2.27-200 to leve some spce so we dont collide with
the ptf1 branch

12 years agoNew version 1.2.27-3
Ronnie Sahlberg [Tue, 10 May 2011 04:53:53 +0000 (14:53 +1000)]
New version 1.2.27-3

12 years agoIf samba fails to start for some reason, make this cause the startup event to fail...
Ronnie Sahlberg [Mon, 9 May 2011 22:25:27 +0000 (08:25 +1000)]
If samba fails to start for some reason, make this cause the startup event to fail too,   so that ctdbd will re-try the startup event later.
Or else this will leave samba not running.

CQ S1023394

12 years agoDont exit from checking interfaces once we have found one interface that is not
Ronnie Sahlberg [Mon, 9 May 2011 20:19:34 +0000 (06:19 +1000)]
Dont exit from checking interfaces once we have found one interface that is not
in use by public addresses.   this can happen when we have removed existing interfaces/ip addresses and prevents us from verifying the status of other interfaces

13 years agobonding mode 4 monitoring:
Ronnie Sahlberg [Tue, 12 Apr 2011 21:51:36 +0000 (07:51 +1000)]
bonding mode 4 monitoring:
we can not just check if MII Status is up for bonding mode 4, since the kernel will always report the bond device as UP
even if all cables are disconneccted.

For mode 4, ignore the status of the bond device and instead chek if at least one slave interface is up
when determining if the device is good or bad

13 years agoNew version 1.2.27-2
Ronnie Sahlberg [Sun, 10 Apr 2011 21:42:44 +0000 (07:42 +1000)]
New version 1.2.27-2

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
13 years agoIFACE handling. Assume links are always good on nstartup (they almost always
Ronnie Sahlberg [Sun, 10 Apr 2011 19:56:14 +0000 (05:56 +1000)]
IFACE handling. Assume links are always good on nstartup (they almost always

Simplify the handling of setting the links in the 10.interface eventscript
and remove the optimization to only call setifacelink on state change
to make the code simpler to read.

If a take ip event fails, flag the node as unhealthy.

Add a check to the interface script to check if the interface exists
or if it has been deleted.
So that we can capture and become UNHELTHY if someone deletes an interface
we are using to host public addresses.

13 years agoThis needs more testing first
Ronnie Sahlberg [Mon, 21 Mar 2011 03:28:22 +0000 (14:28 +1100)]
This needs more testing first

Revert "ctdbd: call tdb_reopen_all() in freeze child."

This reverts commit 1e30004f0c63572d721a2c2f53d8a6bccdb5ec45.

13 years agoNew version 1.2.27
Ronnie Sahlberg [Mon, 21 Mar 2011 02:02:31 +0000 (13:02 +1100)]
New version 1.2.27

13 years agoctdbd: call tdb_reopen_all() in freeze child.
Rusty Russell [Mon, 21 Mar 2011 02:37:17 +0000 (13:07 +1030)]
ctdbd: call tdb_reopen_all() in freeze child.

In theory, the ctdbd parent shouldn't be holding any locks, but it's a good
idea to always call tdb_reopen_all() after a fork().

13 years agoctdbd: fix lock held on error ("ctdb_req_dmaster from non-master.")
Rusty Russell [Mon, 21 Mar 2011 02:33:01 +0000 (13:03 +1030)]
ctdbd: fix lock held on error ("ctdb_req_dmaster from non-")

We should release the lock on the record before returning; otherwise the
recovery (which tries to freeze the database) will fail.  Symptoms are as
follows:

ctdbd: pnn 15 dmaster request for new-dmaster 19 from non-master 1 real-dmaster=5 key f049c3c8 dbid 0x6cf2837d gen=1148812532 curgen=1148812532 c->rsn=2 header.rsn=15 reqid=2147483585 keyval=0x4f464e49
ctdbd: ctdb_req_dmaster from non-master. Force a recovery.
...
ctdbd: freeze_lock-1:server/ctdb_freeze.c:55 Failed to lock database registry.tdb

CQ:1022545

13 years agonew version 1.2.26
Ronnie Sahlberg [Sun, 20 Mar 2011 21:51:20 +0000 (08:51 +1100)]
new version 1.2.26

13 years agoDeferred attach: create the timed event as a child context of the da context we want...
Ronnie Sahlberg [Wed, 16 Mar 2011 03:55:58 +0000 (14:55 +1100)]
Deferred attach: create the timed event as a child context of the da context we want to delete.
Othwervise the da context can be timed out and talloc_free()d
but the event for this already freed object will still trigger,
causing a talloc error and shutdown.

CQ S1022515

13 years agoNew version 1.2.25
Ronnie Sahlberg [Sun, 13 Mar 2011 23:12:37 +0000 (10:12 +1100)]
New version 1.2.25

13 years agoIP reallocation. If a public address is already hosted on the node when we startup...
Ronnie Sahlberg [Sun, 13 Mar 2011 22:55:28 +0000 (09:55 +1100)]
IP reallocation. If a public address is already hosted on the node when we startup, log a warning message but do not cause the recovery to fail.

CQ S1022356

13 years agoVacuuming: initialize a variable to avoid a harmless valgrind hit
Ronnie Sahlberg [Sun, 13 Mar 2011 00:30:52 +0000 (11:30 +1100)]
Vacuuming: initialize a variable to avoid a harmless valgrind hit

13 years agoDont allow clients to connect to databases untile we are well past and through
Ronnie Sahlberg [Fri, 11 Mar 2011 22:42:07 +0000 (09:42 +1100)]
Dont allow clients to connect to databases untile we are well past and through
the initial recovery phase

CQ S1022412

13 years agoNew version 1.2.24.
Michael Adam [Mon, 21 Feb 2011 04:55:16 +0000 (15:55 +1100)]
New version 1.2.24.

13 years agovacuum: fix a comment typo
Michael Adam [Fri, 11 Mar 2011 15:05:44 +0000 (16:05 +0100)]
vacuum: fix a comment typo

13 years agovacuum: use insert_record_into_delete_queue in ctdb_local_schedule_for_deletion.
Michael Adam [Fri, 11 Mar 2011 14:57:45 +0000 (15:57 +0100)]
vacuum: use insert_record_into_delete_queue in ctdb_local_schedule_for_deletion.

This is to take advantage of the hash collision handling and logging
also in ctdb_local_schedule_for_deletion.

13 years agovacuum: refactor insert_record_into_delete_queue out of ctdb_control_schedule_for_del...
Michael Adam [Fri, 11 Mar 2011 14:55:52 +0000 (15:55 +0100)]
vacuum: refactor insert_record_into_delete_queue out of ctdb_control_schedule_for_deletion

13 years agovacuum: raise a debug level from INFO to DEBUG
Michael Adam [Fri, 11 Mar 2011 13:57:15 +0000 (14:57 +0100)]
vacuum: raise a debug level from INFO to DEBUG

when overwriting an existing entry in the delete_queue.

13 years agoserver: add a comment explaining the call redirect logic in ctdb_call_send_redirect().
Michael Adam [Wed, 24 Nov 2010 07:01:01 +0000 (08:01 +0100)]
server: add a comment explaining the call redirect logic in ctdb_call_send_redirect().

13 years agoctdb_ltdb_store_server: honour the AUTOMATIC record flag
Michael Adam [Thu, 3 Feb 2011 15:32:23 +0000 (16:32 +0100)]
ctdb_ltdb_store_server: honour the AUTOMATIC record flag

Do not delete empty records that carry this flag but store
them and schedule them for deletetion. Do not store the flag
in the ltdb though, since this is internal only and should not
be visible to the client.

13 years agoltdb: add the CTDB_REC_FLAG_AUTOMATIC to the initial header in ctdb_ltdb_fetch()
Michael Adam [Thu, 3 Feb 2011 15:30:52 +0000 (16:30 +0100)]
ltdb: add the CTDB_REC_FLAG_AUTOMATIC to the initial header in ctdb_ltdb_fetch()

Signals that this record was not created by a client level store.

13 years agoctdb_private.h: add record flag CTDB_REC_FLAG_AUTOMATIC
Michael Adam [Thu, 3 Feb 2011 15:27:42 +0000 (16:27 +0100)]
ctdb_private.h: add record flag CTDB_REC_FLAG_AUTOMATIC

This is a flag that shall signa that a record has been automatically generated by ctdb
and not by an explicit client store operation. This will be used in the ctdb_ltdb_fetch
operation which stores an empty record with default initial header before trying to
migrate the record from the dmaster when the record does not exist in the local tdb.

13 years agoctdb_ltdb_store_server: add ability to send SCHEDULE_FOR_DELETION control to ctdb_ltd...
Michael Adam [Tue, 28 Dec 2010 12:19:22 +0000 (13:19 +0100)]
ctdb_ltdb_store_server: add ability to send SCHEDULE_FOR_DELETION control to ctdb_ltdb_store.

13 years agoctdb_ltdb_store_server: Improve debug message in ctdb_ltdb_store when store or delete...
Michael Adam [Tue, 21 Dec 2010 17:08:11 +0000 (18:08 +0100)]
ctdb_ltdb_store_server: Improve debug message in ctdb_ltdb_store when store or delete fails.

13 years agoctdb_ltdb_store_server: always store the data when ctdb_ltdb_store() is called from...
Michael Adam [Tue, 21 Dec 2010 16:50:52 +0000 (17:50 +0100)]
ctdb_ltdb_store_server: always store the data when ctdb_ltdb_store() is called from the client

This also fixes a segfault since ctdb_lmaster uses the vnn_map.

13 years agoctdb_ltdb_store_server: implement fastpath vacuuming deletion based on VACUUM_MIGRATE...
Michael Adam [Fri, 10 Dec 2010 13:13:50 +0000 (14:13 +0100)]
ctdb_ltdb_store_server: implement fastpath vacuuming deletion based on VACUUM_MIGRATED flag.

When the record has been obtained by the lmaster as part of the vacuuming-fetch
handler and it is empty and never been migrated with data, then such records
are deleted instead of being stored. These records have automatically been
deleted when leaving the former dmaster, so that they vanish for good when
hitting the lmaster in this way. This will reduces the load on traditional
vacuuming.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

13 years agoctdb_ltdb_store_server: delete an empty record that is safe to delete instead of...
Michael Adam [Fri, 3 Dec 2010 14:29:21 +0000 (15:29 +0100)]
ctdb_ltdb_store_server: delete an empty record that is safe to delete instead of storing locally.

When storing a record that is being migrated off to another node
and has never been migrated with data, then we can safely delete it
from the local tdb instead of storing the record with empty data.

Note: This record is not deleted if we are its lmaster or dmaster.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

13 years agoserver: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs
Michael Adam [Thu, 30 Dec 2010 17:19:32 +0000 (18:19 +0100)]
server: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs

This is realized by adding a ctdb_ltdb_store_fn function pointer to the db
context and filling it in the attach procedure for non-persistent dbs.

13 years agoserver: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().
Michael Adam [Thu, 30 Dec 2010 16:44:51 +0000 (17:44 +0100)]
server: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().

This is supposed to contain logic for deleting records that are safe
to delete and scheduling records for deletion. It will be called in
server context for non-persistent databases instead of the standard
ctdb_ltdb_store() function.

13 years agodaemon: fill ctdb->ctdbd_pid early
Michael Adam [Tue, 28 Dec 2010 12:14:23 +0000 (13:14 +0100)]
daemon: fill ctdb->ctdbd_pid early

13 years agotest: send SCHEDULE_FOR_DELETION control from randrec test.
Michael Adam [Tue, 21 Dec 2010 14:29:46 +0000 (15:29 +0100)]
test: send SCHEDULE_FOR_DELETION control from randrec test.

13 years agoclient: add accessor function ctdb_header_from_record_handle().
Michael Adam [Tue, 21 Dec 2010 14:29:23 +0000 (15:29 +0100)]
client: add accessor function ctdb_header_from_record_handle().

13 years agovacuum: add ctdb_local_schedule_for_deletion()
Michael Adam [Tue, 28 Dec 2010 12:13:34 +0000 (13:13 +0100)]
vacuum: add ctdb_local_schedule_for_deletion()

13 years agoserver: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.
Michael Adam [Tue, 21 Dec 2010 13:25:48 +0000 (14:25 +0100)]
server: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.

13 years agocontrol: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION
Michael Adam [Tue, 8 Mar 2011 23:57:55 +0000 (00:57 +0100)]
control: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION

13 years agocontrol: add macro CHECK_CONTROL_MIN_DATA_SIZE.
Michael Adam [Tue, 8 Mar 2011 23:56:25 +0000 (00:56 +0100)]
control: add macro CHECK_CONTROL_MIN_DATA_SIZE.

This is for the control dispatcher to check whether the input data has
a required minimum size.

13 years agovacuum: lower level of hash collision debug message to INFO
Michael Adam [Thu, 23 Dec 2010 10:54:09 +0000 (11:54 +0100)]
vacuum: lower level of hash collision debug message to INFO

13 years agovacuum: add statistics output to the fast and full traverse runs.
Michael Adam [Wed, 22 Dec 2010 23:27:27 +0000 (00:27 +0100)]
vacuum: add statistics output to the fast and full traverse runs.

13 years agovacuum: refactor insert_delete_record_data_into_tree() out of add_record_to_delete_tree()
Michael Adam [Tue, 21 Dec 2010 13:19:00 +0000 (14:19 +0100)]
vacuum: refactor insert_delete_record_data_into_tree() out of add_record_to_delete_tree()

for reuse in filling the delete_queue.

13 years agovacuum: change all Vacuum*Interval tunables to default to 10
Michael Adam [Mon, 20 Dec 2010 20:43:41 +0000 (21:43 +0100)]
vacuum: change all Vacuum*Interval tunables to default to 10

So, by default we have a fastpath vacuuming every 10 seconds and
full blown db-traverse vacuuming once every 10 minutes.

13 years agovacuum: disable full db-traverse vacuuming runs when VacuumFastPathCount == 0
Michael Adam [Mon, 20 Dec 2010 20:30:39 +0000 (21:30 +0100)]
vacuum: disable full db-traverse vacuuming runs when VacuumFastPathCount == 0

13 years agovacuum: Only run full vacuumig (db traverse) every VacuumFastPathCount times.
Michael Adam [Mon, 20 Dec 2010 17:03:38 +0000 (18:03 +0100)]
vacuum: Only run full vacuumig (db traverse) every VacuumFastPathCount times.

13 years agovacuum: reset the fast path count in the event handle if it exceeds the limit.
Michael Adam [Mon, 20 Dec 2010 16:54:04 +0000 (17:54 +0100)]
vacuum: reset the fast path count in the event handle if it exceeds the limit.

13 years agovacuum: bump the number of fast-path runs in the vacuum child destructor
Michael Adam [Mon, 20 Dec 2010 16:49:29 +0000 (17:49 +0100)]
vacuum: bump the number of fast-path runs in the vacuum child destructor

13 years agovacuum: add a fast_path_count to the vacuum_handle.
Michael Adam [Mon, 20 Dec 2010 16:44:02 +0000 (17:44 +0100)]
vacuum: add a fast_path_count to the vacuum_handle.

13 years agoAdd a tunable VacuumFastPathCount.
Michael Adam [Mon, 20 Dec 2010 16:42:25 +0000 (17:42 +0100)]
Add a tunable VacuumFastPathCount.

This will control how many fast-path vacuuming runs wil have to
be done, before a full vacuuming will be triggered, i.e. one with
a db-traversal.

13 years agovacuum: traverse the delete_queue befor traversing the database.
Michael Adam [Mon, 20 Dec 2010 16:25:35 +0000 (17:25 +0100)]
vacuum: traverse the delete_queue befor traversing the database.

13 years agovacuum: add delete_queue_traverse() for traversal of the delete_queue.
Michael Adam [Mon, 20 Dec 2010 16:24:32 +0000 (17:24 +0100)]
vacuum: add delete_queue_traverse() for traversal of the delete_queue.

13 years agovacuum: reduce indentation in add_record_to_delete_tree()
Michael Adam [Tue, 21 Dec 2010 10:22:50 +0000 (11:22 +0100)]
vacuum: reduce indentation in add_record_to_delete_tree()

This simplyfies the logical structure a bit by using early return.

13 years agovacuum: refactor new add_record_to_delete_tree() out of vacuum_traverse().
Michael Adam [Mon, 20 Dec 2010 16:11:27 +0000 (17:11 +0100)]
vacuum: refactor new add_record_to_delete_tree() out of vacuum_traverse().

This will be reused by the traversal of the delete_queue list.

13 years agovacuum: skip adding records to list of records to send to lmaster on lmaster
Michael Adam [Mon, 20 Dec 2010 15:41:13 +0000 (16:41 +0100)]
vacuum: skip adding records to list of records to send to lmaster on lmaster

This list is skipped afterwards when the lists are processed.

13 years agovacuum: refactor new add_record_to_vacuum_fetch_list() out of vacuum_traverse().
Michael Adam [Mon, 20 Dec 2010 15:31:27 +0000 (16:31 +0100)]
vacuum: refactor new add_record_to_vacuum_fetch_list() out of vacuum_traverse().

This is the function that fills the list of records to send to each lmaster
with the VACUUM_FETCH message.

This function will be reused in the traverse function for the delete_queue.

13 years agoserver: rename ctdb_repack_db() to ctdb_vacuum_and_repack_db()
Michael Adam [Mon, 20 Dec 2010 09:55:53 +0000 (10:55 +0100)]
server: rename ctdb_repack_db() to ctdb_vacuum_and_repack_db()

13 years agoWhen wiping a database, clear the delete_queue.
Michael Adam [Fri, 17 Dec 2010 01:22:02 +0000 (02:22 +0100)]
When wiping a database, clear the delete_queue.

13 years agovaccum: clear the fast-path vacuuming delete_queue after creating the vacuuming child.
Michael Adam [Fri, 17 Dec 2010 00:53:25 +0000 (01:53 +0100)]
vaccum: clear the fast-path vacuuming delete_queue after creating the vacuuming child.

Maybe we should keep a copy for the case that the vacuuming fails?

13 years agoWhen attaching to a non-persistent DB, initialize the delete_queue.
Michael Adam [Fri, 17 Dec 2010 00:38:09 +0000 (01:38 +0100)]
When attaching to a non-persistent DB, initialize the delete_queue.

13 years agoAdd a delete_queue to the ctdb database context struct.
Michael Adam [Wed, 22 Dec 2010 13:50:53 +0000 (14:50 +0100)]
Add a delete_queue to the ctdb database context struct.

This list will be filled by the client using a new
delete control. The list will then be used to implement
a fast-path vacuuming that will traverse this list instead
of traversing the database.

13 years agocall: becoming dmaster in VACUUM_MIGRATION, set the VACUUM_MIGRATED record flag
Michael Adam [Fri, 10 Dec 2010 13:11:38 +0000 (14:11 +0100)]
call: becoming dmaster in VACUUM_MIGRATION, set the VACUUM_MIGRATED record flag

This temporary flag is used for the local record storage function to
decide whether to delete an empty record which has never been migrated
with data as part of the fast-path vacuuming process or, or to store
the record.

13 years agocall: hand the submitted record_flags to local record storage function.
Michael Adam [Fri, 10 Dec 2010 13:07:21 +0000 (14:07 +0100)]
call: hand the submitted record_flags to local record storage function.

13 years agocall: transfer the record flags in the ctdb call packets.
Michael Adam [Fri, 10 Dec 2010 13:02:33 +0000 (14:02 +0100)]
call: transfer the record flags in the ctdb call packets.

This way, the MIGRATED_WITH_DATA information can be transported
along with the records. This is important for vacuuming to function
properly.

The record flags are appended to the data section of the ctdb_req_dmaster
and ctdb_reply_dmaster structs.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

13 years agoserver: in the VACUUM_FETCH handler, add the VACUUM_MIGRAION to the call flags
Michael Adam [Fri, 10 Dec 2010 12:59:37 +0000 (13:59 +0100)]
server: in the VACUUM_FETCH handler, add the VACUUM_MIGRAION to the call flags

This way, the records coming in via this handler, can be treated appropriately.
Namely, they can be deleted instead of being stored when the meet the fast-path
vacuuming criteria (empty, never migrated with data...)

13 years agoadd a new record flag CTDB_REC_FLAG_VACUUM_MIGRATED.
Michael Adam [Fri, 10 Dec 2010 12:57:01 +0000 (13:57 +0100)]
add a new record flag CTDB_REC_FLAG_VACUUM_MIGRATED.

This is to be used internally. The purpose is to flag a record
as been migrated by a VACUUM_MIGRATION, which is triggered by
a VACUUM_FETCH message as part of the vacuuming. The local store
routine will base its decision whether to delete or to store
the record (among other things) upon the value of this flag.

This flag should never be stored in the local database copies.

13 years agocall: Move definition of call flags down to the definition of the flags field.
Michael Adam [Fri, 10 Dec 2010 13:22:55 +0000 (14:22 +0100)]
call: Move definition of call flags down to the definition of the flags field.

13 years agocall: add new call flag CTDB_CALL_FLAG_VACUUM_MIGRATION
Michael Adam [Fri, 10 Dec 2010 13:24:40 +0000 (14:24 +0100)]
call: add new call flag CTDB_CALL_FLAG_VACUUM_MIGRATION

This is to be used when the CTDB_SRVID_VACUUM_FETCH message
triggers the migration of deleted records to the lmaster.
The lmaster can then delete records that have not been
migrated with data instead of storing them.

13 years agorecoverd: in a recovery, set the MIGRATED_WITH_DATA flag on all records
Michael Adam [Fri, 3 Dec 2010 14:24:06 +0000 (15:24 +0100)]
recoverd: in a recovery, set the MIGRATED_WITH_DATA flag on all records

Those records that are kept after recovery, are non-empty, and
stored identically on all nodes. So this is as if they had been
migrated with data.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

13 years agoserver: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag
Michael Adam [Fri, 3 Dec 2010 14:21:51 +0000 (15:21 +0100)]
server: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag

13 years agovacuum: check lmaster against num_nodes instead of vnn_map->size
Michael Adam [Thu, 3 Feb 2011 11:15:41 +0000 (12:15 +0100)]
vacuum: check lmaster against num_nodes instead of vnn_map->size

When lmaster is bigger than the biggest recorded node number,
then exit the traverse with error.

13 years agovacuum: reduce indentation of the loop sending VACUUM_FETCH controls
Michael Adam [Thu, 3 Feb 2011 16:47:36 +0000 (17:47 +0100)]
vacuum: reduce indentation of the loop sending VACUUM_FETCH controls

This slightly improves the code structure in that loop.

13 years agovacuum: correctly send TRY_DELETE_RECORDS ctrl to all active nodes
Michael Adam [Thu, 3 Feb 2011 11:26:45 +0000 (12:26 +0100)]
vacuum: correctly send TRY_DELETE_RECORDS ctrl to all active nodes

Originally, the control was sent to all records in the vnn_map, but
there was something still missing here:
When a node can not become lmaster (via CTDB_CAPABILITY_LMASTER=no)
then it will not be part of the vnn_map. So such a node would
be active but never receive the TRY_DELETE_RECORDS control from a
vacuuming run.

This is fixed in this change by correctly building the list of
active nodes first in the same way that the recovery process does it.

13 years agovacuum: in ctdb_vacuum_db, fix the length of the array of vacuum fetch lists
Michael Adam [Thu, 3 Feb 2011 11:18:58 +0000 (12:18 +0100)]
vacuum: in ctdb_vacuum_db, fix the length of the array of vacuum fetch lists

This patch fixes segfaults in the vacuum child when at least one
node has been stopped or removed from the cluster:

The size of the vnn_map is only the number of active nodes
(that can be lmaster). But the node numbers that are referenced
by the vnn_map spread over all configured nodes.

Since the array of vacuum fetch lists is referenced by the
key's lmaster's node number later on, the array needs to
be of size num_nodes instad of vnn_map->size.

13 years agoFix typos in a comment in vacuum_traverse.
Michael Adam [Mon, 20 Dec 2010 15:26:50 +0000 (16:26 +0100)]
Fix typos in a comment in vacuum_traverse.

13 years agotests: fix segfault in store test when connection to ctdbd failed.
Michael Adam [Tue, 21 Dec 2010 16:18:03 +0000 (17:18 +0100)]
tests: fix segfault in store test when connection to ctdbd failed.

13 years agotests: fix segfault in fetch_one test when connection to ctdbd fails
Michael Adam [Tue, 21 Dec 2010 16:15:41 +0000 (17:15 +0100)]
tests: fix segfault in fetch_one test when connection to ctdbd fails

13 years agotests: fix segfault in fetch test when connection to ctdb failed.
Michael Adam [Tue, 21 Dec 2010 16:14:33 +0000 (17:14 +0100)]
tests: fix segfault in fetch test when connection to ctdb failed.

13 years agotests: fix segfault in randrec test when connection to daemon fails.
Michael Adam [Tue, 21 Dec 2010 16:11:26 +0000 (17:11 +0100)]
tests: fix segfault in randrec test when connection to daemon fails.

13 years agogitignore: add tags file
Michael Adam [Fri, 3 Dec 2010 14:39:44 +0000 (15:39 +0100)]
gitignore: add tags file

13 years agogitignore: add vi swap files
Michael Adam [Fri, 3 Dec 2010 14:39:26 +0000 (15:39 +0100)]
gitignore: add vi swap files

13 years agoRestart recovery dameon if it looks like it hung.
Ronnie Sahlberg [Thu, 3 Mar 2011 19:55:24 +0000 (06:55 +1100)]
Restart recovery dameon if it looks like it hung.
Dont shutdown ctdbd completely, that only makes the problem worse.

13 years agoIf/when the recovery daemon terminates unexpectedly, try to restart it again from...
Ronnie Sahlberg [Tue, 1 Mar 2011 01:09:42 +0000 (12:09 +1100)]
If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too.

While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust.

13 years agonew version 1.2.23
Ronnie Sahlberg [Thu, 24 Feb 2011 23:46:16 +0000 (10:46 +1100)]
new version 1.2.23

13 years agoATTACH_DB: simplify the code slightly and change the semantics to only
Ronnie Sahlberg [Thu, 24 Feb 2011 23:33:12 +0000 (10:33 +1100)]
ATTACH_DB: simplify the code slightly and change the semantics to only
refuse a db attach during recovery IF we can associate the request from a
genuine real client instead of deciding this on whether client_id is zero or

This will suppress/avoid messages like these :
DB Attach to database %s refused. Can not match clientid...

13 years agoNew version 1.2.22.
Michael Adam [Mon, 21 Feb 2011 04:55:16 +0000 (15:55 +1100)]
New version 1.2.22.

13 years agorecover: finish pending trans3 commits when a recovery is finished.
Michael Adam [Wed, 23 Feb 2011 16:39:57 +0000 (17:39 +0100)]
recover: finish pending trans3 commits when a recovery is finished.

When the end_recovery control is received, pending trans3 commits are
finished. During the recovery, all the actions like persistent_callback
and persistent_store_timeout had been disabled to let the recovery do
its job. After the recover is completed, send the reply to the waiting
clients.