tdb: Do not return errors from tdb_repack() in the tail of tdb_transaction_commit()
authorAndrew Bartlett <abartlet@samba.org>
Thu, 16 May 2019 04:14:13 +0000 (16:14 +1200)
committerJeremy Allison <jra@samba.org>
Fri, 17 May 2019 06:48:10 +0000 (06:48 +0000)
The call to tdb_repack() inside tdb_transaction_commit()
is an optimization, not part of the transaction itself,
so failing due to lock or other errors isn't a fatal error
that should cause the caller to think the transaction was
a failure by returning -1.

The tdb transaction itself has finished and been committed
onto stable storage via fsync and all locks released at the
point tdb_repack() is called.

tdb_repack() is only called here as it's a convenient point
to attempt to reduce tdb fragmentation without having to add
a timer call to repack in all users of tdb.

This causes lock ordering issues in Samba, showing up as:

ldb: ltdb: tdb(../private/sam.ldb.d/DC=SAMBA2008R2,DC=EXAMPLE,DC=COM.ldb): tdb_transaction_prepare_commit: failed to upgrade hash locks: Locking error

This is because Samba has multiple tdb databases open, and the lock order between them
is important.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13952

Signed-off-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
lib/tdb/common/transaction.c

index 73d02b6..b67f84f 100644 (file)
@@ -1209,7 +1209,29 @@ _PUBLIC_ int tdb_transaction_commit(struct tdb_context *tdb)
        _tdb_transaction_cancel(tdb);
 
        if (need_repack) {
-               return tdb_repack(tdb);
+               int ret = tdb_repack(tdb);
+               if (ret != 0) {
+                       TDB_LOG((tdb, TDB_DEBUG_FATAL,
+                                __location__ " Failed to repack database (not fatal)\n"));
+               }
+               /*
+                * Ignore the error.
+                *
+                * Why?
+                *
+                * We just committed to the DB above, so anything
+                * written during the transaction is committed, the
+                * caller needs to know that the long-term state was
+                * successfully modified.
+                *
+                * tdb_repack is an optimization that can fail for
+                * reasons like lock ordering and we cannot recover
+                * the transaction lock at this point, having released
+                * it above.
+                *
+                * If we return a failure the caller thinks the
+                * transaction was rolled back.
+                */
        }
 
        return 0;