btrfs: fix possible free space tree corruption with online conversion
authorJosef Bacik <josef@toxicpanda.com>
Fri, 15 Jan 2021 21:26:17 +0000 (16:26 -0500)
committerDavid Sterba <dsterba@suse.com>
Mon, 25 Jan 2021 17:44:37 +0000 (18:44 +0100)
While running btrfs/011 in a loop I would often ASSERT() while trying to
add a new free space entry that already existed, or get an EEXIST while
adding a new block to the extent tree, which is another indication of
double allocation.

This occurs because when we do the free space tree population, we create
the new root and then populate the tree and commit the transaction.
The problem is when you create a new root, the root node and commit root
node are the same.  During this initial transaction commit we will run
all of the delayed refs that were paused during the free space tree
generation, and thus begin to cache block groups.  While caching block
groups the caching thread will be reading from the main root for the
free space tree, so as we make allocations we'll be changing the free
space tree, which can cause us to add the same range twice which results
in either the ASSERT(ret != -EEXIST); in __btrfs_add_free_space, or in a
variety of different errors when running delayed refs because of a
double allocation.

Fix this by marking the fs_info as unsafe to load the free space tree,
and fall back on the old slow method.  We could be smarter than this,
for example caching the block group while we're populating the free
space tree, but since this is a serious problem I've opted for the
simplest solution.

CC: stable@vger.kernel.org # 4.9+
Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
fs/btrfs/block-group.c
fs/btrfs/ctree.h
fs/btrfs/free-space-tree.c

index 0886e81e554020daf145ba6a0b0044e6aeb5fc54..48ebc106a606c73582ef009ca6c9bcb19ef5a50b 100644 (file)
@@ -673,7 +673,15 @@ static noinline void caching_thread(struct btrfs_work *work)
                wake_up(&caching_ctl->wait);
        }
 
-       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
+       /*
+        * If we are in the transaction that populated the free space tree we
+        * can't actually cache from the free space tree as our commit root and
+        * real root are the same, so we could change the contents of the blocks
+        * while caching.  Instead do the slow caching in this case, and after
+        * the transaction has committed we will be safe.
+        */
+       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
+           !(test_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags)))
                ret = load_free_space_tree(caching_ctl);
        else
                ret = load_extent_tree_free(caching_ctl);
index 0225c5208f44c83d20b928b5ff4636dbb236c590..47ca8edafb5e6e1b95a8064d79f597aca6dd8e80 100644 (file)
@@ -564,6 +564,9 @@ enum {
 
        /* Indicate that we need to cleanup space cache v1 */
        BTRFS_FS_CLEANUP_SPACE_CACHE_V1,
+
+       /* Indicate that we can't trust the free space tree for caching yet */
+       BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
 };
 
 /*
index e33a65bd9a0c281a83bc824dcd5160209583c3f2..a33bca94d133ecb6022df27da5e2655f58c93fc3 100644 (file)
@@ -1150,6 +1150,7 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
                return PTR_ERR(trans);
 
        set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+       set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
        free_space_root = btrfs_create_tree(trans,
                                            BTRFS_FREE_SPACE_TREE_OBJECTID);
        if (IS_ERR(free_space_root)) {
@@ -1171,11 +1172,18 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
        btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE);
        btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID);
        clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+       ret = btrfs_commit_transaction(trans);
 
-       return btrfs_commit_transaction(trans);
+       /*
+        * Now that we've committed the transaction any reading of our commit
+        * root will be safe, so we can cache from the free space tree now.
+        */
+       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
+       return ret;
 
 abort:
        clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
        btrfs_abort_transaction(trans, ret);
        btrfs_end_transaction(trans);
        return ret;