sfrench/cifs-2.6.git
6 years agobtrfs: remove unused setup_root_args()
Misono, Tomohiro [Thu, 14 Dec 2017 08:25:54 +0000 (17:25 +0900)]
btrfs: remove unused setup_root_args()

Since setup_root_args() is not used anymore, just remove it.

Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: split parse_early_options() in two
Misono, Tomohiro [Thu, 14 Dec 2017 08:25:28 +0000 (17:25 +0900)]
btrfs: split parse_early_options() in two

Now parse_early_options() is used by both btrfs_mount() and
btrfs_mount_root(). However, the former only needs subvol related part
and the latter needs the others.

Therefore extract the subvol related parts from parse_early_options() and
move it to new parse function (parse_subvol_options()).

Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: cleanup btrfs_mount() using btrfs_mount_root()
Misono, Tomohiro [Thu, 14 Dec 2017 08:25:01 +0000 (17:25 +0900)]
btrfs: cleanup btrfs_mount() using btrfs_mount_root()

Cleanup btrfs_mount() by using btrfs_mount_root(). This avoids getting
btrfs_mount() called twice in mount path.

Old btrfs_mount() will do:
0. VFS layer calls vfs_kern_mount() with registered file_system_type
   (for btrfs, btrfs_fs_type). btrfs_mount() is called on the way.
1. btrfs_parse_early_options() parses "subvolid=" mount option and set the
   value to subvol_objectid. Otherwise, subvol_objectid has the initial
   value of 0
2. check subvol_objectid is 5 or not. Assume this time id is not 5, then
   btrfs_mount() returns by calling mount_subvol()
3. In mount_subvol(), original mount options are modified to contain
   "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with
   btrfs_fs_type and new options
4. btrfs_mount() is called again
5. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0)
   to subvol_objectid
6. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol()
   is not called. btrfs_mount() finishes mounting a root
7. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it
   calls mount_subtree()
8. return subvolume's dentry

Reusing the same file_system_type (and btrfs_mount()) for vfs_kern_mount()
is the cause of complication.

Instead, new btrfs_mount() will do:
1. parse subvol id related options for later use in mount_subvol()
2. mount device's root by calling vfs_kern_mount() with
   btrfs_root_fs_type, which is not registered to VFS by
   register_filesystem(). As a result, btrfs_mount_root() is called
3. return by calling mount_subvol()

The code of 2. is moved from the first part of mount_subvol().

The semantics of device holder changes from btrfs_fs_type to
btrfs_root_fs_type and has to be used in all contexts. Otherwise we'd
get wrong results when mount and dev scan would not check the same
thing. (this has been found indendently and the fix is folded into this
patch)

Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ fold the btrfs_control_ioctl fixup, extend the comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: add btrfs_mount_root() and new file_system_type
Misono, Tomohiro [Thu, 14 Dec 2017 08:24:30 +0000 (17:24 +0900)]
btrfs: add btrfs_mount_root() and new file_system_type

Add btrfs_mount_root() and new file_system_type for preparation of cleanup
of btrfs_mount(). Code path is not changed yet.

btrfs_mount_root() is almost the same as current btrfs_mount(), but doesn't
have subvolume related part.

Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: unify extent_page_data type passed as void
David Sterba [Thu, 30 Nov 2017 17:00:02 +0000 (18:00 +0100)]
btrfs: unify extent_page_data type passed as void

Functions called from extent_write_cache_pages used void* as generic
callback data, but all of them convert it to extent_page_data, or use it
directly.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink writepage parameter to extent_write_cache_pages
David Sterba [Fri, 23 Jun 2017 02:30:28 +0000 (04:30 +0200)]
btrfs: sink writepage parameter to extent_write_cache_pages

The function extent_write_cache_pages is modelled after
write_cache_pages which is a generic interface and the writepage
parameter makes sense there. In btrfs we know exactly which callback
we're going to use, so we can pass it directly.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink flush_fn to extent_write_cache_pages
David Sterba [Fri, 23 Jun 2017 02:30:28 +0000 (04:30 +0200)]
btrfs: sink flush_fn to extent_write_cache_pages

All callers pass the same value flush_write_bio.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: merge two flush_write_bio helpers
David Sterba [Fri, 23 Jun 2017 02:16:17 +0000 (04:16 +0200)]
btrfs: merge two flush_write_bio helpers

flush_epd_write_bio is same as flush_write_bio, no point having two such
functions. Merge them to flush_write_bio. The 'noinline' attribute is
removed as it does not have any meaning.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Rename bin_search -> btrfs_bin_search
Nikolay Borisov [Fri, 8 Dec 2017 14:27:43 +0000 (16:27 +0200)]
btrfs: Rename bin_search -> btrfs_bin_search

Currently there are 2 function doing binary search on btrfs nodes:
bin_search and btrfs_bin_search. The latter being a simple wrapper for
the former. So eliminate the wrapper and just rename bin_search to
btrfs_bin_search. No functional changes

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink extent_write_full_page tree argument
Nikolay Borisov [Fri, 8 Dec 2017 13:55:59 +0000 (15:55 +0200)]
btrfs: sink extent_write_full_page tree argument

The tree argument passed to extent_write_full_page is referenced from
the page being passed to the same function. Since we already have
enough information to get the reference, remove the function parameter.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink extent_write_locked_range tree parameter
Nikolay Borisov [Fri, 8 Dec 2017 13:55:58 +0000 (15:55 +0200)]
btrfs: sink extent_write_locked_range tree parameter

This function is called only from submit_compressed_extents and the
io tree being passed is always that of the inode. But we are also
passing the inode, so just move getting the io tree pointer in
extent_write_locked_range to simplify the signature.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove pair of bio_get/put in btrfs_schedule_bio
Nikolay Borisov [Mon, 11 Dec 2017 14:38:48 +0000 (16:38 +0200)]
btrfs: Remove pair of bio_get/put in btrfs_schedule_bio

This code was added in 492bb6deee34 ("Btrfs: Hold a reference on bios
during submit_bio, add some extra bio checks"). However, holding a
reference on a bio is necessary only if it's going to be referenced
after the submit_bio returns and the bio is completed. In this
particular instance this is not the case so there is no need to hold
an extra reference since we directly return.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Fix out of bounds access in btrfs_search_slot
Nikolay Borisov [Tue, 12 Dec 2017 09:14:49 +0000 (11:14 +0200)]
btrfs: Fix out of bounds access in btrfs_search_slot

When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
the level variable is going to be 7 (this is the max height of the
tree). On the other hand btrfs_cow_block is always called with
"level + 1" as an index into the nodes and slots arrays. This leads to
an out of bounds access. Admittdely this will be benign since an OOB
access of the nodes array will likely read the 0th element from the
slots array, which in this case is going to be 0 (since we start CoW at
the top of the tree). The OOB access into the slots array in turn will
read the 0th and 1st values of the locks array, which would both be 0
at the time. However, this benign behavior relies on the fact that the
path being passed hasn't been initialised, if it has already been used to
query a btree then it could potentially have populated the nodes/slots arrays.

Fix it by explicitly checking if we are at level 7 (the maximum allowed
index in nodes/slots arrays) and explicitly call the CoW routine with
NULL for parent's node/slot.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Fixes-coverity-id: 711515
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove duplicate includes
Pravin Shedge [Wed, 6 Dec 2017 16:44:31 +0000 (22:14 +0530)]
btrfs: remove duplicate includes

These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Handle btrfs_set_extent_delalloc failure in fixup worker
Nikolay Borisov [Tue, 5 Dec 2017 07:29:19 +0000 (09:29 +0200)]
btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker

This function was introduced by 247e743cbe6e ("Btrfs: Use async helpers
to deal with pages that have been improperly dirtied") and it didn't do
any error handling then. This function might very well fail in ENOMEM
situation, yet it's not handled, this could lead to inconsistent state.
So let's handle the failure by setting the mapping error bit.

Cc: stable@vger.kernel.org
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: put btrfs_ioctl_vol_args_v2 related defines together
Anand Jain [Wed, 6 Dec 2017 03:40:10 +0000 (11:40 +0800)]
btrfs: put btrfs_ioctl_vol_args_v2 related defines together

Just a code spatial rearrangement, no functional change.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: show options: use helper to convert compression type string
David Sterba [Tue, 31 Oct 2017 17:06:34 +0000 (18:06 +0100)]
btrfs: show options: use helper to convert compression type string

Use the helper, if the COMPRESS option is set, the result is always
defined and not empty.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: prop: use common helper for type to string conversion
David Sterba [Tue, 31 Oct 2017 16:55:14 +0000 (17:55 +0100)]
btrfs: prop: use common helper for type to string conversion

Use the helper for conversion, keep the semantics.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: SETFLAGS ioctl: use helper for compression type conversion
David Sterba [Tue, 31 Oct 2017 16:32:41 +0000 (17:32 +0100)]
btrfs: SETFLAGS ioctl: use helper for compression type conversion

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: compression: add helper for type to string conversion
David Sterba [Tue, 31 Oct 2017 16:24:26 +0000 (17:24 +0100)]
btrfs: compression: add helper for type to string conversion

There are several places opencoding this conversion, add a helper now
that we have 3 compression algorithms.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove redundant check in btrfs_get_extent_fiemap
Nikolay Borisov [Fri, 1 Dec 2017 09:19:43 +0000 (11:19 +0200)]
btrfs: remove redundant check in btrfs_get_extent_fiemap

Before returning hole_em in btrfs_get_fiemap_extent we check if it's different
than null. However, by the time this null check is triggered we already know
hole_em is not null because it means it points to the em we found and it
has already been dereferenced.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove unused variable in btrfs_get_extent
Nikolay Borisov [Fri, 1 Dec 2017 09:19:40 +0000 (11:19 +0200)]
btrfs: Remove unused variable in btrfs_get_extent

trans was statically assigned to NULL and this never changed over the
course of btrfs_get_extent. So remove any code which checks whether
trans != NULL and just hardcode the fact trans is always NULL.

Resolves-coverity-id: 112806
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: tree-checker: use %zu format string for size_t
Arnd Bergmann [Wed, 6 Dec 2017 14:18:14 +0000 (15:18 +0100)]
btrfs: tree-checker: use %zu format string for size_t

The return value of sizeof() is of type size_t, so we must print it
using the %z format modifier rather than %l to avoid this warning
on some architectures:

fs/btrfs/tree-checker.c: In function 'check_dir_item':
fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} [-Werror=format=]

Fixes: 005887f2e3e0 ("btrfs: tree-checker: Add checker for dir item")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: use struct completion in scrub_submit_raid56_bio_wait
Liu Bo [Fri, 1 Dec 2017 00:26:39 +0000 (17:26 -0700)]
Btrfs: use struct completion in scrub_submit_raid56_bio_wait

This changes to use struct completion directly and removes 'struct
scrub_bio_ret' along with the code using it.

This struct is used to get the return value from bio, but the caller can
access bio to get the return value directly and is holding a reference
on it so it won't go away underneath us and can be removed safely.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: remove unused variable wait in lock_stripe_add
Liu Bo [Tue, 5 Dec 2017 01:09:42 +0000 (18:09 -0700)]
Btrfs: remove unused variable wait in lock_stripe_add

The defined wait is not used anywhere.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: compress_file_range() change page dirty status once
Timofey Titovets [Mon, 23 Oct 2017 22:29:48 +0000 (01:29 +0300)]
Btrfs: compress_file_range() change page dirty status once

We need to call extent_range_clear_dirty_for_io()
on compression range to prevent application from changing
page content, while pages compressing.

extent_range_clear_dirty_for_io() runs on each loop iteration,
"(end - start)" can be much (up to 1024 times) bigger
then compression range (BTRFS_MAX_UNCOMPRESSED).

The start pointer is advanced each time we manage to compress part of
the range. The end pointer does not change so we could redirty the
remaining parts repeatedly.

Fix that behaviour by call extent_range_clear_dirty_for_io()
only once, the first time it happens.

This is the safest but probably not the best behaviour. Previous
iterations of the patch tried to redirty only the range that we were not
able to compress. This has been refused by David for safety reasons, the
writeout callchain is complex and there could be some path that relies
on redirtying the entire unwritten range.

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog, the history and safety concerns, add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: compression heuristic: replace heap sort with radix sort
Timofey Titovets [Sun, 3 Dec 2017 21:30:33 +0000 (00:30 +0300)]
Btrfs: compression heuristic: replace heap sort with radix sort

Slowest part of heuristic for now is kernel heap sort()
It's can take up to 55% of runtime on sorting bucket items.

As sorting will always call on most data sets to get correctly
byte_core_set_size, the only way to speed up heuristic, is to
speed up sort on bucket.

Add a general radix_sort function.
Radix sort require 2 buffers, one full size of input array
and one for store counters (jump addresses).

That increase usage per heuristic workspace +1KiB
8KiB + 1KiB -> 8KiB + 2KiB

That is LSD Radix, i use 4 bit as a base for calculating,
to make counters array acceptable small (16 elements * 8 byte).

That Radix sort implementation have several points to adjust,
I added him to make radix sort general usable in kernel,
like heap sort, if needed.

Performance tested in userspace copy of heuristic code,
throughput:
    - average <-> random data: ~3500 MiB/s - heap  sort
    - average <-> random data: ~6000 MiB/s - radix sort

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
[ coding style fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: cleanup device states define BTRFS_DEV_STATE_FLUSH_SENT
Anand Jain [Mon, 4 Dec 2017 04:54:56 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_FLUSH_SENT

Currently device state is being managed by each individual int
variable such as struct btrfs_device::is_tgtdev_for_dev_replace.
Instead of that declare btrfs_device::dev_state
BTRFS_DEV_STATE_FLUSH_SENT and use the bit operations.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT
Anand Jain [Mon, 4 Dec 2017 04:54:55 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT

Currently device state is being managed by each individual int
variable such as struct btrfs_device::is_tgtdev_for_dev_replace.
Instead of that declare btrfs_device::dev_state
BTRFS_DEV_STATE_MISSING and use the bit operations.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: cleanup device states define BTRFS_DEV_STATE_MISSING
Anand Jain [Mon, 4 Dec 2017 04:54:54 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_MISSING

Currently device state is being managed by each individual int
variable such as struct btrfs_device::missing. Instead of that
declare btrfs_device::dev_state BTRFS_DEV_STATE_MISSING and use
the bit operations.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by : Nikolay Borisov <nborisov@suse.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA
Anand Jain [Mon, 4 Dec 2017 04:54:53 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA

Currently device state is being managed by each individual int
variable such as struct btrfs_device::in_fs_metadata. Instead of
that declare device state BTRFS_DEV_STATE_IN_FS_METADATA and use
the bit operations.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: cleanup device states define BTRFS_DEV_STATE_WRITEABLE
Anand Jain [Mon, 4 Dec 2017 04:54:52 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_WRITEABLE

Currently device state is being managed by each individual int
variable such as struct btrfs_device::writeable. Instead of that
declare device state BTRFS_DEV_STATE_WRITEABLE and use the
bit operations.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: add helper for device path or missing
Anand Jain [Tue, 28 Nov 2017 02:43:10 +0000 (10:43 +0800)]
btrfs: add helper for device path or missing

This patch creates a helper function to get either the rcu device path
or missing.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ rename to btrfs_dev_name, switch to if/else ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: drop btrfs_device::can_discard to query directly
Anand Jain [Wed, 29 Nov 2017 10:53:43 +0000 (18:53 +0800)]
btrfs: drop btrfs_device::can_discard to query directly

We can query the bdev directly when needed at btrfs_discard_extent()
so drop btrfs_device::can_discard.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: make function update_share_count static
Colin Ian King [Thu, 30 Nov 2017 12:14:47 +0000 (12:14 +0000)]
btrfs: make function update_share_count static

The function update_share_count is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
fs/btrfs/backref.c:219:6: warning: symbol 'update_share_count' was not
declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove redundant FLAG_VACANCY
Nikolay Borisov [Thu, 23 Nov 2017 08:51:43 +0000 (10:51 +0200)]
btrfs: Remove redundant FLAG_VACANCY

Commit 9036c10208e1 ("Btrfs: update hole handling v2") added the
FLAG_VACANCY to denote holes, however there was already a consistent way
of flagging extents which represent hole - ->block_start =
EXTENT_MAP_HOLE. And also the only place where this flag is checked is
in the fiemap code, but the block_start value is also checked and every
other place in the filesystem detects holes by using block_start
value's. So remove the extra flag. This survived a full xfstest run.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: extent-tree: Make btrfs_inode_rsv_refill function static
Qu Wenruo [Fri, 17 Nov 2017 07:14:19 +0000 (15:14 +0800)]
btrfs: extent-tree: Make btrfs_inode_rsv_refill function static

This function is no longer used outside of extent-tree.c.
Make it static.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: move some zstd work data from stack to workspace
David Sterba [Wed, 15 Nov 2017 17:27:39 +0000 (18:27 +0100)]
btrfs: move some zstd work data from stack to workspace

* ZSTD_inBuffer in_buf
* ZSTD_outBuffer out_buf

are used in all functions to pass the compression parameters and the
local variables consume some space. We can move them to the workspace
and reduce the stack consumption:

zstd.c:zstd_decompress                        -24 (136 -> 112)
zstd.c:zstd_decompress_bio                    -24 (144 -> 120)
zstd.c:zstd_compress_pages                    -24 (264 -> 240)

Signed-off-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: reorder btrfs_transaction members for better packing
David Sterba [Wed, 8 Nov 2017 00:54:33 +0000 (01:54 +0100)]
btrfs: reorder btrfs_transaction members for better packing

There are now 20 bytes of holes, we can reduce that to 4 by minor
changes. Moving 'aborted' to the status and flags is also more logical,
similar for num_dirty_bgs. The size goes from 432 to 416.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use narrower type for btrfs_transaction::num_dirty_bgs
David Sterba [Wed, 8 Nov 2017 01:12:57 +0000 (02:12 +0100)]
btrfs: use narrower type for btrfs_transaction::num_dirty_bgs

The u64 is an overkill here, we could not possibly create that many
blockgroups in one transaction.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: reorder btrfs_trans_handle members for better packing
David Sterba [Wed, 8 Nov 2017 00:54:33 +0000 (01:54 +0100)]
btrfs: reorder btrfs_trans_handle members for better packing

Recent updates to the structure left some holes, reorder the types so
the packing is tight. The size goes from 112 to 104 on 64bit.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: switch to refcount_t type for btrfs_trans_handle::use_count
David Sterba [Wed, 8 Nov 2017 00:39:58 +0000 (01:39 +0100)]
btrfs: switch to refcount_t type for btrfs_trans_handle::use_count

The use_count is a reference counter, we can use the refcount_t type,
though we don't use the atomicity. This is not a performance critical
code and we could catch the underflows. The type is changed from long,
but the number of references will fit an int.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove unused member of btrfs_trans_handle
David Sterba [Wed, 8 Nov 2017 00:32:48 +0000 (01:32 +0100)]
btrfs: remove unused member of btrfs_trans_handle

Last user was removed in a monster commit a22285a6a32390195235171
("Btrfs: Integrate metadata reservation with start_transaction") in
2010.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: switch btrfs_trans_handle::adding_csums to bool
David Sterba [Wed, 8 Nov 2017 00:07:43 +0000 (01:07 +0100)]
btrfs: switch btrfs_trans_handle::adding_csums to bool

The semantics of adding_csums matches bool, 'short' was most likely used
to save space in a698d0755adb6f2 ("Btrfs: add a type field for the
transaction handle").

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: remove dead code from btrfs_get_extent
Edmund Nadolski [Mon, 20 Nov 2017 20:24:49 +0000 (13:24 -0700)]
btrfs: remove dead code from btrfs_get_extent

Due to new_inline logic, the create == 0 is always true at this
point in the code, so the create != 0 branch can be removed.

Signed-off-by: Edmund Nadolski <enadolski@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: btrfs_inode_log_parent should use defined inode_only values.
Edmund Nadolski [Mon, 20 Nov 2017 20:24:47 +0000 (13:24 -0700)]
btrfs: btrfs_inode_log_parent should use defined inode_only values.

Replace hardcoded numeric argument values for inode_only with the
constants defined for that use.

Signed-off-by: Edmund Nadolski <enadolski@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: switch to on-stack csum buffer in csum_tree_block
David Sterba [Mon, 6 Nov 2017 18:23:00 +0000 (19:23 +0100)]
btrfs: switch to on-stack csum buffer in csum_tree_block

The maximum size of a checksum buffer is known, BTRFS_CSUM_SIZE, and we
don't have to allocate it dynamically. This code path is not used at all
as we have only the crc32c and use an on-stack buffer already.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: set plug for fsync
Liu Bo [Wed, 15 Nov 2017 23:10:28 +0000 (16:10 -0700)]
Btrfs: set plug for fsync

Setting plug can merge adjacent IOs before dispatching IOs to the disk
driver.

Without plug, it'd not be a problem for single disk usecases, but for
multiple disks using raid profile, a large IO can be split to several
IOs of stripe length, and plug can be helpful to bring them together
for each disk so that we can save several disk access.

Moreover, fsync issues synchronous writes, so plug can really take
effect.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: factor __btrfs_open_devices() to create btrfs_open_one_device()
Anand Jain [Thu, 9 Nov 2017 15:45:24 +0000 (23:45 +0800)]
btrfs: factor __btrfs_open_devices() to create btrfs_open_one_device()

No functional changes, create btrfs_open_one_device() from
__btrfs_open_devices(). This is a preparatory work to add dynamic
device scan.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ minor whitespace fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: move check for device generation to the last
Anand Jain [Thu, 9 Nov 2017 15:45:25 +0000 (23:45 +0800)]
btrfs: move check for device generation to the last

No functional changes. This helps to move the entire section into
a new function.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: set fs_devices->seed directly
Anand Jain [Thu, 9 Nov 2017 15:45:23 +0000 (23:45 +0800)]
btrfs: set fs_devices->seed directly

This is in preparation to move a section of code in __btrfs_open_devices()
into a new function so that it can be reused. As we set seeding if any of
the device is having SB flag BTRFS_SUPER_FLAG_SEEDING, so do it in the
device list loop itself. No functional changes.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: ref-verify: Remove unused parameter from walk_up_tree() to kill warning
Geert Uytterhoeven [Wed, 15 Nov 2017 15:04:40 +0000 (16:04 +0100)]
btrfs: ref-verify: Remove unused parameter from walk_up_tree() to kill warning

With gcc-4.1.2:

    fs/btrfs/ref-verify.c: In function â€˜btrfs_build_ref_tree’:
    fs/btrfs/ref-verify.c:1017: warning: â€˜root’ is used uninitialized in this function

The variable is indeed passed uninitialized, but it is never used by the
callee.  However, not all versions of gcc are smart enough to notice.

Hence remove the unused parameter from walk_up_tree() to silence the
compiler warning.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to read_extent_buffer_pages
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to read_extent_buffer_pages

All callers pass btree_get_extent, which needs to be exported.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to __do_contiguous_readpages
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to __do_contiguous_readpages

All callers pass btrfs_get_extent.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to __extent_readpages
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to __extent_readpages

All callers pass btrfs_get_extent.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to extent_readpages
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to extent_readpages

There's only one caller that passes btrfs_get_extent.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to get_extent_skip_holes
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to get_extent_skip_holes

All callers pass btrfs_get_extent_fiemap and get_extent_skip_holes
itself is used only as a fiemap helper.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to extent_fiemap
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to extent_fiemap

All callers pass btrfs_get_extent_fiemap and we don't expect anything
else in the context of extent_fiemap.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: drop get_extent from extent_page_data
David Sterba [Fri, 23 Jun 2017 02:01:08 +0000 (04:01 +0200)]
btrfs: drop get_extent from extent_page_data

Previous patches cleaned up all places where
extent_page_data::get_extent was set and it was btrfs_get_extent all the
time, so we can simply call that instead.

This also reduces size of extent_page_data by 8 bytes which has positive
effect on stack consumption on various functions on the write out path.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to extent_write_full_page
David Sterba [Fri, 23 Jun 2017 01:47:28 +0000 (03:47 +0200)]
btrfs: sink get_extent parameter to extent_write_full_page

There's only one caller.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to extent_write_locked_range
David Sterba [Fri, 23 Jun 2017 01:47:28 +0000 (03:47 +0200)]
btrfs: sink get_extent parameter to extent_write_locked_range

There's only one caller.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink get_extent parameter to extent_writepages
David Sterba [Fri, 23 Jun 2017 01:46:07 +0000 (03:46 +0200)]
btrfs: sink get_extent parameter to extent_writepages

There's only one caller.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Cleanup existing name_len checks
Qu Wenruo [Wed, 8 Nov 2017 00:54:26 +0000 (08:54 +0800)]
btrfs: Cleanup existing name_len checks

Since tree-checker has verified leaf when reading from disk, we don't
need the existing verify_dir_item() or btrfs_is_name_len_valid() checks.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: tree-checker: Add checker for dir item
Qu Wenruo [Wed, 8 Nov 2017 00:54:25 +0000 (08:54 +0800)]
btrfs: tree-checker: Add checker for dir item

Add checker for dir item, for key types DIR_ITEM, DIR_INDEX and
XATTR_ITEM.

This checker does comprehensive checks for:

1) dir_item header and its data size
   Against item boundary and maximum name/xattr length.
   This part is mostly the same as old verify_dir_item().

2) dir_type
   Against maximum file types, and against key type.
   Since XATTR key should only have FT_XATTR dir item, and normal dir
   item type should not have XATTR key.

   The check between key->type and dir_type is newly introduced by this
   patch.

3) name hash
   For XATTR and DIR_ITEM key, key->offset is name hash (crc32c).
   Check the hash of the name against the key to ensure it's correct.

   The name hash check is only found in btrfs-progs before this patch.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use GFP_KERNEL in btrfs_alloc_inode
David Sterba [Tue, 31 Oct 2017 16:08:27 +0000 (17:08 +0100)]
btrfs: use GFP_KERNEL in btrfs_alloc_inode

This callback is called directly from VFS, no locks are held at the
allocation time.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink gfp parameter to clear_extent_uptodate
David Sterba [Tue, 31 Oct 2017 16:02:39 +0000 (17:02 +0100)]
btrfs: sink gfp parameter to clear_extent_uptodate

There's only one callsite with GFP_NOFS.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: sink gfp parameter to clear_extent_bit
David Sterba [Tue, 31 Oct 2017 15:37:52 +0000 (16:37 +0100)]
btrfs: sink gfp parameter to clear_extent_bit

All callers use GFP_NOFS, we don't have to pass it as an argument. The
built-in tests pass GFP_KERNEL, but they run only at module load time
and NOFS works there as well.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: prepare to drop gfp mask parameter from clear_extent_bit
David Sterba [Tue, 31 Oct 2017 15:30:47 +0000 (16:30 +0100)]
btrfs: prepare to drop gfp mask parameter from clear_extent_bit

Use __clear_extent_bit directly in case we want to pass unknown
gfp flags. Otherwise all clear_extent_bit callers use GFP_NOFS, so we
can sink them to the function and reduce argument count, at the cost
that __clear_extent_bit has to be exported.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use non-RCU list traversal in write_all_supers callees
David Sterba [Thu, 15 Jun 2017 22:28:47 +0000 (00:28 +0200)]
btrfs: use non-RCU list traversal in write_all_supers callees

We take the fs_devices::device_list_mutex mutex in write_all_supers
which will prevent any add/del changes to the device list. Therefore we
don't need to use the RCU variant list_for_each_entry_rcu in any of the
called functions.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info
David Sterba [Thu, 15 Jun 2017 22:09:21 +0000 (00:09 +0200)]
btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info

We don't need to use the mutex as we do not modify the devices nor the
list itself and just read information about device counts.
Move copying fsid out of the protected section, not applicable to RCU
same as the rest of the retrieved information.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info
David Sterba [Thu, 15 Jun 2017 22:09:21 +0000 (00:09 +0200)]
btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info

We don't need to use the mutex as we do not modify the devices nor the
list itself and just read some information:

does not change during device lifetime:
- devid
- uuid
- name (ie. the path)

may change in parallel to the ioctl call, but can lead only to reporting
inacurracy:
- bytes_used
- total_bytes

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: simplify btrfs_close_bdev
David Sterba [Mon, 19 Jun 2017 14:55:35 +0000 (16:55 +0200)]
btrfs: simplify btrfs_close_bdev

Split the conditions a bit.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: document device locking
David Sterba [Fri, 16 Jun 2017 20:30:00 +0000 (22:30 +0200)]
btrfs: document device locking

Overview of the main locks protecting various device-related structures.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: simplify exit paths in btrfs_init_new_device
David Sterba [Mon, 30 Oct 2017 18:29:46 +0000 (19:29 +0100)]
btrfs: simplify exit paths in btrfs_init_new_device

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: use free_device where opencoded
David Sterba [Mon, 30 Oct 2017 17:55:47 +0000 (18:55 +0100)]
btrfs: use free_device where opencoded

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: introduce free_device helper
David Sterba [Mon, 30 Oct 2017 17:10:25 +0000 (18:10 +0100)]
btrfs: introduce free_device helper

A helper to free a device and all it's dynamically allocated members,
like the rcu_string name or flush_bio. This is going to replace all
open coded places.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: rename device free rcu helper to free_device_rcu
David Sterba [Tue, 6 Jun 2017 15:08:23 +0000 (17:08 +0200)]
btrfs: rename device free rcu helper to free_device_rcu

Make it clear that it is an RCU helper, we want to use the name
free_device for a wrapper freeing all device members.

Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: document rules about bio async submit
Liu Bo [Wed, 1 Nov 2017 23:19:27 +0000 (17:19 -0600)]
Btrfs: document rules about bio async submit

These rules have been hidden in several if-else and are not
straightforward to follow, for example, dio submit hook's nocsum case
has a bug , i.e. doing async submit instead of sync submit, which has
been fixed recently.

This is documenting the rules for reference.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Reduce scope of delayed_rsv->lock in may_commit_trans
Nikolay Borisov [Tue, 7 Nov 2017 09:22:54 +0000 (11:22 +0200)]
btrfs: Reduce scope of delayed_rsv->lock in may_commit_trans

After commit 996478ca9c460886ac1 ("btrfs: change how we decide to commit
transactions during flushing") there is no need to hold the delayed_rsv
during the percpu_counter_compare call since we get the byte's snapshot
earlier. So hold the lock only while reading delayed_rsv.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: add __init macro to btrfs init functions
Liu Bo [Thu, 2 Nov 2017 23:21:50 +0000 (17:21 -0600)]
Btrfs: add __init macro to btrfs init functions

Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be freed up
after.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: rename btrfs_add_device to btrfs_add_dev_item
Anand Jain [Mon, 6 Nov 2017 08:36:15 +0000 (16:36 +0800)]
btrfs: rename btrfs_add_device to btrfs_add_dev_item

Function btrfs_add_device() is adding the device item so rename to
reflect that in the function. Similarly we have btrfs_rm_dev_item().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Don't generate UUID for non-fs tree
Qu Wenruo [Tue, 31 Oct 2017 06:08:16 +0000 (14:08 +0800)]
btrfs: Don't generate UUID for non-fs tree

btrfs_create_tree() will unconditionally generate UUID for any root.
So for quota tree and data reloc tree created by kernel, they will have
unique UUIDs.

However UUID in root item is only referred by UUID tree, which only
records UUID for fs trees.  This makes unique UUIDs for quota/data reloc
tree meaningless.

Leave the UUID as zero for non-fs tree, making btrfs-debug-tree output
less confusing.

Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: move volume_mutex into the btrfs_rm_device()
Anand Jain [Mon, 6 Nov 2017 02:28:00 +0000 (10:28 +0800)]
btrfs: move volume_mutex into the btrfs_rm_device()

A cleanup patch no functional change, we hold volume_mutex before
calling btrfs_rm_device, so move it into the function itself.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Use locked_end rather than open coding it
Nikolay Borisov [Wed, 1 Nov 2017 09:36:05 +0000 (11:36 +0200)]
btrfs: Use locked_end rather than open coding it

Right before we go into this loop locked_end is set to alloc_end - 1 and
is being used in nearby functions, no need to have exceptions. This just
makes the code consistent, no functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Move loop termination condition in while()
Nikolay Borisov [Wed, 1 Nov 2017 09:32:18 +0000 (11:32 +0200)]
btrfs: Move loop termination condition in while()

Fallocating a file in btrfs goes through several stages. The one before
actually inserting the fallocated extents is to create a qgroup
reservation, covering the desired range. To this end there is a loop in
btrfs_fallocate which checks to see if there are holes in the fallocated
range or !PREALLOC extents past EOF and if so create qgroup reservations
for them. Unfortunately, the main condition of the loop is burried right
at the end of its body rather than in the actual while statement which
makes it non-obvious. Fix this by moving the condition in the while
statement where it belongs. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: remove rcu_barrier in btrfs_close_devices
Liu Bo [Tue, 10 Oct 2017 21:51:02 +0000 (15:51 -0600)]
Btrfs: remove rcu_barrier in btrfs_close_devices

It was introduced because btrfs used to do blkdev_put in a deferred
work, now that btrfs has blkdev_put in place, this rcu_barrier can be
removed.

modprobe -r btrfs will do btrfs_cleanup_fs_uuids(), where it cleanup
every %fs_devices on the list, but when we do btrfs_close_devices(), we
have replaced the devices on the list with dummy ones which only have
the same name and uuid, so modprobe -r btrfs will free those instead of
what we were using, this change won't cause a problem for it.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copied 2nd paragraph from mailinglist discussion ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items
Nikolay Borisov [Mon, 23 Oct 2017 10:51:49 +0000 (13:51 +0300)]
btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items

btrfs_balance_delayed_items is the sole caller of
btrfs_wq_run_delayed_node and already includes one of the checks whether
the delayed inodes should be run. On the other hand
btrfs_wq_run_delayed_node duplicates that check and performs an
additional one for wq congestion.

Let's remove the duplicate check and move the congestion one in
btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only
care about setting up the wq run. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels
Nikolay Borisov [Mon, 23 Oct 2017 10:51:48 +0000 (13:51 +0300)]
btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels

Currently btrfs_async_run_delayed_root's implementation uses 3 goto
labels to mimic the functionality of a simple do {} while loop. Refactor
the function to use a do {} while construct, making intention clear and
code easier to follow. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove redundant mirror_num arg
Nikolay Borisov [Tue, 24 Oct 2017 08:50:39 +0000 (11:50 +0300)]
btrfs: Remove redundant mirror_num arg

The following callpath is always invoked with mirror_num set to 0, so
let's remove it as an argument and directly pass 0 to __do_redpage. No
functional change.

extent_readpages
  __extent_readpages
    __do_contiguous_readpages
      __do_readpage

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove unused function
Nikolay Borisov [Fri, 20 Oct 2017 15:10:59 +0000 (18:10 +0300)]
btrfs: Remove unused function

It's sole callsite was removed in a previous patch so just nuke it for good.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Remove redundant memory barrier in dev stats
Nikolay Borisov [Fri, 20 Oct 2017 15:10:58 +0000 (18:10 +0300)]
btrfs: Remove redundant memory barrier in dev stats

As per atomic_t.txt documentation :
 - RMW operations that have a return value are fully ordered;

atomic_xchg is one such operation so it already includes everything it
needs w.r.t memory ordering and add a comment to be more explicit about
that.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: Fix memory barriers usage with device stats counters
Nikolay Borisov [Tue, 24 Oct 2017 10:47:37 +0000 (13:47 +0300)]
btrfs: Fix memory barriers usage with device stats counters

Commit addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev
stats is cleared") reworked the way device stats changes are tracked. A
new atomic dev_stats_ccnt counter was introduced which is incremented
every time any of the device stats counters are changed. This serves as
a flag whether there are any pending stats changes. However, this patch
only partially implemented the correct memory barriers necessary:

- It only ordered the stores to the counters but not the reads e.g.
  btrfs_run_dev_stats
- It completely omitted any comments documenting the intended design and
  how the memory barriers pair with each-other

This patch provides the necessary comments as well as adds a missing
smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only
a snapshot at best there was no point in reading the counter twice -
once in btrfs_dev_stats_dirty and then again when assigning stats_cnt.
Just collapse both reads into 1.

Fixes: addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agobtrfs: clean up btrfs_dev_stat_inc usage
Anand Jain [Fri, 20 Oct 2017 17:45:33 +0000 (01:45 +0800)]
btrfs: clean up btrfs_dev_stat_inc usage

btrfs_end_bio() is using btrfs_dev_stat_inc() and then
btrfs_dev_stat_print_on_error() separately instead use
btrfs_dev_stat_inc_and_print() directly.

As of now there isn't any bio in btrfs which is - a non-empty write and
also the REQ_PREFLUSH flag is set. So in actual the condition

   if (bio->bi_opf & REQ_PREFLUSH)

is never true in btrfs_end_bio(), and so there won't be any redundant
error log by using btrfs_dev_stat_inc_and_print() separately one for
write and another for flush.

This consolidation will help to add the device critical error handles in
the function btrfs_dev_stat_inc_and_print() and which can be renamed as
needed.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: free btrfs_device in place
Liu Bo [Tue, 24 Oct 2017 05:02:54 +0000 (23:02 -0600)]
Btrfs: free btrfs_device in place

It's pointless to defer it to a kthread helper as we're not under a
special context.

For reference, commit 1f78160ce1b1 ("Btrfs: using rcu lock in the reader
side of devices list") introduced RCU freeing for device structures.

Originally the blkdev_put was called from free_device and rcu_barrier had
to be called. This is no longer required, bdev and our device structures
are now freed separately.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoBtrfs: remove redundant btrfs_balance_delayed_items
Liu Bo [Fri, 20 Oct 2017 23:53:41 +0000 (17:53 -0600)]
Btrfs: remove redundant btrfs_balance_delayed_items

In functions like btrfs_create(), we run both
btrfs_balance_delayed_items() and btrfs_btree_balance_dirty() after
the operation, but btrfs_btree_balance_dirty() is surely going to run
btrfs_balance_delayed_items().

This keeps only btrfs_btree_balance_dirty().

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
6 years agoLinux 4.15-rc9 v4.15-rc9
Linus Torvalds [Sun, 21 Jan 2018 21:51:26 +0000 (13:51 -0800)]
Linux 4.15-rc9

6 years agoMerge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Jan 2018 18:48:35 +0000 (10:48 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 pti fixes from Thomas Gleixner:
 "A small set of fixes for the meltdown/spectre mitigations:

   - Make kprobes aware of retpolines to prevent probes in the retpoline
     thunks.

   - Make the machine check exception speculation protected. MCE used to
     issue an indirect call directly from the ASM entry code. Convert
     that to a direct call into a C-function and issue the indirect call
     from there so the compiler can add the retpoline protection,

   - Make the vmexit_fill_RSB() assembly less stupid

   - Fix a typo in the PTI documentation"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
  x86/pti: Document fix wrong index
  kprobes/x86: Disable optimizing on the function jumps to indirect thunk
  kprobes/x86: Blacklist indirect thunk functions for kprobes
  retpoline: Introduce start/end markers of indirect thunk
  x86/mce: Make machine check speculation protected

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Jan 2018 18:41:48 +0000 (10:41 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 kexec fix from Thomas Gleixner:
 "A single fix for the WBINVD issue introduced by the SME support which
  causes kexec fails on non AMD/SME capable CPUs. Issue WBINVD only when
  the CPU has SME and avoid doing so in a loop"

[ Side note: this patch fixes the problem, but it isn't entirely clear
  why it is required. The wbinvd should just work regardless, but there
  seems to be some system - as opposed to CPU - issue, since the wbinvd
  causes more problems later in the shutdown sequence, but wbinvd
  instructions while the system is still active are not problematic.

  Possibly some SMI or pending machine check issue on the affected system ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()

6 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Jan 2018 18:39:58 +0000 (10:39 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single fix for the new matrix allocator to prevent vector exhaustion
  by certain network drivers which allocate gazillions of unused vectors
  which cannot be put into reservation mode due to MSI and the lack of
  MSI entry masking.

  The fix/workaround is to spread the vectors across CPUs by searching
  the supplied target CPU mask for the CPU with the smallest number of
  allocated vectors"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq/matrix: Spread interrupts on allocation

6 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88...
Linus Torvalds [Sun, 21 Jan 2018 04:12:47 +0000 (20:12 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha

Pull alpha fixes from Matt Turner:
 "A build fix and a regression fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha/PCI: Fix noname IRQ level detection
  alpha: extend memset16 to EV6 optimised routines