md raid10: fix NULL deference in handle_write_completed()
authorYufen Yu <yuyufen@huawei.com>
Tue, 6 Feb 2018 09:39:15 +0000 (17:39 +0800)
committerShaohua Li <sh.li@alibaba-inc.com>
Mon, 19 Feb 2018 17:40:36 +0000 (09:40 -0800)
In the case of 'recover', an r10bio with R10BIO_WriteError &
R10BIO_IsRecover will be progressed by handle_write_completed().
This function traverses all r10bio->devs[copies].
If devs[m].repl_bio != NULL, it thinks conf->mirrors[dev].replacement
is also not NULL. However, this is not always true.

When there is an rdev of raid10 has replacement, then each r10bio
->devs[m].repl_bio != NULL in conf->r10buf_pool. However, in 'recover',
even if corresponded replacement is NULL, it doesn't clear r10bio
->devs[m].repl_bio, resulting in replacement NULL deference.

This bug was introduced when replacement support for raid10 was
added in Linux 3.3.

As NeilBrown suggested:
Elsewhere the determination of "is this device part of the
resync/recovery" is made by resting bio->bi_end_io.
If this is end_sync_write, then we tried to write here.
If it is NULL, then we didn't try to write.

Fixes: 9ad1aefc8ae8 ("md/raid10:  Handle replacement devices during resync.")
Cc: stable (V3.3+)
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
drivers/md/raid10.c

index 8d7ddc947d9d7ec6500bf7fd19e3077d3099b790..9e9441fde8b39ce7978c06f9e2cc72f803c45e47 100644 (file)
@@ -2655,7 +2655,8 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
                for (m = 0; m < conf->copies; m++) {
                        int dev = r10_bio->devs[m].devnum;
                        rdev = conf->mirrors[dev].rdev;
-                       if (r10_bio->devs[m].bio == NULL)
+                       if (r10_bio->devs[m].bio == NULL ||
+                               r10_bio->devs[m].bio->bi_end_io == NULL)
                                continue;
                        if (!r10_bio->devs[m].bio->bi_status) {
                                rdev_clear_badblocks(
@@ -2670,7 +2671,8 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
                                        md_error(conf->mddev, rdev);
                        }
                        rdev = conf->mirrors[dev].replacement;
-                       if (r10_bio->devs[m].repl_bio == NULL)
+                       if (r10_bio->devs[m].repl_bio == NULL ||
+                               r10_bio->devs[m].repl_bio->bi_end_io == NULL)
                                continue;
 
                        if (!r10_bio->devs[m].repl_bio->bi_status) {