x86/mm: set fields in deferred pages
authorPavel Tatashin <pasha.tatashin@oracle.com>
Thu, 16 Nov 2017 01:36:14 +0000 (17:36 -0800)
committerLinus Torvalds <torvalds@linux-foundation.org>
Thu, 16 Nov 2017 02:21:05 +0000 (18:21 -0800)
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to
first initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled, however, we set fields in
register_page_bootmem_info that are subsequently clobbered right after
in free_all_bootmem:

        mem_init() {
                register_page_bootmem_info();
                free_all_bootmem();
                ...
        }

When register_page_bootmem_info() is called only non-deferred struct
pages are initialized.  But, this function goes through some reserved
pages which might be part of the deferred, and thus are not yet
initialized.

  mem_init
   register_page_bootmem_info
    register_page_bootmem_info_node
     get_page_bootmem
      .. setting fields here ..
      such as: page->freelist = (void *)type;

  free_all_bootmem()
   free_low_memory_core_early()
    for_each_reserved_mem_region()
     reserve_bootmem_region()
      init_reserved_page() <- Only if this is deferred reserved page
       __init_single_pfn()
        __init_single_page()
            memset(0) <-- Loose the set fields here

We end up with issue where, currently we do not observe problem as
memory is explicitly zeroed.  But, if flag asserts are changed we can
start hitting issues.

Also, because in this patch series we will stop zeroing struct page
memory during allocation, we must make sure that struct pages are
properly initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Link: http://lkml.kernel.org/r/20171013173214.27300-3-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/mm/init_64.c

index 5fa3a58b5d7865e50329b8c1e93573a6f7fc1db0..c3fc544b50d278b6fa64c8f3c9eb6afcb7d8bb0c 100644 (file)
@@ -1173,12 +1173,18 @@ void __init mem_init(void)
 
        /* clear_bss() already clear the empty_zero_page */
 
 
        /* clear_bss() already clear the empty_zero_page */
 
-       register_page_bootmem_info();
-
        /* this will put all memory onto the freelists */
        free_all_bootmem();
        after_bootmem = 1;
 
        /* this will put all memory onto the freelists */
        free_all_bootmem();
        after_bootmem = 1;
 
+       /*
+        * Must be done after boot memory is put on freelist, because here we
+        * might set fields in deferred struct pages that have not yet been
+        * initialized, and free_all_bootmem() initializes all the reserved
+        * deferred pages for us.
+        */
+       register_page_bootmem_info();
+
        /* Register memory areas for /proc/kcore */
        kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR,
                         PAGE_SIZE, KCORE_OTHER);
        /* Register memory areas for /proc/kcore */
        kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR,
                         PAGE_SIZE, KCORE_OTHER);