memcg: fix VM_BUG_ON from page migration
authorHugh Dickins <>
Tue, 4 Mar 2008 22:29:06 +0000 (14:29 -0800)
committerLinus Torvalds <>
Wed, 5 Mar 2008 00:35:14 +0000 (16:35 -0800)
Page migration gave me free_hot_cold_page's VM_BUG_ON page->page_cgroup.
remove_migration_pte was calling mem_cgroup_charge on the new page whenever it
found a swap pte, before it had determined it to be a migration entry.  That
left a surplus reference count on the page_cgroup, so it was still attached
when the page was later freed.

Move that mem_cgroup_charge down to where we're sure it's a migration entry.
We were already under i_mmap_lock or anon_vma->lock, so its GFP_KERNEL was
already inappropriate: change that to GFP_ATOMIC.

It's essential that remove_migration_pte removes all the migration entries,
other crashes follow if not.  So proceed even when the charge fails: normally
it cannot, but after a mem_cgroup_force_empty it might - comment in the code.

Signed-off-by: Hugh Dickins <>
Cc: David Rientjes <>
Cc: Balbir Singh <>
Acked-by: KAMEZAWA Hiroyuki <>
Cc: Hirokazu Takahashi <>
Cc: YAMAMOTO Takashi <>
Cc: Paul Menage <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>

index a73504ff5ab982007b2e0355e49f0dedcac65899..4e0eccca5e265ac19bc507a171a2f720d27f8c21 100644 (file)
@@ -153,11 +153,6 @@ static void remove_migration_pte(struct vm_area_struct *vma,
-       if (mem_cgroup_charge(new, mm, GFP_KERNEL)) {
-               pte_unmap(ptep);
-               return;
-       }
        ptl = pte_lockptr(mm, pmd);
        pte = *ptep;
@@ -169,6 +164,20 @@ static void remove_migration_pte(struct vm_area_struct *vma,
        if (!is_migration_entry(entry) || migration_entry_to_page(entry) != old)
                goto out;
+       /*
+        * Yes, ignore the return value from a GFP_ATOMIC mem_cgroup_charge.
+        * Failure is not an option here: we're now expected to remove every
+        * migration pte, and will cause crashes otherwise.  Normally this
+        * is not an issue: mem_cgroup_prepare_migration bumped up the old
+        * page_cgroup count for safety, that's now attached to the new page,
+        * so this charge should just be another incrementation of the count,
+        * to keep in balance with rmap.c's mem_cgroup_uncharging.  But if
+        * there's been a force_empty, those reference counts may no longer
+        * be reliable, and this charge can actually fail: oh well, we don't
+        * make the situation any worse by proceeding as if it had succeeded.
+        */
+       mem_cgroup_charge(new, mm, GFP_ATOMIC);
        pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
        if (is_write_migration_entry(entry))