sfrench/cifs-2.6.git
5 years agomm: mmu_notifier fix for tlb_end_vma
Nicholas Piggin [Thu, 23 Aug 2018 08:47:09 +0000 (18:47 +1000)]
mm: mmu_notifier fix for tlb_end_vma

The generic tlb_end_vma does not call invalidate_range mmu notifier, and
it resets resets the mmu_gather range, which means the notifier won't be
called on part of the range in case of an unmap that spans multiple
vmas.

ARM64 seems to be the only arch I could see that has notifiers and uses
the generic tlb_end_vma.  I have not actually tested it.

[ Catalin and Will point out that ARM64 currently only uses the
  notifiers for KVM, which doesn't use the ->invalidate_range()
  callback right now, so it's a bug, but one that happens to
  not affect them.  So not necessary for stable.  - Linus ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
Peter Zijlstra [Wed, 22 Aug 2018 15:30:15 +0000 (17:30 +0200)]
mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE

Jann reported that x86 was missing required TLB invalidates when he
hit the !*batch slow path in tlb_remove_table().

This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache)
invalidates, the PowerPC-hash where this code originated and the
Sparc-hash where this was subsequently used did not need that. ARM
which later used this put an explicit TLB invalidate in their
__p*_free_tlb() functions, and PowerPC-radix followed that example.

But when we hooked up x86 we failed to consider this. Fix this by
(optionally) hooking tlb_remove_table() into the TLB invalidate code.

NOTE: s390 was also needing something like this and might now
      be able to use the generic code again.

[ Modified to be on top of Nick's cleanups, which simplified this patch
  now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]

Fixes: 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm/tlb: Remove tlb_remove_table() non-concurrent condition
Peter Zijlstra [Wed, 22 Aug 2018 15:30:14 +0000 (17:30 +0200)]
mm/tlb: Remove tlb_remove_table() non-concurrent condition

Will noted that only checking mm_users is incorrect; we should also
check mm_count in order to cover CPUs that have a lazy reference to
this mm (and could do speculative TLB operations).

If removing this turns out to be a performance issue, we can
re-instate a more complete check, but in tlb_table_flush() eliding the
call_rcu_sched().

Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agomm: move tlb_table_flush to tlb_flush_mmu_free
Nicholas Piggin [Thu, 23 Aug 2018 08:47:08 +0000 (18:47 +1000)]
mm: move tlb_table_flush to tlb_flush_mmu_free

There is no need to call this from tlb_flush_mmu_tlbonly, it logically
belongs with tlb_flush_mmu_free.  This makes future fixes simpler.

[ This was originally done to allow code consolidation for the
  mmu_notifier fix, but it also ends up helping simplify the
  HAVE_RCU_TABLE_INVALIDATE fix.    - Linus ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agogetxattr: use correct xattr length
Christian Brauner [Thu, 7 Jun 2018 11:43:48 +0000 (13:43 +0200)]
getxattr: use correct xattr length

When running in a container with a user namespace, if you call getxattr
with name = "system.posix_acl_access" and size % 8 != 4, then getxattr
silently skips the user namespace fixup that it normally does resulting in
un-fixed-up data being returned.
This is caused by posix_acl_fix_xattr_to_user() being passed the total
buffer size and not the actual size of the xattr as returned by
vfs_getxattr().
This commit passes the actual length of the xattr as returned by
vfs_getxattr() down.

A reproducer for the issue is:

  touch acl_posix

  setfacl -m user:0:rwx acl_posix

and the compile:

  #define _GNU_SOURCE
  #include <errno.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <sys/types.h>
  #include <unistd.h>
  #include <attr/xattr.h>

  /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */
  int main(int argc, void **argv)
  {
          ssize_t ret1, ret2;
          char buf1[128], buf2[132];
          int fret = EXIT_SUCCESS;
          char *file;

          if (argc < 2) {
                  fprintf(stderr,
                          "Please specify a file with "
                          "\"system.posix_acl_access\" permissions set\n");
                  _exit(EXIT_FAILURE);
          }
          file = argv[1];

          ret1 = getxattr(file, "system.posix_acl_access",
                          buf1, sizeof(buf1));
          if (ret1 < 0) {
                  fprintf(stderr, "%s - Failed to retrieve "
                                  "\"system.posix_acl_access\" "
                                  "from \"%s\"\n", strerror(errno), file);
                  _exit(EXIT_FAILURE);
          }

          ret2 = getxattr(file, "system.posix_acl_access",
                          buf2, sizeof(buf2));
          if (ret2 < 0) {
                  fprintf(stderr, "%s - Failed to retrieve "
                                  "\"system.posix_acl_access\" "
                                  "from \"%s\"\n", strerror(errno), file);
                  _exit(EXIT_FAILURE);
          }

          if (ret1 != ret2) {
                  fprintf(stderr, "The value of \"system.posix_acl_"
                                  "access\" for file \"%s\" changed "
                                  "between two successive calls\n", file);
                  _exit(EXIT_FAILURE);
          }

          for (ssize_t i = 0; i < ret2; i++) {
                  if (buf1[i] == buf2[i])
                          continue;

                  fprintf(stderr,
                          "Unexpected different in byte %zd: "
                          "%02x != %02x\n", i, buf1[i], buf2[i]);
                  fret = EXIT_FAILURE;
          }

          if (fret == EXIT_SUCCESS)
                  fprintf(stderr, "Test passed\n");
          else
                  fprintf(stderr, "Test failed\n");

          _exit(fret);
  }
and run:

  ./tester acl_posix

On a non-fixed up kernel this should return something like:

  root@c1:/# ./t
  Unexpected different in byte 16: ffffffa0 != 00
  Unexpected different in byte 17: ffffff86 != 00
  Unexpected different in byte 18: 01 != 00

and on a fixed kernel:

  root@c1:~# ./t
  Test passed

Cc: stable@vger.kernel.org
Fixes: 2f6f0654ab61 ("userns: Convert vfs posix_acl support to use kuids and kgids")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945
Reported-by: Colin Watson <cjwatson@ubuntu.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
5 years agogcc-plugins: Disable when building under Clang
Kees Cook [Thu, 23 Aug 2018 06:02:31 +0000 (23:02 -0700)]
gcc-plugins: Disable when building under Clang

Prior to doing compiler feature detection in Kconfig, attempts to build
GCC plugins with Clang would fail the build, much in the same way missing
GCC plugin headers would fail the build. However, now that this logic
has been lifted into Kconfig, add an explicit test for GCC (instead of
duplicating it in the feature-test script).

Reported-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
5 years agoblk-wbt: don't maintain inflight counts if disabled
Jens Axboe [Thu, 23 Aug 2018 15:34:46 +0000 (09:34 -0600)]
blk-wbt: don't maintain inflight counts if disabled

A previous commit removed the ability to have per-rq flags. We used
those flags to maintain inflight counts. Since we don't have those
anymore, we have to always maintain inflight counts, even if wbt is
disabled. This is clearly suboptimal.

Add a queue quiesce around changing the wbt latency settings from sysfs
to work around this. With that, we can reliably put the enabled check in
our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be
consistent for the lifetime of the request.

Fixes: c1c80384c8f ("block: remove external dependency on wbt_flags")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agopowerpc/mce: Fix SLB rebolting during MCE recovery path.
Mahesh Salgaonkar [Thu, 23 Aug 2018 04:56:08 +0000 (10:26 +0530)]
powerpc/mce: Fix SLB rebolting during MCE recovery path.

The commit e7e81847478 ("powerpc/64s: move machine check SLB flushing
to mm/slb.c") introduced a bug in reloading bolted SLB entries. Unused
bolted entries are stored with .esid=0 in the slb_shadow area, and
that value is now used directly as the RB input to slbmte, which means
the RB[52:63] index field is set to 0, which causes SLB entry 0 to be
cleared.

Fix this by storing the index bits in the unused bolted entries, which
directs the slbmte to the right place.

The SLB shadow area is also used by the hypervisor, but PAPR is okay
with that, from LoPAPR v1.1, 14.11.1.3 SLB Shadow Buffer:

  Note: SLB is filled sequentially starting at index 0
  from the shadow buffer ignoring the contents of
  RB field bits 52-63

Fixes: e7e81847478b ("powerpc/64s: move machine check SLB flushing to mm/slb.c")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoKVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages
Paul Mackerras [Thu, 23 Aug 2018 00:08:58 +0000 (10:08 +1000)]
KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages

Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in
the pinned physical page", 2018-07-17) added some checks to ensure
that guest DMA mappings don't attempt to map more than the guest is
entitled to access. However, errors in the logic mean that legitimate
guest requests to map pages for DMA are being denied in some
situations. Specifically, if the first page of the range passed to
mm_iommu_get() is mapped with a normal page, and subsequent pages are
mapped with transparent huge pages, we end up with mem->pageshift ==
0. That means that the page size checks in mm_iommu_ua_to_hpa() and
mm_iommu_up_to_hpa_rm() will always fail for every page in that
region, and thus the guest can never map any memory in that region for
DMA, typically leading to a flood of error messages like this:

  qemu-system-ppc64: VFIO_MAP_DMA: -22
  qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)

The logic errors in mm_iommu_get() are:

  (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
      call (meaning that find_linux_pte() returns the pte for the
      first address in the range, not the address we are currently up
      to);
  (b) use of 'pageshift' as the variable to receive the hugepage shift
      returned by find_linux_pte() - for a normal page this gets set
      to 0, leading to us setting mem->pageshift to 0 when we conclude
      that the pte returned by find_linux_pte() didn't match the page
      we were looking at;
  (c) comparing 'compshift', which is a page order, i.e. log base 2 of
      the number of pages, with 'pageshift', which is a log base 2 of
      the number of bytes.

To fix these problems, this patch introduces 'cur_ua' to hold the
current user address and uses that in the find_linux_pte() call;
introduces 'pteshift' to hold the hugepage shift found by
find_linux_pte(); and compares 'pteshift' with 'compshift +
PAGE_SHIFT' rather than 'compshift'.

The patch also moves the local_irq_restore to the point after the PTE
pointer returned by find_linux_pte() has been dereferenced because
otherwise the PTE could change underneath us, and adds a check to
avoid doing the find_linux_pte() call once mem->pageshift has been
reduced to PAGE_SHIFT, as an optimization.

Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition
Aneesh Kumar K.V [Wed, 22 Aug 2018 17:16:05 +0000 (22:46 +0530)]
powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition

The Nest MMU workaround is only needed for RW upgrades. Avoid doing
that for other PTE updates.

We also avoid clearing the PTE while marking it invalid. This is
because other page table walkers will find this PTE none and can
result in unexpected behaviour due to that. Instead we clear
_PAGE_PRESENT and set the software PTE bit _PAGE_INVALID.
pte_present() is already updated to check for both bits. This makes
sure page table walkers will find the PTE present and things like
pte_pfn(pte) returns the right value.

Based on an original patch from Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoMerge tag 'perf-core-for-mingo-4.19-20180820' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Thu, 23 Aug 2018 08:29:19 +0000 (10:29 +0200)]
Merge tag 'perf-core-for-mingo-4.19-20180820' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

LLVM/clang/eBPF: (Arnaldo Carvalho de Melo)

 - Allow passing options to llc in addition to to clang.

Hardware tracing: (Jack Henschel)

 - Improve error message for PMU address filters, clarifying availability of
   that feature in hardware having hardware tracing such as Intel PT.

Python interface: (Jiri Olsa)

 - Fix read_on_cpu() interface.

ELF/DWARF libraries: (Jiri Olsa)

 - Fix handling of the combo compressed module file + decompressed associated
   debuginfo file.

Build (Rasmus Villemoes)

 - Disable parallelism for 'make clean', avoiding multiple submakes deleting
   the same files and causing the build to fail on systems such as Yocto.

Kernel ABI copies: (Arnaldo Carvalho de Melo)

 - Update tools's copy of x86's cpufeatures.h.

 - Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'.

Miscellaneous: (Steven Rostedt)

 - Change libtraceevent to SPDX License format.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
5 years agodrm/edid: Add 6 bpc quirk for SDC panel in Lenovo B50-80
Kai-Heng Feng [Thu, 23 Aug 2018 05:53:32 +0000 (05:53 +0000)]
drm/edid: Add 6 bpc quirk for SDC panel in Lenovo B50-80

Another panel that reports "DFP 1.x compliant TMDS" but it supports 6bpc
instead of 8 bpc.

Apply 6 bpc quirk for the panel to fix it.

BugLink: https://bugs.launchpad.net/bugs/1788308
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180823055332.7723-1-kai.heng.feng@canonical.com
5 years agoACPI: fix menuconfig presentation of ACPI submenu
Arnd Bergmann [Tue, 21 Aug 2018 20:37:33 +0000 (22:37 +0200)]
ACPI: fix menuconfig presentation of ACPI submenu

My fix for a recursive Kconfig dependency caused another issue where the
ACPI specific options end up in the top-level menu in 'menuconfig'. This
was an unintended side-effect of having a silent option between
'menuconfig ACPI' and 'if ACPI'.

Moving the ARCH_SUPPORTS_ACPI symbol ahead of the ACPI menu solves that
problem and restores the previous presentation.

Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 2c870e61132c (arm64: fix ACPI dependencies)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agopowerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.
Aneesh Kumar K.V [Wed, 22 Aug 2018 17:16:04 +0000 (22:46 +0530)]
powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.

When splitting a huge pmd pte, we need to mark the pmd entry invalid. We
can do that by clearing _PAGE_PRESENT bit. But then that will be taken as a
swap pte. In order to differentiate between the two use a software pte bit
when invalidating.

For regular pte, due to bd5050e38aec ("powerpc/mm/radix: Change pte relax
sequence to handle nest MMU hang") we need to mark the pte entry invalid when
relaxing access permission. Instead of marking pte_none which can result in
different page table walk routines possibly skipping this pte entry, invalidate
it but still keep it marked present.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/nohash: fix pte_access_permitted()
Christophe Leroy [Tue, 21 Aug 2018 13:03:23 +0000 (13:03 +0000)]
powerpc/nohash: fix pte_access_permitted()

Commit 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper
for other platforms") replaced generic pte_access_permitted() by an
arch specific one.

The generic one is defined as
(pte_present(pte) && (!(write) || pte_write(pte)))

The arch specific one is open coded checking that _PAGE_USER and
_PAGE_WRITE (_PAGE_RW) flags are set, but lacking to check that
_PAGE_RO and _PAGE_PRIVILEGED are unset, leading to a useless test
on targets like the 8xx which defines _PAGE_RW and _PAGE_USER as 0.

Commit 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted
for hugetlb access check") replaced some tests performed with
pte helpers by a call to pte_access_permitted(), leading to the same
issue.

This patch rewrites powerpc/nohash pte_access_permitted()
using pte helpers.

Fixes: 5769beaf180a8 ("powerpc/mm: Add proper pte access check helper for other platforms")
Fixes: 5fa5b16be5b31 ("powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoapparmor: remove no-op permission check in policy_unpack
John Johansen [Wed, 22 Aug 2018 00:19:53 +0000 (17:19 -0700)]
apparmor: remove no-op permission check in policy_unpack

The patch 736ec752d95e: "AppArmor: policy routines for loading and
unpacking policy" from Jul 29, 2010, leads to the following static
checker warning:

    security/apparmor/policy_unpack.c:410 verify_accept()
    warn: bitwise AND condition is false here

    security/apparmor/policy_unpack.c:413 verify_accept()
    warn: bitwise AND condition is false here

security/apparmor/policy_unpack.c
   392  #define DFA_VALID_PERM_MASK             0xffffffff
   393  #define DFA_VALID_PERM2_MASK            0xffffffff
   394
   395  /**
   396   * verify_accept - verify the accept tables of a dfa
   397   * @dfa: dfa to verify accept tables of (NOT NULL)
   398   * @flags: flags governing dfa
   399   *
   400   * Returns: 1 if valid accept tables else 0 if error
   401   */
   402  static bool verify_accept(struct aa_dfa *dfa, int flags)
   403  {
   404          int i;
   405
   406          /* verify accept permissions */
   407          for (i = 0; i < dfa->tables[YYTD_ID_ACCEPT]->td_lolen; i++) {
   408                  int mode = ACCEPT_TABLE(dfa)[i];
   409
   410                  if (mode & ~DFA_VALID_PERM_MASK)
   411                          return 0;
   412
   413                  if (ACCEPT_TABLE2(dfa)[i] & ~DFA_VALID_PERM2_MASK)
   414                          return 0;

fixes: 736ec752d95e ("AppArmor: policy routines for loading and unpacking policy")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
5 years agoMerge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Thu, 23 Aug 2018 01:24:45 +0000 (11:24 +1000)]
Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-next

Fixes for 4.19:
- Fix build when KCOV is enabled
- Misc display fixes
- A couple of SR-IOV fixes
- Fence fixes for eviction handling for KFD
- Misc other fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180822203813.2733-1-alexander.deucher@amd.com
5 years agoMerge tag 'drm-misc-next-fixes-2018-08-22' of git://anongit.freedesktop.org/drm/drm...
Dave Airlie [Thu, 23 Aug 2018 01:23:31 +0000 (11:23 +1000)]
Merge tag 'drm-misc-next-fixes-2018-08-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

- Add an unprepare delay to the tv123wam panel (Sean)
- Update seanpaul's email in MAINTAINERS (Sean)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20180822193850.GA214158@art_vandelay
5 years agox86/mm/tlb: Revert the recent lazy TLB patches
Peter Zijlstra [Wed, 22 Aug 2018 15:30:13 +0000 (17:30 +0200)]
x86/mm/tlb: Revert the recent lazy TLB patches

Revert commits:

  95b0e6357d3e x86/mm/tlb: Always use lazy TLB mode
  64482aafe55f x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
  ac0315896970 x86/mm/tlb: Make lazy TLB mode lazier
  61d0beb5796a x86/mm/tlb: Restructure switch_mm_irqs_off()
  2ff6ddf19c0e x86/mm/tlb: Leave lazy TLB mode at page table free time

In order to simplify the TLB invalidate fixes for x86 and unify the
parts that need backporting.  We'll try again later.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoinclude/linux/compiler*.h: make compiler-*.h mutually exclusive
Nick Desaulniers [Wed, 22 Aug 2018 23:37:24 +0000 (16:37 -0700)]
include/linux/compiler*.h: make compiler-*.h mutually exclusive

Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
recently exposed a brittle part of the build for supporting non-gcc
compilers.

Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and
__GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't
added compiler specific checks for __clang__ or __INTEL_COMPILER.

This is brittle, as they happened to get compatibility by posing as a
certain version of GCC.  This broke when upgrading the minimal version
of GCC required to build the kernel, to a version above what ICC and
Clang claim to be.

Rather than always including compiler-gcc.h then undefining or
redefining macros in compiler-intel.h or compiler-clang.h, let's
separate out the compiler specific macro definitions into mutually
exclusive headers, do more proper compiler detection, and keep shared
definitions in compiler_types.h.

Fixes: cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Suggested-by: Eli Friedman <efriedma@codeaurora.org>
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosunrpc: Add comment defining gssd upcall API keywords
Chuck Lever [Mon, 20 Aug 2018 14:39:16 +0000 (10:39 -0400)]
sunrpc: Add comment defining gssd upcall API keywords

During review, it was found that the target, service, and srchost
keywords are easily conflated. Add an explainer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agonfsd: Remove callback_cred
Chuck Lever [Thu, 16 Aug 2018 16:06:09 +0000 (12:06 -0400)]
nfsd: Remove callback_cred

Clean up: The global callback_cred is no longer used, so it can be
removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agonfsd: Use correct credential for NFSv4.0 callback with GSS
Chuck Lever [Thu, 16 Aug 2018 16:06:04 +0000 (12:06 -0400)]
nfsd: Use correct credential for NFSv4.0 callback with GSS

I've had trouble when operating a multi-homed Linux NFS server with
Kerberos using NFSv4.0. Lately, I've seen my clients reporting
this (and then hanging):

May  9 11:43:26 manet kernel: NFS: NFSv4 callback contains invalid cred

The client-side commit f11b2a1cfbf5 ("nfs4: copy acceptor name from
context to nfs_client") appears to be related, but I suspect this
problem has been going on for some time before that.

RFC 7530 Section 3.3.3 says:
> For Kerberos V5, nfs/hostname would be a server principal in the
> Kerberos Key Distribution Center database.  This is the same
> principal the client acquired a GSS-API context for when it issued
> the SETCLIENTID operation ...

In other words, an NFSv4.0 client expects that the server will use
the same GSS principal for callback that the client used to
establish its lease. For example, if the client used the service
principal "nfs@server.domain" to establish its lease, the server
is required to use "nfs@server.domain" when performing NFSv4.0
callback operations.

The Linux NFS server currently does not. It uses a common service
principal for all callback connections. Sometimes this works as
expected, and other times -- for example, when the server is
accessible via multiple hostnames -- it won't work at all.

This patch scrapes the target name from the client credential,
and uses that for the NFSv4.0 callback credential. That should
be correct much more often.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agosunrpc: Extract target name into svc_cred
Chuck Lever [Thu, 16 Aug 2018 16:05:59 +0000 (12:05 -0400)]
sunrpc: Extract target name into svc_cred

NFSv4.0 callback needs to know the GSS target name the client used
when it established its lease. That information is available from
the GSS context created by gssproxy. Make it available in each
svc_cred.

Note this will also give us access to the real target service
principal name (which is typically "nfs", but spec does not require
that).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agosunrpc: Enable the kernel to specify the hostname part of service principals
Chuck Lever [Thu, 16 Aug 2018 16:05:54 +0000 (12:05 -0400)]
sunrpc: Enable the kernel to specify the hostname part of service principals

A multi-homed NFS server may have more than one "nfs" key in its
keytab. Enable the kernel to pick the key it wants as a machine
credential when establishing a GSS context.

This is useful for GSS-protected NFSv4.0 callbacks, which are
required by RFC 7530 S3.3.3 to use the same principal as the service
principal the client used when establishing its lease.

A complementary modification to rpc.gssd is required to fully enable
this feature.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agosunrpc: Don't use stack buffer with scatterlist
Laura Abbott [Fri, 17 Aug 2018 21:43:54 +0000 (14:43 -0700)]
sunrpc: Don't use stack buffer with scatterlist

Fedora got a bug report from NFS:

kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:sg_init_one+0x7d/0x90
..
  make_checksum+0x4e7/0x760 [rpcsec_gss_krb5]
  gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5]
  gss_marshal+0x126/0x1a0 [auth_rpcgss]
  ? __local_bh_enable_ip+0x80/0xe0
  ? call_transmit_status+0x1d0/0x1d0 [sunrpc]
  call_transmit+0x137/0x230 [sunrpc]
  __rpc_execute+0x9b/0x490 [sunrpc]
  rpc_run_task+0x119/0x150 [sunrpc]
  nfs4_run_exchange_id+0x1bd/0x250 [nfsv4]
  _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4]
  nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4]
  nfs4_discover_server_trunking+0x80/0x270 [nfsv4]
  nfs4_init_client+0x16e/0x240 [nfsv4]
  ? nfs_get_client+0x4c9/0x5d0 [nfs]
  ? _raw_spin_unlock+0x24/0x30
  ? nfs_get_client+0x4c9/0x5d0 [nfs]
  nfs4_set_client+0xb2/0x100 [nfsv4]
  nfs4_create_server+0xff/0x290 [nfsv4]
  nfs4_remote_mount+0x28/0x50 [nfsv4]
  mount_fs+0x3b/0x16a
  vfs_kern_mount.part.35+0x54/0x160
  nfs_do_root_mount+0x7f/0xc0 [nfsv4]
  nfs4_try_mount+0x43/0x70 [nfsv4]
  ? get_nfs_version+0x21/0x80 [nfs]
  nfs_fs_mount+0x789/0xbf0 [nfs]
  ? pcpu_alloc+0x6ca/0x7e0
  ? nfs_clone_super+0x70/0x70 [nfs]
  ? nfs_parse_mount_options+0xb40/0xb40 [nfs]
  mount_fs+0x3b/0x16a
  vfs_kern_mount.part.35+0x54/0x160
  do_mount+0x1fd/0xd50
  ksys_mount+0xba/0xd0
  __x64_sys_mount+0x21/0x30
  do_syscall_64+0x60/0x1f0
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack
allocated buffer with a scatterlist. Convert the buffer for
rc4salt to be dynamically allocated instead.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5 years agoMerge tag 'platform-drivers-x86-v4.19-1' of git://git.infradead.org/linux-platform...
Linus Torvalds [Wed, 22 Aug 2018 21:14:15 +0000 (14:14 -0700)]
Merge tag 'platform-drivers-x86-v4.19-1' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver updates from Andy Shevchenko:

 - The driver for Silead touchscreen configurations has been renamed
   from silead_dmi to touchscreen_dmi since it starts supporting other
   touchscreens which require some DMI quirks

   It also gets expanded to cover cases for Chuwi Vi10, ONDA V891W,
   Connect Tablet 9, Onda V820w, and Cube KNote i1101 tablets.

 - Another bunch of changes is related to Mellanox platform code to
   allow user space to communicate with Mellanox for system control and
   monitoring purposes. The driver notifies user on hotplug device
   signal receiving.

 - ASUS WMI drivers recognize lid flip action on UX360, and correctly
   toggles airplane mode LED. In addition the keyboard backlight toggle
   gets support.

 - ThinkPad ACPI driver enables support for calculator key (on at least
   P52). It also has been fixed to support three characters model
   designators, which are used for modern laptops. Earlier the battery,
   marked as BAT1, on ThinkPad laptops has not been configured properly,
   which is fixed. On the opposite the multi-battery configurations now
   probed correctly.

 - Dell SMBIOS driver starts working on some Dell servers which do not
   support token interface. The regression with backlight detection has
   also been fixed. In order to support dock mode on some laptops, Intel
   virtual button driver has been fixed. The last but not least is the
   fix to Intel HID driver due to changes in Dell systems that prevented
   to use power button.

* tag 'platform-drivers-x86-v4.19-1' of git://git.infradead.org/linux-platform-drivers-x86: (47 commits)
  platform/x86: acer-wmi: Silence "unsupported" message a bit
  platform/x86: intel_punit_ipc: fix build errors
  platform/x86: ideapad: Add Y520-15IKBM and Y720-15IKBM to no_hw_rfkill
  platform/x86: asus-nb-wmi: Add keymap entry for lid flip action on UX360
  platform/x86: acer-wmi: refactor function has_cap
  platform/x86: thinkpad_acpi: Fix multi-battery bug
  platform/x86: thinkpad_acpi: extend battery quirk coverage
  platform/x86: touchscreen_dmi: Add info for the Cube KNote i1101 tablet
  platform/x86: mlx-platform: Fix copy-paste error in mlxplat_init()
  platform/x86: mlx-platform: Remove unused define
  platform/x86: mlx-platform: Change mlxreg-io configuration for MSN274x systems
  Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces
  platform/x86: mlx-platform: Allow mlxreg-io driver activation for more systems
  platform/x86: mlx-platform: Add ASIC hotplug device configuration
  platform/mellanox: mlxreg-hotplug: Add hotplug hwmon uevent notification
  platform/mellanox: mlxreg-hotplug: Improve mechanism of ASIC health discovery
  platform/x86: mlx-platform: Add mlxreg-fan platform driver activation
  platform/x86: dell-laptop: Fix backlight detection
  platform/x86: toshiba_acpi: Fix defined but not used build warnings
  platform/x86: thinkpad_acpi: Support battery quirk
  ...

5 years agoia64: Fix allnoconfig section mismatch for ioc_init/ioc_iommu_info
Tony Luck [Wed, 22 Aug 2018 20:39:21 +0000 (13:39 -0700)]
ia64: Fix allnoconfig section mismatch for ioc_init/ioc_iommu_info

This has been broken for an embarassingly long time (since v4.4).

Just needs a couple of __init tags on functions to make the sections
match up.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoblk-wbt: fix has-sleeper queueing check
Jens Axboe [Mon, 20 Aug 2018 19:22:27 +0000 (13:22 -0600)]
blk-wbt: fix has-sleeper queueing check

We need to do this inside the loop as well, or we can allow new
IO to supersede previous IO.

Tested-by: Anchal Agarwal <anchalag@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-wbt: use wq_has_sleeper() for wq active check
Jens Axboe [Mon, 20 Aug 2018 19:20:50 +0000 (13:20 -0600)]
blk-wbt: use wq_has_sleeper() for wq active check

We need the memory barrier before checking the list head,
use the appropriate helper for this. The matching queue
side memory barrier is provided by set_current_state().

Tested-by: Anchal Agarwal <anchalag@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoblk-wbt: move disable check into get_limit()
Jens Axboe [Mon, 20 Aug 2018 19:24:25 +0000 (13:24 -0600)]
blk-wbt: move disable check into get_limit()

Check it in one place, instead of in multiple places.

Tested-by: Anchal Agarwal <anchalag@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge branch 'parisc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Wed, 22 Aug 2018 21:06:37 +0000 (14:06 -0700)]
Merge branch 'parisc-4.19-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull more parisc updates from Helge Deller:

 - fix boot failure of 64-bit kernel. It got broken by the unwind
   optimization commit in merge window.

 - fix 64-bit userspace support (static 64-bit applications only, e.g.
   we don't yet have 64-bit userspace support in glibc).

 - consolidate unwind initialization code.

 - add machine model description to stack trace.

* 'parisc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Add hardware description to stack traces
  parisc: Fix boot failure of 64-bit kernel
  parisc: Consolidate unwind initialization calls
  parisc: Update comments in syscall.S regarding wide userland
  parisc: Fix ptraced 64-bit applications to call 64-bit syscalls
  parisc: Restore possibility to execute 64-bit applications

5 years agobcache: release dc->writeback_lock properly in bch_writeback_thread()
Shan Hai [Wed, 22 Aug 2018 18:02:56 +0000 (02:02 +0800)]
bcache: release dc->writeback_lock properly in bch_writeback_thread()

The writeback thread would exit with a lock held when the cache device
is detached via sysfs interface, fix it by releasing the held lock
before exiting the while-loop.

Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set)
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Coly Li <colyli@suse.de>
Tested-by: Shenghui Wang <shhuiw@foxmail.com>
Cc: stable@vger.kernel.org #4.17+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 years agoMerge tag 'xtensa-20180820' of git://github.com/jcmvbkbc/linux-xtensa
Linus Torvalds [Wed, 22 Aug 2018 21:04:41 +0000 (14:04 -0700)]
Merge tag 'xtensa-20180820' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa updates from Max Filippov:

 - switch xtensa arch to the generic noncoherent direct mapping
   operations

 - add support for DMA_ATTR_NO_KERNEL_MAPPING attribute

 - clean up users of platform/hardware.h in generic Xtensa code

 - fix assembly cache maintenance code for long cache lines

 - rework noMMU cache attributes initialization

 - add big-endian HiFi2 test_kc705_be CPU variant

* tag 'xtensa-20180820' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: add test_kc705_be variant
  xtensa: clean up boot-elf/bootstrap.S
  xtensa: make bootparam parsing optional
  xtensa: drop variant IRQ support
  xtensa: drop unneeded platform/hardware.h headers
  xtensa: move PLATFORM_NR_IRQS to Kconfig
  xtensa: rework {CONFIG,PLATFORM}_DEFAULT_MEM_START
  xtensa: drop unused {CONFIG,PLATFORM}_DEFAULT_MEM_SIZE
  xtensa: rework noMMU cache attributes initialization
  xtensa: increase ranges in ___invalidate_{i,d}cache_all
  xtensa: limit offsets in __loop_cache_{all,page}
  xtensa: platform-specific handling of coherent memory
  xtensa: support DMA_ATTR_NO_KERNEL_MAPPING attribute
  xtensa: use generic dma_noncoherent_ops

5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 22 Aug 2018 20:52:44 +0000 (13:52 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull second set of KVM updates from Paolo Bonzini:
 "ARM:
   - Support for Group0 interrupts in guests
   - Cache management optimizations for ARMv8.4 systems
   - Userspace interface for RAS
   - Fault path optimization
   - Emulated physical timer fixes
   - Random cleanups

  x86:
   - fixes for L1TF
   - a new test case
   - non-support for SGX (inject the right exception in the guest)
   - fix lockdep false positive"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits)
  KVM: VMX: fixes for vmentry_l1d_flush module parameter
  kvm: selftest: add dirty logging test
  kvm: selftest: pass in extra memory when create vm
  kvm: selftest: include the tools headers
  kvm: selftest: unify the guest port macros
  tools: introduce test_and_clear_bit
  KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
  KVM: vmx: Inject #UD for SGX ENCLS instruction in guest
  KVM: vmx: Add defines for SGX ENCLS exiting
  x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush()
  x86: kvm: avoid unused variable warning
  KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESR
  KVM: arm/arm64: Skip updating PTE entry if no change
  KVM: arm/arm64: Skip updating PMD entry if no change
  KVM: arm: Use true and false for boolean values
  KVM: arm/arm64: vgic: Do not use spin_lock_irqsave/restore with irq disabled
  KVM: arm/arm64: vgic: Move DEBUG_SPINLOCK_BUG_ON to vgic.h
  KVM: arm: vgic-v3: Add support for ICC_SGI0R and ICC_ASGI1R accesses
  KVM: arm64: vgic-v3: Add support for ICC_SGI0R_EL1 and ICC_ASGI1R_EL1 accesses
  KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIs
  ...

5 years agoMerge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block
Linus Torvalds [Wed, 22 Aug 2018 20:38:05 +0000 (13:38 -0700)]
Merge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:

 - Set of bcache fixes and changes (Coly)

 - The flush warn fix (me)

 - Small series of BFQ fixes (Paolo)

 - wbt hang fix (Ming)

 - blktrace fix (Steven)

 - blk-mq hardware queue count update fix (Jianchao)

 - Various little fixes

* tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits)
  block/DAC960.c: make some arrays static const, shrinks object size
  blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter
  blk-mq: init hctx sched after update ctx and hctx mapping
  block: remove duplicate initialization
  tracing/blktrace: Fix to allow setting same value
  pktcdvd: fix setting of 'ret' error return for a few cases
  block: change return type to bool
  block, bfq: return nbytes and not zero from struct cftype .write() method
  block, bfq: improve code of bfq_bfqq_charge_time
  block, bfq: reduce write overcharge
  block, bfq: always update the budget of an entity when needed
  block, bfq: readd missing reset of parent-entity service
  blk-wbt: fix IO hang in wbt_wait()
  block: don't warn for flush on read-only device
  bcache: add the missing comments for smp_mb()/smp_wmb()
  bcache: remove unnecessary space before ioctl function pointer arguments
  bcache: add missing SPDX header
  bcache: move open brace at end of function definitions to next line
  bcache: add static const prefix to char * array declarations
  bcache: fix code comments style
  ...

5 years agoMerge tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk...
Linus Torvalds [Wed, 22 Aug 2018 20:29:39 +0000 (13:29 -0700)]
Merge tag 'f2fs-for-4.19' of git://git./linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've tuned f2fs to improve general performance by
  serializing block allocation and enhancing discard flows like fstrim
  which avoids user IO contention. And we've added fsync_mode=nobarrier
  which gives an option to user where it skips issuing cache_flush
  commands to underlying flash storage. And there are many bug fixes
  related to fuzzed images, revoked atomic writes, quota ops, and minor
  direct IO.

  Enhancements:
   - add fsync_mode=nobarrier which bypasses cache_flush command
   - enhance the discarding flow which avoids user IOs and issues in
     LBA order
   - readahead some encrypted blocks during GC
   - enable in-memory inode checksum to verify the blocks if
     F2FS_CHECK_FS is set
   - enhance nat_bits behavior
   - set -o discard by default
   - set REQ_RAHEAD to bio in ->readpages

  Bug fixes:
   - fix a corner case to corrupt atomic_writes revoking flow
   - revisit i_gc_rwsem to fix race conditions
   - fix some dio behaviors captured by xfstests
   - correct handling errors given by quota-related failures
   - add many sanity check flows to avoid fuzz test failures
   - add more error number propagation to their callers
   - fix several corner cases to continue fault injection w/ shutdown
     loop"

* tag 'f2fs-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (89 commits)
  f2fs: readahead encrypted block during GC
  f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
  f2fs: fix performance issue observed with multi-thread sequential read
  f2fs: fix to skip verifying block address for non-regular inode
  f2fs: rework fault injection handling to avoid a warning
  f2fs: support fault_type mount option
  f2fs: fix to return success when trimming meta area
  f2fs: fix use-after-free of dicard command entry
  f2fs: support discard submission error injection
  f2fs: split discard command in prior to block layer
  f2fs: wake up gc thread immediately when gc_urgent is set
  f2fs: fix incorrect range->len in f2fs_trim_fs()
  f2fs: refresh recent accessed nat entry in lru list
  f2fs: fix avoid race between truncate and background GC
  f2fs: avoid race between zero_range and background GC
  f2fs: fix to do sanity check with block address in main area v2
  f2fs: fix to do sanity check with inline flags
  f2fs: fix to reset i_gc_failures correctly
  f2fs: fix invalid memory access
  f2fs: fix to avoid broken of dnode block list
  ...

5 years agoovl: set I_CREATING on inode being created
Miklos Szeredi [Wed, 22 Aug 2018 08:55:22 +0000 (10:55 +0200)]
ovl: set I_CREATING on inode being created

...otherwise there will be list corruption due to inode_sb_list_add() being
called for inode already on the sb list.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: e950564b97fd ("vfs: don't evict uninitialized inode")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 22 Aug 2018 19:34:08 +0000 (12:34 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - the rest of MM

 - procfs updates

 - various misc things

 - more y2038 fixes

 - get_maintainer updates

 - lib/ updates

 - checkpatch updates

 - various epoll updates

 - autofs updates

 - hfsplus

 - some reiserfs work

 - fatfs updates

 - signal.c cleanups

 - ipc/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits)
  ipc/util.c: update return value of ipc_getref from int to bool
  ipc/util.c: further variable name cleanups
  ipc: simplify ipc initialization
  ipc: get rid of ids->tables_initialized hack
  lib/rhashtable: guarantee initial hashtable allocation
  lib/rhashtable: simplify bucket_table_alloc()
  ipc: drop ipc_lock()
  ipc/util.c: correct comment in ipc_obtain_object_check
  ipc: rename ipcctl_pre_down_nolock()
  ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
  ipc: reorganize initialization of kern_ipc_perm.seq
  ipc: compute kern_ipc_perm.id under the ipc lock
  init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
  fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
  adfs: use timespec64 for time conversion
  kernel/sysctl.c: fix typos in comments
  drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
  fork: don't copy inconsistent signal handler state to child
  signal: make get_signal() return bool
  signal: make sigkill_pending() return bool
  ...

5 years agoipc/util.c: update return value of ipc_getref from int to bool
Manfred Spraul [Wed, 22 Aug 2018 05:02:04 +0000 (22:02 -0700)]
ipc/util.c: update return value of ipc_getref from int to bool

ipc_getref has still a return value of type "int", matching the atomic_t
interface of atomic_inc_not_zero()/atomic_add_unless().

ipc_getref now uses refcount_inc_not_zero, which has a return value of
type "bool".

Therefore, update the return code to avoid implicit conversions.

Link: http://lkml.kernel.org/r/20180712185241.4017-13-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc/util.c: further variable name cleanups
Manfred Spraul [Wed, 22 Aug 2018 05:02:00 +0000 (22:02 -0700)]
ipc/util.c: further variable name cleanups

The varable names got a mess, thus standardize them again:

id: user space id. Called semid, shmid, msgid if the type is known.
    Most functions use "id" already.
idx: "index" for the idr lookup
    Right now, some functions use lid, ipc_addid() already uses idx as
    the variable name.
seq: sequence number, to avoid quick collisions of the user space id
key: user space key, used for the rhash tree

Link: http://lkml.kernel.org/r/20180712185241.4017-12-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc: simplify ipc initialization
Davidlohr Bueso [Wed, 22 Aug 2018 05:01:56 +0000 (22:01 -0700)]
ipc: simplify ipc initialization

Now that we know that rhashtable_init() will not fail, we can get rid of a
lot of the unnecessary cleanup paths when the call errored out.

[manfred@colorfullife.com: variable name added to util.h to resolve checkpatch warning]
Link: http://lkml.kernel.org/r/20180712185241.4017-11-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc: get rid of ids->tables_initialized hack
Davidlohr Bueso [Wed, 22 Aug 2018 05:01:52 +0000 (22:01 -0700)]
ipc: get rid of ids->tables_initialized hack

In sysvipc we have an ids->tables_initialized regarding the rhashtable,
introduced in 0cfb6aee70bd ("ipc: optimize semget/shmget/msgget for lots
of keys")

It's there, specifically, to prevent nil pointer dereferences, from using
an uninitialized api.  Considering how rhashtable_init() can fail
(probably due to ENOMEM, if anything), this made the overall ipc
initialization capable of failure as well.  That alone is ugly, but fine,
however I've spotted a few issues regarding the semantics of
tables_initialized (however unlikely they may be):

- There is inconsistency in what we return to userspace: ipc_addid()
  returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr()
  returns EINVAL.

- After we started using rhashtables, ipc_findkey() can return nil upon
  !tables_initialized, but the caller expects nil for when the ipc
  structure isn't found, and can therefore call into ipcget() callbacks.

Now that rhashtable initialization cannot fail, we can properly get rid of
the hack altogether.

[manfred@colorfullife.com: commit id extended to 12 digits]
Link: http://lkml.kernel.org/r/20180712185241.4017-10-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agolib/rhashtable: guarantee initial hashtable allocation
Davidlohr Bueso [Wed, 22 Aug 2018 05:01:48 +0000 (22:01 -0700)]
lib/rhashtable: guarantee initial hashtable allocation

rhashtable_init() may fail due to -ENOMEM, thus making the entire api
unusable.  This patch removes this scenario, however unlikely.  In order
to guarantee memory allocation, this patch always ends up doing
GFP_KERNEL|__GFP_NOFAIL for both the tbl as well as
alloc_bucket_spinlocks().

Upon the first table allocation failure, we shrink the size to the
smallest value that makes sense and retry with __GFP_NOFAIL semantics.
With the defaults, this means that from 64 buckets, we retry with only 4.
Any later issues regarding performance due to collisions or larger table
resizing (when more memory becomes available) is the least of our
problems.

Link: http://lkml.kernel.org/r/20180712185241.4017-9-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agolib/rhashtable: simplify bucket_table_alloc()
Davidlohr Bueso [Wed, 22 Aug 2018 05:01:45 +0000 (22:01 -0700)]
lib/rhashtable: simplify bucket_table_alloc()

As of ce91f6ee5b3b ("mm: kvmalloc does not fallback to vmalloc for
incompatible gfp flags") we can simplify the caller and trust kvzalloc()
to just do the right thing.  For the case of the GFP_ATOMIC context, we
can drop the __GFP_NORETRY flag for obvious reasons, and for the
__GFP_NOWARN case, however, it is changed such that the caller passes the
flag instead of making bucket_table_alloc() handle it.

This slightly changes the gfp flags passed on to nested_table_alloc() as
it will now also use GFP_ATOMIC | __GFP_NOWARN.  However, I consider this
a positive consequence as for the same reasons we want nowarn semantics in
bucket_table_alloc().

[manfred@colorfullife.com: commit id extended to 12 digits, line wraps updated]
Link: http://lkml.kernel.org/r/20180712185241.4017-8-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc: drop ipc_lock()
Davidlohr Bueso [Wed, 22 Aug 2018 05:01:41 +0000 (22:01 -0700)]
ipc: drop ipc_lock()

ipc/util.c contains multiple functions to get the ipc object pointer given
an id number.

There are two sets of function: One set verifies the sequence counter part
of the id number, other functions do not check the sequence counter.

The standard for function names in ipc/util.c is
- ..._check() functions verify the sequence counter
- ..._idr() functions do not verify the sequence counter

ipc_lock() is an exception: It does not verify the sequence counter value,
but this is not obvious from the function name.

Furthermore, shm.c is the only user of this helper.  Thus, we can simply
move the logic into shm_lock() and get rid of the function altogether.

[manfred@colorfullife.com: most of changelog]
Link: http://lkml.kernel.org/r/20180712185241.4017-7-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc/util.c: correct comment in ipc_obtain_object_check
Manfred Spraul [Wed, 22 Aug 2018 05:01:37 +0000 (22:01 -0700)]
ipc/util.c: correct comment in ipc_obtain_object_check

The comment that explains ipc_obtain_object_check is wrong: The function
checks the sequence number, not the reference counter.

Note that checking the reference counter would be meaningless: The
reference counter is decreased without holding any locks, thus an object
with kern_ipc_perm.deleted=true may disappear at the end of the next rcu
grace period.

Link: http://lkml.kernel.org/r/20180712185241.4017-6-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc: rename ipcctl_pre_down_nolock()
Manfred Spraul [Wed, 22 Aug 2018 05:01:34 +0000 (22:01 -0700)]
ipc: rename ipcctl_pre_down_nolock()

Both the comment and the name of ipcctl_pre_down_nolock() are misleading:
The function must be called while holdling the rw semaphore.

Therefore the patch renames the function to ipcctl_obtain_check(): This
name matches the other names used in util.c:

- "obtain" function look up a pointer in the idr, without
  acquiring the object lock.
- The caller is responsible for locking.
- _check means that the sequence number is checked.

Link: http://lkml.kernel.org/r/20180712185241.4017-5-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()
Manfred Spraul [Wed, 22 Aug 2018 05:01:29 +0000 (22:01 -0700)]
ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid()

ipc_addid() is impossible to use:
- for certain failures, the caller must not use ipc_rcu_putref(),
  because the reference counter is not yet initialized.
- for other failures, the caller must use ipc_rcu_putref(),
  because parallel operations could be ongoing already.

The patch cleans that up, by initializing the refcount early, and by
modifying all callers.

The issues is related to the finding of
syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
issue with reading kern_ipc_perm.seq, here both read and write to already
released memory could happen.

Link: http://lkml.kernel.org/r/20180712185241.4017-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc: reorganize initialization of kern_ipc_perm.seq
Manfred Spraul [Wed, 22 Aug 2018 05:01:25 +0000 (22:01 -0700)]
ipc: reorganize initialization of kern_ipc_perm.seq

ipc_addid() initializes kern_ipc_perm.seq after having called idr_alloc()
(within ipc_idr_alloc()).

Thus a parallel semop() or msgrcv() that uses ipc_obtain_object_check()
may see an uninitialized value.

The patch moves the initialization of kern_ipc_perm.seq before the calls
of idr_alloc().

Notes:
1) This patch has a user space visible side effect:
If /proc/sys/kernel/*_next_id is used (i.e.: checkpoint/restore) and
if semget()/msgget()/shmget() fails in the final step of adding the id
to the rhash tree, then .._next_id is cleared. Before the patch, is
remained unmodified.

There is no change of the behavior after a successful ..get() call: It
always clears .._next_id, there is no impact to non checkpoint/restore
code as that code does not use .._next_id.

2) The patch correctly documents that after a call to ipc_idr_alloc(),
the full tear-down sequence must be used. The callers of ipc_addid()
do not fullfill that, i.e. more bugfixes are required.

The patch is a squash of a patch from Dmitry and my own changes.

Link: http://lkml.kernel.org/r/20180712185241.4017-3-manfred@colorfullife.com
Reported-by: syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoipc: compute kern_ipc_perm.id under the ipc lock
Manfred Spraul [Wed, 22 Aug 2018 05:01:21 +0000 (22:01 -0700)]
ipc: compute kern_ipc_perm.id under the ipc lock

ipc_addid() initializes kern_ipc_perm.id after having called
ipc_idr_alloc().

Thus a parallel semctl() or msgctl() that uses e.g.  MSG_STAT may use this
unitialized value as the return code.

The patch moves all accesses to kern_ipc_perm.id under the spin_lock().

The issues is related to the finding of
syzbot+2827ef6b3385deb07eaf@syzkaller.appspotmail.com: syzbot found an
issue with kern_ipc_perm.seq

Link: http://lkml.kernel.org/r/20180712185241.4017-2-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoinit/Kconfig: remove EXPERT from CHECKPOINT_RESTORE
Adrian Reber [Wed, 22 Aug 2018 05:01:17 +0000 (22:01 -0700)]
init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE

The CHECKPOINT_RESTORE configuration option was introduced in 2012 and
combined with EXPERT.  CHECKPOINT_RESTORE is already enabled in many
distribution kernels and also part of the defconfigs of various
architectures.

To make it easier for distributions to enable CHECKPOINT_RESTORE this
removes EXPERT and moves the configuration option out of the EXPERT block.

Link: http://lkml.kernel.org/r/20180712130733.11510-1-adrian@lisas.de
Signed-off-by: Adrian Reber <adrian@lisas.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp
Arnd Bergmann [Wed, 22 Aug 2018 05:01:13 +0000 (22:01 -0700)]
fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp

get_seconds() is deprecated in favor of ktime_get_real_seconds(), which
returns a 64-bit timestamp.

In the SYSV file system, the superblock timestamp is only 32 bits wide,
and it is used to check whether a file system is clean, so the best
solution seems to be to force a wraparound and explicitly convert it to an
unsigned 32-bit value.

This is independent of the inode timestamps that are also 32-bit wide on
disk and that come from current_time().

Link: http://lkml.kernel.org/r/20180713145236.3152513-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoadfs: use timespec64 for time conversion
Arnd Bergmann [Wed, 22 Aug 2018 05:01:09 +0000 (22:01 -0700)]
adfs: use timespec64 for time conversion

We just truncate the seconds to 32-bit in one place now, so this can
trivially be converted over to using timespec64 consistently.

Link: http://lkml.kernel.org/r/20180620100133.4035614-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agokernel/sysctl.c: fix typos in comments
Randy Dunlap [Wed, 22 Aug 2018 05:01:06 +0000 (22:01 -0700)]
kernel/sysctl.c: fix typos in comments

Fix a few typos/spellos in kernel/sysctl.c.

Link: http://lkml.kernel.org/r/bb09a8b9-f984-6dd4-b07b-3ecaf200862e@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agodrivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md
Colin Ian King [Wed, 22 Aug 2018 05:01:01 +0000 (22:01 -0700)]
drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md

Pointer md is being assigned but is never used hence it is redundant and
can be removed.

Cleans up clang warning:
warning: variable 'md' set but not used [-Wunused-but-set-variable]

Link: http://lkml.kernel.org/r/20180711082346.5223-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexandre Bounine <alex.bou9@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofork: don't copy inconsistent signal handler state to child
Jann Horn [Wed, 22 Aug 2018 05:00:58 +0000 (22:00 -0700)]
fork: don't copy inconsistent signal handler state to child

Before this change, if a multithreaded process forks while one of its
threads is changing a signal handler using sigaction(), the memcpy() in
copy_sighand() can race with the struct assignment in do_sigaction().  It
isn't clear whether this can cause corruption of the userspace signal
handler pointer, but it definitely can cause inconsistency between
different fields of struct sigaction.

Take the appropriate spinlock to avoid this.

I have tested that this patch prevents inconsistency between sa_sigaction
and sa_flags, which is possible before this patch.

Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make get_signal() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:54 +0000 (22:00 -0700)]
signal: make get_signal() return bool

make get_signal() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-18-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make sigkill_pending() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:50 +0000 (22:00 -0700)]
signal: make sigkill_pending() return bool

sigkill_pending() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-17-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make legacy_queue() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:46 +0000 (22:00 -0700)]
signal: make legacy_queue() return bool

legacy_queue() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-16-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make wants_signal() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:42 +0000 (22:00 -0700)]
signal: make wants_signal() return bool

wants_signal() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-15-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make flush_sigqueue_mask() void
Christian Brauner [Wed, 22 Aug 2018 05:00:38 +0000 (22:00 -0700)]
signal: make flush_sigqueue_mask() void

The return value of flush_sigqueue_mask() is never checked anywhere.

Link: http://lkml.kernel.org/r/20180602103653.18181-14-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make unhandled_signal() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:34 +0000 (22:00 -0700)]
signal: make unhandled_signal() return bool

unhandled_signal() already behaves like a boolean function.  Let's
actually declare it as such too.  All callers treat it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-13-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make recalc_sigpending_tsk() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:30 +0000 (22:00 -0700)]
signal: make recalc_sigpending_tsk() return bool

recalc_sigpending_tsk() already behaves like a boolean function.  Let's
actually declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-12-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make has_pending_signals() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:27 +0000 (22:00 -0700)]
signal: make has_pending_signals() return bool

has_pending_signals() already behaves like a boolean function.  Let's
actually declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-11-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make sig_ignored() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:23 +0000 (22:00 -0700)]
signal: make sig_ignored() return bool

sig_ignored() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-10-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make sig_task_ignored() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:19 +0000 (22:00 -0700)]
signal: make sig_task_ignored() return bool

sig_task_ignored() already behaves like a boolean function.  Let's
actually declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-9-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make sig_handler_ignored() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:15 +0000 (22:00 -0700)]
signal: make sig_handler_ignored() return bool

sig_handler_ignored() already behaves like a boolean function.  Let's
actually declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-8-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make kill_ok_by_cred() return bool
Christian Brauner [Wed, 22 Aug 2018 05:00:11 +0000 (22:00 -0700)]
signal: make kill_ok_by_cred() return bool

kill_ok_by_cred() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-7-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: simplify rt_sigaction()
Christian Brauner [Wed, 22 Aug 2018 05:00:07 +0000 (22:00 -0700)]
signal: simplify rt_sigaction()

The goto is not needed and does not add any clarity.  Simply return
-EINVAL on unexpected sigset_t struct size directly.

Link: http://lkml.kernel.org/r/20180602103653.18181-6-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make do_sigpending() void
Christian Brauner [Wed, 22 Aug 2018 05:00:02 +0000 (22:00 -0700)]
signal: make do_sigpending() void

do_sigpending() returned 0 unconditionally so it doesn't make sense to
have it return at all.  This allows us to simplify a bunch of syscall
callers.

Link: http://lkml.kernel.org/r/20180602103653.18181-5-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make may_ptrace_stop() return bool
Christian Brauner [Wed, 22 Aug 2018 04:59:59 +0000 (21:59 -0700)]
signal: make may_ptrace_stop() return bool

may_ptrace_stop() already behaves like a boolean function.  Let's actually
declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-4-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make kill_as_cred_perm() return bool
Christian Brauner [Wed, 22 Aug 2018 04:59:55 +0000 (21:59 -0700)]
signal: make kill_as_cred_perm() return bool

kill_as_cred_perm() already behaves like a boolean function.  Let's
actually declare it as such too.

Link: http://lkml.kernel.org/r/20180602103653.18181-3-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agosignal: make force_sigsegv() void
Christian Brauner [Wed, 22 Aug 2018 04:59:51 +0000 (21:59 -0700)]
signal: make force_sigsegv() void

Patch series "signal: refactor some functions", v3.

This series refactors a bunch of functions in signal.c to simplify parts
of the code.

The greatest single change is declaring the static do_sigpending() helper
as void which makes it possible to remove a bunch of unnecessary checks in
the syscalls later on.

This patch (of 17):

force_sigsegv() returned 0 unconditionally so it doesn't make sense to have
it return at all. In addition, there are no callers that check
force_sigsegv()'s return value.

Link: http://lkml.kernel.org/r/20180602103653.18181-2-christian@brauner.io
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morris <james.morris@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofat: propagate 64-bit inode timestamps
Arnd Bergmann [Wed, 22 Aug 2018 04:59:48 +0000 (21:59 -0700)]
fat: propagate 64-bit inode timestamps

Now that we pass down 64-bit timestamps from VFS, we just need to convert
that correctly into on-disk timestamps.  To make that work correctly, this
changes the last use of time_to_tm() in the kernel to time64_to_tm(),
which also lets use remove that deprecated interfaces.

Similarly, the time_t use in fat_time_fat2unix() truncates the timestamp
on the way in, which can be avoided by using types that are wide enough to
hold the intermediate values during the conversion.

[hirofumi@mail.parknet.co.jp: remove useless temporary variable, needless long long]
Link: http://lkml.kernel.org/r/20180619153646.3637529-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofat: validate ->i_start before using
OGAWA Hirofumi [Wed, 22 Aug 2018 04:59:44 +0000 (21:59 -0700)]
fat: validate ->i_start before using

On corrupted FATfs may have invalid ->i_start.  To handle it, this checks
->i_start before using, and return proper error code.

Link: http://lkml.kernel.org/r/87o9f8y1t5.fsf_-_@mail.parknet.co.jp
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofat: add FITRIM ioctl for FAT file system
Wentao Wang [Wed, 22 Aug 2018 04:59:41 +0000 (21:59 -0700)]
fat: add FITRIM ioctl for FAT file system

Add FITRIM ioctl for FAT file system

[witallwang@gmail.com: use u64s]
Link: http://lkml.kernel.org/r/87h8l37hub.fsf@mail.parknet.co.jp
[hirofumi@mail.parknet.co.jp: bug fixes, coding style fixes, add signal check]
Link: http://lkml.kernel.org/r/87fu10anhj.fsf@mail.parknet.co.jp
Signed-off-by: Wentao Wang <witallwang@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoreiserfs: fix broken xattr handling (heap corruption, bad retval)
Jann Horn [Wed, 22 Aug 2018 04:59:37 +0000 (21:59 -0700)]
reiserfs: fix broken xattr handling (heap corruption, bad retval)

This fixes the following issues:

- When a buffer size is supplied to reiserfs_listxattr() such that each
  individual name fits, but the concatenation of all names doesn't fit,
  reiserfs_listxattr() overflows the supplied buffer.  This leads to a
  kernel heap overflow (verified using KASAN) followed by an out-of-bounds
  usercopy and is therefore a security bug.

- When a buffer size is supplied to reiserfs_listxattr() such that a
  name doesn't fit, -ERANGE should be returned.  But reiserfs instead just
  truncates the list of names; I have verified that if the only xattr on a
  file has a longer name than the supplied buffer length, listxattr()
  incorrectly returns zero.

With my patch applied, -ERANGE is returned in both cases and the memory
corruption doesn't happen anymore.

Credit for making me clean this code up a bit goes to Al Viro, who pointed
out that the ->actor calling convention is suboptimal and should be
changed.

Link: http://lkml.kernel.org/r/20180802151539.5373-1-jannh@google.com
Fixes: 48b32a3553a5 ("reiserfs: use generic xattr handlers")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoreiserfs: change j_timestamp type to time64_t
Arnd Bergmann [Wed, 22 Aug 2018 04:59:34 +0000 (21:59 -0700)]
reiserfs: change j_timestamp type to time64_t

This uses the deprecated time_t type but is write-only, and could be
removed, but as Jeff explains, having a timestamp can be usefule for
post-mortem analysis in crash dumps.

In order to remove one of the last instances of time_t, this changes the
type to time64_t, same as j_trans_start_time.

Link: http://lkml.kernel.org/r/20180622133315.221210-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoreiserfs: remove obsolete print_time function
Arnd Bergmann [Wed, 22 Aug 2018 04:59:30 +0000 (21:59 -0700)]
reiserfs: remove obsolete print_time function

Before linux-2.4.6, print_time() was used to pretty-print an inode time
when running reiserfs in user space, after that it has become obsolete and
is still a bit incorrect: It behaves differently on 32-bit and 64-bit
machines, and uses a static buffer to hold a string, which could lead to
undefined behavior if we ever called this from multiple places
simultaneously.

Since we always want to treat the timestamps as 'unsigned' anyway, simply
printing them as an integer is both simpler and safer while avoiding the
deprecated time_t type.

Link: http://lkml.kernel.org/r/20180620142522.27639-3-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoreiserfs: use monotonic time for j_trans_start_time
Arnd Bergmann [Wed, 22 Aug 2018 04:59:26 +0000 (21:59 -0700)]
reiserfs: use monotonic time for j_trans_start_time

Using CLOCK_REALTIME time_t timestamps breaks on 32-bit systems in 2038,
and gives surprising results with a concurrent settimeofday().

This changes the reiserfs journal timestamps to use ktime_get_seconds()
instead, which makes it use a 64-bit CLOCK_MONOTONIC stamp.

In the procfs output, the monotonic timestamp needs to be converted back
to CLOCK_REALTIME to keep the existing ABI.

Link: http://lkml.kernel.org/r/20180620142522.27639-2-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agohfsplus: drop ACL support
Ernesto A. Fernández [Wed, 22 Aug 2018 04:59:23 +0000 (21:59 -0700)]
hfsplus: drop ACL support

The HFS+ Access Control Lists have not worked at all for the past five
years, and nobody seems to have noticed.  Besides, POSIX draft ACLs are
not compatible with MacOS.  Drop the feature entirely.

Link: http://lkml.kernel.org/r/20180714190608.wtnmmtjqeyladkut@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agohfsplus: fix decomposition of Hangul characters
Ernesto A. Fernández [Wed, 22 Aug 2018 04:59:19 +0000 (21:59 -0700)]
hfsplus: fix decomposition of Hangul characters

Files created under macOS cannot be opened under linux if their names
contain Korean characters, and vice versa.

The Korean alphabet is special because its normalization is done without a
table.  The module deals with it correctly when composing, but forgets
about it for the decomposition.

Fix this using the Hangul decomposition function provided in the Unicode
Standard.  The code fits a bit awkwardly because it requires a buffer,
while all the other normalizations are returned as pointers to the
decomposition table.  This is actually also a bug because reordering may
still be needed, but for now leave it as it is.

The patch will cause trouble for Hangul filenames already created by the
module in the past.  This shouldn't really be concern because its main
purpose was always sharing with macOS.  If a user actually needs to access
such a file the nodecompose mount option should be enough.

Link: http://lkml.kernel.org/r/20180717220951.p6qqrgautc4pxvzu@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reported-by: Ting-Chang Hou <tchou@synology.com>
Tested-by: Ting-Chang Hou <tchou@synology.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agohfsplus: avoid deadlock on file truncation
Ernesto A. Fernández [Wed, 22 Aug 2018 04:59:16 +0000 (21:59 -0700)]
hfsplus: avoid deadlock on file truncation

After an extent is removed from the extent tree, the corresponding bits
are also cleared from the block allocation file.  This is currently done
without releasing the tree lock.

The problem is that the allocation file has extents of its own; if it is
fragmented enough, some of them may be in the extent tree as well, and
hfsplus_get_block() will try to take the lock again.

To avoid deadlock, only hold the extent tree lock during the actual tree
operations.

Link: http://lkml.kernel.org/r/20180709202549.auxwkb6memlegb4a@eaf
Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agohfsplus: don't return 0 when fill_super() failed
Tetsuo Handa [Wed, 22 Aug 2018 04:59:12 +0000 (21:59 -0700)]
hfsplus: don't return 0 when fill_super() failed

syzbot is reporting NULL pointer dereference at mount_fs() [1].  This is
because hfsplus_fill_super() is by error returning 0 when
hfsplus_fill_super() detected invalid filesystem image, and mount_bdev()
is returning NULL because dget(s->s_root) == NULL if s->s_root == NULL,
and mount_fs() is accessing root->d_sb because IS_ERR(root) == false if
root == NULL.  Fix this by returning -EINVAL when hfsplus_fill_super()
detected invalid filesystem image.

[1] https://syzkaller.appspot.com/bug?id=21acb6850cecbc960c927229e597158cf35f33d0

Link: http://lkml.kernel.org/r/d83ce31a-874c-dd5b-f790-41405983a5be@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+01ffaf5d9568dd1609f7@syzkaller.appspotmail.com>
Reviewed-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofs/nilfs2/file.c: use new return type vm_fault_t
Souptick Joarder [Wed, 22 Aug 2018 04:59:08 +0000 (21:59 -0700)]
fs/nilfs2/file.c: use new return type vm_fault_t

Use new return type vm_fault_t for page_mkwrite handler.

Link: http://lkml.kernel.org/r/1529555928-2411-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agonilfs2: use 64-bit superblock timstamps
Arnd Bergmann [Wed, 22 Aug 2018 04:59:05 +0000 (21:59 -0700)]
nilfs2: use 64-bit superblock timstamps

The mount time field in the superblock uses a 64-bit timestamp, but
calling get_seconds() may truncate the current time to 32 bits.

This changes it to ktime_get_real_seconds() to avoid the potential
overflow.

Link: http://lkml.kernel.org/r/20180620075041.4154396-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: add AUTOFS_EXP_FORCED flag
Ian Kent [Wed, 22 Aug 2018 04:59:01 +0000 (21:59 -0700)]
autofs: add AUTOFS_EXP_FORCED flag

The userspace automount(8) daemon is meant to perform a forced expire when
sent a SIGUSR2.

But since the expiration is routed through the kernel and the kernel
doesn't send an expire request if the mount is busy this hasn't worked at
least since autofs version 5.

Add an AUTOFS_EXP_FORCED flag to allow implemention of the feature and
bump the protocol version so user space can check if it's implemented if
needed.

Link: http://lkml.kernel.org/r/152937734715.21213.6594007182776598970.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: make expire flags usage consistent with v5 params
Ian Kent [Wed, 22 Aug 2018 04:58:58 +0000 (21:58 -0700)]
autofs: make expire flags usage consistent with v5 params

Make the usage of the expire flags consistent by naming the expire flags
the same as it is named in the version 5 miscelaneous ioctl parameters and
only check the bit flags when needed.

Link: http://lkml.kernel.org/r/152937734046.21213.9454131988766280028.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: make autofs_expire_indirect() static
Ian Kent [Wed, 22 Aug 2018 04:58:54 +0000 (21:58 -0700)]
autofs: make autofs_expire_indirect() static

autofs_expire_indirect() isn't used outside of fs/autofs/expire.c so make
it static.

Link: http://lkml.kernel.org/r/152937733512.21213.10509996499623738446.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: make autofs_expire_direct() static
Ian Kent [Wed, 22 Aug 2018 04:58:51 +0000 (21:58 -0700)]
autofs: make autofs_expire_direct() static

autofs_expire_direct() isn't used outside of fs/autofs/expire.c so make it
static.

Link: http://lkml.kernel.org/r/152937732944.21213.11821977712410930973.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: fix clearing AUTOFS_EXP_LEAVES in autofs_expire_indirect()
Ian Kent [Wed, 22 Aug 2018 04:58:48 +0000 (21:58 -0700)]
autofs: fix clearing AUTOFS_EXP_LEAVES in autofs_expire_indirect()

The expire flag AUTOFS_EXP_LEAVES is cleared before the second call to
should_expire() in autofs_expire_indirect() but the parameter passed in
the second call is incorrect.

Fortunately AUTOFS_EXP_LEAVES expire flag has not been used for a long
time but might be needed in the future so fix it rather than remove the
expire leaves functionality.

Link: http://lkml.kernel.org/r/152937732410.21213.7447294898147765076.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: fix inconsistent use of now variable
Ian Kent [Wed, 22 Aug 2018 04:58:44 +0000 (21:58 -0700)]
autofs: fix inconsistent use of now variable

The global variable "now" in fs/autofs/expire.c is used in an inconsistent
way, sometimes using jiffies directly, and sometimes using the "now"
variable, and setting it isn't done consistently either.

But the autofs dentry info last_used field is only updated during path
walks or during expire so jiffies can be used directly and the global
variable "now" removed.

Link: http://lkml.kernel.org/r/152937731702.21213.7371321165189170865.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoautofs: fix directory and symlink access
Ian Kent [Wed, 22 Aug 2018 04:58:41 +0000 (21:58 -0700)]
autofs: fix directory and symlink access

Depending on how it is configured the autofs user space daemon can leave
in use mounts mounted at exit and re-connect to them at start up.  But for
this to work best the state of the autofs file system needs to be left
intact over the restart.

Also, at system shutdown, mounts in an autofs file system might be
umounted exposing a mount point trigger for which subsequent access can
lead to a hang.  So recent versions of automount(8) now does its best to
set autofs file system mounts catatonic at shutdown.

When autofs file system mounts are catatonic it's currently possible to
create and remove directories and symlinks which can be a problem at
restart, as described above.

So return EACCES in the directory, symlink and unlink methods if the
autofs file system is catatonic.

Link: http://lkml.kernel.org/r/152902119090.4144.9561910674530214291.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoinit/main.c: log init process file name
Paul Menzel [Wed, 22 Aug 2018 04:58:37 +0000 (21:58 -0700)]
init/main.c: log init process file name

Add a log message to `run_init_process()`.

This log message serves two purposes.

1.  If the init process is not specified on the Linux Kernel command
    line, the user sees, what file was chosen.

2.  The time stamps shows exactly, when the Linux kernel handed over
    control to the init process.

Link: http://lkml.kernel.org/r/b1fc97fa-4aa9-1904-ddb5-859e78995c41@molgen.mpg.de
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoinit/Kconfig: fix its typos
Randy Dunlap [Wed, 22 Aug 2018 04:58:34 +0000 (21:58 -0700)]
init/Kconfig: fix its typos

Correct typos of "it's" to "its.

Link: http://lkml.kernel.org/r/0ac627b6-5527-55f4-0489-1631aa34fc11@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoinit/: remove ineffective sparse disabling
Luc Van Oostenryck [Wed, 22 Aug 2018 04:58:30 +0000 (21:58 -0700)]
init/: remove ineffective sparse disabling

Sparse checking used to be disabled on init/do_mounts.c and a few related
files because "Many of the syscalls used in this file expect some of the
arguments to be __user pointers not __kernel pointers".

However since 28128c61e ("kconfig.h: Include compiler types to avoid
missed struct attributes") the checks are, in fact, not disabled anymore
because of the more early include of "linux/compiler_types.h"

So remove the now ineffective #undefery that was done to disable these
warnings, as well as the associated comment.

Link: http://lkml.kernel.org/r/20180617115355.53799-1-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofs/eventpoll.c: simplify ep_is_linked() callers
Davidlohr Bueso [Wed, 22 Aug 2018 04:58:26 +0000 (21:58 -0700)]
fs/eventpoll.c: simplify ep_is_linked() callers

Instead of having each caller pass the rdllink explicitly, just have
ep_is_linked() pass it while the callers just need the epi pointer.  This
helper is all about the rdllink, and this change, furthermore, improves
the function's self documentation.

Link: http://lkml.kernel.org/r/20180727053432.16679-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofs/eventpoll.c: loosen irq safety in ep_poll()
Davidlohr Bueso [Wed, 22 Aug 2018 04:58:23 +0000 (21:58 -0700)]
fs/eventpoll.c: loosen irq safety in ep_poll()

Similar to other calls, ep_poll() is not called with interrupts disabled,
and we can therefore avoid the irq save/restore dance and just disable
local irqs.  In fact, the call should never be called in irq context at
all, considering that the only path is

epoll_wait(2) -> do_epoll_wait() -> ep_poll().

When running on a 2 socket 40-core (ht) IvyBridge a common pipe based
epoll_wait(2) microbenchmark, the following performance improvements are
seen:

    # threads       vanilla         dirty
 1          1805587     2106412
 2          1854064     2090762
 4          1805484     2017436
 8          1751222     1974475
 16         1725299     1962104
 32         1378463     1571233
 64          787368      900784

Which is a pretty constantly near 15%.

Also add a lockdep check such that we detect any mischief before
deadlocking.

Link: http://lkml.kernel.org/r/20180727053432.16679-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agofs/eventpoll.c: simply CONFIG_NET_RX_BUSY_POLL ifdefery
Davidlohr Bueso [Wed, 22 Aug 2018 04:58:19 +0000 (21:58 -0700)]
fs/eventpoll.c: simply CONFIG_NET_RX_BUSY_POLL ifdefery

... 'tis easier on the eye.

[akpm@linux-foundation.org: use inlines rather than macros]
Link: http://lkml.kernel.org/r/20180725185620.11020-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>