sfrench/cifs-2.6.git
6 years agos390/virtio: change maintainership
Christian Borntraeger [Thu, 2 Mar 2017 16:08:23 +0000 (17:08 +0100)]
s390/virtio: change maintainership

Halil is doing a lot more work in the virtio area on s390 than I
do. Let's reflect the reality in the maintainers file.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agotools/virtio: fix spelling mistake: "wakeus" -> "wakeups"
Colin Ian King [Sat, 29 Apr 2017 22:14:21 +0000 (23:14 +0100)]
tools/virtio: fix spelling mistake: "wakeus" -> "wakeups"

trivial fix to spelling mistake in an error message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
6 years agovirtio_net: tidy a couple debug statements
Dan Carpenter [Thu, 6 Apr 2017 09:04:47 +0000 (12:04 +0300)]
virtio_net: tidy a couple debug statements

We are printing a decimal value for truesize so we shouldn't use an "0x"
prefix.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agoptr_ring: support testing different batching sizes
Michael S. Tsirkin [Fri, 7 Apr 2017 05:45:32 +0000 (08:45 +0300)]
ptr_ring: support testing different batching sizes

Use the param flag for that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agoringtest: support test specific parameters
Michael S. Tsirkin [Fri, 7 Apr 2017 05:44:23 +0000 (08:44 +0300)]
ringtest: support test specific parameters

Add a new flag for passing test-specific parameters.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agoptr_ring: batch ring zeroing
Michael S. Tsirkin [Fri, 7 Apr 2017 05:25:09 +0000 (08:25 +0300)]
ptr_ring: batch ring zeroing

A known weakness in ptr_ring design is that it does not handle well the
situation when ring is almost full: as entries are consumed they are
immediately used again by the producer, so consumer and producer are
writing to a shared cache line.

To fix this, add batching to consume calls: as entries are
consumed do not write NULL into the ring until we get
a multiple (in current implementation 2x) of cache lines
away from the producer. At that point, write them all out.

We do the write out in the reverse order to keep
producer from sharing cache with consumer for as long
as possible.

Writeout also triggers when ring wraps around - there's
no special reason to do this but it helps keep the code
a bit simpler.

What should we do if getting away from producer by 2 cache lines
would mean we are keeping the ring moe than half empty?
Maybe we should reduce the batching in this case,
current patch simply reduces the batching.

Notes:
- it is no longer true that a call to consume guarantees
  that the following call to produce will succeed.
  No users seem to assume that.
- batching can also in theory reduce the signalling rate:
  users that would previously send interrups to the producer
  to wake it up after consuming each entry would now only
  need to do this once in a batch.
  Doing this would be easy by returning a flag to the caller.
  No users seem to do signalling on consume yet so this was not
  implemented yet.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
6 years agovirtio: virtio_driver doc
Cornelia Huck [Thu, 30 Mar 2017 11:13:33 +0000 (13:13 +0200)]
virtio: virtio_driver doc

Add comments for the virtio_driver members that were not documented.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio_net: don't reset twice on XDP on/off
Michael S. Tsirkin [Wed, 29 Mar 2017 20:12:23 +0000 (23:12 +0300)]
virtio_net: don't reset twice on XDP on/off

We already do a reset once in remove_vq_common -
there appears to be no point in doing another one
when we add/remove XDP.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio_net: fix support for small rings
Michael S. Tsirkin [Thu, 9 Mar 2017 00:21:21 +0000 (02:21 +0200)]
virtio_net: fix support for small rings

When ring size is small (<32 entries) making buffers smaller means a
full ring might not be able to hold enough buffers to fit a single large
packet.

Make sure a ring full of buffers is large enough to allow at least one
packet of max size.

Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio_net: reduce alignment for buffers
Michael S. Tsirkin [Mon, 6 Mar 2017 20:21:35 +0000 (22:21 +0200)]
virtio_net: reduce alignment for buffers

We don't need to align length to any particular
value anymore. Aligning to L1 cache size probably
sill makes sense to reduce false sharing.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio_net: rework mergeable buffer handling
Michael S. Tsirkin [Mon, 6 Mar 2017 19:29:47 +0000 (21:29 +0200)]
virtio_net: rework mergeable buffer handling

Use the new _ctx virtio API to maintain true length for each buffer.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agovirtio_net: allow specifying context for rx
Michael S. Tsirkin [Mon, 6 Mar 2017 18:31:21 +0000 (20:31 +0200)]
virtio_net: allow specifying context for rx

With mergeable buffers we never use s/g for rx,
so allow specifying context in that case.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6 years agodrivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison
Karim Eshapa [Tue, 9 May 2017 00:06:50 +0000 (02:06 +0200)]
drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison

Use time_after() for time comparison with the new fix.

Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoDECnet: Use container_of() for embedded struct
Kees Cook [Mon, 8 May 2017 22:31:44 +0000 (15:31 -0700)]
DECnet: Use container_of() for embedded struct

Instead of a direct cross-type cast, use conatiner_of() to locate
the embedded structure, even in the face of future struct layout
randomization.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kernel...
Paolo Bonzini [Tue, 9 May 2017 10:51:49 +0000 (12:51 +0200)]
Merge tag 'kvm-arm-for-v4.12-round2' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

Second round of KVM/ARM Changes for v4.12.

Changes include:
 - A fix related to the 32-bit idmap stub
 - A fix to the bitmask used to deode the operands of an AArch32 CP
   instruction
 - We have moved the files shared between arch/arm/kvm and
   arch/arm64/kvm to virt/kvm/arm
 - We add support for saving/restoring the virtual ITS state to
   userspace

6 years agoKVM: arm/arm64: vgic-its: Cleanup after failed ITT restore
Christoffer Dall [Mon, 8 May 2017 11:31:12 +0000 (13:31 +0200)]
KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore

When failing to restore the ITT for a DTE, we should remove the failed
device entry from the list and free the object.

We slightly refactor vgic_its_destroy to be able to reuse the now
separate vgic_its_free_dte() function.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Don't call map_resources when restoring ITS tables
Christoffer Dall [Mon, 8 May 2017 12:43:45 +0000 (14:43 +0200)]
KVM: arm/arm64: Don't call map_resources when restoring ITS tables

The only reason we called kvm_vgic_map_resources() when restoring the
ITS tables was because we wanted to have the KVM iodevs registered in
the KVM IO bus framework at the time when the ITS was restored such that
a restored and active device can inject MSIs prior to otherwise calling
kvm_vgic_map_resources() from the first run of a VCPU.

Since we now register the KVM iodevs for the redestributors and ITS as
soon as possible (when setting the base addresses), we no longer need
this call and kvm_vgic_map_resources() is again called only when first
running a VCPU.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Register ITS iodev when setting base address
Christoffer Dall [Mon, 8 May 2017 11:14:57 +0000 (13:14 +0200)]
KVM: arm/arm64: Register ITS iodev when setting base address

We have to register the ITS iodevice before running the VM, because in
migration scenarios, we may be restoring a live device that wishes to
inject MSIs before the VCPUs have started.

All we need to register the ITS io device is the base address of the
ITS, so we can simply register that when the base address of the ITS is
set.

  [ Code to fix concurrency issues when setting the ITS base address and
    to fix the undef base address check written by Marc Zyngier ]

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Get rid of its->initialized field
Marc Zyngier [Mon, 8 May 2017 17:15:40 +0000 (18:15 +0100)]
KVM: arm/arm64: Get rid of its->initialized field

The its->initialized doesn't bring much to the table, and creates
unnecessary ordering between setting the address and initializing it
(which amounts to exactly nothing).

Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
it deserves to be.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs
Christoffer Dall [Mon, 8 May 2017 10:30:24 +0000 (12:30 +0200)]
KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs

Instead of waiting with registering KVM iodevs until the first VCPU is
run, we can actually create the iodevs when the redist base address is
set.  The only downside is that we must now also check if we need to do
this for VCPUs which are created after creating the VGIC, because there
is no enforced ordering between creating the VGIC (and setting its base
addresses) and creating the VCPUs.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Slightly rework kvm_vgic_addr
Christoffer Dall [Mon, 8 May 2017 10:28:19 +0000 (12:28 +0200)]
KVM: arm/arm64: Slightly rework kvm_vgic_addr

As we are about to handle setting the address for the redistributor base
region separately from some of the other base addresses, let's rework
this function to leave a little more room for being flexible in what
each type of base address does.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Make vgic_v3_check_base more broadly usable
Christoffer Dall [Mon, 8 May 2017 10:23:51 +0000 (12:23 +0200)]
KVM: arm/arm64: Make vgic_v3_check_base more broadly usable

As we are about to fiddle with the IO device registration mechanism,
let's be a little more careful when setting base addresses as early as
possible.  When setting a base address, we can check that there's
address space enough for its scope and when the last of the two
base addresses (dist and redist) get set, we can also check if the
regions overlap at that time.

This allows us to provide error messages to the user at time when trying
to set the base address, as opposed to later when trying to run the VM.

To do this,  we make vgic_v3_check_base available in the core vgic-v3
code as well as in the other parts of the GICv3 code, namely the MMIO
config code.

We also return true for undefined base addresses so that the function
can be used before all base addresses are set; all callers already check
for uninitialized addresses before calling this function.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Refactor vgic_register_redist_iodevs
Christoffer Dall [Mon, 8 May 2017 10:18:26 +0000 (12:18 +0200)]
KVM: arm/arm64: Refactor vgic_register_redist_iodevs

Split out the function to register all the redistributor iodevs into a
function that handles a single redistributor at a time in preparation
for being able to call this per VCPU as these get created.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus
Christoffer Dall [Mon, 8 May 2017 18:38:40 +0000 (20:38 +0200)]
KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus

There are occasional needs to use the index of vcpu in the kvm->vcpus
array to map something related to a VCPU.  For example, unlike the
vcpu->vcpu_id, the vcpu index is guaranteed to not be sparse across all
vcpus which is useful when allocating a memory area for each vcpu.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agonVMX: Advertise PML to L1 hypervisor
Bandan Das [Fri, 5 May 2017 19:25:15 +0000 (15:25 -0400)]
nVMX: Advertise PML to L1 hypervisor

Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agonVMX: Implement emulated Page Modification Logging
Bandan Das [Fri, 5 May 2017 19:25:14 +0000 (15:25 -0400)]
nVMX: Implement emulated Page Modification Logging

With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.

This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agokvm: x86: Add a hook for arch specific dirty logging emulation
Bandan Das [Fri, 5 May 2017 19:25:13 +0000 (15:25 -0400)]
kvm: x86: Add a hook for arch specific dirty logging emulation

When KVM updates accessed/dirty bits, this hook can be used
to invoke an arch specific function that implements/emulates
dirty logging such as PML.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agokvm: nVMX: Validate CR3 target count on nested VM-entry
Jim Mattson [Fri, 5 May 2017 18:28:09 +0000 (11:28 -0700)]
kvm: nVMX: Validate CR3 target count on nested VM-entry

According to the SDM, the CR3-target count must not be greater than
4. Future processors may support a different number of CR3-target
values. Software should read the VMX capability MSR IA32_VMX_MISC to
determine the number of values supported.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoMerge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus...
Paolo Bonzini [Tue, 9 May 2017 09:50:01 +0000 (11:50 +0200)]
Merge branch 'kvm-ppc-next' of git://git./linux/kernel/git/paulus/powerpc into HEAD

The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]

6 years agoKVM: set no_llseek in stat_fops_per_vm
Geliang Tang [Sat, 6 May 2017 15:37:19 +0000 (23:37 +0800)]
KVM: set no_llseek in stat_fops_per_vm

In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we
use nonseekable_open() to open, we should use no_llseek() to seek,
not generic_file_llseek().

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoKVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable
Christoffer Dall [Mon, 8 May 2017 10:09:13 +0000 (12:09 +0200)]
KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable

This function really doesn't init anything, it enables the CPU
interface, so name it as such, which gives us the name to use for actual
init work later on.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoKVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI
Christoffer Dall [Mon, 8 May 2017 07:31:53 +0000 (09:31 +0200)]
KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI

Clarify what is meant by the save/restore ABI only supporting virtual
physical interrupts.

Relax the requirement of the order that the collection entries are
written in and be clear that there is no particular ordering enforced.

Some cosmetic changes in the capitalization of ID names to align with
the GICv3 manual and remove the empty line in the bottom of the patch.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
6 years agoMerge tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 9 May 2017 03:43:30 +0000 (20:43 -0700)]
Merge tag 'linux-kselftest-4.12-rc1' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:
 "This update consists of:

   - important fixes for build failures and clean target related
     warnings to address regressions introduced in commit 88baa78d1f31
     ("selftests: remove duplicated all and clean target")

   - several minor spelling fixes in and log messages and comment
     blocks.

   - Enabling configs for better test coverage in ftrace, vm, and
     cpufreq tests.

   - .gitignore changes"

* tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (26 commits)
  selftests: x86: add missing executables to .gitignore
  selftests: watchdog: accept multiple params on command line
  selftests: create cpufreq kconfig fragments
  selftests: x86: override clean in lib.mk to fix warnings
  selftests: sync: override clean in lib.mk to fix warnings
  selftests: splice: override clean in lib.mk to fix warnings
  selftests: gpio: fix clean target to remove all generated files and dirs
  selftests: add gpio generated files to .gitignore
  selftests: powerpc: override clean in lib.mk to fix warnings
  selftests: gpio: override clean in lib.mk to fix warnings
  selftests: futex: override clean in lib.mk to fix warnings
  selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean
  selftests: splice: fix clean target to not remove default_file_splice_read.sh
  selftests: gpio: add config fragment for gpio-mockup
  selftests: breakpoints: allow to cross-compile for aarch64/arm64
  selftests/Makefile: Add missed PHONY targets
  selftests/vm/run_vmtests: Fix wrong comment
  selftests/Makefile: Add missed closing `"` in comment
  selftests/vm/run_vmtests: Polish output text
  selftests/timers: fix spelling mistake: "Asynchronous"
  ...

6 years agoMerge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Tue, 9 May 2017 03:36:38 +0000 (20:36 -0700)]
Merge tag 'trace-v4.12-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
 "These are three simple changes.

  The first one is just a switch from using strcpy() to strlcpy().
  Someone thought that it may cause an overflow bug, but since it only
  copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm
  should ever be bigger than that, nor not end with a nul character,
  this change is more of a safety precaution than fixing anything that
  is actually broken.

  The other two changes are simply cleaning and optimizing some code"

* tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Simplify ftrace_match_record() even more
  ftrace: Remove an unneeded condition
  tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

6 years agoMerge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 9 May 2017 03:07:29 +0000 (20:07 -0700)]
Merge tags 'for-linus' and 'for-next' of git://git./linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool

6 years agoRevert "ipv4: restore rt->fi for reference counting"
David S. Miller [Tue, 9 May 2017 02:35:32 +0000 (22:35 -0400)]
Revert "ipv4: restore rt->fi for reference counting"

This reverts commit 82486aa6f1b9bc8145e6d0fa2bc0b44307f3b875.

As implemented, this causes dangling netdevice refs.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Tue, 9 May 2017 02:03:25 +0000 (19:03 -0700)]
Merge tag 'pci-v4.12-changes' of git://git./linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

 - use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

 - clean up and unify mmap of PCI BARs (David Woodhouse)

 - export and unify Function Level Reset support (Christoph Hellwig)

 - avoid FLR for Intel 82579 NICs (Sasha Neftin)

 - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

 - short-circuit config access failures for disconnected devices (Keith
   Busch)

 - remove D3 sleep delay when possible (Adrian Hunter)

 - freeze PME scan before suspending devices (Lukas Wunner)

 - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

 - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

 - add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

 - add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

 - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

 - fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

 - add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

 - add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

 - add i.MX7D support (Andrey Smirnov)

 - use generic MSI support for Aardvark (Thomas Petazzoni)

 - make Rockchip driver modular (Brian Norris)

 - advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

 - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

 - convert atomic_t to refcount_t in HV driver (Elena Reshetova)

 - add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

 - fix PCI bus removal in HV driver (Long Li)

 - add support for ThunderX2 DMA alias topology (Jayachandran C)

 - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

 - add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

 - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...

6 years agoMerge tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Tue, 9 May 2017 01:49:23 +0000 (18:49 -0700)]
Merge tag 'tty-4.12-rc1' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the "big" TTY/Serial patch updates for 4.12-rc1

  Not a lot of new things here, the normal number of serial driver
  updates and additions, tiny bugs fixed, and some core files split up
  to make future changes a bit easier for Nicolas's "tiny-tty" work.

  All of these have been in linux-next for a while"

* tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits)
  serial: small Makefile reordering
  tty: split job control support into a file of its own
  tty: move baudrate handling code to a file of its own
  console: move console_init() out of tty_io.c
  serial: 8250_early: Add earlycon support for Palmchip UART
  tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
  vt: make mouse selection of non-ASCII consistent
  vt: set mouse selection word-chars to gpm's default
  imx-serial: Reduce RX DMA startup latency when opening for reading
  serial: omap: suspend device on probe errors
  serial: omap: fix runtime-pm handling on unbind
  tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
  serial: samsung: Remove useless spinlock
  serial: samsung: Add missing checks for dma_map_single failure
  serial: samsung: Use right device for DMA-mapping calls
  serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
  tty: fix comment typo s/repsonsible/responsible/
  tty: amba-pl011: Fix spurious TX interrupts
  serial: xuartps: Enable clocks in the pm disable case also
  serial: core: Re-use struct uart_port {name} field
  ...

6 years agotracing: Use cpumask_available() to check if cpumask variable may be used
Matthias Kaehlcke [Fri, 21 Apr 2017 23:41:10 +0000 (16:41 -0700)]
tracing: Use cpumask_available() to check if cpumask variable may be used

This fixes the following clang warning:

kernel/trace/trace.c:3231:12: warning: address of array 'iter->started'
  will always evaluate to 'true' [-Wpointer-bool-conversion]
        if (iter->started)

Link: http://lkml.kernel.org/r/20170421234110.117075-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Tue, 9 May 2017 01:17:56 +0000 (18:17 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - the rest of MM

 - various misc things

 - procfs updates

 - lib/ updates

 - checkpatch updates

 - kdump/kexec updates

 - add kvmalloc helpers, use them

 - time helper updates for Y2038 issues. We're almost ready to remove
   current_fs_time() but that awaits a btrfs merge.

 - add tracepoints to DAX

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
  selftests/vm: add a test for virtual address range mapping
  dax: add tracepoint to dax_insert_mapping()
  dax: add tracepoint to dax_writeback_one()
  dax: add tracepoints to dax_writeback_mapping_range()
  dax: add tracepoints to dax_load_hole()
  dax: add tracepoints to dax_pfn_mkwrite()
  dax: add tracepoints to dax_iomap_pte_fault()
  mtd: nand: nandsim: convert to memalloc_noreclaim_*()
  treewide: convert PF_MEMALLOC manipulations to new helpers
  mm: introduce memalloc_noreclaim_{save,restore}
  mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
  mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
  mm/huge_memory.c: use zap_deposited_table() more
  time: delete CURRENT_TIME_SEC and CURRENT_TIME
  gfs2: replace CURRENT_TIME with current_time
  apparmorfs: replace CURRENT_TIME with current_time()
  lustre: replace CURRENT_TIME macro
  fs: ubifs: replace CURRENT_TIME_SEC with current_time
  fs: ufs: use ktime_get_real_ts64() for birthtime
  ...

6 years agodrivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
Andrew Morton [Mon, 8 May 2017 23:00:22 +0000 (16:00 -0700)]
drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4

  drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer
  drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0].<anonymous>.template_ahash')
  drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer
  drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].<anonymous>')
  drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer
  drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].<anonymous>')
  ...

gcc-4.4.4 has issues with anon union initializers.  Work around this.

Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoselftests/vm: add a test for virtual address range mapping
Anshuman Khandual [Mon, 8 May 2017 23:00:19 +0000 (16:00 -0700)]
selftests/vm: add a test for virtual address range mapping

This verifies virtual address mapping below and above the 128TB range
and makes sure that address returned are within the expected range
depending upon the hint passed from the user space.

Link: http://lkml.kernel.org/r/20170418095252.20533-1-khandual@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodax: add tracepoint to dax_insert_mapping()
Ross Zwisler [Mon, 8 May 2017 23:00:16 +0000 (16:00 -0700)]
dax: add tracepoint to dax_insert_mapping()

Add a tracepoint to dax_insert_mapping(), following the same logging
conventions as the rest of DAX.  This tracepoint, along with the one in
dax_load_hole(), lets us know how a DAX PTE fault was serviced.

Here is an example DAX fault that inserts a PTE mapping:

  small-1126  [007] ....
   145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1126  [007] ....
   145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006

  small-1126  [007] ....
   145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-7-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodax: add tracepoint to dax_writeback_one()
Ross Zwisler [Mon, 8 May 2017 23:00:13 +0000 (16:00 -0700)]
dax: add tracepoint to dax_writeback_one()

Add a tracepoint to dax_writeback_one(), following the same logging
conventions as the rest of DAX.

Here is an example range writeback which ends up flushing one PMD and
one PTE:

  test-1265  [003] ....
   496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

  test-1265  [003] ....
   496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200

  test-1265  [003] ....
   496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1

  test-1265  [003] ....
   496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

[akpm@linux-foundation.org: struct blk_dax_ctl has disappeared]
Link: http://lkml.kernel.org/r/20170221195116.13278-6-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodax: add tracepoints to dax_writeback_mapping_range()
Ross Zwisler [Mon, 8 May 2017 23:00:10 +0000 (16:00 -0700)]
dax: add tracepoints to dax_writeback_mapping_range()

Add tracepoints to dax_writeback_mapping_range(), following the same
logging conventions as the rest of DAX.

Here is an example writeback call:

  msync-1085  [006] ....
   200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

  msync-1085  [006] ....
   200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

[ross.zwisler@linux.intel.com: fix regression in dax_writeback_mapping_range()]
Link: http://lkml.kernel.org/r/20170314215358.31451-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170221195116.13278-5-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodax: add tracepoints to dax_load_hole()
Ross Zwisler [Mon, 8 May 2017 23:00:07 +0000 (16:00 -0700)]
dax: add tracepoints to dax_load_hole()

Add tracepoints to dax_load_hole(), following the same logging conventions
as the rest of DAX.

Here is the logging generated by a PTE read from a hole:

  read-1075  [002] ....
    62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280

  read-1075  [002] ....
    62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

  read-1075  [002] ....
    62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodax: add tracepoints to dax_pfn_mkwrite()
Ross Zwisler [Mon, 8 May 2017 23:00:03 +0000 (16:00 -0700)]
dax: add tracepoints to dax_pfn_mkwrite()

Add tracepoints to dax_pfn_mkwrite(), following the same logging
conventions as the rest of DAX.

Here is an example PTE fault followed by a pfn_mkwrite:

  small_aligned-1094  [002] ....
   374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200

  small_aligned-1094  [002] ....
   374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE

  small_aligned-1094  [002] ....
   374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-3-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodax: add tracepoints to dax_iomap_pte_fault()
Ross Zwisler [Mon, 8 May 2017 23:00:00 +0000 (16:00 -0700)]
dax: add tracepoints to dax_iomap_pte_fault()

Patch series "second round of tracepoints for DAX".

This second round of DAX tracepoint patches adds tracing to the PTE
fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
dax_insert_mapping()) and to the writeback path
(dax_writeback_mapping_range(), dax_writeback_one()).

The purpose of this tracing is to give us a high level view of what DAX
is doing, whether faults are being serviced by PMDs or PTEs, and by real
storage or by zero pages covering holes.

I do have some patches nearly ready which also add tracing to
grab_mapping_entry() and dax_insert_mapping_entry().  These are more
targeted at logging how we are interacting with the radix tree, how we
use empty entries for locking, whether we "downgrade" huge zero pages to
4k PTE sized allocations, etc.  In the end it seemed to me that this
might be too detailed to have as constantly present tracepoints, but if
anyone sees value in having tracepoints like this in the DAX code
permanently (Jan?), please let me know and I'll add those last two
patches.

All these tracepoints were done to be consistent with the style of the
XFS tracepoints and with the existing DAX PMD tracepoints.

This patch (of 6):

Add tracepoints to dax_iomap_pte_fault(), following the same logging
conventions as the rest of DAX.

Here is an example fault that initially tries to be serviced by the PMD
fault handler but which falls back to PTEs because the VMA isn't large
enough to hold a PMD:

  small-1086  [005] ....
   71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003

  small-1086  [005] ....
    71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400

  small-1086  [005] ....
    71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK

  small-1086  [005] ....
    71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1086  [005] ....
    71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/20170221195116.13278-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomtd: nand: nandsim: convert to memalloc_noreclaim_*()
Vlastimil Babka [Mon, 8 May 2017 22:59:57 +0000 (15:59 -0700)]
mtd: nand: nandsim: convert to memalloc_noreclaim_*()

Nandsim has own functions set_memalloc() and clear_memalloc() for robust
setting and clearing of PF_MEMALLOC.  Replace them by the new generic
helpers.  No functional change.

Link: http://lkml.kernel.org/r/20170405074700.29871-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotreewide: convert PF_MEMALLOC manipulations to new helpers
Vlastimil Babka [Mon, 8 May 2017 22:59:53 +0000 (15:59 -0700)]
treewide: convert PF_MEMALLOC manipulations to new helpers

We now have memalloc_noreclaim_{save,restore} helpers for robust setting
and clearing of PF_MEMALLOC.  Let's convert the code which was using the
generic tsk_restore_flags().  No functional change.

[vbabka@suse.cz: in net/core/sock.c the hunk is missing]
Link: http://lkml.kernel.org/r/20170405074700.29871-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Wouter Verhelst <w@uter.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: introduce memalloc_noreclaim_{save,restore}
Vlastimil Babka [Mon, 8 May 2017 22:59:50 +0000 (15:59 -0700)]
mm: introduce memalloc_noreclaim_{save,restore}

The previous patch ("mm: prevent potential recursive reclaim due to
clearing PF_MEMALLOC") has shown that simply setting and clearing
PF_MEMALLOC in current->flags can result in wrongly clearing a
pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
Let's introduce helpers that support proper nesting by saving the
previous stat of the flag, similar to the existing memalloc_noio_* and
memalloc_nofs_* helpers.  Convert existing setting/clearing of
PF_MEMALLOC within mm to the new helpers.

There are no known issues with the converted code, but the change makes
it more robust.

Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
Vlastimil Babka [Mon, 8 May 2017 22:59:46 +0000 (15:59 -0700)]
mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC

Patch series "more robust PF_MEMALLOC handling"

This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim.  There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag.  This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier).  Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them.  Patches 3
and 4 convert non-mm code.

This patch (of 4):

__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()).  Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed609 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.

Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true.  There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.

Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath")
Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
Oliver O'Halloran [Mon, 8 May 2017 22:59:43 +0000 (15:59 -0700)]
mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required

Although all architectures use a deposited page table for THP on
anonymous VMAs, some architectures (s390 and powerpc) require the
deposited storage even for file backed VMAs due to quirks of their MMUs.

This patch adds support for depositing a table in DAX PMD fault handling
path for archs that require it.  Other architectures should see no
functional changes.

Link: http://lkml.kernel.org/r/20170411174233.21902-3-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: linux-nvdimm@ml01.01.org
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm/huge_memory.c: use zap_deposited_table() more
Oliver O'Halloran [Mon, 8 May 2017 22:59:40 +0000 (15:59 -0700)]
mm/huge_memory.c: use zap_deposited_table() more

Depending on the flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed.  In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper.  This patch
converts the others to use the helper to clean things up a bit.

Link: http://lkml.kernel.org/r/20170411174233.21902-2-oohall@gmail.com
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: linux-nvdimm@ml01.01.org
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotime: delete CURRENT_TIME_SEC and CURRENT_TIME
Deepa Dinamani [Mon, 8 May 2017 22:59:37 +0000 (15:59 -0700)]
time: delete CURRENT_TIME_SEC and CURRENT_TIME

All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced
by other time functions.  These macros are also not y2038 safe.  And,
all their use cases can be fulfilled by y2038 safe ktime_get_* variants.

Link: http://lkml.kernel.org/r/1491613030-11599-12-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agogfs2: replace CURRENT_TIME with current_time
Stephen Rothwell [Mon, 8 May 2017 22:59:34 +0000 (15:59 -0700)]
gfs2: replace CURRENT_TIME with current_time

Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoapparmorfs: replace CURRENT_TIME with current_time()
Deepa Dinamani [Mon, 8 May 2017 22:59:31 +0000 (15:59 -0700)]
apparmorfs: replace CURRENT_TIME with current_time()

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time().

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/1491613030-11599-11-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolustre: replace CURRENT_TIME macro
Deepa Dinamani [Mon, 8 May 2017 22:59:28 +0000 (15:59 -0700)]
lustre: replace CURRENT_TIME macro

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for others.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as lustre uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/1491613030-11599-10-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: James Simmons <jsimmons@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs: ubifs: replace CURRENT_TIME_SEC with current_time
Deepa Dinamani [Mon, 8 May 2017 22:59:25 +0000 (15:59 -0700)]
fs: ubifs: replace CURRENT_TIME_SEC with current_time

CURRENT_TIME_SEC is not y2038 safe.  current_time() will be transitioned
to use 64 bit time along with vfs in a separate patch.  There is no plan
to transition CURRENT_TIME_SEC to use y2038 safe time interfaces.

current_time() returns timestamps according to the granularities set in
the inode's super_block.  The granularity check to call
current_fs_time() or CURRENT_TIME_SEC is not required.

Use current_time() directly to update inode timestamp.  Use
timespec_trunc during file system creation, before the first inode is
created.

Link: http://lkml.kernel.org/r/1491613030-11599-9-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs: ufs: use ktime_get_real_ts64() for birthtime
Deepa Dinamani [Mon, 8 May 2017 22:59:22 +0000 (15:59 -0700)]
fs: ufs: use ktime_get_real_ts64() for birthtime

CURRENT_TIME is not y2038 safe.  Replace it with ktime_get_real_ts64().
Inode time formats are already 64 bit long and accommodates time64_t.

Link: http://lkml.kernel.org/r/1491613030-11599-6-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs: ceph: CURRENT_TIME with ktime_get_real_ts()
Deepa Dinamani [Mon, 8 May 2017 22:59:19 +0000 (15:59 -0700)]
fs: ceph: CURRENT_TIME with ktime_get_real_ts()

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
M: Ilya Dryomov <idryomov@gmail.com>
M: "Yan, Zheng" <zyan@redhat.com>
M: Sage Weil <sage@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs: cifs: replace CURRENT_TIME by other appropriate apis
Deepa Dinamani [Mon, 8 May 2017 22:59:16 +0000 (15:59 -0700)]
fs: cifs: replace CURRENT_TIME by other appropriate apis

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for authentication
timestamps and timezone calculations.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

The inode timestamps read from the server are assumed to have correct
granularity and range.

The patch also assumes that the difference between server and client
times lie in the range INT_MIN..INT_MAX.  This is valid because this is
the difference between current times between server and client, and the
largest timezone difference is in the range of one day.

All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned to
use timespec64.

Link: http://lkml.kernel.org/r/1491613030-11599-4-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotrace: make trace_hwlat timestamp y2038 safe
Deepa Dinamani [Mon, 8 May 2017 22:59:13 +0000 (15:59 -0700)]
trace: make trace_hwlat timestamp y2038 safe

struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.

Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.

Link: http://lkml.kernel.org/r/1491613030-11599-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs: f2fs: use ktime_get_real_seconds for sit_info times
Deepa Dinamani [Mon, 8 May 2017 22:59:10 +0000 (15:59 -0700)]
fs: f2fs: use ktime_get_real_seconds for sit_info times

CURRENT_TIME_SEC is not y2038 safe.

Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment
timestamps used by GC algorithm including the segment mtime timestamps.

Link: http://lkml.kernel.org/r/1491613030-11599-2-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoformat-security: move static strings to const
Kees Cook [Mon, 8 May 2017 22:59:05 +0000 (15:59 -0700)]
format-security: move static strings to const

While examining output from trial builds with -Wformat-security enabled,
many strings were found that should be defined as "const", or as a char
array instead of char pointer.  This makes some static analysis easier,
by producing fewer false positives.

As these are all trivial changes, it seemed best to put them all in a
single patch rather than chopping them up per maintainer.

Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jes Sorensen <jes@trained-monkey.org> [runner.c]
Cc: Tony Lindgren <tony@atomide.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Patrice Chotard <patrice.chotard@st.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kejian Yan <yankejian@huawei.com>
Cc: Daode Huang <huangdaode@hisilicon.com>
Cc: Qianqian Xie <xieqianqian@huawei.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Christian Gromm <christian.gromm@microchip.com>
Cc: Andrey Shvetsov <andrey.shvetsov@k2l.de>
Cc: Jason Litzinger <jlitzingerdev@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoDocumentation/vm/transhuge.txt: fix trivial typos
SeongJae Park [Mon, 8 May 2017 22:59:02 +0000 (15:59 -0700)]
Documentation/vm/transhuge.txt: fix trivial typos

[akpm@linux-foundation.org: fixes per Randy]
Link: http://lkml.kernel.org/r/20170405210259.2067-1-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag
Tetsuo Handa [Mon, 8 May 2017 22:58:59 +0000 (15:58 -0700)]
fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag

Commit afddba49d18f ("fs: introduce write_begin, write_end, and
perform_write aops") introduced AOP_FLAG_UNINTERRUPTIBLE flag which was
checked in pagecache_write_begin(), but that check was removed by
4e02ed4b4a2f ("fs: remove prepare_write/commit_write").

Between these two commits, commit d9414774dc0c ("cifs: Convert cifs to
new aops.") added a check in cifs_write_begin(), but that check was soon
removed by commit a98ee8c1c707 ("[CIFS] fix regression in
cifs_write_begin/cifs_write_end").

Therefore, AOP_FLAG_UNINTERRUPTIBLE flag is checked nowhere.  Let's
remove this flag.  This patch has no functionality changes.

Link: http://lkml.kernel.org/r/1489294781-53494-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoinclude/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec
Andi Kleen [Mon, 8 May 2017 22:58:56 +0000 (15:58 -0700)]
include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec

pagefault_disabled_dec is frequently used inline, and it has a WARN_ON
for underflow that expands to about 6.5k of extra code.  The warning
doesn't seem to be that useful and worth so much code so remove it.

If it was needed could make it depending on some debug kernel option.

Saves ~6.5k in my kernel

     text    data     bss     dec     hex filename
  9039417 5367568 11116544        25523529        1857549 vmlinux-before-pf
  9032805 5367568 11116544        25516917        1855b75 vmlinux-pf

Link: http://lkml.kernel.org/r/20170315021431.13107-8-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/scsi/megaraid: remove expensive inline from megasas_return_cmd
Andi Kleen [Mon, 8 May 2017 22:58:53 +0000 (15:58 -0700)]
drivers/scsi/megaraid: remove expensive inline from megasas_return_cmd

Remove an inline from a fairly big function that is used often.  It's
unlikely that calling or not calling it makes a lot of difference.

Saves around 8k text in my kernel.

     text    data     bss     dec     hex filename
  9047801 5367568 11116544        25531913        1859609 vmlinux-before-megasas
  9039417 5367568 11116544        25523529        1857549 vmlinux-megasas

Link: http://lkml.kernel.org/r/20170315021431.13107-7-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Kashyap Desai <kashyap.desai@avagotech.com>
Cc: Sumit Saxena <sumit.saxena@avagotech.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokref: remove WARN_ON for NULL release functions
Andi Kleen [Mon, 8 May 2017 22:58:50 +0000 (15:58 -0700)]
kref: remove WARN_ON for NULL release functions

The kref functions check for NULL release functions.  This WARN_ON seems
rather pointless.  We will eventually release and then just crash
nicely.  It is also somewhat expensive because these functions are
inlined in a lot of places.  Removing the WARN_ONs saves around 2.3k in
this kernel (likely more in others with more drivers)

     text    data     bss     dec     hex filename
  9083992 5367600 11116544        25568136        1862388 vmlinux-before-load-avg
  9070166 5367600 11116544        25554310        185ed86 vmlinux-load-avg

Link: http://lkml.kernel.org/r/20170315021431.13107-5-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotreewide: decouple cacheflush.h and set_memory.h
Laura Abbott [Mon, 8 May 2017 22:58:47 +0000 (15:58 -0700)]
treewide: decouple cacheflush.h and set_memory.h

Now that all call sites, completely decouple cacheflush.h and
set_memory.h

[sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling]
Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au
Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/staging/media/atomisp/pci/atomisp2: use set_memory.h
Andrew Morton [Mon, 8 May 2017 22:58:44 +0000 (15:58 -0700)]
drivers/staging/media/atomisp/pci/atomisp2: use set_memory.h

Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:41 +0000 (15:58 -0700)]
drivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-16-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/misc/sram-exec.c: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:38 +0000 (15:58 -0700)]
drivers/misc/sram-exec.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-15-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoalsa: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:35 +0000 (15:58 -0700)]
alsa: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-14-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/power/snapshot.c: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:32 +0000 (15:58 -0700)]
kernel/power/snapshot.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-13-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/module.c: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:29 +0000 (15:58 -0700)]
kernel/module.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-12-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoinclude/linux/filter.h: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:26 +0000 (15:58 -0700)]
include/linux/filter.h: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-11-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/watchdog/hpwdt.c: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:23 +0000 (15:58 -0700)]
drivers/watchdog/hpwdt.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-10-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/hwtracing/intel_th/msu.c: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:20 +0000 (15:58 -0700)]
drivers/hwtracing/intel_th/msu.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-9-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrm: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:17 +0000 (15:58 -0700)]
drm: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

[akpm@linux-foundation.org: track drivers/gpu/drm/i915/i915_gem_gtt.c linux-next changes]
Link: http://lkml.kernel.org/r/1488920133-27229-8-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoagp: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:14 +0000 (15:58 -0700)]
agp: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-7-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agox86: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:11 +0000 (15:58 -0700)]
x86: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agos390: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:08 +0000 (15:58 -0700)]
s390: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly

Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoarm64: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:05 +0000 (15:58 -0700)]
arm64: use set_memory.h header

The set_memory_* functions have moved to set_memory.h.  Use that header
explicitly.

Link: http://lkml.kernel.org/r/1488920133-27229-4-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoarm: use set_memory.h header
Laura Abbott [Mon, 8 May 2017 22:58:02 +0000 (15:58 -0700)]
arm: use set_memory.h header

set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly

Link: http://lkml.kernel.org/r/1488920133-27229-3-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotreewide: move set_memory_* functions away from cacheflush.h
Laura Abbott [Mon, 8 May 2017 22:57:59 +0000 (15:57 -0700)]
treewide: move set_memory_* functions away from cacheflush.h

Patch series "set_memory_* functions header refactor", v3.

The set_memory_* APIs came out of a desire to have a better way to
change memory attributes.  Many of these attributes were linked to cache
functionality so the prototypes were put in cacheflush.h.  These days,
the APIs have grown and have a much wider use than just cache APIs.  To
support this growth, split off set_memory_* and friends into a separate
header file to avoid growing cacheflush.h for APIs that have nothing to
do with caches.

Link: http://lkml.kernel.org/r/1488920133-27229-2-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotreewide: spelling: correct diffrent[iate] and banlance typos
Joe Perches [Mon, 8 May 2017 22:57:56 +0000 (15:57 -0700)]
treewide: spelling: correct diffrent[iate] and banlance typos

Add these misspellings to scripts/spelling.txt too

Link: http://lkml.kernel.org/r/962aace119675e5fe87be2a88ddac1a5486f8e60.1490931810.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoscripts/spelling.txt: add "intialise(d)" pattern and fix typo instances
Masahiro Yamada [Mon, 8 May 2017 22:57:53 +0000 (15:57 -0700)]
scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances

Fix typos and add the following to the scripts/spelling.txt:

  intialisation||initialisation
  intialised||initialised
  intialise||initialise

This commit does not intend to change the British spelling itself.

Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoscripts/spelling.txt: add regsiter -> register spelling mistake
Stephen Boyd [Mon, 8 May 2017 22:57:50 +0000 (15:57 -0700)]
scripts/spelling.txt: add regsiter -> register spelling mistake

This typo is quite common.  Fix it and add it to the spelling file so
that checkpatch catches it earlier.

Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoscripts/spelling.txt: add "memory" pattern and fix typos
Stephen Boyd [Mon, 8 May 2017 22:57:47 +0000 (15:57 -0700)]
scripts/spelling.txt: add "memory" pattern and fix typos

Fix typos and add the following to the scripts/spelling.txt:

      momery||memory

Link: http://lkml.kernel.org/r/20170317011131.6881-1-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm, vmalloc: use __GFP_HIGHMEM implicitly
Michal Hocko [Mon, 8 May 2017 22:57:44 +0000 (15:57 -0700)]
mm, vmalloc: use __GFP_HIGHMEM implicitly

__vmalloc* allows users to provide gfp flags for the underlying
allocation.  This API is quite popular

  $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
  77

The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space.  About half of users don't
use this flag, though.  This signals that we make the API unnecessarily
too complex.

This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space.  Current users which add __GFP_HIGHMEM
are simplified and drop the flag.

Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm, swap: use kvzalloc to allocate some swap data structures
Huang Ying [Mon, 8 May 2017 22:57:40 +0000 (15:57 -0700)]
mm, swap: use kvzalloc to allocate some swap data structures

Now vzalloc() is used in swap code to allocate various data structures,
such as swap cache, swap slots cache, cluster info, etc.  Because the
size may be too large on some system, so that normal kzalloc() may fail.
But using kzalloc() has some advantages, for example, less memory
fragmentation, less TLB pressure, etc.  So change the data structure
allocation in swap code to use kvzalloc() which will try kzalloc()
firstly, and fallback to vzalloc() if kzalloc() failed.

In general, although kmalloc() will reduce the number of high-order
pages in short term, vmalloc() will cause more pain for memory
fragmentation in the long term.  And the swap data structure allocation
that is changed in this patch is expected to be long term allocation.

From Dave Hansen:
 "for example, we have a two-page data structure. vmalloc() takes two
  effectively random order-0 pages, probably from two different 2M pages
  and pins them. That "kills" two 2M pages. kmalloc(), allocating two
  *contiguous* pages, will not cross a 2M boundary. That means it will
  only "kill" the possibility of a single 2M page. More 2M pages == less
  fragmentation.

The allocation in this patch occurs during swap on time, which is
usually done during system boot, so usually we have high opportunity to
allocate the contiguous pages successfully.

The allocation for swap_map[] in struct swap_info_struct is not changed,
because that is usually quite large and vmalloc_to_page() is used for
it.  That makes it a little harder to change.

Link: http://lkml.kernel.org/r/20170407064911.25447-1-ying.huang@intel.com
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Tim Chen <tim.c.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/md/bcache/super.c: use kvmalloc
Michal Hocko [Mon, 8 May 2017 22:57:37 +0000 (15:57 -0700)]
drivers/md/bcache/super.c: use kvmalloc

bcache_device_init uses kmalloc for small requests and vmalloc for those
which are larger than 64 pages.  This alone is a strange criterion.
Moreover kmalloc can fallback to vmalloc on the failure.  Let's simply
use kvmalloc instead as it knows how to handle the fallback properly

Link: http://lkml.kernel.org/r/20170306103327.2766-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant
Michal Hocko [Mon, 8 May 2017 22:57:34 +0000 (15:57 -0700)]
drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant

copy_params uses kmalloc with vmalloc fallback.  We already have a
helper for that - kvmalloc.  This caller requires GFP_NOIO semantic so
it hasn't been converted with many others by previous patches.  All we
need to achieve this semantic is to use the scope
memalloc_noio_{save,restore} around kvmalloc.

Link: http://lkml.kernel.org/r/20170306103327.2766-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agonet: use kvmalloc with __GFP_REPEAT rather than open coded variant
Michal Hocko [Mon, 8 May 2017 22:57:31 +0000 (15:57 -0700)]
net: use kvmalloc with __GFP_REPEAT rather than open coded variant

fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc with
vmalloc fallback.  Use the kvmalloc variant instead.  Keep the
__GFP_REPEAT flag based on explanation from Eric:

 "At the time, tests on the hardware I had in my labs showed that
  vmalloc() could deliver pages spread all over the memory and that was
  a small penalty (once memory is fragmented enough, not at boot time)"

The way how the code is constructed means, however, that we prefer to go
and hit the OOM killer before we fall back to the vmalloc for requests
<=32kB (with 4kB pages) in the current code.  This is rather disruptive
for something that can be achived with the fallback.  On the other hand
__GFP_REPEAT doesn't have any useful semantic for these requests.  So
the effect of this patch is that requests which fit into 32kB will fall
back to vmalloc easier now.

Link: http://lkml.kernel.org/r/20170306103327.2766-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotreewide: use kv[mz]alloc* rather than opencoded variants
Michal Hocko [Mon, 8 May 2017 22:57:27 +0000 (15:57 -0700)]
treewide: use kv[mz]alloc* rather than opencoded variants

There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs/xattr.c: zero out memory copied to userspace in getxattr
Michal Hocko [Mon, 8 May 2017 22:57:24 +0000 (15:57 -0700)]
fs/xattr.c: zero out memory copied to userspace in getxattr

getxattr uses vmalloc to allocate memory if kzalloc fails.  This is
filled by vfs_getxattr and then copied to the userspace.  vmalloc,
however, doesn't zero out the memory so if the specific implementation
of the xattr handler is sloppy we can theoretically expose a kernel
memory.  There is no real sign this is really the case but let's make
sure this will not happen and use vzalloc instead.

Fixes: 779302e67835 ("fs/xattr.c:getxattr(): improve handling of allocation failures")
Link: http://lkml.kernel.org/r/20170306103327.2766-1-mhocko@kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agonet/ipv6/ila/ila_xlat.c: simplify a strange allocation pattern
Michal Hocko [Mon, 8 May 2017 22:57:21 +0000 (15:57 -0700)]
net/ipv6/ila/ila_xlat.c: simplify a strange allocation pattern

alloc_ila_locks seemed to c&p from alloc_bucket_locks allocation pattern
which is quite unusual.  The default allocation size is 320 *
sizeof(spinlock_t) which is sub page unless lockdep is enabled when the
performance benefit is really questionable and not worth the subtle code
IMHO.  Also note that the context when we call ila_init_net (modprobe or
a task creating a net namespace) has to be properly configured.

Let's just simplify the code and use kvmalloc helper which is a
transparent way to use kmalloc with vmalloc fallback.

Link: http://lkml.kernel.org/r/20170306103032.2540-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolib/rhashtable.c: simplify a strange allocation pattern
Michal Hocko [Mon, 8 May 2017 22:57:18 +0000 (15:57 -0700)]
lib/rhashtable.c: simplify a strange allocation pattern

alloc_bucket_locks allocation pattern is quite unusual.  We are
preferring vmalloc when CONFIG_NUMA is enabled.  The rationale is that
vmalloc will respect the memory policy of the current process and so the
backing memory will get distributed over multiple nodes if the requester
is configured properly.  At least that is the intention, in reality
rhastable is shrunk and expanded from a kernel worker so no mempolicy
can be assumed.

Let's just simplify the code and use kvmalloc helper, which is a
transparent way to use kmalloc with vmalloc fallback, if the caller is
allowed to block and use the flag otherwise.

Link: http://lkml.kernel.org/r/20170306103032.2540-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>