5 years agoMerge tag 'md/4.12-rc2' of git://
Linus Torvalds [Thu, 18 May 2017 19:04:41 +0000 (12:04 -0700)]
Merge tag 'md/4.12-rc2' of git://git./linux/kernel/git/shli/md

Pull MD fixes from Shaohua Li:

 - Several bug fixes for raid5-cache from Song Liu, mainly handle
   journal disk error

 - Fix bad block handling in choosing raid1 disk from Tomasz Majchrzak

 - Simplify external metadata array sysfs handling from Artur

 - Optimize raid0 discard handling from me, now raid0 will dispatch
   large discard IO directly to underlayer disks.

* tag 'md/4.12-rc2' of git://
  raid1: prefer disk without bad blocks
  md/r5cache: handle sync with data in write back cache
  md/r5cache: gracefully handle journal device errors for writeback mode
  md/raid1/10: avoid unnecessary locking
  md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first
  md/md0: optimize raid0 discard handling
  md: don't return -EAGAIN in md_allow_write for external metadata arrays
  md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock

5 years agoMerge git://
Linus Torvalds [Thu, 18 May 2017 18:40:21 +0000 (11:40 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Don't allow negative TCP reordering values, from Soheil Hassas

 2) Don't overflow while parsing ipv6 header options, from Craig Gallek.

 3) Handle more cleanly the case where an individual route entry during
    a dump will not fit into the allocated netlink SKB, from David

 4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann.

 5) Allow neighbour updates to converge more quickly via gratuitous
    ARPs, from Ihar Hrachyshka.

 6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet.

 7) Fix use after free in x25 protocol init, from Lin Zhang.

 8) Valid VLAN pvid ranges passed into br_validate(), from Tobias

 9) NULL out address lists in child sockets in SCTP, this is similar to
    the fix we made for inet connection sockets last week. From Eric

10) Fix NULL deref in mlxsw driver, from Ido Schimmel.

* git:// (27 commits)
  mlxsw: spectrum: Avoid possible NULL pointer dereference
  sh_eth: Do not print an error message for probe deferral
  sh_eth: Use platform device for printing before register_netdev()
  mlxsw: spectrum_router: Fix rif counter freeing routine
  mlxsw: spectrum_dpipe: Fix incorrect entry index
  cxgb4: update latest firmware version supported
  qmi_wwan: add another Lenovo EM74xx device ID
  sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
  udp: make *udp*_queue_rcv_skb() functions static
  bridge: netlink: check vlan_default_pvid range
  net: ethernet: faraday: To support device tree usage.
  net: x25: fix one potential use-after-free issue
  bpf: adjust verifier heuristics
  ipv6: Check ip6_find_1stfragopt() return value properly.
  selftests/bpf: fix broken build due to types.h
  bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.
  bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.
  net: fix compile error in skb_orphan_partial()
  ipv6: Prevent overrun when parsing v6 header options
  neighbour: update neigh timestamps iff update is effective

5 years agoMerge git://
Linus Torvalds [Thu, 18 May 2017 18:21:10 +0000 (11:21 -0700)]
Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
 "Three sparc bug fixes"

* git://
  sparc/ftrace: Fix ftrace graph time measurement
  sparc: Fix -Wstringop-overflow warning
  sparc64: Fix mapping of 64k pages with MAP_FIXED

5 years agoMerge tag 'kbuild-fixes-v4.12' of git://
Linus Torvalds [Thu, 18 May 2017 18:17:34 +0000 (11:17 -0700)]
Merge tag 'kbuild-fixes-v4.12' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fix from Masahiro Yamada:
 "Fix headers_install to not delete pre-existing headers in the install

* tag 'kbuild-fixes-v4.12' of git://
  kbuild: skip install/check of headers right under uapi directories

5 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 18 May 2017 17:04:42 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull pid namespace fixes from Eric Biederman:
 "These are two bugs that turn out to have simple fixes that were
  reported during the merge window. Both of these issues have existed
  for a while and it just happens that they both were reported at almost
  the same time"

* 'for-linus' of git://
  pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
  pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes

5 years agoMerge tag 'hwmon-for-linus-v4.12-rc2' of git://
Linus Torvalds [Thu, 18 May 2017 16:38:09 +0000 (09:38 -0700)]
Merge tag 'hwmon-for-linus-v4.12-rc2' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Fix problem with hotplug state machine in coretemp driver"

* tag 'hwmon-for-linus-v4.12-rc2' of git://
  hwmon: (coretemp) Handle frozen hotplug state correctly

5 years agomlxsw: spectrum: Avoid possible NULL pointer dereference
Ido Schimmel [Thu, 18 May 2017 11:03:52 +0000 (13:03 +0200)]
mlxsw: spectrum: Avoid possible NULL pointer dereference

In case we got an FDB notification for a port that doesn't exist we
execute an FDB entry delete to prevent it from re-appearing the next
time we poll for notifications.

If the operation failed we would trigger a NULL pointer dereference as
'mlxsw_sp_port' is NULL.

Fix it by reporting the error using the underlying bus device instead.

Fixes: 12f1501e7511 ("mlxsw: spectrum: remove FDB entry in case we get unknown object notification")
Signed-off-by: Ido Schimmel <>
Signed-off-by: Jiri Pirko <>
Signed-off-by: David S. Miller <>
5 years agosh_eth: Do not print an error message for probe deferral
Geert Uytterhoeven [Thu, 18 May 2017 13:01:35 +0000 (15:01 +0200)]
sh_eth: Do not print an error message for probe deferral

EPROBE_DEFER is not an error, hence printing an error message like

    sh-eth ee700000.ethernet: failed to initialise MDIO

may confuse the user.

To fix this, suppress the error message in case of probe deferral.
While at it, shorten the message, and add the actual error code.

Signed-off-by: Geert Uytterhoeven <>
Reviewed-by: Laurent Pinchart <>
Signed-off-by: David S. Miller <>
5 years agosh_eth: Use platform device for printing before register_netdev()
Geert Uytterhoeven [Thu, 18 May 2017 13:01:34 +0000 (15:01 +0200)]
sh_eth: Use platform device for printing before register_netdev()

The MDIO initialization failure message is printed using the network
device, before it has been registered, leading to:

     (null): failed to initialise MDIO

Use the platform device instead to fix this:

    sh-eth ee700000.ethernet: failed to initialise MDIO

Fixes: daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the network device")
Signed-off-by: Geert Uytterhoeven <>
Reviewed-by: Laurent Pinchart <>
Signed-off-by: David S. Miller <>
5 years agoMerge branch 'mlxsw-fixes'
David S. Miller [Thu, 18 May 2017 15:04:00 +0000 (11:04 -0400)]
Merge branch 'mlxsw-fixes'

Jiri Pirko says:

mlxsw: couple of fixes

Couple of fixes from Arkadi

Signed-off-by: David S. Miller <>
5 years agomlxsw: spectrum_router: Fix rif counter freeing routine
Arkadi Sharshevsky [Thu, 18 May 2017 07:18:53 +0000 (09:18 +0200)]
mlxsw: spectrum_router: Fix rif counter freeing routine

During rif counter freeing the counter index can be invalid. Add check
of validity before freeing the counter.

Fixes: e0c0afd8aa4e ("mlxsw: spectrum: Support for counters on router interfaces")
Signed-off-by: Arkadi Sharshevsky <>
Signed-off-by: Jiri Pirko <>
Signed-off-by: David S. Miller <>
5 years agomlxsw: spectrum_dpipe: Fix incorrect entry index
Arkadi Sharshevsky [Thu, 18 May 2017 07:18:52 +0000 (09:18 +0200)]
mlxsw: spectrum_dpipe: Fix incorrect entry index

In case of disabled counters the entry index will be incorrect. Fix this
by moving the entry index set before the counter status check.

Fixes: 2ba5999f009d ("mlxsw: spectrum: Add Support for erif table entries access")
Signed-off-by: Arkadi Sharshevsky <>
Reviewed-by: Ido Schimmel <>
Signed-off-by: Jiri Pirko <>
Signed-off-by: David S. Miller <>
5 years agocxgb4: update latest firmware version supported
Ganesh Goudar [Wed, 17 May 2017 18:38:16 +0000 (00:08 +0530)]
cxgb4: update latest firmware version supported

Change t4fw_version.h to update latest firmware version
number to

Signed-off-by: Ganesh Goudar <>
Signed-off-by: David S. Miller <>
5 years agoqmi_wwan: add another Lenovo EM74xx device ID
Bjørn Mork [Wed, 17 May 2017 14:31:41 +0000 (16:31 +0200)]
qmi_wwan: add another Lenovo EM74xx device ID

In their infinite wisdom, and never ending quest for end user frustration,
Lenovo has decided to use a new USB device ID for the wwan modules in
their 2017 laptops.  The actual hardware is still the Sierra Wireless
EM7455 or EM7430, depending on region.

Signed-off-by: Bjørn Mork <>
Signed-off-by: David S. Miller <>
5 years agosctp: do not inherit ipv6_{mc|ac|fl}_list from parent
Eric Dumazet [Wed, 17 May 2017 14:16:40 +0000 (07:16 -0700)]
sctp: do not inherit ipv6_{mc|ac|fl}_list from parent

SCTP needs fixes similar to 83eaddab4378 ("ipv6/dccp: do not inherit
ipv6_mc_list from parent"), otherwise bad things can happen.

Signed-off-by: Eric Dumazet <>
Reported-by: Andrey Konovalov <>
Tested-by: Andrey Konovalov <>
Signed-off-by: David S. Miller <>
5 years agoudp: make *udp*_queue_rcv_skb() functions static
Paolo Abeni [Wed, 17 May 2017 12:52:16 +0000 (14:52 +0200)]
udp: make *udp*_queue_rcv_skb() functions static

Since the udp memory accounting refactor, we don't need any more
to export the *udp*_queue_rcv_skb(). Make them static and fix
a couple of sparse warnings:

net/ipv4/udp.c:1615:5: warning: symbol 'udp_queue_rcv_skb' was not
declared. Should it be static?
net/ipv6/udp.c:572:5: warning: symbol 'udpv6_queue_rcv_skb' was not
declared. Should it be static?

Fixes: 850cbaddb52d ("udp: use it's own memory accounting schema")
Fixes: c915fe13cbaa ("udplite: fix NULL pointer dereference")
Signed-off-by: Paolo Abeni <>
Signed-off-by: David S. Miller <>
5 years agobridge: netlink: check vlan_default_pvid range
Tobias Jungel [Wed, 17 May 2017 07:29:12 +0000 (09:29 +0200)]
bridge: netlink: check vlan_default_pvid range

Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
returns -EINVAL in case the pvid is out of bounds.

Reproduce by calling:

[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port vlan ids
bridge0  9999 PVID Egress Untagged

dummy0  9999 PVID Egress Untagged

Fixes: 0f963b7592ef ("bridge: netlink: add support for default_pvid")
Acked-by: Nikolay Aleksandrov <>
Signed-off-by: Tobias Jungel <>
Acked-by: Sabrina Dubroca <>
Signed-off-by: David S. Miller <>
5 years agonet: ethernet: faraday: To support device tree usage.
Greentime Hu [Wed, 17 May 2017 07:28:19 +0000 (15:28 +0800)]
net: ethernet: faraday: To support device tree usage.

To support device tree usage for ftmac100.

Signed-off-by: Greentime Hu <>
Signed-off-by: David S. Miller <>
5 years agonet: x25: fix one potential use-after-free issue
linzhang [Wed, 17 May 2017 04:05:07 +0000 (12:05 +0800)]
net: x25: fix one potential use-after-free issue

The function x25_init is not properly unregister related resources
on error handler.It is will result in kernel oops if x25_init init
failed, so add properly unregister call on error handler.

Also, i adjust the coding style and make x25_register_sysctl properly
return failure.

Signed-off-by: linzhang <>
Signed-off-by: David S. Miller <>
5 years agobpf: adjust verifier heuristics
Daniel Borkmann [Thu, 18 May 2017 01:00:06 +0000 (03:00 +0200)]
bpf: adjust verifier heuristics

Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.

This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.

This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.

Signed-off-by: Daniel Borkmann <>
Acked-by: Alexei Starovoitov <>
Signed-off-by: David S. Miller <>
5 years agoipv6: Check ip6_find_1stfragopt() return value properly.
David S. Miller [Thu, 18 May 2017 02:54:11 +0000 (22:54 -0400)]
ipv6: Check ip6_find_1stfragopt() return value properly.

Do not use unsigned variables to see if it returns a negative
error or not.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <>
Signed-off-by: David S. Miller <>
5 years agoselftests/bpf: fix broken build due to types.h
Yonghong Song [Wed, 17 May 2017 22:18:05 +0000 (15:18 -0700)]
selftests/bpf: fix broken build due to types.h

Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
because of some missing types:
    $ make -C tools/testing/selftests/bpf/
    In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
    ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
                    __aligned_u64   key;
    /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
    static __always_inline __u16 __swab16p(const __u16 *p)
The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.

The fix is to copy missing type definition into
Adding additional include "string.h" resolves __always_inline issue.

Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf selftests.")
Signed-off-by: Yonghong Song <>
Signed-off-by: David S. Miller <>
5 years agoMerge tag 'for-4.12/dm-fixes-2' of git://
Linus Torvalds [Wed, 17 May 2017 21:21:15 +0000 (14:21 -0700)]
Merge tag 'for-4.12/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - a couple DM thin provisioning fixes

 - a few request-based DM and DM multipath fixes for issues that were
   made when merging Christoph's changes with Bart's changes for 4.12

 - a DM bufio unsigned overflow fix

 - a couple pure fixes for the DM cache target.

 - various very small tweaks to the DM cache target that enable
   considerable speed improvements in the face of continuous IO. Given
   that the cache target was significantly reworked for 4.12 I see no
   reason to sit on these advances until 4.13 considering the favorable
   results associated with such minimalist tweaks.

* tag 'for-4.12/dm-fixes-2' of git://
  dm cache: handle kmalloc failure allocating background_tracker struct
  dm bufio: make the parameter "retain_bytes" unsigned long
  dm mpath: multipath_clone_and_map must not return -EIO
  dm mpath: don't return -EIO from dm_report_EIO
  dm rq: add a missing break to map_request
  dm space map disk: fix some book keeping in the disk space map
  dm thin metadata: call precommit before saving the roots
  dm cache policy smq: don't do any writebacks unless IDLE
  dm cache: simplify the IDLE vs BUSY state calculation
  dm cache: track all IO to the cache rather than just the origin device's IO
  dm cache policy smq: stop preemptively demoting blocks
  dm cache policy smq: put newly promoted entries at the top of the multiqueue
  dm cache policy smq: be more aggressive about triggering a writeback
  dm cache policy smq: only demote entries in bottom half of the clean multiqueue
  dm cache: fix incorrect 'idle_time' reset in IO tracker

5 years agoMerge branch 'i2c/for-current' of git://
Linus Torvalds [Wed, 17 May 2017 21:13:44 +0000 (14:13 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Here are some bugfixes from I2C, especially removing a wrongly
  displayed error message for all i2c muxes"

* 'i2c/for-current' of git://
  i2c: xgene: Set ACPI_COMPANION_I2C
  i2c: mv64xxx: don't override deferred probing when getting irq
  i2c: mux: only print failure message on error
  i2c: mux: reg: rename label to indicate what it does
  i2c: mux: reg: put away the parent i2c adapter on probe failure

5 years agoMerge branch 'bnxt_en-DCBX-fixes'
David S. Miller [Wed, 17 May 2017 19:12:50 +0000 (15:12 -0400)]
Merge branch 'bnxt_en-DCBX-fixes'

Michael Chan says:

bnxt_en: DCBX fixes.

2 bug fixes for the case where the NIC's firmware DCBX agent is enabled.
With these fixes, we will return the proper information to lldpad.

Signed-off-by: David S. Miller <>
5 years agobnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.
Michael Chan [Tue, 16 May 2017 20:39:44 +0000 (16:39 -0400)]
bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST.

Otherwise, all the host based DCBX settings from lldpad will fail if the
firmware DCBX agent is running.

Signed-off-by: Michael Chan <>
Signed-off-by: David S. Miller <>
5 years agobnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.
Michael Chan [Tue, 16 May 2017 20:39:43 +0000 (16:39 -0400)]
bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.

In the current code, bnxt_dcb_init() is called too early before we
determine if the firmware DCBX agent is running or not.  As a result,
we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED
flags properly to report to DCBNL.

Signed-off-by: Michael Chan <>
Signed-off-by: David S. Miller <>
5 years agonet: fix compile error in skb_orphan_partial()
Eric Dumazet [Tue, 16 May 2017 20:27:53 +0000 (13:27 -0700)]
net: fix compile error in skb_orphan_partial()

If CONFIG_INET is not set, net/core/sock.c can not compile :

net/core/sock.c: In function ‘skb_orphan_partial’:
net/core/sock.c:1810:2: error: implicit declaration of function
‘skb_is_tcp_pure_ack’ [-Werror=implicit-function-declaration]
  if (skb_is_tcp_pure_ack(skb))

Fix this by always including <net/tcp.h>

Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
Signed-off-by: Eric Dumazet <>
Reported-by: Paul Gortmaker <>
Reported-by: Randy Dunlap <>
Reported-by: Stephen Rothwell <>
Signed-off-by: David S. Miller <>
5 years agosparc/ftrace: Fix ftrace graph time measurement
Liam R. Howlett [Wed, 17 May 2017 15:47:00 +0000 (11:47 -0400)]
sparc/ftrace: Fix ftrace graph time measurement

The ftrace function_graph time measurements of a given function is not
accurate according to those recorded by ftrace using the function
filters.  This change pulls the x86_64 fix from 'commit 722b3c746953
("ftrace/graph: Trace function entry before updating index")' into the
sparc specific prepare_ftrace_return which stops ftrace from
counting interrupted tasks in the time measurement.

Example measurements for select_task_rq_fair running "hackbench 100
process 1000":

              |  tracing/trace_stat/function0  |  function_graph
 Before patch |  2.802 us                      |  4.255 us
 After patch  |  2.749 us                      |  3.094 us

Signed-off-by: Liam R. Howlett <>
Signed-off-by: David S. Miller <>
5 years agosparc: Fix -Wstringop-overflow warning
Orlando Arias [Tue, 16 May 2017 19:34:00 +0000 (15:34 -0400)]
sparc: Fix -Wstringop-overflow warning


GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
in calls to string handling functions [1][2]. Due to the way
``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
causes a warning to trigger at compile time in the function mem_init(),
which is subsequently converted to an error. The ensuing patch fixes
this issue and aligns the declaration of empty_zero_page to that of
other architectures. Thank you.



Signed-off-by: Orlando Arias <>
Signed-off-by: David S. Miller <>
5 years agosparc64: Fix mapping of 64k pages with MAP_FIXED
Nitin Gupta [Mon, 15 May 2017 23:28:17 +0000 (16:28 -0700)]
sparc64: Fix mapping of 64k pages with MAP_FIXED

An incorrect huge page alignment check caused
mmap failure for 64K pages when MAP_FIXED is used
with address not aligned to HPAGE_SIZE.

Orabug: 25885991

Fixes: dcd1912d21a0 ("sparc64: Add 64K page size support")
Signed-off-by: Nitin Gupta <>
Signed-off-by: David S. Miller <>
5 years agoipv6: Prevent overrun when parsing v6 header options
Craig Gallek [Tue, 16 May 2017 18:36:23 +0000 (14:36 -0400)]
ipv6: Prevent overrun when parsing v6 header options

The KASAN warning repoted below was discovered with a syzkaller
program.  The reproducer is basically:
  int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
  send(s, &one_byte_of_data, 1, MSG_MORE);
  send(s, &more_than_mtu_bytes_data, 2000, 0);

The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.

The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option.  Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.

This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.

[   42.361487] ==================================================================
[   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
[   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
[   42.366469]
[   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
[   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[   42.368824] Call Trace:
[   42.369183]  dump_stack+0xb3/0x10b
[   42.369664]  print_address_description+0x73/0x290
[   42.370325]  kasan_report+0x252/0x370
[   42.370839]  ? ip6_fragment+0x11c8/0x3730
[   42.371396]  check_memory_region+0x13c/0x1a0
[   42.371978]  memcpy+0x23/0x50
[   42.372395]  ip6_fragment+0x11c8/0x3730
[   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
[   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
[   42.374263]  ? ip6_forward+0x2e30/0x2e30
[   42.374803]  ip6_finish_output+0x584/0x990
[   42.375350]  ip6_output+0x1b7/0x690
[   42.375836]  ? ip6_finish_output+0x990/0x990
[   42.376411]  ? ip6_fragment+0x3730/0x3730
[   42.376968]  ip6_local_out+0x95/0x160
[   42.377471]  ip6_send_skb+0xa1/0x330
[   42.377969]  ip6_push_pending_frames+0xb3/0xe0
[   42.378589]  rawv6_sendmsg+0x2051/0x2db0
[   42.379129]  ? rawv6_bind+0x8b0/0x8b0
[   42.379633]  ? _copy_from_user+0x84/0xe0
[   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
[   42.380878]  ? ___sys_sendmsg+0x162/0x930
[   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
[   42.382074]  ? sock_has_perm+0x1f6/0x290
[   42.382614]  ? ___sys_sendmsg+0x167/0x930
[   42.383173]  ? lock_downgrade+0x660/0x660
[   42.383727]  inet_sendmsg+0x123/0x500
[   42.384226]  ? inet_sendmsg+0x123/0x500
[   42.384748]  ? inet_recvmsg+0x540/0x540
[   42.385263]  sock_sendmsg+0xca/0x110
[   42.385758]  SYSC_sendto+0x217/0x380
[   42.386249]  ? SYSC_connect+0x310/0x310
[   42.386783]  ? __might_fault+0x110/0x1d0
[   42.387324]  ? lock_downgrade+0x660/0x660
[   42.387880]  ? __fget_light+0xa1/0x1f0
[   42.388403]  ? __fdget+0x18/0x20
[   42.388851]  ? sock_common_setsockopt+0x95/0xd0
[   42.389472]  ? SyS_setsockopt+0x17f/0x260
[   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
[   42.390650]  SyS_sendto+0x40/0x50
[   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.391731] RIP: 0033:0x7fbbb711e383
[   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
[   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
[   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
[   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
[   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
[   42.397257]
[   42.397411] Allocated by task 3789:
[   42.397702]  save_stack_trace+0x16/0x20
[   42.398005]  save_stack+0x46/0xd0
[   42.398267]  kasan_kmalloc+0xad/0xe0
[   42.398548]  kasan_slab_alloc+0x12/0x20
[   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
[   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
[   42.399654]  __alloc_skb+0xf8/0x580
[   42.400003]  sock_wmalloc+0xab/0xf0
[   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
[   42.400813]  ip6_append_data+0x1a8/0x2f0
[   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
[   42.401505]  inet_sendmsg+0x123/0x500
[   42.401860]  sock_sendmsg+0xca/0x110
[   42.402209]  ___sys_sendmsg+0x7cb/0x930
[   42.402582]  __sys_sendmsg+0xd9/0x190
[   42.402941]  SyS_sendmsg+0x2d/0x50
[   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.403718]
[   42.403871] Freed by task 1794:
[   42.404146]  save_stack_trace+0x16/0x20
[   42.404515]  save_stack+0x46/0xd0
[   42.404827]  kasan_slab_free+0x72/0xc0
[   42.405167]  kfree+0xe8/0x2b0
[   42.405462]  skb_free_head+0x74/0xb0
[   42.405806]  skb_release_data+0x30e/0x3a0
[   42.406198]  skb_release_all+0x4a/0x60
[   42.406563]  consume_skb+0x113/0x2e0
[   42.406910]  skb_free_datagram+0x1a/0xe0
[   42.407288]  netlink_recvmsg+0x60d/0xe40
[   42.407667]  sock_recvmsg+0xd7/0x110
[   42.408022]  ___sys_recvmsg+0x25c/0x580
[   42.408395]  __sys_recvmsg+0xd6/0x190
[   42.408753]  SyS_recvmsg+0x2d/0x50
[   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.409513]
[   42.409665] The buggy address belongs to the object at ffff88000969e780
[   42.409665]  which belongs to the cache kmalloc-512 of size 512
[   42.410846] The buggy address is located 24 bytes inside of
[   42.410846]  512-byte region [ffff88000969e780ffff88000969e980)
[   42.411941] The buggy address belongs to the page:
[   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   42.413298] flags: 0x100000000008100(slab|head)
[   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
[   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
[   42.415074] page dumped because: kasan: bad access detected
[   42.415604]
[   42.415757] Memory state around the buggy address:
[   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   42.418273]                    ^
[   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419882] ==================================================================

Reported-by: Andrey Konovalov <>
Signed-off-by: Craig Gallek <>
Signed-off-by: David S. Miller <>
5 years agokbuild: skip install/check of headers right under uapi directories
Masahiro Yamada [Tue, 16 May 2017 05:15:03 +0000 (14:15 +0900)]
kbuild: skip install/check of headers right under uapi directories

Since commit 61562f981e92 ("uapi: export all arch specifics
directories"), "make INSTALL_HDR_PATH=$root/usr headers_install"
deletes standard glibc headers and others in $(root)/usr/include.

The cause of the issue is that headers_install now starts descending
from arch/$(hdr-arch)/include/uapi with $(root)/usr/include for its
destination when installing asm headers.  So, headers already there
are assumed to be unwanted.

When headers_install starts descending from include/uapi with
$(root)/usr/include for its destination, it works around the problem
by creating an dummy destination $(root)/usr/include/uapi, but this
is tricky.

To fix the problem in a clean way is to skip headers install/check
in include/uapi and arch/$(hdr-arch)/include/uapi because we know
there are only sub-directories in uapi directories.  A good side
effect is the empty destination $(root)/usr/include/uapi will go

I am also removing the trailing slash in the headers_check target to
skip checking in arch/$(hdr-arch)/include/uapi.

Fixes: 61562f981e92 ("uapi: export all arch specifics directories")
Reported-by: Dan Williams <>
Signed-off-by: Masahiro Yamada <>
Tested-by: Dan Williams <>
Acked-by: Nicolas Dichtel <>
5 years agoneighbour: update neigh timestamps iff update is effective
Ihar Hrachyshka [Tue, 16 May 2017 15:44:24 +0000 (08:44 -0700)]
neighbour: update neigh timestamps iff update is effective

It's a common practice to send gratuitous ARPs after moving an
IP address to another device to speed up healing of a service. To
fulfill service availability constraints, the timing of network peers
updating their caches to point to a new location of an IP address can be
particularly important.

Sometimes neigh_update calls won't touch neither lladdr nor state, for
example if an update arrives in locktime interval. The neigh->updated
value is tested by the protocol specific neigh code, which in turn
will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
call to neigh_update() or not. As a result, we may effectively ignore
the update request, bailing out of touching the neigh entry, except that
we still bump its timestamps inside neigh_update.

This may be a problem for updates arriving in quick succession. For
example, consider the following scenario:

A service is moved to another device with its IP address. The new device
sends three gratuitous ARP requests into the network with ~1 seconds
interval between them. Just before the first request arrives to one of
network peer nodes, its neigh entry for the IP address transitions from
STALE to DELAY.  This transition, among other things, updates
neigh->updated. Once the kernel receives the first gratuitous ARP, it
ignores it because its arrival time is inside the locktime interval. The
kernel still bumps neigh->updated. Then the second gratuitous ARP
request arrives, and it's also ignored because it's still in the (new)
locktime interval. Same happens for the third request. The node
eventually heals itself (after delay_first_probe_time seconds since the
initial transition to DELAY state), but it just wasted some time and
require a new ARP request/reply round trip. This unfortunate behaviour
both puts more load on the network, as well as reduces service

This patch changes neigh_update so that it bumps neigh->updated (as well
as neigh->confirmed) only once we are sure that either lladdr or entry
state will change). In the scenario described above, it means that the
second gratuitous ARP request will actually update the entry lladdr.

Ideally, we would update the neigh entry on the very first gratuitous
ARP request. The locktime mechanism is designed to ignore ARP updates in
a short timeframe after a previous ARP update was honoured by the kernel
layer. This would require tracking timestamps for state transitions
separately from timestamps when actual updates are received. This would
probably involve changes in neighbour struct. Therefore, the patch
doesn't tackle the issue of the first gratuitous APR ignored, leaving
it for a follow-up.

Signed-off-by: Ihar Hrachyshka <>
Signed-off-by: David S. Miller <>
5 years agoarp: honour gratuitous ARP _replies_
Ihar Hrachyshka [Tue, 16 May 2017 14:53:43 +0000 (07:53 -0700)]
arp: honour gratuitous ARP _replies_

When arp_accept is 1, gratuitous ARPs are supposed to override matching
entries irrespective of whether they arrive during locktime. This was
implemented in commit 56022a8fdd87 ("ipv4: arp: update neighbour address
when a gratuitous arp is received and arp_accept is set")

There is a glitch in the patch though. RFC 2002, section 4.6, "ARP,
Proxy ARP, and Gratuitous ARP", defines gratuitous ARPs so that they can
be either of Request or Reply type. Those Reply gratuitous ARPs can be
triggered with standard tooling, for example, arping -A option does just

This patch fixes the glitch, making both Request and Reply flavours of
gratuitous ARPs to behave identically.

As per RFC, if gratuitous ARPs are of Reply type, their Target Hardware
Address field should also be set to the link-layer address to which this
cache entry should be updated. The field is present in ARP over Ethernet
but not in IEEE 1394. In this patch, I don't consider any broadcasted
ARP replies as gratuitous if the field is not present, to conform the
standard. It's not clear whether there is such a thing for IEEE 1394 as
a gratuitous ARP reply; until it's cleared up, we will ignore such
broadcasts. Note that they will still update existing ARP cache entries,
assuming they arrive out of locktime time interval.

Signed-off-by: Ihar Hrachyshka <>
Signed-off-by: David S. Miller <>
5 years agodm cache: handle kmalloc failure allocating background_tracker struct
Colin Ian King [Sat, 11 Mar 2017 19:09:45 +0000 (19:09 +0000)]
dm cache: handle kmalloc failure allocating background_tracker struct

Currently there is no kmalloc failure check on the allocation of
the background_tracker struct in btracker_create(), and so a NULL return
will lead to a NULL pointer dereference.  Add a NULL check.

Detected by CoverityScan, CID#1416587 ("Dereference null return value")

Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2")
Signed-off-by: Colin Ian King <>
Signed-off-by: Mike Snitzer <>
5 years agoi2c: xgene: Set ACPI_COMPANION_I2C
Tin Huynh [Wed, 17 May 2017 04:25:34 +0000 (11:25 +0700)]
i2c: xgene: Set ACPI_COMPANION_I2C

With ACPI, i2c-core requires ACPI companion to be set in order for it
to create slave device.
This patch sets the ACPI companion accordingly.

Signed-off-by: Tin Huynh <>
Signed-off-by: Wolfram Sang <>
5 years agoi2c: mv64xxx: don't override deferred probing when getting irq
Thomas Petazzoni [Tue, 16 May 2017 12:07:24 +0000 (14:07 +0200)]
i2c: mv64xxx: don't override deferred probing when getting irq

There is no reason to use platform_get_irq() for non-DT probing and
irq_of_parse_and_map() for DT probing. Indeed, platform_get_irq()
works fine for both.

In addition, using platform_get_irq() properly returns -EPROBE_DEFER
when the interrupt controller is not yet available, so instead of
inventing our own error code (-ENXIO), return the one provided by

Signed-off-by: Thomas Petazzoni <>
Signed-off-by: Wolfram Sang <>
5 years agoMerge tag 'pstore-v4.12-rc2' of git://
Linus Torvalds [Tue, 16 May 2017 20:29:07 +0000 (13:29 -0700)]
Merge tag 'pstore-v4.12-rc2' of git://git./linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
 "Fix bad EFI vars iterator usage"

* tag 'pstore-v4.12-rc2' of git://
  efi-pstore: Fix read iter after pstore API refactor

5 years agomlx5e: add CONFIG_INET dependency
Arnd Bergmann [Tue, 16 May 2017 11:27:49 +0000 (13:27 +0200)]
mlx5e: add CONFIG_INET dependency

We now reference the arp_tbl, which requires IPv4 support to be
enabled in the kernel, otherwise we get a link error:

drivers/net/built-in.o: In function `mlx5e_tc_update_neigh_used_value':
(.text+0x16afec): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_neigh_init':
en_rep.c:(.text+0x16c16d): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_netevent_event':
en_rep.c:(.text+0x16cbb5): undefined reference to `arp_tbl'

This adds a Kconfig dependency for it.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Arnd Bergmann <>
Signed-off-by: David S. Miller <>
5 years agodm bufio: make the parameter "retain_bytes" unsigned long
Mikulas Patocka [Sun, 30 Apr 2017 21:32:28 +0000 (17:32 -0400)]
dm bufio: make the parameter "retain_bytes" unsigned long

Change the type of the parameter "retain_bytes" from unsigned to
unsigned long, so that on 64-bit machines the user can set more than
4GiB of data to be retained.

Also, change the type of the variable "count" in the function
"__evict_old_buffers" to unsigned long.  The assignment
"count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
could result in unsigned long to unsigned overflow and that could result
in buffers not being freed when they should.

While at it, avoid division in get_retain_buffers().  Division is slow,
we can change it to shift because we have precalculated the log2 of
block size.

Signed-off-by: Mikulas Patocka <>
Signed-off-by: Mike Snitzer <>
5 years agonet: Improve handling of failures on link and route dumps
David Ahern [Tue, 16 May 2017 06:19:17 +0000 (23:19 -0700)]
net: Improve handling of failures on link and route dumps

In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.

netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is <= 0 so userspace is notified of the failure. The missing
piece is the rtnetlink dump functions returning the error.

Fix route and link dump functions to return the errors if no object is
added to an skb (detected by skb->len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.

Reported-by: Jan Moskyto Matejka <>
Signed-off-by: David Ahern <>
Signed-off-by: David S. Miller <>
5 years agonet/smc: Add warning about remote memory exposure
Christoph Hellwig [Tue, 16 May 2017 06:51:38 +0000 (09:51 +0300)]
net/smc: Add warning about remote memory exposure

The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.

Signed-off-by: Christoph Hellwig <>
Signed-off-by: Leon Romanovsky <>
Acked-by: Ursula Braun <>
Signed-off-by: David S. Miller <>
5 years agosmc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY
Ursula Braun [Mon, 15 May 2017 15:33:37 +0000 (17:33 +0200)]
smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY

Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.

Signed-off-by: Ursula Braun <>
Signed-off-by: David S. Miller <>
5 years agoefi-pstore: Fix read iter after pstore API refactor
Kees Cook [Fri, 12 May 2017 21:52:34 +0000 (14:52 -0700)]
efi-pstore: Fix read iter after pstore API refactor

During the internal pstore API refactoring, the EFI vars read entry was
accidentally made to update a stack variable instead of the pstore
private data pointer. This corrects the problem (and removes the now
needless argument).

Fixes: 125cc42baf8a ("pstore: Replace arguments for read() API")
Signed-off-by: Kees Cook <>
5 years agoMerge branch 'i2c-mux/for-current' of into i2c...
Wolfram Sang [Tue, 16 May 2017 16:57:39 +0000 (18:57 +0200)]
Merge branch 'i2c-mux/for-current' of into i2c/for-current

Pull bugfixes from the i2c mux subsubsystem:

This fixes an old bug in resource cleanup on failure in i2c-mux-reg and
a new log spamming bug from this merge window in the i2c-mux core.

5 years agoipmr: vrf: Find VIFs using the actual device
Thomas Winter [Mon, 15 May 2017 22:14:44 +0000 (10:14 +1200)]
ipmr: vrf: Find VIFs using the actual device

The skb->dev that is passed into ip_mr_input is
the loX device for VRFs. When we lookup a vif
for this dev, none is found as we do not create
vifs for loopbacks. Instead lookup a vif for the
actual device that the packet was received on,
eg the vlan.

Signed-off-by: Thomas Winter <>
cc: David Ahern <>
cc: Nikolay Aleksandrov <>
cc: roopa <>
Acked-by: David Ahern <>
Signed-off-by: David S. Miller <>
5 years agotcp: eliminate negative reordering in tcp_clean_rtx_queue
Soheil Hassas Yeganeh [Mon, 15 May 2017 21:05:47 +0000 (17:05 -0400)]
tcp: eliminate negative reordering in tcp_clean_rtx_queue

tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than

Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.

Fixes: c7caf8d3ed7a ("[TCP]: Fix reord detection due to snd_una covered holes")
Reported-by: Rebecca Isaacs <>
Signed-off-by: Soheil Hassas Yeganeh <>
Signed-off-by: Neal Cardwell <>
Signed-off-by: Yuchung Cheng <>
Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
5 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Tue, 16 May 2017 16:24:44 +0000 (09:24 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

 - convert the debug feature to refcount_t

 - reduce the copy size for strncpy_from_user

 - 8 bug fixes

* 'for-linus' of git://
  s390/virtio: change virtio_feature_desc:features type to __le32
  s390: convert debug_info.ref_count from atomic_t to refcount_t
  s390: move _text symbol to address higher than zero
  s390/qdio: increase string buffer size
  s390/ccwgroup: increase string buffer size
  s390/topology: let topology_mnest_limit() return unsigned char
  s390/uaccess: use sane length for __strncpy_from_user()
  s390/uprobes: fix compile for !KPROBES
  s390/ftrace: fix compile for !MODULES
  s390/cputime: fix incorrect system time

5 years agoMerge tag 'edac_fix_for_4.12' of git://
Linus Torvalds [Tue, 16 May 2017 16:18:18 +0000 (09:18 -0700)]
Merge tag 'edac_fix_for_4.12' of git://git./linux/kernel/git/bp/bp

Pull EDAC fix from Borislav Petkov:
 "A single amd64_edac fix correcting chip select sizes reporting on

* tag 'edac_fix_for_4.12' of git://
  EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h

5 years agoMerge git://
Linus Torvalds [Mon, 15 May 2017 22:50:49 +0000 (15:50 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Track alignment in BPF verifier so that legitimate programs won't be
    rejected on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.

 2) Make tail calls work properly in arm64 BPF JIT, from Deniel

 3) Make the configuration and semantics Generic XDP make more sense and
    don't allow both generic XDP and a driver specific instance to be
    active at the same time. Also from Daniel.

 4) Don't crash on resume in xen-netfront, from Vitaly Kuznetsov.

 5) Fix use-after-free in VRF driver, from Gao Feng.

 6) Use netdev_alloc_skb_ip_align() to avoid unaligned IP headers in
    qca_spi driver, from Stefan Wahren.

 7) Always run cleanup routines in BPF samples when we get SIGTERM, from
    Andy Gospodarek.

 8) The mdio phy code should bring PHYs out of reset using the shared
    GPIO lines before invoking bus->reset(). From Florian Fainelli.

 9) Some USB descriptor access endian fixes in various drivers from
    Johan Hovold.

10) Handle PAUSE advertisements properly in mlx5 driver, from Gal

11) Fix reversed test in mlx5e_setup_tc(), from Saeed Mahameed.

12) Cure netdev leak in AF_PACKET when using timestamping via control
    messages. From Douglas Caetano dos Santos.

13) netcp doesn't support HWTSTAMP_FILTER_ALl, reject it. From Miroslav

* git:// (52 commits)
  ldmvsw: stop the clean timer at beginning of remove
  ldmvsw: unregistering netdev before disable hardware
  net: netcp: fix check of requested timestamping filter
  ipv6: avoid dad-failures for addresses with NODAD
  qed: Fix uninitialized data in aRFS infrastructure
  mdio: mux: fix device_node_continue.cocci warnings
  net/packet: fix missing net_device reference release
  net/mlx4_core: Use min3 to select number of MSI-X vectors
  macvlan: Fix performance issues with vlan tagged packets
  net: stmmac: use correct pointer when printing normal descriptor ring
  net/mlx5: Use underlay QPN from the root name space
  net/mlx5e: IPoIB, Only support regular RQ for now
  net/mlx5e: Fix setup TC ndo
  net/mlx5e: Fix ethtool pause support and advertise reporting
  net/mlx5e: Use the correct pause values for ethtool advertising
  vmxnet3: ensure that adapter is in proper state during force_close
  sfc: revert changes to NIC revision numbers
  net: ch9200: add missing USB-descriptor endianness conversions
  net: irda: irda-usb: fix firmware name on big-endian hosts
  net: dsa: mv88e6xxx: add default case to switch

5 years agoMerge branch 'for-next' of git://
Linus Torvalds [Mon, 15 May 2017 22:27:02 +0000 (15:27 -0700)]
Merge branch 'for-next' of git://

Pull cifs fixes from Steve French:
 "A set of minor cifs fixes"

* 'for-next' of git://
  [CIFS] Minor cleanup of xattr query function
  fs: cifs: transport: Use time_after for time comparison
  SMB2: Fix share type handling
  cifs: cifsacl: Use a temporary ops variable to reduce code length
  Don't delay freeing mids when blocked on slow socket write of request
  CIFS: silence lockdep splat in cifs_relock_file()

5 years agoMerge branch 'ldmsw-fixes'
David S. Miller [Mon, 15 May 2017 19:36:09 +0000 (15:36 -0400)]
Merge branch 'ldmsw-fixes'

Shannon Nelson says:

ldmvsw: port removal stability

Under heavy reboot stress testing we found a couple of timing issues
when removing the device that could cause the kernel great heartburn,
addressed by these two patches.

Signed-off-by: David S. Miller <>
5 years agoldmvsw: stop the clean timer at beginning of remove
Shannon Nelson [Mon, 15 May 2017 17:51:08 +0000 (10:51 -0700)]
ldmvsw: stop the clean timer at beginning of remove

Stop the clean timer earlier to be sure there's no asynchronous
interference while stopping the port.

Orabug: 25748241

Signed-off-by: Shannon Nelson <>
Signed-off-by: David S. Miller <>
5 years agoldmvsw: unregistering netdev before disable hardware
Thomas Tai [Mon, 15 May 2017 17:51:07 +0000 (10:51 -0700)]
ldmvsw: unregistering netdev before disable hardware

When running LDom binding/unbinding test, kernel may panic
in ldmvsw_open(). It is more likely that because we're removing
the ldc connection before unregistering the netdev in vsw_port_remove(),
we set up a window of time where one process could be removing the
device while another trying to UP the device. This also sometimes causes
vio handshake error due to opening a device without closing it completely.
We should unregister the netdev before we disable the "hardware".

Orabug: 2598091325925306

Signed-off-by: Thomas Tai <>
Signed-off-by: Shannon Nelson <>
Signed-off-by: David S. Miller <>
5 years agonet: netcp: fix check of requested timestamping filter
Miroslav Lichvar [Mon, 15 May 2017 14:04:36 +0000 (16:04 +0200)]
net: netcp: fix check of requested timestamping filter

The driver doesn't support timestamping of all received packets and
should return error when trying to enable the HWTSTAMP_FILTER_ALL

Cc: WingMan Kwok <>
Cc: Richard Cochran <>
Signed-off-by: Miroslav Lichvar <>
Acked-by: Richard Cochran <>
Signed-off-by: David S. Miller <>
5 years agodm mpath: multipath_clone_and_map must not return -EIO
Christoph Hellwig [Mon, 15 May 2017 15:28:38 +0000 (17:28 +0200)]
dm mpath: multipath_clone_and_map must not return -EIO

Since 412445ac ("dm: introduce a new DM_MAPIO_KILL return value"), the
clone_and_map_rq methods must not return errno values, so fix it up
to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck
in due to a conflict between two patches.

Signed-off-by: Christoph Hellwig <>
Signed-off-by: Mike Snitzer <>
5 years agodm mpath: don't return -EIO from dm_report_EIO
Christoph Hellwig [Mon, 15 May 2017 15:28:37 +0000 (17:28 +0200)]
dm mpath: don't return -EIO from dm_report_EIO

Instead just turn the macro into a helper for the warning message.
This removes an unnecessary assignment and will allow the next commit to
fix a place where -EIO is the wrong return value.

Signed-off-by: Christoph Hellwig <>
Signed-off-by: Mike Snitzer <>
5 years agodm rq: add a missing break to map_request
Christoph Hellwig [Mon, 15 May 2017 15:28:36 +0000 (17:28 +0200)]
dm rq: add a missing break to map_request

We don't want to bug when receiving a DM_MAPIO_KILL value..

Fixes: 412445ac ("dm: introduce a new DM_MAPIO_KILL return value")
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Mike Snitzer <>
5 years agodm space map disk: fix some book keeping in the disk space map
Joe Thornber [Mon, 15 May 2017 13:45:40 +0000 (09:45 -0400)]
dm space map disk: fix some book keeping in the disk space map

When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm thin metadata: call precommit before saving the roots
Joe Thornber [Mon, 15 May 2017 13:43:05 +0000 (09:43 -0400)]
dm thin metadata: call precommit before saving the roots

These calls were the wrong way round in __write_initial_superblock.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agoMerge tag 'mlx5-fixes-2017-05-12-V2' of git://
David S. Miller [Mon, 15 May 2017 18:38:04 +0000 (14:38 -0400)]
Merge tag 'mlx5-fixes-2017-05-12-V2' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

Mellanox, mlx5 fixes 2017-05-12

This series contains some mlx5 fixes for net.
Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix ethtool pause support and advertise reporting") kernels >= 4.8
("net/mlx5e: Use the correct pause values for ethtool advertising") kernels >= 4.8

 Dropped statistics spinlock patch, it needs some extra work.

Signed-off-by: David S. Miller <>
5 years agoipv6: avoid dad-failures for addresses with NODAD
Mahesh Bandewar [Sat, 13 May 2017 00:03:39 +0000 (17:03 -0700)]
ipv6: avoid dad-failures for addresses with NODAD

Every address gets added with TENTATIVE flag even for the addresses with
IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process
we realize it's an address with NODAD and complete the process without
sending any probe. However the TENTATIVE flags stays on the
address for sometime enough to cause misinterpretation when we receive a NS.
While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED
and endup with an address that was originally configured as NODAD with

We can't avoid scheduling dad_work for addresses with NODAD but we can
avoid adding TENTATIVE flag to avoid this racy situation.

Signed-off-by: Mahesh Bandewar <>
Acked-by: David Ahern <>
Signed-off-by: David S. Miller <>
5 years agoqed: Fix uninitialized data in aRFS infrastructure
Mintz, Yuval [Sun, 14 May 2017 09:21:23 +0000 (12:21 +0300)]
qed: Fix uninitialized data in aRFS infrastructure

Current memset is using incorrect type of variable, causing the
upper-half of the strucutre to be left uninitialized and causing:

  ethernet/qlogic/qed/qed_init_fw_funcs.c: In function 'qed_set_rfs_mode_disable':
  ethernet/qlogic/qed/qed_init_fw_funcs.c:993:3: error: '*((void *)&ramline+4)' is used uninitialized in this function [-Werror=uninitialized]

Fixes: d51e4af5c209 ("qed: aRFS infrastructure support")
Reported-by: Arnd Bergmann <>
Signed-off-by: Yuval Mintz <>
Reviewed-by: Arnd Bergmann <>
Signed-off-by: David S. Miller <>
5 years agomdio: mux: fix device_node_continue.cocci warnings
Julia Lawall [Fri, 12 May 2017 14:54:23 +0000 (22:54 +0800)]
mdio: mux: fix device_node_continue.cocci warnings

Device node iterators put the previous value of the index variable, so an
explicit put causes a double put.

In particular, of_mdiobus_register can fail before doing anything
interesting, so one could view it as a no-op from the reference count
point of view.

Generated by: scripts/coccinelle/iterators/device_node_continue.cocci

CC: Jon Mason <>
Signed-off-by: Julia Lawall <>
Signed-off-by: Fengguang Wu <>
Signed-off-by: David S. Miller <>
5 years agonet/packet: fix missing net_device reference release
Douglas Caetano dos Santos [Fri, 12 May 2017 18:19:15 +0000 (15:19 -0300)]
net/packet: fix missing net_device reference release

When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not

Fixes c14ac9451c348 ("sock: enable timestamping using control messages")
Signed-off-by: Douglas Caetano dos Santos <>
Acked-by: Soheil Hassas Yeganeh <>
Signed-off-by: David S. Miller <>
5 years agonet/mlx4_core: Use min3 to select number of MSI-X vectors [Fri, 12 May 2017 06:10:51 +0000 (09:10 +0300)]
net/mlx4_core: Use min3 to select number of MSI-X vectors

Signed-off-by: Yuval Shaia <>
Reviewed-by: Leon Romanovsky <>
Signed-off-by: David S. Miller <>
5 years agomacvlan: Fix performance issues with vlan tagged packets
Vlad Yasevich [Thu, 11 May 2017 15:09:52 +0000 (11:09 -0400)]
macvlan: Fix performance issues with vlan tagged packets

Macvlan always turns on offload features that have sofware
fallback (NETIF_GSO_SOFTWARE).  This allows much higher guest-guest
communications over macvtap.

However, macvtap does not turn on these features for vlan tagged traffic.
As a result, depending on the HW that mactap is configured on, the
performance of guest-guest communication over a vlan is very
inconsistent.  If the HW supports TSO/UFO over vlans, then the
performance will be fine.  If not, the the performance will suffer
greatly since the VM may continue using TSO/UFO, and will force the host
segment the traffic and possibly overlow the macvtap queue.

This patch adds the always on offloads to vlan_features.  This
makes sure that any vlan tagged traffic between 2 guest will not
be segmented needlessly.

Signed-off-by: Vladislav Yasevich <>
Signed-off-by: David S. Miller <>
5 years agoi2c: mux: only print failure message on error
Peter Rosin [Mon, 15 May 2017 07:03:50 +0000 (09:03 +0200)]
i2c: mux: only print failure message on error

As is, a failure message is printed unconditionally, which is confusing.
And noisy.

Fixes: 8d4d159f25a7 ("i2c: mux: provide more info on failure in i2c_mux_add_adapter")
Signed-off-by: Peter Rosin <>
5 years agoi2c: mux: reg: rename label to indicate what it does
Peter Rosin [Mon, 15 May 2017 16:48:55 +0000 (18:48 +0200)]
i2c: mux: reg: rename label to indicate what it does

That maintains sanity if it is ever called from some other spot, and
also makes the label names coherent.

Signed-off-by: Peter Rosin <>
5 years agoi2c: mux: reg: put away the parent i2c adapter on probe failure
Peter Rosin [Sun, 7 May 2017 05:16:30 +0000 (07:16 +0200)]
i2c: mux: reg: put away the parent i2c adapter on probe failure

It is only prudent to let go of resources that are not used.

Fixes: b3fdd32799d8 ("i2c: mux: Add register-based mux i2c-mux-reg")
Signed-off-by: Peter Rosin <>
5 years agonet: stmmac: use correct pointer when printing normal descriptor ring
Niklas Cassel [Mon, 15 May 2017 08:56:06 +0000 (10:56 +0200)]
net: stmmac: use correct pointer when printing normal descriptor ring

There are two pointers in sysfs_display_ring,
one that increments if using normal dma descriptors,
another if using extended dma descriptors.

When printing the normal dma descriptors, the wrong pointer is used,
thus the printed descriptor addresses are incorrect.

Signed-off-by: Niklas Cassel <>
Signed-off-by: David S. Miller <>
5 years agos390/virtio: change virtio_feature_desc:features type to __le32
Heiko Carstens [Tue, 9 May 2017 10:50:53 +0000 (12:50 +0200)]
s390/virtio: change virtio_feature_desc:features type to __le32

The feature member of virtio_feature_desc contains little endian
values, given that it contents will be converted with
le32_to_cpu(). The "wrong" __u32 type leads to the sparse warnings
In order to avoid them, use the correct __le32 type instead.

drivers/s390/virtio/virtio_ccw.c:749:14: warning: cast to restricted __le32
drivers/s390/virtio/virtio_ccw.c:762:28: warning: cast to restricted __le32

Acked-by: Halil Pasic <>
Acked-by: Cornelia Huck <>
Signed-off-by: Heiko Carstens <>
Signed-off-by: Martin Schwidefsky <>
5 years agodm cache policy smq: don't do any writebacks unless IDLE
Joe Thornber [Thu, 11 May 2017 13:09:04 +0000 (09:09 -0400)]
dm cache policy smq: don't do any writebacks unless IDLE

If there are no clean blocks to be demoted the writeback will be
triggered at that point.  Preemptively writing back can hurt high IO
load scenarios.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache: simplify the IDLE vs BUSY state calculation
Joe Thornber [Thu, 11 May 2017 13:07:16 +0000 (09:07 -0400)]
dm cache: simplify the IDLE vs BUSY state calculation

Drop the MODERATE state since it wasn't buying us much.

Also, in check_migrations(), prepare for the next commit ("dm cache
policy smq: don't do any writebacks unless IDLE") by deferring to the
policy to make the final decision on whether writebacks can be

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache: track all IO to the cache rather than just the origin device's IO
Joe Thornber [Thu, 11 May 2017 12:22:31 +0000 (08:22 -0400)]
dm cache: track all IO to the cache rather than just the origin device's IO

IO tracking used to throttle writebacks when the origin device is busy.

Even if all the IO is going to the fast device, writebacks can
significantly degrade performance.  So track all IO to gauge whether the
cache is busy or not.

Otherwise, synthetic IO tests (e.g. fio) that might send all IO to the
fast device wouldn't cause writebacks to get throttled.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache policy smq: stop preemptively demoting blocks
Joe Thornber [Thu, 11 May 2017 11:48:18 +0000 (07:48 -0400)]
dm cache policy smq: stop preemptively demoting blocks

It causes a lot of churn if the working set's size is close to the fast
device's size.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache policy smq: put newly promoted entries at the top of the multiqueue
Joe Thornber [Thu, 11 May 2017 09:11:06 +0000 (05:11 -0400)]
dm cache policy smq: put newly promoted entries at the top of the multiqueue

This stops entries bouncing in and out of the cache quickly.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache policy smq: be more aggressive about triggering a writeback
Joe Thornber [Thu, 11 May 2017 09:09:38 +0000 (05:09 -0400)]
dm cache policy smq: be more aggressive about triggering a writeback

If there are no clean entries to demote we really want to writeback

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache policy smq: only demote entries in bottom half of the clean multiqueue
Joe Thornber [Thu, 11 May 2017 09:07:34 +0000 (05:07 -0400)]
dm cache policy smq: only demote entries in bottom half of the clean multiqueue

Heavy IO load may mean there are very few clean blocks in the cache, and
we risk demoting entries that get hit a lot.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agodm cache: fix incorrect 'idle_time' reset in IO tracker
Joe Thornber [Thu, 11 May 2017 10:14:16 +0000 (06:14 -0400)]
dm cache: fix incorrect 'idle_time' reset in IO tracker

Some bios have no payload (eg, a FLUSH), don't reset the idle_time when
these come in.

Signed-off-by: Joe Thornber <>
Signed-off-by: Mike Snitzer <>
5 years agohwmon: (coretemp) Handle frozen hotplug state correctly
Thomas Gleixner [Wed, 10 May 2017 14:30:12 +0000 (16:30 +0200)]
hwmon: (coretemp) Handle frozen hotplug state correctly

The recent conversion to the hotplug state machine missed that the original
hotplug notifiers did not execute in the frozen state, which is used on
suspend on resume.

This does not matter on single socket machines, but on multi socket systems
this breaks when the device for a non-boot socket is removed when the last
CPU of that socket is brought offline. The device removal locks up the
machine hard w/o any debug output.

Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.

Thanks to Tommi for providing debug information patiently while I failed to
spot the obvious.

Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
Reported-by: Tommi Rantala <>
Tested-by: Tommi Rantala <>
Signed-off-by: Thomas Gleixner <>
Signed-off-by: Guenter Roeck <>
5 years agonet/mlx5: Use underlay QPN from the root name space
Yishai Hadas [Tue, 25 Apr 2017 07:39:57 +0000 (10:39 +0300)]
net/mlx5: Use underlay QPN from the root name space

Root flow table is dynamically changed by the underlying flow steering
layer, and IPoIB/ULPs have no idea what will be the root flow table in
the future, hence we need a dynamic infrastructure to move Underlay QPs
with the root flow table.

Fixes: b3ba51498bdd ("net/mlx5: Refactor create flow table method to accept underlay QP")
Signed-off-by: Erez Shitrit <>
Signed-off-by: Maor Gottlieb <>
Signed-off-by: Yishai Hadas <>
Signed-off-by: Saeed Mahameed <>
5 years agonet/mlx5e: IPoIB, Only support regular RQ for now
Saeed Mahameed [Thu, 4 May 2017 14:53:32 +0000 (17:53 +0300)]
net/mlx5e: IPoIB, Only support regular RQ for now

IPoIB doesn't support striding RQ at the moment, for this
we need to explicitly choose non striding RQ in IPoIB init,
even if the HW supports it.

Fixes: 8f493ffd88ea ("net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs")
Signed-off-by: Saeed Mahameed <>
5 years agonet/mlx5e: Fix setup TC ndo
Saeed Mahameed [Tue, 9 May 2017 13:40:46 +0000 (16:40 +0300)]
net/mlx5e: Fix setup TC ndo

Fail-safe support patches introduced a trivial bug,
setup tc callback is doing a wrong check of the netdevice state,
the fix is simply to invert the condition.

Fixes: 6f9485af4020 ("net/mlx5e: Fail safe tc setup")
Signed-off-by: Saeed Mahameed <>
5 years agonet/mlx5e: Fix ethtool pause support and advertise reporting
Gal Pressman [Wed, 19 Apr 2017 11:35:15 +0000 (14:35 +0300)]
net/mlx5e: Fix ethtool pause support and advertise reporting

Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <>
Signed-off-by: Saeed Mahameed <>
5 years agonet/mlx5e: Use the correct pause values for ethtool advertising
Gal Pressman [Mon, 3 Apr 2017 12:11:22 +0000 (15:11 +0300)]
net/mlx5e: Use the correct pause values for ethtool advertising

Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969d7 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <>
Signed-off-by: Saeed Mahameed <>
5 years agopid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
Kirill Tkhai [Fri, 12 May 2017 16:11:31 +0000 (19:11 +0300)]
pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()

Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:

Task from parent pid_ns             Child reaper
copy_process()                      ..
  alloc_pid()                       ..
  ..                                zap_pid_ns_processes()
  ..                                  disable_pid_allocation()
  ..                                  read_lock(&tasklist_lock)
  ..                                  iterate over pids in pid_ns
  ..                                    kill tasks linked to pids
  ..                                  read_unlock(&tasklist_lock)
  write_lock_irq(&tasklist_lock);   ..
  attach_pid(p, PIDTYPE_PID);       ..
  ..                                ..

So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.

The patch fixes the problem. It simply checks for
(pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:

"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in ->nr_hashed."

If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().

v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.

Fixes: c876ad768215 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai <>
CC: Andrew Morton <>
CC: Ingo Molnar <>
CC: Peter Zijlstra <>
CC: Oleg Nesterov <>
CC: Mike Rapoport <>
CC: Michal Hocko <>
CC: Andy Lutomirski <>
CC: "Eric W. Biederman" <>
CC: Andrei Vagin <>
CC: Cyrill Gorcunov <>
CC: Serge Hallyn <>
Acked-by: Oleg Nesterov <>
Signed-off-by: Eric W. Biederman <>
5 years agopid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
Eric W. Biederman [Thu, 11 May 2017 23:21:01 +0000 (18:21 -0500)]
pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes

The code can potentially sleep for an indefinite amount of time in
zap_pid_ns_processes triggering the hung task timeout, and increasing
the system average.  This is undesirable.  Sleep with a task state of
undesirable side effects.

Apparently under heavy load this has been allowing Chrome to trigger
the hung time task timeout error and cause ChromeOS to reboot.

Reported-by: Vovo Yang <>
Reported-by: Guenter Roeck <>
Tested-by: Guenter Roeck <>
Fixes: 6347e9009104 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
Signed-off-by: "Eric W. Biederman" <>
5 years agoLinux 4.12-rc1 v4.12-rc1
Linus Torvalds [Sat, 13 May 2017 20:19:49 +0000 (13:19 -0700)]
Linux 4.12-rc1

5 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Sat, 13 May 2017 17:25:05 +0000 (10:25 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull some more input subsystem updates from Dmitry Torokhov:
 "An updated xpad driver with a few more recognized device IDs, and a
  new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
  via SPI bus"

* 'for-linus' of git://
  Input: cros_ec_keyb - remove extraneous 'const'
  Input: add support for PlayStation 1/2 joypads connected via SPI
  Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
  Input: xpad - sync supported devices with xboxdrv
  Input: xpad - sort supported devices by USB ID

5 years agoMerge tag 'upstream-4.12-rc1' of git://
Linus Torvalds [Sat, 13 May 2017 17:23:12 +0000 (10:23 -0700)]
Merge tag 'upstream-4.12-rc1' of git://

Pull UBI/UBIFS updates from Richard Weinberger:

 - new config option CONFIG_UBIFS_FS_SECURITY

 - minor improvements

 - random fixes

* tag 'upstream-4.12-rc1' of git://
  ubi: Add debugfs file for tracking PEB state
  ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl
  ubifs: Remove unnecessary assignment
  ubifs: Fix cut and paste error on sb type comparisons
  ubi: fastmap: Fix slab corruption
  ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels
  ubi: Make mtd parameter readable
  ubi: Fix section mismatch

5 years agoMerge branch 'for-linus-4.12-rc1' of git://
Linus Torvalds [Sat, 13 May 2017 17:20:02 +0000 (10:20 -0700)]
Merge branch 'for-linus-4.12-rc1' of git://git./linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
 "No new stuff, just fixes"

* 'for-linus-4.12-rc1' of git://
  um: Add missing NR_CPUS include
  um: Fix to call read_initrd after init_bootmem
  um: Include kbuild.h instead of duplicating its macros
  um: Fix PTRACE_POKEUSER on x86_64
  um: Set number of CPUs
  um: Fix _print_addr()

5 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 13 May 2017 16:49:35 +0000 (09:49 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <>:
  mm, docs: update memory.stat description with workingset* entries
  mm: vmscan: scan until it finds eligible pages
  mm, thp: copying user pages must schedule on collapse
  dax: fix PMD data corruption when fault races with write
  dax: fix data corruption when fault races with write
  ext4: return to starting transaction in ext4_dax_huge_fault()
  mm: fix data corruption due to stale mmap reads
  dax: prevent invalidation of mapped DAX entries
  Tigran has moved
  mm, vmalloc: fix vmalloc users tracking properly
  mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
  gcov: support GCC 7.1
  mm, vmstat: Remove spurious WARN() during zoneinfo print
  time: delete current_fs_time()
  hwpoison, memcg: forcibly uncharge LRU pages

5 years ago[CIFS] Minor cleanup of xattr query function
Steve French [Sat, 13 May 2017 01:59:10 +0000 (20:59 -0500)]
[CIFS] Minor cleanup of xattr query function

Some minor cleanup of cifs query xattr functions (will also make
SMB3 xattr implementation cleaner as well).

Signed-off-by: Steve French <>
5 years agofs: cifs: transport: Use time_after for time comparison
Karim Eshapa [Thu, 11 May 2017 23:53:38 +0000 (01:53 +0200)]
fs: cifs: transport: Use time_after for time comparison

Use time_after kernel macro for time comparison
that has safety check.

Signed-off-by: Karim Eshapa <>
Signed-off-by: Steve French <>
5 years agoSMB2: Fix share type handling
Christophe JAILLET [Fri, 12 May 2017 15:59:32 +0000 (17:59 +0200)]
SMB2: Fix share type handling

In fs/cifs/smb2pdu.h, we have:
#define SMB2_SHARE_TYPE_DISK    0x01
#define SMB2_SHARE_TYPE_PIPE    0x02
#define SMB2_SHARE_TYPE_PRINT   0x03

Knowing that, with the current code, the SMB2_SHARE_TYPE_PRINT case can
never trigger and printer share would be interpreted as disk share.

So, test the ShareType value for equality instead.

Fixes: faaf946a7d5b ("CIFS: Add tree connect/disconnect capability for SMB2")
Signed-off-by: Christophe JAILLET <>
Acked-by: Aurelien Aptel <>
Signed-off-by: Steve French <>
5 years agocifs: cifsacl: Use a temporary ops variable to reduce code length
Joe Perches via samba-technical [Sun, 7 May 2017 08:31:47 +0000 (01:31 -0700)]
cifs: cifsacl: Use a temporary ops variable to reduce code length

Create an ops variable to store tcon->ses->server->ops and cache
indirections and reduce code size a trivial bit.

$ size fs/cifs/cifsacl.o*
   text    data     bss     dec     hex filename
   5338     136       8    5482    156a fs/cifs/
   5371     136       8    5515    158b fs/cifs/cifsacl.o.old

Signed-off-by: Joe Perches <>
Acked-by: Shirish Pargaonkar <>
Signed-off-by: Steve French <>
5 years agomm, docs: update memory.stat description with workingset* entries
Roman Gushchin [Fri, 12 May 2017 22:47:09 +0000 (15:47 -0700)]
mm, docs: update memory.stat description with workingset* entries

Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache
workingset transition") introduced three new entries in memory stat

 - workingset_refault
 - workingset_activate
 - workingset_nodereclaim

This commit adds a corresponding description to the cgroup v2 docs.

Signed-off-by: Roman Gushchin <>
Cc: Johannes Weiner <>
Cc: Michal Hocko <>
Cc: Vladimir Davydov <>
Cc: Tejun Heo <>
Cc: Li Zefan <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
5 years agomm: vmscan: scan until it finds eligible pages
Minchan Kim [Fri, 12 May 2017 22:47:06 +0000 (15:47 -0700)]
mm: vmscan: scan until it finds eligible pages

Although there are a ton of free swap and anonymous LRU page in elgible
zones, OOM happened.

  balloon invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=0, oom_score_adj=0
  CPU: 7 PID: 1138 Comm: balloon Not tainted 4.11.0-rc6-mm1-zram-00289-ge228d67e9677-dirty #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
  active_anon:424716 inactive_anon:65314 isolated_anon:0
   active_file:52 inactive_file:46 isolated_file:0
   unevictable:0 dirty:27 writeback:0 unstable:0
   slab_reclaimable:3967 slab_unreclaimable:4125
   mapped:133 shmem:43 pagetables:1674 bounce:0
   free:4637 free_pcp:225 free_cma:0
  Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
  DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 992 992 1952
  DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 959
  Movable free:3644kB min:1980kB low:2960kB high:3940kB active_anon:738560kB inactive_anon:261340kB active_file:188kB inactive_file:640kB unevictable:0kB writepending:20kB present:1048444kB managed:1010816kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:832kB local_pcp:60kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 1*4kB (E) 0*8kB 18*16kB (E) 10*32kB (E) 10*64kB (E) 9*128kB (ME) 8*256kB (E) 2*512kB (E) 2*1024kB (E) 0*2048kB 0*4096kB = 7524kB
  DMA32: 417*4kB (UMEH) 181*8kB (UMEH) 68*16kB (UMEH) 48*32kB (UMEH) 14*64kB (MH) 3*128kB (M) 1*256kB (H) 1*512kB (M) 2*1024kB (M) 0*2048kB 0*4096kB = 9836kB
  Movable: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 0*64kB 1*128kB (M) 2*256kB (M) 4*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 3772kB
  378 total pagecache pages
  17 pages in swap cache
  Swap cache stats: add 17325, delete 17302, find 0/27
  Free swap  = 978940kB
  Total swap = 1048572kB
  524157 pages RAM
  0 pages HighMem/MovableOnly
  12629 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned
  [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
  [  433]     0   433     4904        5      14       3       82             0 upstart-udev-br
  [  438]     0   438    12371        5      27       3      191         -1000 systemd-udevd

With investigation, skipping page of isolate_lru_pages makes reclaim
void because it returns zero nr_taken easily so LRU shrinking is
effectively nothing and just increases priority aggressively.  Finally,
OOM happens.

The problem is that get_scan_count determines nr_to_scan with eligible
zones so although priority drops to zero, it couldn't reclaim any pages
if the LRU contains mostly ineligible pages.


        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.


(Ie, small eligible pages are in the head of LRU but others are
 almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages from
tail of the LRU are not eligible pages.  If get_scan_count counts
skipped pages, it doesn't reclaim any pages remained after scanning 4
pages so it ends up OOM happening.

This patch makes isolate_lru_pages try to scan pages until it encounters
eligible zones's pages.

[ clean up mind-bending `for' statement.  Tweak comment text]
Fixes: 3db65812d688 ("Revert "mm, vmscan: account for skipped pages as a partial scan"")
Signed-off-by: Minchan Kim <>
Acked-by: Michal Hocko <>
Acked-by: Johannes Weiner <>
Cc: Mel Gorman <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>