sfrench/cifs-2.6.git
3 years agonet/mlx5e: Fix memory usage issues in offloading TC flows
Jianbo Liu [Thu, 8 Mar 2018 09:20:55 +0000 (09:20 +0000)]
net/mlx5e: Fix memory usage issues in offloading TC flows

For NIC flows, the parsed attributes are not freed when we exit
successfully from mlx5e_configure_flower().

There is possible double free for eswitch flows. If error is returned
from rhashtable_insert_fast(), the parse attrs will be freed in
mlx5e_tc_del_flow(), but they will be freed again before exiting
mlx5e_configure_flower().

To fix both issues we do the following:
(1) change the condition that determines if to issue the free call to
    check if this flow is NIC flow, or it does not have encap action.
(2) reorder the code such that that the check and free calls are done
    before we attempt to add into the hash table.

Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
3 years agonet/mlx5e: Fix traffic being dropped on VF representor
Roi Dayan [Wed, 28 Feb 2018 10:56:42 +0000 (12:56 +0200)]
net/mlx5e: Fix traffic being dropped on VF representor

Increase representor netdev RQ size to avoid dropped packets.
The current size (two) is just too small to keep up with
conventional slow path traffic patterns.
Also match the SQ size to the RQ size.

Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
3 years agonet/mlx5e: Verify coalescing parameters in range
Moshe Shemesh [Thu, 15 Feb 2018 10:41:48 +0000 (12:41 +0200)]
net/mlx5e: Verify coalescing parameters in range

Add check of coalescing parameters received through ethtool are within
range of values supported by the HW.
Driver gets the coalescing rx/tx-usecs and rx/tx-frames as set by the
users through ethtool. The ethtool support up to 32 bit value for each.
However, mlx5 modify cq limits the coalescing time parameter to 12 bit
and coalescing frames parameters to 16 bits.
Return out of range error if user tries to set these parameters to
higher values.

Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
3 years agonet/mlx5: Make eswitch support to depend on switchdev
Or Gerlitz [Thu, 15 Feb 2018 10:39:55 +0000 (12:39 +0200)]
net/mlx5: Make eswitch support to depend on switchdev

Add dependancy for switchdev to be congfigured as any user-space control
plane SW is expected to use the HW switchdev ID to locate the representors
related to VFs of a certain PF and apply SW/offloaded switching on them.

Fixes: e80541ecabd5 ('net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
3 years agonet/mlx5e: Use 32 bits to store VF representor SQ number
Or Gerlitz [Tue, 13 Feb 2018 11:43:39 +0000 (13:43 +0200)]
net/mlx5e: Use 32 bits to store VF representor SQ number

SQs are 32 and not 16 bits, hence it's wrong to use only 16 bits to
store the sq number for which are going to set steering rule, fix that.

Fixes: cb67b832921c ('net/mlx5e: Introduce SRIOV VF representors')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
3 years agonet/mlx5e: Don't override vport admin link state in switchdev mode
Jianbo Liu [Fri, 2 Mar 2018 02:09:08 +0000 (02:09 +0000)]
net/mlx5e: Don't override vport admin link state in switchdev mode

The vport admin original link state will be re-applied after returning
back to legacy mode, it is not right to change the admin link state value
when in switchdev mode.

Use direct vport commands to alter logical vport state in netdev
representor open/close flows rather than the administrative eswitch API.

Fixes: 20a1ea674783 ('net/mlx5e: Support VF vport link state control for SRIOV switchdev mode')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
3 years agonet: dsa: mt7530: fix module autoloading for OF platform drivers
Sean Wang [Mon, 26 Mar 2018 10:07:10 +0000 (18:07 +0800)]
net: dsa: mt7530: fix module autoloading for OF platform drivers

It's required to create a modules.alias via MODULE_DEVICE_TABLE helper
for the OF platform driver. Otherwise, module autoloading cannot work.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: mt7530: remove redundant MODULE_ALIAS entries
Sean Wang [Mon, 26 Mar 2018 10:07:09 +0000 (18:07 +0800)]
net: dsa: mt7530: remove redundant MODULE_ALIAS entries

MODULE_ALIAS exports information to allow the module to be auto-loaded at
boot for the drivers registered using legacy platform registration.

However, currently the driver is always used by DT-only platform,
MODULE_ALIAS is redundant and should be removed properly.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agovhost_net: add missing lock nesting notation
Jason Wang [Mon, 26 Mar 2018 08:10:23 +0000 (16:10 +0800)]
vhost_net: add missing lock nesting notation

We try to hold TX virtqueue mutex in vhost_net_rx_peek_head_len()
after RX virtqueue mutex is held in handle_rx(). This requires an
appropriate lock nesting notation to calm down deadlock detector.

Fixes: 0308813724606 ("vhost_net: basic polling support")
Reported-by: syzbot+7f073540b1384a614e09@syzkaller.appspotmail.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/usb/qmi_wwan.c: Add USB id for lt4120 modem
Torsten Hilbrich [Mon, 26 Mar 2018 05:19:57 +0000 (07:19 +0200)]
net/usb/qmi_wwan.c: Add USB id for lt4120 modem

This is needed to support the modem found in HP EliteBook 820 G3.

Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoteam: move dev_mc_sync after master_upper_dev_link in team_port_add
Xin Long [Sun, 25 Mar 2018 17:25:06 +0000 (01:25 +0800)]
team: move dev_mc_sync after master_upper_dev_link in team_port_add

The same fix as in 'bonding: move dev_mc_sync after master_upper_dev_link
in bond_enslave' is needed for team driver.

The panic can be reproduced easily:

  ip link add team1 type team
  ip link set team1 up
  ip link add link team1 vlan1 type vlan id 80
  ip link set vlan1 master team1

Fixes: cb41c997d444 ("team: team should sync the port's uc/mc addrs when add a port")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'bond-hwaddr-sync-fixes'
David S. Miller [Mon, 26 Mar 2018 16:51:06 +0000 (12:51 -0400)]
Merge branch 'bond-hwaddr-sync-fixes'

Xin Long says:

====================
bonding: a bunch of fixes for dev hwaddr sync in bond_enslave

This patchset is mainly to fix a crash when adding vlan as slave of
bond which is also the parent link in patch 2/3,  and also fix some
err process problems in bond_enslave in patch 1/3 and 3/3.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobonding: process the err returned by dev_set_allmulti properly in bond_enslave
Xin Long [Sun, 25 Mar 2018 17:16:47 +0000 (01:16 +0800)]
bonding: process the err returned by dev_set_allmulti properly in bond_enslave

When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails,
dev_set_promiscuity(-1) should be done before going to the err path.
Otherwise, dev->promiscuity will leak.

Fixes: 7e1a1ac1fbaa ("bonding: Check return of dev_set_promiscuity/allmulti")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobonding: move dev_mc_sync after master_upper_dev_link in bond_enslave
Xin Long [Sun, 25 Mar 2018 17:16:46 +0000 (01:16 +0800)]
bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave

Beniamino found a crash when adding vlan as slave of bond which is also
the parent link:

  ip link add bond1 type bond
  ip link set bond1 up
  ip link add link bond1 vlan1 type vlan id 80
  ip link set vlan1 master bond1

The call trace is as below:

  [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf
  [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30
  [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80
  [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q]
  [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0
  [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80
  [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding]
  [<ffffffffa8401909>] do_setlink+0x9c9/0xe50
  [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880
  [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260
  [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0
  [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30
  [<ffffffffa8424850>] netlink_unicast+0x170/0x210
  [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420
  [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0

This is actually a dead lock caused by sync slave hwaddr from master when
the master is the slave's 'slave'. This dead loop check is actually done
by netdev_master_upper_dev_link. However, Commit 1f718f0f4f97 ("bonding:
populate neighbour's private on enslave") moved it after dev_mc_sync.

This patch is to fix it by moving dev_mc_sync after master_upper_dev_link,
so that this loop check would be earlier than dev_mc_sync. It also moves
if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an
improvement.

Note team driver also has this issue, I will fix it in another patch.

Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave")
Reported-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobonding: fix the err path for dev hwaddr sync in bond_enslave
Xin Long [Sun, 25 Mar 2018 17:16:45 +0000 (01:16 +0800)]
bonding: fix the err path for dev hwaddr sync in bond_enslave

vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
the err path it should unsync dev hwaddr. Otherwise, the slave
dev's hwaddr will never be unsync when this err happens.

Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the standard ones")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sched, fix OOO packets with pfifo_fast
John Fastabend [Sun, 25 Mar 2018 05:25:06 +0000 (22:25 -0700)]
net: sched, fix OOO packets with pfifo_fast

After the qdisc lock was dropped in pfifo_fast we allow multiple
enqueue threads and dequeue threads to run in parallel. On the
enqueue side the skb bit ooo_okay is used to ensure all related
skbs are enqueued in-order. On the dequeue side though there is
no similar logic. What we observe is with fewer queues than CPUs
it is possible to re-order packets when two instances of
__qdisc_run() are running in parallel. Each thread will dequeue
a skb and then whichever thread calls the ndo op first will
be sent on the wire. This doesn't typically happen because
qdisc_run() is usually triggered by the same core that did the
enqueue. However, drivers will trigger __netif_schedule()
when queues are transitioning from stopped to awake using the
netif_tx_wake_* APIs. When this happens netif_schedule() calls
qdisc_run() on the same CPU that did the netif_tx_wake_* which
is usually done in the interrupt completion context. This CPU
is selected with the irq affinity which is unrelated to the
enqueue operations.

To resolve this we add a RUNNING bit to the qdisc to ensure
only a single dequeue per qdisc is running. Enqueue and dequeue
operations can still run in parallel and also on multi queue
NICs we can still have a dequeue in-flight per qdisc, which
is typically per CPU.

Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: qmi_wwan: add BroadMobi BM806U 2020:2033
Pawel Dembicki [Sat, 24 Mar 2018 21:08:14 +0000 (22:08 +0100)]
net: qmi_wwan: add BroadMobi BM806U 2020:2033

BroadMobi BM806U is an Qualcomm MDM9225 based 3G/4G modem.
Tested hardware BM806U is mounted on D-Link DWR-921-C3 router.
The USB id is added to qmi_wwan.c to allow QMI communication with
the BM806U.

Tested on 4.14 kernel and OpenWRT.

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoDocumentation/isdn: check and fix dead links ...
Sanjeev Gupta [Sat, 24 Mar 2018 05:07:42 +0000 (13:07 +0800)]
Documentation/isdn: check and fix dead links ...

and switch to https where possible.

All links have been eyeballed to verify that the domains have
not changed, etc.

Signed-off-by: Sanjeev Gupta <ghane0@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'wireless-drivers-for-davem-2018-03-24' of git://git.kernel.org/pub/scm...
David S. Miller [Mon, 26 Mar 2018 01:26:07 +0000 (21:26 -0400)]
Merge tag 'wireless-drivers-for-davem-2018-03-24' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.16

Some fixes for 4.16, only for iwlwifi and brcmfmac this time. All
pretty small.

iwlwifi

* fix an issue with the multicast queue

* fix IGTK handling

* fix some missing return value checks

* add support for a HW workaround for issues on some platforms

* a couple of fixes for channel-switch

* a few fixes for the aggregation handling code

brcmfmac

* drop Inter-Access Point Protocol packets by default

* fix check for ISO3166 regulatory code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: the entire IPv6 header chain must fit the first fragment
Paolo Abeni [Fri, 23 Mar 2018 13:47:30 +0000 (14:47 +0100)]
ipv6: the entire IPv6 header chain must fit the first fragment

While building ipv6 datagram we currently allow arbitrary large
extheaders, even beyond pmtu size. The syzbot has found a way
to exploit the above to trigger the following splat:

kernel BUG at ./include/linux/skbuff.h:2073!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline]
RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636
RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293
RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18
RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000
R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6
R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0
FS:  0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ip6_finish_skb include/net/ipv6.h:969 [inline]
  udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073
  udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343
  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
  __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
  SYSC_sendmmsg net/socket.c:2167 [inline]
  SyS_sendmmsg+0x35/0x60 net/socket.c:2162
  do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4404c9
RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9
RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003
RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0
R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000
Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29
5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d
87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe
RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0
RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP:
ffff8801bc18f0f0

As stated by RFC 7112 section 5:

   When a host fragments an IPv6 datagram, it MUST include the entire
   IPv6 Header Chain in the First Fragment.

So this patch addresses the issue dropping datagrams with excessive
extheader length. It also updates the error path to report to the
calling socket nonnegative pmtu values.

The issue apparently predates git history.

v1 -> v2: cleanup error path, as per Eric's suggestion

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetlink: make sure nladdr has correct size in netlink_connect()
Alexander Potapenko [Fri, 23 Mar 2018 12:49:02 +0000 (13:49 +0100)]
netlink: make sure nladdr has correct size in netlink_connect()

KMSAN reports use of uninitialized memory in the case when |alen| is
smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
fully copied from the userspace.

Signed-off-by: Alexander Potapenko <glider@google.com>
Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan78xx: Set ASD in MAC_CR when EEE is enabled.
Raghuram Chary J [Fri, 23 Mar 2018 10:18:08 +0000 (15:48 +0530)]
lan78xx: Set ASD in MAC_CR when EEE is enabled.

Description:
EEE does not work with lan7800 when AutoSpeed is not set.
(This can happen when EEPROM is not populated or configured incorrectly)

Root-Cause:
When EEE is enabled, the mac config register ASD is not set
i.e. in default state, causing EEE fail.

Fix:
Set the register when eeprom is not present.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv4: disable SMC TCP option with SYN Cookies
Hans Wippel [Fri, 23 Mar 2018 10:05:45 +0000 (11:05 +0100)]
net/ipv4: disable SMC TCP option with SYN Cookies

Currently, the SMC experimental TCP option in a SYN packet is lost on
the server side when SYN Cookies are active. However, the corresponding
SYNACK sent back to the client contains the SMC option. This causes an
inconsistent view of the SMC capabilities on the client and server.

This patch disables the SMC option in the SYNACK when SYN Cookies are
active to avoid this issue.

Fixes: 60e2a7780793b ("tcp: TCP experimental option for SMC")
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Sat, 24 Mar 2018 21:10:01 +0000 (17:10 -0400)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise
   userspace hits EOPNOTSUPP with valid rules using the meter statement,
   from Florian Westphal.

2) If you send a batch that flushes the existing ruleset (that contains
   a NAT chain) and the new ruleset definition comes with a new NAT
   chain, don't bogusly hit EBUSY. Also from Florian.

3) Missing netlink policy attribute validation, from Florian.

4) Detach conntrack template from skbuff if IP_NODEFRAG is set on,
   from Paolo Abeni.

5) Cache device names in flowtable object, otherwise we may end up
   walking over devices going aways given no rtnl_lock is held.

6) Fix incorrect net_device ingress with ingress hooks.

7) Fix crash when trying to read more data than available in UDP
   packets from the nf_socket infrastructure, from Subash.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}
Subash Abhinov Kasiviswanathan [Fri, 23 Mar 2018 03:12:39 +0000 (21:12 -0600)]
netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6}

skb_header_pointer will copy data into a buffer if data is non linear,
otherwise it will return a pointer in the linear section of the data.
nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later
accesses memory within the size of tcphdr (th->doff) in case of TCP
packets. This causes a crash when running with KASAN with the following
call stack -

BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718
net/netfilter/xt_socket.c:178
Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971
CPU: 2 PID: 28971 Comm: syz-executor Tainted: G    B   W  O    4.9.65+ #1
Call trace:
[<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76
[<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
[<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51
[<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248
[<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline]
[<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
[<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
[<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
[<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739
[<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline]
[<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178

Fix this by copying data into appropriate size headers based on protocol.

Fixes: a583636a83ea ("inet: refactor inet[6]_lookup functions to take skb")
Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoipv6: fix possible deadlock in rt6_age_examine_exception()
Eric Dumazet [Fri, 23 Mar 2018 14:56:58 +0000 (07:56 -0700)]
ipv6: fix possible deadlock in rt6_age_examine_exception()

syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()

rt6_age_examine_exception() is called while rt6_exception_lock is held.
This lock is the lower one in the lock hierarchy, thus we can not
call dst_neigh_lookup() function, as it can fallback to neigh_create()

We should instead do a pure RCU lookup. As a bonus we avoid
a pair of atomic operations on neigh refcount.

[1]

WARNING: possible circular locking dependency detected
4.16.0-rc4+ #277 Not tainted

syz-executor7/4015 is trying to acquire lock:
 (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928

but task is already holding lock:
 (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&tbl->lock){++-.}:
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
       neigh_create include/net/neighbour.h:315 [inline]
       ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
       dst_neigh_lookup include/net/dst.h:405 [inline]
       rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
       rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
       fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
       fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
       fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
       fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
       fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
       __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
       fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
       fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
       ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #2 (rt6_exception_lock){+.-.}:
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:315 [inline]
       rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
       fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
       fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
       __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
       ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
       __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
       ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
       ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
       inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
       sock_release+0x8d/0x1e0 net/socket.c:594
       sock_close+0x16/0x20 net/socket.c:1149
       __fput+0x327/0x7e0 fs/file_table.c:209
       ____fput+0x15/0x20 fs/file_table.c:243
       task_work_run+0x199/0x270 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x9bb/0x1ad0 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:968
       get_signal+0x73a/0x16d0 kernel/signal.c:2469
       do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
       exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
       prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
       do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:315 [inline]
       __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
       ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
       addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
       fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
       addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
       addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
       rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
       rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
       netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
       netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
       netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:639
       ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
       __sys_sendmsg+0xe5/0x210 net/socket.c:2081
       SYSC_sendmsg net/socket.c:2092 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2088
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

-> #0 (&ndev->lock){++--}:
       lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
       ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
       pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
       pneigh_ifdown net/core/neighbour.c:695 [inline]
       neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
       rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
       addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
       addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

other info that might help us debug this:

Chain exists of:
  &ndev->lock --> rt6_exception_lock --> &tbl->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&tbl->lock);
                               lock(rt6_exception_lock);
                               lock(&tbl->lock);
  lock(&ndev->lock);

 *** DEADLOCK ***

2 locks held by syz-executor7/4015:
 #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
 #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292

stack backtrace:
CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
 check_prev_add kernel/locking/lockdep.c:1863 [inline]
 check_prevs_add kernel/locking/lockdep.c:1976 [inline]
 validate_chain kernel/locking/lockdep.c:2417 [inline]
 __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
 __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
 _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
 __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
 ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
 pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
 pneigh_ifdown net/core/neighbour.c:695 [inline]
 neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
 rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
 addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
 addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
 call_netdevice_notifiers net/core/dev.c:1725 [inline]
 __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
 dev_change_flags+0xf5/0x140 net/core/dev.c:6994
 devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
 inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
 packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
 sock_do_ioctl+0xef/0x390 net/socket.c:957
 sock_ioctl+0x36b/0x610 net/socket.c:1081
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
 SYSC_ioctl fs/ioctl.c:701 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlxsw-GRE-mtu-changes'
David S. Miller [Fri, 23 Mar 2018 16:54:35 +0000 (12:54 -0400)]
Merge branch 'mlxsw-GRE-mtu-changes'

Ido Schimmel says:

====================
mlxsw: Handle changes to MTU in GRE tunnels

Petr says:

When offloading GRE tunnels, the MTU setting is kept fixed after the
initial offload even as the slow-path configuration changed. Worse: the
offloaded MTU setting is actually just a transient value set at the time
of NETDEV_REGISTER of the tunnel. As of commit ffc2b6ee4174 ("ip_gre:
fix IFLA_MTU ignored on NEWLINK"), that transient value is zero, and
unless there's e.g. a VRF migration that prompts re-offload, it stays at
zero, and all GRE packets end up trapping.

Thus, in patch #1, change the way the MTU is changed post-registration,
so that the full event protocol is observed. That way the drivers get to
see the change and have a chance to react.

In the remaining two patches, implement support for MTU change in mlxsw
driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Handle MTU change of GRE netdevs
Petr Machata [Thu, 22 Mar 2018 17:53:35 +0000 (19:53 +0200)]
mlxsw: spectrum_router: Handle MTU change of GRE netdevs

Update MTU of overlay loopback in accordance with the setting on the
tunnel netdevice.

Fixes: 0063587d3587 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomlxsw: spectrum_router: Move mlxsw_sp_rif_ipip_lb_op()
Petr Machata [Thu, 22 Mar 2018 17:53:34 +0000 (19:53 +0200)]
mlxsw: spectrum_router: Move mlxsw_sp_rif_ipip_lb_op()

Move the function so that it can be called without forward declaration
from a function that will be added in a follow-up patch.

Fixes: 0063587d3587 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoip_tunnel: Emit events for post-register MTU changes
Petr Machata [Thu, 22 Mar 2018 17:53:33 +0000 (19:53 +0200)]
ip_tunnel: Emit events for post-register MTU changes

For tunnels created with IFLA_MTU, MTU of the netdevice is set by
rtnl_create_link() (called from rtnl_newlink()) before the device is
registered. However without IFLA_MTU that's not done.

rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
via ip_tunnel_newlink() calls register_netdevice(), and that emits
NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
MTU of 0.

After ip_tunnel_newlink() corrects the MTU after registering the
netdevice, but since there's no event, the listeners don't get to know
about the MTU until something else happens--such as a NETDEV_UP event.
That's not ideal.

So instead of setting the MTU directly, go through dev_set_mtu(), which
takes care of distributing the necessary NETDEV_PRECHANGEMTU and
NETDEV_CHANGEMTU events.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 23 Mar 2018 01:48:43 +0000 (18:48 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, thp: do not cause memcg oom for thp
  mm/vmscan: wake up flushers for legacy cgroups too
  Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
  mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
  mm/thp: do not wait for lock_page() in deferred_split_scan()
  mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
  x86/mm: implement free pmd/pte page interfaces
  mm/vmalloc: add interfaces to free unmapped page table
  h8300: remove extraneous __BIG_ENDIAN definition
  hugetlbfs: check for pgoff value overflow
  lockdep: fix fs_reclaim warning
  MAINTAINERS: update Mark Fasheh's e-mail
  mm/mempolicy.c: avoid use uninitialized preferred_node

3 years agoMerge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdim...
Linus Torvalds [Fri, 23 Mar 2018 01:37:49 +0000 (18:37 -0700)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "Two regression fixes, two bug fixes for older issues, two fixes for
  new functionality added this cycle that have userspace ABI concerns,
  and a small cleanup. These have appeared in a linux-next release and
  have a build success report from the 0day robot.

   * The 4.16 rework of altmap handling led to some configurations
     leaking page table allocations due to freeing from the altmap
     reservation rather than the page allocator.

     The impact without the fix is leaked memory and a WARN() message
     when tearing down libnvdimm namespaces. The rework also missed a
     place where error handling code needed to be removed that can lead
     to a crash if devm_memremap_pages() fails.

   * acpi_map_pxm_to_node() had a latent bug whereby it could
     misidentify the closest online node to a given proximity domain.

   * Block integrity handling was reworked several kernels back to allow
     calling add_disk() after setting up the integrity profile.

     The nd_btt and nd_blk drivers are just now catching up to fix
     automatic partition detection at driver load time.

   * The new peristence_domain attribute, a platform indicator of
     whether cpu caches are powerfail protected for example, is meant to
     be a single value enum and not a set of flags.

     This oversight was caught while reviewing new userspace code in
     libndctl to communicate the attribute.

     Fix this new enabling up so that we are not stuck with an unwanted
     userspace ABI"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, nfit: fix persistence domain reporting
  libnvdimm, region: hide persistence_domain when unknown
  acpi, numa: fix pxm to online numa node associations
  x86, memremap: fix altmap accounting at free
  libnvdimm: remove redundant assignment to pointer 'dev'
  libnvdimm, {btt, blk}: do integrity setup before add_disk()
  kernel/memremap: Remove stale devres_free() call

3 years agoMerge tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 23 Mar 2018 00:37:44 +0000 (17:37 -0700)]
Merge tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A bunch of fixes all over the place (core, i915, amdgpu, imx, sun4i,
  ast, tegra, vmwgfx), nothing too serious or worrying at this stage.

   - one uapi fix to stop multi-planar images with getfb

   - Sun4i error path and clock fixes

   - udl driver mmap offset fix

   - i915 DP MST and GPU reset fixes

   - vmwgfx mutex and black screen fixes

   - imx array underflow fix and vblank fix

   - amdgpu: display fixes

   - exynos devicetree fix

   - ast mode fix"

* tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux: (29 commits)
  drm/ast: Fixed 1280x800 Display Issue
  drm: udl: Properly check framebuffer mmap offsets
  drm/i915: Specify which engines to reset following semaphore/event lockups
  drm/vmwgfx: Fix a destoy-while-held mutex problem.
  drm/vmwgfx: Fix black screen and device errors when running without fbdev
  drm: Reject getfb for multi-plane framebuffers
  drm/amd/display: Add one to EDID's audio channel count when passing to DC
  drm/amd/display: We shouldn't set format_default on plane as atomic driver
  drm/amd/display: Fix FMT truncation programming
  drm/amd/display: Allow truncation to 10 bits
  drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
  drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
  drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
  drm/amd/display: fix dereferencing possible ERR_PTR()
  drm/amd/display: Refine disable VGA
  drm/tegra: Shutdown on driver unbind
  drm/tegra: dsi: Don't disable regulator on ->exit()
  drm/tegra: dc: Detach IOMMU group from domain only once
  dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node
  drm/imx: move arming of the vblank event to atomic_flush
  ...

3 years agomm, thp: do not cause memcg oom for thp
David Rientjes [Thu, 22 Mar 2018 23:17:45 +0000 (16:17 -0700)]
mm, thp: do not cause memcg oom for thp

Commit 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and
madvised allocations") changed the page allocator to no longer detect
thp allocations based on __GFP_NORETRY.

It did not, however, modify the mem cgroup try_charge() path to avoid
oom kill for either khugepaged collapsing or thp faulting.  It is never
expected to oom kill a process to allocate a hugepage for thp; reclaim
is governed by the thp defrag mode and MADV_HUGEPAGE, but allocations
(and charging) should fallback instead of oom killing processes.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803191409420.124411@chino.kir.corp.google.com
Fixes: 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations")
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/vmscan: wake up flushers for legacy cgroups too
Andrey Ryabinin [Thu, 22 Mar 2018 23:17:42 +0000 (16:17 -0700)]
mm/vmscan: wake up flushers for legacy cgroups too

Commit 726d061fbd36 ("mm: vmscan: kick flushers when we encounter dirty
pages on the LRU") added flusher invocation to shrink_inactive_list()
when many dirty pages on the LRU are encountered.

However, shrink_inactive_list() doesn't wake up flushers for legacy
cgroup reclaim, so the next commit bbef938429f5 ("mm: vmscan: remove old
flusher wakeup from direct reclaim path") removed the only source of
flusher's wake up in legacy mem cgroup reclaim path.

This leads to premature OOM if there is too many dirty pages in cgroup:
    # mkdir /sys/fs/cgroup/memory/test
    # echo $$ > /sys/fs/cgroup/memory/test/tasks
    # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
    # dd if=/dev/zero of=tmp_file bs=1M count=100
    Killed

    dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0

    Call Trace:
     dump_stack+0x46/0x65
     dump_header+0x6b/0x2ac
     oom_kill_process+0x21c/0x4a0
     out_of_memory+0x2a5/0x4b0
     mem_cgroup_out_of_memory+0x3b/0x60
     mem_cgroup_oom_synchronize+0x2ed/0x330
     pagefault_out_of_memory+0x24/0x54
     __do_page_fault+0x521/0x540
     page_fault+0x45/0x50

    Task in /test killed as a result of limit of /test
    memory: usage 51200kB, limit 51200kB, failcnt 73
    memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
    kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
    Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
            mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
    active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
    Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
    Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
    oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Wake up flushers in legacy cgroup reclaim too.

Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
Fixes: bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoRevert "mm: page_alloc: skip over regions of invalid pfns where possible"
Daniel Vacek [Thu, 22 Mar 2018 23:17:38 +0000 (16:17 -0700)]
Revert "mm: page_alloc: skip over regions of invalid pfns where possible"

This reverts commit b92df1de5d28 ("mm: page_alloc: skip over regions of
invalid pfns where possible").  The commit is meant to be a boot init
speed up skipping the loop in memmap_init_zone() for invalid pfns.

But given some specific memory mapping on x86_64 (or more generally
theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
implementation also skips valid pfns which is plain wrong and causes
'kernel BUG at mm/page_alloc.c:1389!'

  crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
  kernel BUG at mm/page_alloc.c:1389!
  invalid opcode: 0000 [#1] SMP
  --
  RIP: 0010: move_freepages+0x15e/0x160
  --
  Call Trace:
    move_freepages_block+0x73/0x80
    __rmqueue+0x263/0x460
    get_page_from_freelist+0x7e1/0x9e0
    __alloc_pages_nodemask+0x176/0x420
  --

  crash> page_init_bug -v | grep RAM
  <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
  <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
  <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
  <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
  <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
  <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)

  crash> page_init_bug | head -6
  <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
  <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
  <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
  <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
  <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
  BUG, zones differ!

  crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
        PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
  ffffea0001e00000  78000000                0        0  0 0
  ffffea0001ed7fc0  7b5ff000                0        0  0 0
  ffffea0001ed8000  7b600000                0        0  0 0       <<<<
  ffffea0001ede1c0  7b787000                0        0  0 0
  ffffea0001ede200  7b788000                0        0  1 1fffff00000000

Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx@redhat.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
Kirill A. Shutemov [Thu, 22 Mar 2018 23:17:35 +0000 (16:17 -0700)]
mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()

shmem_unused_huge_shrink() gets called from reclaim path.  Waiting for
page lock may lead to deadlock there.

There was a bug report that may be attributed to this:

  http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.net

Replace lock_page() with trylock_page() and skip the page if we failed
to lock it.  We will get to the page on the next scan.

We can test for the PageTransHuge() outside the page lock as we only
need protection against splitting the page under us.  Holding pin oni
the page is enough for this.

Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel.com
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Eric Wheeler <linux-mm@lists.ewheeler.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/thp: do not wait for lock_page() in deferred_split_scan()
Kirill A. Shutemov [Thu, 22 Mar 2018 23:17:31 +0000 (16:17 -0700)]
mm/thp: do not wait for lock_page() in deferred_split_scan()

deferred_split_scan() gets called from reclaim path.  Waiting for page
lock may lead to deadlock there.

Replace lock_page() with trylock_page() and skip the page if we failed
to lock it.  We will get to the page on the next scan.

Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel.com
Fixes: 9a982250f773 ("thp: introduce deferred_split_huge_page()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/khugepaged.c: convert VM_BUG_ON() to collapse fail
Kirill A. Shutemov [Thu, 22 Mar 2018 23:17:28 +0000 (16:17 -0700)]
mm/khugepaged.c: convert VM_BUG_ON() to collapse fail

khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
mapped.  We do not collapse such pages.  See check
khugepaged_scan_pmd().

But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
somebody managed to instantiate THP in the range and then split the PMD
back to PTEs we would have a problem --
VM_BUG_ON_PAGE(PageCompound(page)) will get triggered.

It's possible since we drop mmap_sem during collapse to re-take for
write.

Replace the VM_BUG_ON() with graceful collapse fail.

Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel.com
Fixes: b1caa957ae6d ("khugepaged: ignore pmd tables with THP mapped with ptes")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agox86/mm: implement free pmd/pte page interfaces
Toshi Kani [Thu, 22 Mar 2018 23:17:24 +0000 (16:17 -0700)]
x86/mm: implement free pmd/pte page interfaces

Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).

The address range associated with the pud/pmd entry must have been
purged by INVLPG.

Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com
Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/vmalloc: add interfaces to free unmapped page table
Toshi Kani [Thu, 22 Mar 2018 23:17:20 +0000 (16:17 -0700)]
mm/vmalloc: add interfaces to free unmapped page table

On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
create pud/pmd mappings.  A kernel panic was observed on arm64 systems
with Cortex-A75 in the following steps as described by Hanjun Guo.

 1. ioremap a 4K size, valid page table will build,
 2. iounmap it, pte0 will set to 0;
 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
    then set the a new value for pmd;
 4. pte0 is leaked;
 5. CPU may meet exception because the old pmd is still in TLB,
    which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.  x86
still has memory leak.

The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:

 - The iounmap() path is shared with vunmap(). Since vmap() only
   supports pte mappings, making vunmap() to free a pte page is an
   overhead for regular vmap users as they do not need a pte page freed
   up.

 - Checking if all entries in a pte page are cleared in the unmap path
   is racy, and serializing this check is expensive.

 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
   Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
   purge.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
clear a given pud/pmd entry and free up a page for the lower level
entries.

This patch implements their stub functions on x86 and arm64, which work
as workaround.

[akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoh8300: remove extraneous __BIG_ENDIAN definition
Arnd Bergmann [Thu, 22 Mar 2018 23:17:17 +0000 (16:17 -0700)]
h8300: remove extraneous __BIG_ENDIAN definition

A bugfix I did earlier caused a build regression on h8300, which defines
the __BIG_ENDIAN macro in a slightly different way than the generic
code:

  arch/h8300/include/asm/byteorder.h:5:0: warning: "__BIG_ENDIAN" redefined

We don't need to define it here, as the same macro is already provided
by the linux/byteorder/big_endian.h, and that version does not conflict.

While this is a v4.16 regression, my earlier patch also got backported
to the 4.14 and 4.15 stable kernels, so we need the fixup there as well.

Link: http://lkml.kernel.org/r/20180313120752.2645129-1-arnd@arndb.de
Fixes: 101110f6271c ("Kbuild: always define endianess in kconfig.h")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agohugetlbfs: check for pgoff value overflow
Mike Kravetz [Thu, 22 Mar 2018 23:17:13 +0000 (16:17 -0700)]
hugetlbfs: check for pgoff value overflow

A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages system
call.  The hugetlbfs mmap routine uses the byte offset to calculate
reservations and file size.

A sequence such as:

  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);

will result in the following when task exits/file closed,

  kernel BUG at mm/hugetlb.c:749!
  Call Trace:
    hugetlbfs_evict_inode+0x2f/0x40
    evict+0xcb/0x190
    __dentry_kill+0xcb/0x150
    __fput+0x164/0x1e0
    task_work_run+0x84/0xa0
    exit_to_usermode_loop+0x7d/0x80
    do_syscall_64+0x18b/0x190
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which
causes the BUG.

The previous overflow fix to this code was incomplete and did not take
the remap_file_pages system call into account.

[mike.kravetz@oracle.com: v3]
Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm@linux-foundation.org: include mmdebug.h]
[akpm@linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nic Losby <blurbdust@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agolockdep: fix fs_reclaim warning
Tetsuo Handa [Thu, 22 Mar 2018 23:17:10 +0000 (16:17 -0700)]
lockdep: fix fs_reclaim warning

Dave Jones reported fs_reclaim lockdep warnings.

  ============================================
  WARNING: possible recursive locking detected
  4.15.0-rc9-backup-debug+ #1 Not tainted
  --------------------------------------------
  sshd/24800 is trying to acquire lock:
   (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30

  but task is already holding lock:
   (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(fs_reclaim);
    lock(fs_reclaim);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  2 locks held by sshd/24800:
   #0:  (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
   #1:  (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30

  stack backtrace:
  CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
  Call Trace:
   dump_stack+0xbc/0x13f
   __lock_acquire+0xa09/0x2040
   lock_acquire+0x12e/0x350
   fs_reclaim_acquire.part.102+0x29/0x30
   kmem_cache_alloc+0x3d/0x2c0
   alloc_extent_state+0xa7/0x410
   __clear_extent_bit+0x3ea/0x570
   try_release_extent_mapping+0x21a/0x260
   __btrfs_releasepage+0xb0/0x1c0
   btrfs_releasepage+0x161/0x170
   try_to_release_page+0x162/0x1c0
   shrink_page_list+0x1d5a/0x2fb0
   shrink_inactive_list+0x451/0x940
   shrink_node_memcg.constprop.88+0x4c9/0x5e0
   shrink_node+0x12d/0x260
   try_to_free_pages+0x418/0xaf0
   __alloc_pages_slowpath+0x976/0x1790
   __alloc_pages_nodemask+0x52c/0x5c0
   new_slab+0x374/0x3f0
   ___slab_alloc.constprop.81+0x47e/0x5a0
   __slab_alloc.constprop.80+0x32/0x60
   __kmalloc_track_caller+0x267/0x310
   __kmalloc_reserve.isra.40+0x29/0x80
   __alloc_skb+0xee/0x390
   sk_stream_alloc_skb+0xb8/0x340
   tcp_sendmsg_locked+0x8e6/0x1d30
   tcp_sendmsg+0x27/0x40
   inet_sendmsg+0xd0/0x310
   sock_write_iter+0x17a/0x240
   __vfs_write+0x2ab/0x380
   vfs_write+0xfb/0x260
   SyS_write+0xb6/0x140
   do_syscall_64+0x1e5/0xc05
   entry_SYSCALL64_slow_path+0x25/0x25

This warning is caused by commit d92a8cfcb37e ("locking/lockdep:
Rework FS_RECLAIM annotation") which replaced the use of
lockdep_{set,clear}_current_reclaim_state() in __perform_reclaim()
and lockdep_trace_alloc() in slab_pre_alloc_hook() with
fs_reclaim_acquire()/ fs_reclaim_release().

Since __kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC |
__GFP_NOWARN to gfp_mask, and all reclaim path simply propagates
__GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() is
trying to grab the 'fake' lock again when __perform_reclaim() already
grabbed the 'fake' lock.

The

  /* this guy won't enter reclaim */
  if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
          return false;

test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock
was added by commit cf40bd16fdad ("lockdep: annotate reclaim context
(__GFP_NOFS)").  But that test is outdated because PF_MEMALLOC thread
won't enter reclaim regardless of __GFP_NOMEMALLOC after commit
341ce06f69ab ("page allocator: calculate the alloc_flags for allocation
only once") added the PF_MEMALLOC safeguard (

  /* Avoid recursion of direct reclaim */
  if (p->flags & PF_MEMALLOC)
          goto nopage;

in __alloc_pages_slowpath()).

Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and
allow __need_fs_reclaim() to return false.

Link: http://lkml.kernel.org/r/201802280650.FJC73911.FOSOMLJVFFQtHO@I-love.SAKURA.ne.jp
Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMAINTAINERS: update Mark Fasheh's e-mail
Mark Fasheh [Thu, 22 Mar 2018 23:17:05 +0000 (16:17 -0700)]
MAINTAINERS: update Mark Fasheh's e-mail

I'd like to use my personal e-mail for Ocfs2 requests and review.

Link: http://lkml.kernel.org/r/20180311231356.9385-1-mfasheh@versity.com
Signed-off-by: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agomm/mempolicy.c: avoid use uninitialized preferred_node
Yisheng Xie [Thu, 22 Mar 2018 23:17:02 +0000 (16:17 -0700)]
mm/mempolicy.c: avoid use uninitialized preferred_node

Alexander reported a use of uninitialized memory in __mpol_equal(),
which is caused by incorrect use of preferred_node.

When mempolicy in mode MPOL_PREFERRED with flags MPOL_F_LOCAL, it uses
numa_node_id() instead of preferred_node, however, __mpol_equal() uses
preferred_node without checking whether it is MPOL_F_LOCAL or not.

[akpm@linux-foundation.org: slight comment tweak]
Link: http://lkml.kernel.org/r/4ebee1c2-57f6-bcb8-0e2d-1833d1ee0bb7@huawei.com
Fixes: fc36b8d3d819 ("mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy")
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agodrm/ast: Fixed 1280x800 Display Issue
Y.C. Chen [Mon, 12 Mar 2018 03:40:23 +0000 (11:40 +0800)]
drm/ast: Fixed 1280x800 Display Issue

The original ast driver cannot display properly if the resolution is 1280x800 and the pixel clock is 83.5MHz.
Here is the update to fix it.

Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
3 years agoMerge tag 'acpi-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 22 Mar 2018 23:20:25 +0000 (16:20 -0700)]
Merge tag 'acpi-4.16-rc7' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These revert one recent commit that added incorrect battery quirks for
  some Asus systems and fix an off-by-one error in the watchdog driver
  based on the ACPI WDAT table.

  Specifics:

   - Revert the recent change adding battery quirks for Asus GL502VSK
     and UX305LA as these quirks turn out to be inadequate and possibly
     premature (Daniel Drake).

   - Fix an off-by-one error in the resource allocation part of the
     watchdog driver based on the ACPI WDAT table (Takashi Iwai)"

* tag 'acpi-4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / watchdog: Fix off-by-one error at resource assignment
  Revert "ACPI / battery: Add quirk for Asus GL502VSK and UX305LA"

3 years agoMerge tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 22 Mar 2018 23:13:49 +0000 (16:13 -0700)]
Merge tag 'modules-for-v4.16-rc7' of git://git./linux/kernel/git/jeyu/linux

Pull modules fix from Jessica Yu:
 "Propagate error in modules_open() to avoid possible later NULL
  dereference if seq_open() had failed"

* tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: propagate error in modules_open()

3 years agoMerge branch 'acpi-wdat'
Rafael J. Wysocki [Thu, 22 Mar 2018 22:42:08 +0000 (23:42 +0100)]
Merge branch 'acpi-wdat'

* acpi-wdat:
  ACPI / watchdog: Fix off-by-one error at resource assignment

3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 22 Mar 2018 21:10:29 +0000 (14:10 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Always validate XFRM esn replay attribute, from Florian Westphal.

 2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long.

 3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul
    Triebitz.

 4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel
    Axtens.

 5) Fix some interrupt handling issues in e1000e driver, from Benjamin
    Poitier.

 6) Use strlcpy() in several ethtool get_strings methods, from Florian
    Fainelli.

 7) Fix rhlist dup insertion, from Paul Blakey.

 8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev.

 9) Fix driver unload crash when link is up in smsc911x, from Jeremy
    Linton.

10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric
    Dumazet.

11) Need to purge the write queue when TCP connections are aborted,
    otherwise userspace using MSG_ZEROCOPY can't close the fd. From
    Soheil Hassas Yeganeh.

12) Fix double free in error path of team driver, from Arkadi
    Sharshevsky.

13) Filter fixes for hv_netvsc driver, from Stephen Hemminger.

14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo
    Bianconi.

15) Properly filter out unsupported feature flags in macvlan driver,
    from Shannon Nelson.

16) Don't request loading the diag module for a protocol if the protocol
    itself is not even registered. From Xin Long.

17) If datagram connect fails in ipv6, make sure the socket state is
    consistent afterwards. From Paolo Abeni.

18) Use after free in qed driver, from Dan Carpenter.

19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the
    entry. From Sabrina Dubroca.

20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins.

21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita.

22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel.

23) Fix NULL derefs in error paths of tcf_*_init(), from Davide Caratti.

24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli.

25) Memory leak in gemini driver, from Igor Pylypiv.

26) IDR leaks in error paths of tcf_*_init() functions, from Davide
    Caratti.

27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun.

28) Missing dev_put() in error path of macsec_newlink(), from Dan
    Carpenter.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits)
  macsec: missing dev_put() on error in macsec_newlink()
  net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
  hv_netvsc: common detach logic
  hv_netvsc: change GPAD teardown order on older versions
  hv_netvsc: use RCU to fix concurrent rx and queue changes
  hv_netvsc: disable NAPI before channel close
  net/ipv6: Handle onlink flag with multipath routes
  ppp: avoid loop in xmit recursion detection code
  ipv6: sr: fix NULL pointer dereference when setting encap source address
  ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
  net: aquantia: driver version bump
  net: aquantia: Implement pci shutdown callback
  net: aquantia: Allow live mac address changes
  net: aquantia: Add tx clean budget and valid budget handling logic
  net: aquantia: Change inefficient wait loop on fw data reads
  net: aquantia: Fix a regression with reset on old firmware
  net: aquantia: Fix hardware reset when SPI may rarely hangup
  s390/qeth: on channel error, reject further cmd requests
  s390/qeth: lock read device while queueing next buffer
  s390/qeth: when thread completes, wake up all waiters
  ...

3 years agoMerge tag 'mmc-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Thu, 22 Mar 2018 20:29:55 +0000 (13:29 -0700)]
Merge tag 'mmc-v4.16-rc4' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "A couple of MMC fixes intended for v4.16-rc7:

  MMC host:

   - dw_mmc: Fix the suspend/resume issue for Exynos5433

   - dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit
     systems

   - dw_mmc: Make PIO mode work when failing with idmac when
     dw_mci_reset occurs

   - sdhci-acpi: Re-allow IRQ 0 to fix broken probe

  MMC core:

   - Update EXT_CSD caches to correctly switch partition for ioctl calls

   - Fix tracepoint print of blk_addr and blksz

   - Disable HPI on broken Micron (Numonyx) eMMC cards"

* tag 'mmc-v4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-acpi: Fix IRQ 0
  mmc: dw_mmc: fix falling from idmac to PIO mode when dw_mci_reset occurs
  mmc: core: Fix tracepoint print of blk_addr and blksz
  mmc: core: Disable HPI for certain Micron (Numonyx) eMMC cards
  mmc: dw_mmc: exynos: fix the suspend/resume issue for exynos5433
  mmc: block: fix updating ext_csd caches on ioctl call
  mmc: dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit systems

3 years agoMerge tag 'drm-misc-fixes-2018-03-22' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Thu, 22 Mar 2018 20:19:56 +0000 (06:19 +1000)]
Merge tag 'drm-misc-fixes-2018-03-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Main change is a patch to reject getfb call for multiplanar framebuffers,
then we have a couple of error path fixes on the sun4i driver. Still on that
driver there is a clk fix and finally a mmap offset fix on the udl driver.

* tag 'drm-misc-fixes-2018-03-22' of git://anongit.freedesktop.org/drm/drm-misc:
  drm: udl: Properly check framebuffer mmap offsets
  drm: Reject getfb for multi-plane framebuffers
  drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
  drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
  drm/sun4i: Fix an error handling path in 'sun4i_drv_bind()'
  drm/sun4i: Fix exclusivity of the TCON clocks

3 years agoMerge tag 'drm-intel-fixes-2018-03-21' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 22 Mar 2018 20:15:44 +0000 (06:15 +1000)]
Merge tag 'drm-intel-fixes-2018-03-21' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

One fix for DP MST and one fix for GPU reset on hang check.

* tag 'drm-intel-fixes-2018-03-21' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915: Specify which engines to reset following semaphore/event lockups
  drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.

3 years agoMerge branch 'vmwgfx-fixes-4.16' of git://people.freedesktop.org/~thomash/linux into...
Dave Airlie [Thu, 22 Mar 2018 20:15:08 +0000 (06:15 +1000)]
Merge branch 'vmwgfx-fixes-4.16' of git://people.freedesktop.org/~thomash/linux into drm-fixes

Two vmwgfx fixes for 4.16. Both cc'd stable.

* 'vmwgfx-fixes-4.16' of git://people.freedesktop.org/~thomash/linux:
  drm/vmwgfx: Fix a destoy-while-held mutex problem.
  drm/vmwgfx: Fix black screen and device errors when running without fbdev

3 years agoMerge tag 'imx-drm-fixes-2018-03-22' of git://git.pengutronix.de/git/pza/linux into...
Dave Airlie [Thu, 22 Mar 2018 20:14:15 +0000 (06:14 +1000)]
Merge tag 'imx-drm-fixes-2018-03-22' of git://git.pengutronix.de/git/pza/linux into drm-fixes

drm/imx: fixes for early vblank event issue, array underflow error

- fix an array underflow error by reordering the range check before the array
  subscript in ipu-prg.
- make some local functions static in ipuv3-plane.
- add a missng header for ipu_planes_assign_pre in ipuv3-plane.
- move arming of the vblank event from atomic_begin to atomic_flush, to avoid
  signalling atomic commit completion to userspace before plane atomic_update
  has finished, due to a race condition that is likely to be hit on i.MX6QP on
  PRE enabled channels.

* tag 'imx-drm-fixes-2018-03-22' of git://git.pengutronix.de/git/pza/linux:
  drm/imx: move arming of the vblank event to atomic_flush
  drm/imx: ipuv3-plane: Include "imx-drm.h" header file
  drm/imx: ipuv3-plane: Make functions static when possible
  gpu: ipu-v3: prg: avoid possible array underflow

3 years agomacsec: missing dev_put() on error in macsec_newlink()
Dan Carpenter [Wed, 21 Mar 2018 08:09:01 +0000 (11:09 +0300)]
macsec: missing dev_put() on error in macsec_newlink()

We moved the dev_hold(real_dev); call earlier in the function but forgot
to update the error paths.

Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mac80211-for-davem-2018-03-21' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Thu, 22 Mar 2018 17:19:10 +0000 (13:19 -0400)]
Merge tag 'mac80211-for-davem-2018-03-21' of git://git./linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two more fixes (in three patches):
 * ath9k_htc doesn't like QoS NDP frames, use regular ones
 * hwsim: set up wmediumd for radios created later
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: Fix functional dsa-loop dependency on FIXED_PHY
Florian Fainelli [Wed, 21 Mar 2018 00:31:10 +0000 (17:31 -0700)]
net: dsa: Fix functional dsa-loop dependency on FIXED_PHY

We have a functional dependency on the FIXED_PHY MDIO bus because we register
fixed PHY devices "the old way" which only works if the code that does this has
had a chance to run before the fixed MDIO bus is probed. Make sure we account
for that and have dsa_loop_bdinfo.o be either built-in or modular depending on
whether CONFIG_FIXED_PHY reflects that too.

Fixes: 98cd1552ea27 ("net: dsa: Mock-up driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'hv_netvsc-fix-races-during-shutdown-and-changes'
David S. Miller [Thu, 22 Mar 2018 16:45:10 +0000 (12:45 -0400)]
Merge branch 'hv_netvsc-fix-races-during-shutdown-and-changes'

Stephen Hemminger says:

====================
hv_netvsc: fix races during shutdown and changes

This set of patches fixes issues identified by Vitaly Kuznetsov and
Mohammed Gamal related to state changes in Hyper-v network driver.

A lot of the issues are because setting up the netvsc device requires
a second step (in work queue) to get all the sub-channels running.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agohv_netvsc: common detach logic
Stephen Hemminger [Tue, 20 Mar 2018 22:03:05 +0000 (15:03 -0700)]
hv_netvsc: common detach logic

Make common function for detaching internals of device
during changes to MTU and RSS. Make sure no more packets
are transmitted and all packets have been received before
doing device teardown.

Change the wait logic to be common and use usleep_range().

Changes transmit enabling logic so that transmit queues are disabled
during the period when lower device is being changed. And enabled
only after sub channels are setup. This avoids issue where it could
be that a packet was being sent while subchannel was not initialized.

Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agohv_netvsc: change GPAD teardown order on older versions
Stephen Hemminger [Tue, 20 Mar 2018 22:03:04 +0000 (15:03 -0700)]
hv_netvsc: change GPAD teardown order on older versions

On older versions of Windows, the host ignores messages after
vmbus channel is closed.

Workaround this by doing what Windows does and send the teardown
before close on older versions of NVSP protocol.

Reported-by: Mohammed Gamal <mgamal@redhat.com>
Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agohv_netvsc: use RCU to fix concurrent rx and queue changes
Stephen Hemminger [Tue, 20 Mar 2018 22:03:03 +0000 (15:03 -0700)]
hv_netvsc: use RCU to fix concurrent rx and queue changes

The receive processing may continue to happen while the
internal network device state is in RCU grace period.
The internal RNDIS structure is associated with the
internal netvsc_device structure; both have the same
RCU lifetime.

Defer freeing all associated parts until after grace
period.

Fixes: 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agohv_netvsc: disable NAPI before channel close
Stephen Hemminger [Tue, 20 Mar 2018 22:03:02 +0000 (15:03 -0700)]
hv_netvsc: disable NAPI before channel close

This makes sure that no CPU is still process packets when
the channel is closed.

Fixes: 76bb5db5c749 ("netvsc: fix use after free on module removal")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/ipv6: Handle onlink flag with multipath routes
David Ahern [Tue, 20 Mar 2018 17:06:59 +0000 (10:06 -0700)]
net/ipv6: Handle onlink flag with multipath routes

For multipath routes the ONLINK flag can be specified per nexthop in
rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
to consider the ONLINK setting coming from rtnh_flags. Each loop over
nexthops the config for the sibling route is initialized to the global
config and then per nexthop settings overlayed. The flag is 'or'ed into
fib6_config to handle the ONLINK flag coming from either rtm_flags or
rtnh_flags.

Fixes: fc1e64e1092f ("net/ipv6: Add support for onlink flag")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoppp: avoid loop in xmit recursion detection code
Guillaume Nault [Tue, 20 Mar 2018 15:49:26 +0000 (16:49 +0100)]
ppp: avoid loop in xmit recursion detection code

We already detect situations where a PPP channel sends packets back to
its upper PPP device. While this is enough to avoid deadlocking on xmit
locks, this doesn't prevent packets from looping between the channel
and the unit.

The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq
before checking for xmit recursion. Therefore, __ppp_xmit_process()
might dequeue a packet from ppp->file.xq and send it on the channel
which, in turn, loops it back on the unit. Then ppp_start_xmit()
queues the packet back to ppp->file.xq and __ppp_xmit_process() picks
it up and sends it again through the channel. Therefore, the packet
will loop between __ppp_xmit_process() and ppp_start_xmit() until some
other part of the xmit path drops it.

For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops
the packet after a few iterations. But PPTP reallocates the headroom
if necessary, letting the loop run and exhaust the machine resources
(as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109).

Fix this by letting __ppp_xmit_process() enqueue the skb to
ppp->file.xq, so that we can check for recursion before adding it to
the queue. Now ppp_xmit_process() can drop the packet when recursion is
detected.

__ppp_channel_push() is a bit special. It calls __ppp_xmit_process()
without having any actual packet to send. This is used by
ppp_output_wakeup() to re-enable transmission on the parent unit (for
implementations like ppp_async.c, where the .start_xmit() function
might not consume the skb, leaving it in ppp->xmit_pending and
disabling transmission).
Therefore, __ppp_xmit_process() needs to handle the case where skb is
NULL, dequeuing as many packets as possible from ppp->file.xq.

Reported-by: xu heng <xuheng333@zoho.com>
Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: sr: fix NULL pointer dereference when setting encap source address
David Lebrun [Tue, 20 Mar 2018 14:44:56 +0000 (14:44 +0000)]
ipv6: sr: fix NULL pointer dereference when setting encap source address

When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
source address of the outer IPv6 header, in case none was specified.
Using skb->dev can lead to BUG() when it is in an inconsistent state.
This patch uses the net_device attached to the skb's dst instead.

[940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
[940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
[940807.815725] PGD 0 P4D 0
[940807.847173] Oops: 0000 [#1] SMP PTI
[940807.890073] Modules linked in:
[940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
[940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
[940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
[940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
[940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
[940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
[940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
[940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
[940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
[940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
[940808.939634] Call Trace:
[940808.970041]  <IRQ>
[940808.995250]  ? ip6t_do_table+0x265/0x640
[940809.043341]  seg6_do_srh_encap+0x28f/0x300
[940809.093516]  ? seg6_do_srh+0x1a0/0x210
[940809.139528]  seg6_do_srh+0x1a0/0x210
[940809.183462]  seg6_output+0x28/0x1e0
[940809.226358]  lwtunnel_output+0x3f/0x70
[940809.272370]  ip6_xmit+0x2b8/0x530
[940809.313185]  ? ac6_proc_exit+0x20/0x20
[940809.359197]  inet6_csk_xmit+0x7d/0xc0
[940809.404173]  tcp_transmit_skb+0x548/0x9a0
[940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
[940809.506603]  ? ip6_default_advmss+0x40/0x40
[940809.557824]  ? tcp_current_mss+0x24/0x90
[940809.605925]  tcp_retransmit_skb+0xd/0x80
[940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
[940809.719797]  tcp_ack+0xa47/0x1110
[940809.760612]  tcp_rcv_established+0x13c/0x570
[940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
[940809.858879]  tcp_v6_rcv+0xa5c/0xb10
[940809.901770]  ? seg6_output+0xdd/0x1e0
[940809.946745]  ip6_input_finish+0xbb/0x460
[940809.994837]  ip6_input+0x74/0x80
[940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
[940810.081663]  ipv6_rcv+0x31c/0x4c0
...

Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Reported-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
David Lebrun [Tue, 20 Mar 2018 14:44:55 +0000 (14:44 +0000)]
ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state

The seg6_build_state() function is called with RCU read lock held,
so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.

[   92.770271] =============================
[   92.770628] WARNING: suspicious RCU usage
[   92.770921] 4.16.0-rc4+ #12 Not tainted
[   92.771277] -----------------------------
[   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[   92.772279]
[   92.772279] other info that might help us debug this:
[   92.772279]
[   92.773067]
[   92.773067] rcu_scheduler_active = 2, debug_locks = 1
[   92.773514] 2 locks held by ip/2413:
[   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
[   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
[   92.775065]
[   92.775065] stack backtrace:
[   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
[   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
[   92.776608] Call Trace:
[   92.776852]  dump_stack+0x7d/0xbc
[   92.777130]  __schedule+0x133/0xf00
[   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
[   92.777783]  ? __sched_text_start+0x8/0x8
[   92.778073]  ? rcu_is_watching+0x19/0x30
[   92.778383]  ? kernel_text_address+0x49/0x60
[   92.778800]  ? __kernel_text_address+0x9/0x30
[   92.779241]  ? unwind_get_return_address+0x29/0x40
[   92.779727]  ? pcpu_alloc+0x102/0x8f0
[   92.780101]  _cond_resched+0x23/0x50
[   92.780459]  __mutex_lock+0xbd/0xad0
[   92.780818]  ? pcpu_alloc+0x102/0x8f0
[   92.781194]  ? seg6_build_state+0x11d/0x240
[   92.781611]  ? save_stack+0x9b/0xb0
[   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
[   92.782480]  ? seg6_build_state+0x11d/0x240
[   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
[   92.783393]  ? ip6_route_info_create+0x687/0x1640
[   92.783846]  ? ip6_route_add+0x74/0x110
[   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0

Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'aquantia-fixes'
David S. Miller [Thu, 22 Mar 2018 16:02:50 +0000 (12:02 -0400)]
Merge branch 'aquantia-fixes'

Igor Russkikh says:

====================
Aquantia atlantic hot fixes 03-2018

This is a set of atlantic driver hot fixes for various areas:

Some issues with hardware reset covered,
Fixed napi_poll flood happening on some traffic conditions,
Allow system to change MAC address on live device,
Add pci shutdown handler.

patch v2:
- reverse christmas tree
- remove driver private parameter, replacing it with define.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: driver version bump
Igor Russkikh [Tue, 20 Mar 2018 11:40:37 +0000 (14:40 +0300)]
net: aquantia: driver version bump

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: Implement pci shutdown callback
Igor Russkikh [Tue, 20 Mar 2018 11:40:36 +0000 (14:40 +0300)]
net: aquantia: Implement pci shutdown callback

We should close link and all NIC operations during shutdown.
On some systems graceful reboot never closes NIC interface on its own,
but only indicates pci device shutdown. Without explicit handler, NIC
rx rings continued to transfer DMA data into prepared buffers while CPU
rebooted already. That caused memory corruptions on soft reboot.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: Allow live mac address changes
Igor Russkikh [Tue, 20 Mar 2018 11:40:35 +0000 (14:40 +0300)]
net: aquantia: Allow live mac address changes

There is nothing prevents us from changing MAC on the running interface.
Allow this with ndev priv flag.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: Add tx clean budget and valid budget handling logic
Igor Russkikh [Tue, 20 Mar 2018 11:40:34 +0000 (14:40 +0300)]
net: aquantia: Add tx clean budget and valid budget handling logic

We should report to napi full budget only when we have more job to do.
Before this fix, on any tx queue cleanup we forced napi to do poll again.
Thats a waste of cpu resources and caused storming with napi polls when
there was at least one tx on each interrupt.

With this fix we report full budget only when there is more job on TX
to do. Or, as before, when rx budget was fully consumed.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: Change inefficient wait loop on fw data reads
Igor Russkikh [Tue, 20 Mar 2018 11:40:33 +0000 (14:40 +0300)]
net: aquantia: Change inefficient wait loop on fw data reads

B1 hardware changes behavior of mailbox interface, it has busy bit
always raised. Data ready condition should be detected by increment
of address register.

Old code has empty `for` loop, and that caused cpu overloads on B1
hardware. aq_nic_service_timer_cb consumed ~100ms because of that.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: Fix a regression with reset on old firmware
Igor Russkikh [Tue, 20 Mar 2018 11:40:32 +0000 (14:40 +0300)]
net: aquantia: Fix a regression with reset on old firmware

FW 1.5.58 and below needs a fixed delay even after 0x18 register
is filled. Otherwise, setting MPI_INIT state too fast causes
traffic hang.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: aquantia: Fix hardware reset when SPI may rarely hangup
Igor Russkikh [Tue, 20 Mar 2018 11:40:31 +0000 (14:40 +0300)]
net: aquantia: Fix hardware reset when SPI may rarely hangup

Under some circumstances (notably using thunderbolt interface) SPI
on chip reset may be in active transaction.
Here we forcibly cleanup SPI to prevent possible hangups.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 's390-qeth-fixes'
David S. Miller [Thu, 22 Mar 2018 15:52:31 +0000 (11:52 -0400)]
Merge branch 's390-qeth-fixes'

Julian Wiedmann says:

====================
s390/qeth: fixes 2018-03-20

Please apply one final set of qeth patches for 4.16.
All of these fix long-standing bugs, so please queue them up for -stable
as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agos390/qeth: on channel error, reject further cmd requests
Julian Wiedmann [Tue, 20 Mar 2018 06:59:15 +0000 (07:59 +0100)]
s390/qeth: on channel error, reject further cmd requests

When the IRQ handler determines that one of the cmd IO channels has
failed and schedules recovery, block any further cmd requests from
being submitted. The request would inevitably stall, and prevent the
recovery from making progress until the request times out.

This sort of error was observed after Live Guest Relocation, where
the pending IO on the READ channel intentionally gets terminated to
kick-start recovery. Simultaneously the guest executed SIOCETHTOOL,
triggering qeth to issue a QUERY CARD INFO command. The command
then stalled in the inoperabel WRITE channel.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agos390/qeth: lock read device while queueing next buffer
Julian Wiedmann [Tue, 20 Mar 2018 06:59:14 +0000 (07:59 +0100)]
s390/qeth: lock read device while queueing next buffer

For calling ccw_device_start(), issue_next_read() needs to hold the
device's ccwlock.
This is satisfied for the IRQ handler path (where qeth_irq() gets called
under the ccwlock), but we need explicit locking for the initial call by
the MPC initialization.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agos390/qeth: when thread completes, wake up all waiters
Julian Wiedmann [Tue, 20 Mar 2018 06:59:13 +0000 (07:59 +0100)]
s390/qeth: when thread completes, wake up all waiters

qeth_wait_for_threads() is potentially called by multiple users, make
sure to notify all of them after qeth_clear_thread_running_bit()
adjusted the thread_running_mask. With no timeout, callers would
otherwise stall.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agos390/qeth: free netdevice when removing a card
Julian Wiedmann [Tue, 20 Mar 2018 06:59:12 +0000 (07:59 +0100)]
s390/qeth: free netdevice when removing a card

On removal, a qeth card's netdevice is currently not properly freed
because the call chain looks as follows:

qeth_core_remove_device(card)
lx_remove_device(card)
unregister_netdev(card->dev)
card->dev = NULL !!!
qeth_core_free_card(card)
if (card->dev) !!!
free_netdev(card->dev)

Fix it by free'ing the netdev straight after unregistering. This also
fixes the sysfs-driven layer switch case (qeth_dev_layer2_store()),
where the need to free the current netdevice was not considered at all.

Note that free_netdev() takes care of the netif_napi_del() for us too.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-phy-Add-general-dummy-stubs-for-MMD-register-access'
David S. Miller [Thu, 22 Mar 2018 15:41:08 +0000 (11:41 -0400)]
Merge branch 'net-phy-Add-general-dummy-stubs-for-MMD-register-access'

Kevin Hao says:

====================
net: phy: Add general dummy stubs for MMD register access

v2:
As suggested by Andrew:
  - Add general dummy stubs
  - Also use that for the micrel phy

This patch series fix the Ethernet broken on the mpc8315erdb board introduced
by commit b6b5e8a69118 ("gianfar: Disable EEE autoneg by default").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: micrel: Use the general dummy stubs for MMD register access
Kevin Hao [Tue, 20 Mar 2018 01:44:54 +0000 (09:44 +0800)]
net: phy: micrel: Use the general dummy stubs for MMD register access

The new general dummy stubs for MMD register access were introduced.
Use that for the codes reuse.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b
Kevin Hao [Tue, 20 Mar 2018 01:44:53 +0000 (09:44 +0800)]
net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b

The Ethernet on mpc8315erdb is broken since commit b6b5e8a69118
("gianfar: Disable EEE autoneg by default"). The reason is that
even though the rtl8211b doesn't support the MMD extended registers
access, it does return some random values if we trying to access
the MMD register via indirect method. This makes it seem that the
EEE is supported by this phy device. And the subsequent writing to
the MMD registers does cause the phy malfunction. So use the dummy
stubs for the MMD register access to fix this issue.

Fixes: b6b5e8a69118 ("gianfar: Disable EEE autoneg by default")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: Add general dummy stubs for MMD register access
Kevin Hao [Tue, 20 Mar 2018 01:44:52 +0000 (09:44 +0800)]
net: phy: Add general dummy stubs for MMD register access

For some phy devices, even though they don't support the MMD extended
register access, it does have some side effect if we are trying to
read/write the MMD registers via indirect method. So introduce general
dummy stubs for MMD register access which these devices can use to avoid
such side effect.

Fixes: b6b5e8a69118 ("gianfar: Disable EEE autoneg by default")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'batadv-net-for-davem-20180319' of git://git.open-mesh.org/linux-merge
David S. Miller [Thu, 22 Mar 2018 15:26:13 +0000 (11:26 -0400)]
Merge tag 'batadv-net-for-davem-20180319' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - fix possible IPv6 packet loss when multicast extension is used, by Linus Luessing

 - fix SKB handling issues for TTVN and DAT, by Matthias Schiffer (two patches)

 - fix include for eventpoll, by Sven Eckelmann

 - fix skb checksum for ttvn reroutes, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetfilter: nf_tables: do not hold reference on netdevice from preparation phase
Pablo Neira Ayuso [Tue, 20 Mar 2018 16:00:19 +0000 (17:00 +0100)]
netfilter: nf_tables: do not hold reference on netdevice from preparation phase

The netfilter netdevice event handler hold the nfnl_lock mutex, this
avoids races with a device going away while such device is being
attached to hooks from the netlink control plane. Therefore, either
control plane bails out with ENOENT or netdevice event path waits until
the hook that is attached to net_device is registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nf_tables: cache device name in flowtable object
Pablo Neira Ayuso [Wed, 21 Mar 2018 12:55:42 +0000 (13:55 +0100)]
netfilter: nf_tables: cache device name in flowtable object

Devices going away have to grab the nfnl_lock from the netdev event path
to avoid races with control plane updates.

However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache
the device name into the objects to avoid an use-after-free situation
for a device that is going away.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: drop template ct when conntrack is skipped.
Paolo Abeni [Thu, 22 Mar 2018 10:08:50 +0000 (11:08 +0100)]
netfilter: drop template ct when conntrack is skipped.

The ipv4 nf_ct code currently skips the nf_conntrak_in() call
for fragmented packets. As a results later matches/target can end
up manipulating template ct entry instead of 'real' ones.

Exploiting the above, syzbot found a way to trigger the following
splat:

WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55
xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  panic+0x1e4/0x41c kernel/panic.c:183
  __warn+0x1dc/0x200 kernel/panic.c:547
  report_bug+0x211/0x2d0 lib/bug.c:184
  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
  fixup_bug arch/x86/kernel/traps.c:247 [inline]
  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957
RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline]
RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127
RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293
RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1
RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a
RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18
R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478
  ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296
  iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41
  nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
  nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
  nf_hook include/linux/netfilter.h:243 [inline]
  NF_HOOK include/linux/netfilter.h:286 [inline]
  raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432
  raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669
  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:639
  SYSC_sendto+0x361/0x5c0 net/socket.c:1748
  SyS_sendto+0x40/0x50 net/socket.c:1716
  do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x441b49
RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49
RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470
R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

Instead of adding checks for template ct on every target/match
manipulating skb->_nfct, simply drop the template ct when skipping
nf_conntrack_in().

Fixes: 7b4fdf77a450ec ("netfilter: don't track fragmented packets")
Reported-and-tested-by: syzbot+0346441ae0545cfcea3a@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agodrm: udl: Properly check framebuffer mmap offsets
Greg Kroah-Hartman [Wed, 21 Mar 2018 15:45:53 +0000 (16:45 +0100)]
drm: udl: Properly check framebuffer mmap offsets

The memmap options sent to the udl framebuffer driver were not being
checked for all sets of possible crazy values.  Fix this up by properly
bounding the allowed values.

Reported-by: Eyal Itkin <eyalit@checkpoint.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180321154553.GA18454@kroah.com
3 years agoMerge branch 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Wed, 21 Mar 2018 22:52:21 +0000 (08:52 +1000)]
Merge branch 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

A few more fixes for 4.16.  Mostly for displays:
- A fix for DP handling on radeon
- Fix banding on eDP panels
- Fix HBR audio
- Fix for disabling VGA mode on Raven that leads to a corrupt or
  blank display on some platforms

* 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux:
  drm/amd/display: Add one to EDID's audio channel count when passing to DC
  drm/amd/display: We shouldn't set format_default on plane as atomic driver
  drm/amd/display: Fix FMT truncation programming
  drm/amd/display: Allow truncation to 10 bits
  drm/amd/display: fix dereferencing possible ERR_PTR()
  drm/amd/display: Refine disable VGA
  drm/amdgpu: Use atomic function to disable crtcs with dc enabled
  drm/radeon: Don't turn off DP sink when disconnected

3 years agoMerge branch 'net-sched-action-idr-leak'
David S. Miller [Wed, 21 Mar 2018 22:12:46 +0000 (18:12 -0400)]
Merge branch 'net-sched-action-idr-leak'

Davide Caratti says:

====================
fix idr leak in actions

This series fixes situations where a temporary failure to install a TC
action results in the permanent impossibility to reuse the configured
value of 'index'.

Thanks to Cong Wang for the initial review.

v2: fix build error in act_ipt.c, reported by kbuild test robot
====================

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: fix idr leak in the error path of tcf_skbmod_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:28 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_skbmod_init()

tcf_skbmod_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure skbmod rules
using the same idr value will systematically fail with -ENOSPC, unless
the first attempt was done using the 'replace' keyword:

 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
on the error path when the idr has been reserved, but not yet inserted.
Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
implicitly become a 'delete' that leaks refcount in act_skbmod module:

 # rmmod act_skbmod; modprobe act_skbmod
 # tc action add action skbmod swap mac index 100
 # tc action add action skbmod swap mac continue index 100
 RTNETLINK answers: File exists
 We have an error talking to the kernel
 # tc action replace action skbmod swap mac continue index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action list action skbmod
 #
 # rmmod  act_skbmod
 rmmod: ERROR: Module act_skbmod is in use

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: fix idr leak in the error path of tcf_vlan_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:27 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_vlan_init()

tcf_vlan_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure vlan rules using
the same idr value will systematically fail with -ENOSPC, unless the first
attempt was done using the 'replace' keyword.

 # tc action add action vlan pop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action vlan pop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action vlan pop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
the error path when the idr has been reserved, but not yet inserted. Also,
don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
become a 'delete' that leaks refcount in act_vlan module:

 # rmmod act_vlan; modprobe act_vlan
 # tc action add action vlan push id 5 index 100
 # tc action replace action vlan push id 7 index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action list action vlan
 #
 # rmmod act_vlan
 rmmod: ERROR: Module act_vlan is in use

Fixes: 4c5b9d9642c8 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: fix idr leak in the error path of __tcf_ipt_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:26 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of __tcf_ipt_init()

__tcf_ipt_init() can fail after the idr has been successfully reserved.
When this happens, subsequent attempts to configure xt/ipt rules using
the same idr value systematically fail with -ENOSPC:

 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
         target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".
 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
         target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".
 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
         target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
to avoid NULL pointer dereference.

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: fix idr leak in the error path of tcp_pedit_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:25 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcp_pedit_init()

tcf_pedit_init() can fail to allocate 'keys' after the idr has been
successfully reserved. When this happens, subsequent attempts to configure
a pedit rule using the same idr value systematically fail with -ENOSPC:

 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_act_pedit_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolibnvdimm, nfit: fix persistence domain reporting
Dan Williams [Wed, 21 Mar 2018 22:12:07 +0000 (15:12 -0700)]
libnvdimm, nfit: fix persistence domain reporting

The persistence domain is a point in the platform where once writes
reach that destination the platform claims it will make them persistent
relative to power loss. In the ACPI NFIT this is currently communicated
as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
Durability on Power Loss Capable".

Commit 96c3a239054a "libnvdimm: expose platform persistence attr..."
shows the persistence domain as flags, but it's really an enumerated
hierarchy.

Fix this newly introduced user ABI to show the closest available
persistence domain before userspace develops dependencies on seeing, or
needing to develop code to tolerate, the raw NFIT flags communicated
through the libnvdimm-generic region attribute.

Fixes: 96c3a239054a ("libnvdimm: expose platform persistence attr...")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
3 years agonet/sched: fix idr leak in the error path of tcf_act_police_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:24 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_act_police_init()

tcf_act_police_init() can fail after the idr has been successfully
reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
subsequent attempts to configure a police rule using the same idr value
systematiclly fail with -ENOSPC:

 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 ...

Fix this in the error path of tcf_act_police_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: fix idr leak in the error path of tcf_simp_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:23 +0000 (15:31 +0100)]
net/sched: fix idr leak in the error path of tcf_simp_init()

if the kernel fails to duplicate 'sdata', creation of a new action fails
with -ENOMEM. However, subsequent attempts to install the same action
using the same value of 'index' systematically fail with -ENOSPC, and
that value of 'index' will no more be usable by act_simple, until rmmod /
insmod of act_simple.ko is done:

 # tc actions add action simple sdata hello index 100
 # tc actions list action simple

        action order 0: Simple <hello>
         index 100 ref 1 bind 0
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup().

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/sched: fix idr leak on the error path of tcf_bpf_init()
Davide Caratti [Mon, 19 Mar 2018 14:31:22 +0000 (15:31 +0100)]
net/sched: fix idr leak on the error path of tcf_bpf_init()

when the following command sequence is entered

 # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
 RTNETLINK answers: Invalid argument
 We have an error talking to the kernel
 # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

act_bpf correctly refuses to install the first TC rule, because 31 is not
a valid instruction. However, it refuses to install the second TC rule,
even if the BPF code is correct. Furthermore, it's no more possible to
install any other rule having the same value of 'index' until act_bpf
module is unloaded/inserted again. After the idr has been reserved, call
tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>