sfrench/cifs-2.6.git
5 years agoMerge tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Greg Kroah-Hartman [Thu, 4 Oct 2018 20:24:38 +0000 (13:24 -0700)]
Merge tag 'ovl-fixes-4.19-rc7' of git://git./linux/kernel/git/mszeredi/vfs

Miklos writes:
  "overlayfs fixes for 4.19-rc7

   This update fixes a couple of regressions in the stacked file update
   added in this cycle, as well as some older bugs uncovered by
   syzkaller.

   There's also one trivial naming change that touches other parts of
   the fs subsystem."

* tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix format of setxattr debug
  ovl: fix access beyond unterminated strings
  ovl: make symbol 'ovl_aops' static
  vfs: swap names of {do,vfs}_clone_file_range()
  ovl: fix freeze protection bypass in ovl_clone_file_range()
  ovl: fix freeze protection bypass in ovl_write_iter()
  ovl: fix memory leak on unlink of indexed file

5 years agoMerge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Greg Kroah-Hartman [Thu, 4 Oct 2018 16:48:10 +0000 (09:48 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Russell writes:
  "A couple of small ARM fixes from Stefan and Thomas:
   - Adding the io_pgetevents syscall
   - Fixing a bounds check in pci_ioremap_io()"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8799/1: mm: fix pci_ioremap_io() offset check
  ARM: 8787/1: wire up io_pgetevents syscall

5 years agoMerge tag 'drm-fixes-2018-10-04' of git://anongit.freedesktop.org/drm/drm
Greg Kroah-Hartman [Thu, 4 Oct 2018 16:18:44 +0000 (09:18 -0700)]
Merge tag 'drm-fixes-2018-10-04' of git://anongit.freedesktop.org/drm/drm

Dave writes:
  "drm exynos, tda9950 and intel fixes

   3 i915 fixes:
     compressed error handling zlib fix
     compiler warning cleanup
     and a minor code cleanup

   2 tda9950:
     Two fixes for the HDMI CEC

   1 exynos:
     A fix required for IOMMU interaction."

* tag 'drm-fixes-2018-10-04' of git://anongit.freedesktop.org/drm/drm:
  drm/i915: Handle incomplete Z_FINISH for compressed error states
  drm/i915: Avoid compiler warning for maybe unused gu_misc_iir
  drm/i915: Do not redefine the has_csr parameter.
  drm/exynos: Use selected dma_dev default iommu domain instead of a fake one
  drm/i2c: tda9950: set MAX_RETRIES for errors only
  drm/i2c: tda9950: fix timeout counter check

5 years agoMerge tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Greg Kroah-Hartman [Thu, 4 Oct 2018 16:17:38 +0000 (09:17 -0700)]
Merge tag 'xfs-fixes-for-4.19-rc6' of git://git./fs/xfs/xfs-linux

Dave writes:
  "XFS fixes for 4.19-rc6

   Accumlated regression and bug fixes for 4.19-rc6, including:

   o make iomap correctly mark dirty pages for sub-page block sizes
   o fix regression in handling extent-to-btree format conversion errors
   o fix torn log wrap detection for new logs
   o various corrupt inode detection fixes
   o various delalloc state fixes
   o cleanup all the missed transaction cancel cases missed from changes merged
     in 4.19-rc1
   o fix lockdep false positive on transaction allocation
   o fix locking and reference counting on buffer log items"

* tag 'xfs-fixes-for-4.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix error handling in xfs_bmap_extents_to_btree
  iomap: set page dirty after partial delalloc on mkwrite
  xfs: remove invalid log recovery first/last cycle check
  xfs: validate inode di_forkoff
  xfs: skip delalloc COW blocks in xfs_reflink_end_cow
  xfs: don't treat unknown di_flags2 as corruption in scrub
  xfs: remove duplicated include from alloc.c
  xfs: don't bring in extents in xfs_bmap_punch_delalloc_range
  xfs: fix transaction leak in xfs_reflink_allocate_cow()
  xfs: avoid lockdep false positives in xfs_trans_alloc
  xfs: refactor xfs_buf_log_item reference count handling
  xfs: clean up xfs_trans_brelse()
  xfs: don't unlock invalidated buf on aborted tx commit
  xfs: remove last of unnecessary xfs_defer_cancel() callers
  xfs: don't crash the vfs on a garbage inline symlink

5 years agoMerge tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Thu, 4 Oct 2018 16:16:11 +0000 (09:16 -0700)]
Merge tag 'riscv-for-linus-4.19-rc7' of git://git./linux/kernel/git/palmer/riscv-linux

Palmer writes:
  "A Single RISC-V Fix for 4.19-rc7

   This tag contains a single patch that managed to get lost in the
   shuffle, which explains why it's so late.  This single line has been
   floating around in various patch sets for months, and fixes our DMA32
   region."

* tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISCV: Fix end PFN for low memory

5 years agoovl: fix format of setxattr debug
Miklos Szeredi [Thu, 4 Oct 2018 12:49:10 +0000 (14:49 +0200)]
ovl: fix format of setxattr debug

Format has a typo: it was meant to be "%.*s", not "%*s".  But at some point
callers grew nonprintable values as well, so use "%*pE" instead with a
maximized length.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
Cc: <stable@vger.kernel.org> # v4.12
5 years agoovl: fix access beyond unterminated strings
Amir Goldstein [Fri, 28 Sep 2018 18:00:48 +0000 (21:00 +0300)]
ovl: fix access beyond unterminated strings

KASAN detected slab-out-of-bounds access in printk from overlayfs,
because string format used %*s instead of %.*s.

> BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604
> Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811
>
> CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36
...
>  printk+0xa7/0xcf kernel/printk/printk.c:1996
>  ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689

Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin")
Cc: <stable@vger.kernel.org> # v4.13
5 years agoMerge branch 'drm-tda9950-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm into...
Dave Airlie [Thu, 4 Oct 2018 00:28:27 +0000 (10:28 +1000)]
Merge branch 'drm-tda9950-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-fixes

two tda9950 fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Russell King <rmk@armlinux.org.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001162948.GA9508@rmk-PC.armlinux.org.uk
5 years agoMerge tag 'drm-intel-fixes-2018-10-03' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Thu, 4 Oct 2018 00:04:38 +0000 (10:04 +1000)]
Merge tag 'drm-intel-fixes-2018-10-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

There's one fix for our zlib incomlete Z_FINISH on our error state handling,
plus a compilation warning fix and a tiny code clean up.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181003202840.GA23560@intel.com
5 years agoMerge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net
Greg Kroah-Hartman [Wed, 3 Oct 2018 23:09:11 +0000 (16:09 -0700)]
Merge gitolite./pub/scm/linux/kernel/git/davem/net

David writes:
  "Networking fixes:
   1) Prefix length validation in xfrm layer, from Steffen Klassert.

   2) TX status reporting fix in mac80211, from Andrei Otcheretianski.

   3) Fix hangs due to TX_DROP in mac80211, from Bob Copeland.

   4) Fix DMA error regression in b43, from Larry Finger.

   5) Add input validation to xenvif_set_hash_mapping(), from Jan Beulich.

   6) SMMU unmapping fix in hns driver, from Yunsheng Lin.

   7) Bluetooh crash in unpairing on SMP, from Matias Karhumaa.

   8) WoL handling fixes in the phy layer, from Heiner Kallweit.

   9) Fix deadlock in bonding, from Mahesh Bandewar.

   10) Fill ttl inherit infor in vxlan driver, from Hangbin Liu.

   11) Fix TX timeouts during netpoll, from Michael Chan.

   12) RXRPC layer fixes from David Howells.

   13) Another batch of ndo_poll_controller() removals to deal with
       excessive resource consumption during load.  From Eric Dumazet.

   14) Fix a specific TIPC failure secnario, from LUU Duc Canh.

   15) Really disable clocks in r8169 during suspend so that low
       power states can actually be reached.

   16) Fix SYN backlog lockdep issue in tcp and dccp, from Eric Dumazet.

   17) Fix RCU locking in netpoll SKB send, which shows up in bonding,
       from Dave Jones.

   18) Fix TX stalls in r8169, from Heiner Kallweit.

   19) Fix locksup in nfp due to control message storms, from Jakub
       Kicinski.

   20) Various rmnet bug fixes from Subash Abhinov Kasiviswanathan and
       Sean Tranchetti.

   21) Fix use after free in ip_cmsg_recv_dstaddr(), from Eric Dumazet."

* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (122 commits)
  ixgbe: check return value of napi_complete_done()
  sctp: fix fall-through annotation
  r8169: always autoneg on resume
  ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()
  net: qualcomm: rmnet: Fix incorrect allocation flag in receive path
  net: qualcomm: rmnet: Fix incorrect allocation flag in transmit
  net: qualcomm: rmnet: Skip processing loopback packets
  net: systemport: Fix wake-up interrupt race during resume
  rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096
  bonding: fix warning message
  inet: make sure to grab rcu_read_lock before using ireq->ireq_opt
  nfp: avoid soft lockups under control message storm
  declance: Fix continuation with the adapter identification message
  net: fec: fix rare tx timeout
  r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO
  tun: napi flags belong to tfile
  tun: initialize napi_mutex unconditionally
  tun: remove unused parameters
  bond: take rcu lock in netpoll_send_skb_on_dev
  rtnetlink: Fail dump if target netnsid is invalid
  ...

5 years agoixgbe: check return value of napi_complete_done()
Song Liu [Wed, 3 Oct 2018 18:30:35 +0000 (11:30 -0700)]
ixgbe: check return value of napi_complete_done()

The NIC driver should only enable interrupts when napi_complete_done()
returns true. This patch adds the check for ixgbe.

Cc: stable@vger.kernel.org # 4.10+
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'linux-kselftest-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Wed, 3 Oct 2018 18:06:49 +0000 (11:06 -0700)]
Merge tag 'linux-kselftest-4.19-rc7' of git://git./linux/kernel/git/shuah/linux-kselftest

Shuah writes:
  "kselftest fixes for 4.19-rc7

   This fixes update for 4.19-rc7 consists one fix to rseq test to
   prevent it from seg-faulting when compiled with -fpie."

* tag 'linux-kselftest-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  rseq/selftests: fix parametrized test with -fpie

5 years agosctp: fix fall-through annotation
Gustavo A. R. Silva [Wed, 3 Oct 2018 10:45:56 +0000 (12:45 +0200)]
sctp: fix fall-through annotation

Replace "fallthru" with a proper "fall through" annotation.

This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodrm/i915: Handle incomplete Z_FINISH for compressed error states
Chris Wilson [Wed, 3 Oct 2018 08:24:22 +0000 (09:24 +0100)]
drm/i915: Handle incomplete Z_FINISH for compressed error states

The final call to zlib_deflate(Z_FINISH) may require more output
space to be allocated and so needs to re-invoked. Failure to do so in
the current code leads to incomplete zlib streams (albeit intact due to
the use of Z_SYNC_FLUSH) resulting in the occasional short object
capture.

v2: Check against overrunning our pre-allocated page array
v3: Drop Z_SYNC_FLUSH entirely

Testcase: igt/i915-error-capture.js
Fixes: 0a97015d45ee ("drm/i915: Compress GPU objects in error state")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181003082422.23214-1-chris@chris-wilson.co.uk
(cherry picked from commit 83bc0f5b432f60394466deef16fc753e27371d0b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 years agoMerge tag 'media/v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Greg Kroah-Hartman [Wed, 3 Oct 2018 11:23:46 +0000 (04:23 -0700)]
Merge tag 'media/v4.19-3' of git://git./linux/kernel/git/mchehab/linux-media

Mauro writes:
  "media fixes for v4.19-rc6"

* tag 'media/v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: v4l: event: Prevent freeing event subscriptions while accessed

5 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Greg Kroah-Hartman [Wed, 3 Oct 2018 11:22:30 +0000 (04:22 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Jiri writes:
  "HID fixes:
   - hantick touchpad fix from Anisse Astier
   - device ID addition for Ice Lake mobile from Srinivas Pandruvada
   - touchscreen resume fix for certain i2c-hid driven devices from Hans
     de Goede"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: intel-ish-hid: Enable Ice Lake mobile
  HID: i2c-hid: Remove RESEND_REPORT_DESCR quirk and its handling
  HID: i2c-hid: disable runtime PM operations on hantick touchpad

5 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Greg Kroah-Hartman [Wed, 3 Oct 2018 11:21:23 +0000 (04:21 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs

Al writes:
  "xattrs regression fix from Andreas; sat in -next for quite a while."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sysfs: Do not return POSIX ACL xattrs via listxattr

5 years agomedia: v4l: event: Prevent freeing event subscriptions while accessed
Sakari Ailus [Tue, 11 Sep 2018 09:32:37 +0000 (05:32 -0400)]
media: v4l: event: Prevent freeing event subscriptions while accessed

The event subscriptions are added to the subscribed event list while
holding a spinlock, but that lock is subsequently released while still
accessing the subscription object. This makes it possible to unsubscribe
the event --- and freeing the subscription object's memory --- while
the subscription object is simultaneously accessed.

Prevent this by adding a mutex to serialise the event subscription and
unsubscription. This also gives a guarantee to the callback ops that the
add op has returned before the del op is called.

This change also results in making the elems field less special:
subscriptions are only added to the event list once they are fully
initialised.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: stable@vger.kernel.org # for 4.14 and up
Fixes: c3b5b0241f62 ("V4L/DVB: V4L: Events: Add backend")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
5 years agoMerge tag 'exynos-drm-fixes-for-v4.19-rc7' of git://git.kernel.org/pub/scm/linux...
Dave Airlie [Wed, 3 Oct 2018 06:31:11 +0000 (16:31 +1000)]
Merge tag 'exynos-drm-fixes-for-v4.19-rc7' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes

Use default iommu domain instead of fake one
- This patch makes it to reuse default IOMMU domain instead of
  allocating a fake IOMMU domain, and allows some design changes
  for enhancement of IOMMU framework[1] without breaking Exynos DRM.

[1] https://www.spinics.net/lists/arm-kernel/msg676098.html

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1538360696-23579-1-git-send-email-inki.dae@samsung.com
5 years agor8169: always autoneg on resume
Alex Xu (Hello71) [Sun, 30 Sep 2018 15:06:39 +0000 (11:06 -0400)]
r8169: always autoneg on resume

This affects at least versions 25 and 33, so assume all cards are broken
and just renegotiate by default.

Fixes: 10bc6a6042c9 ("r8169: fix autoneg issue on resume with RTL8168E")
Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: fix use-after-free in ip_cmsg_recv_dstaddr()
Eric Dumazet [Sun, 30 Sep 2018 18:33:39 +0000 (11:33 -0700)]
ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()

Caching ip_hdr(skb) before a call to pskb_may_pull() is buggy,
do not do it.

Fixes: 2efd4fca703a ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-fixes-2018-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 3 Oct 2018 05:20:24 +0000 (22:20 -0700)]
Merge tag 'mlx5-fixes-2018-10-01' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2018-10-01

This pull request includes some fixes to mlx5 driver,
Please pull and let me know if there's any problem.

For -stable v4.11:
"6e0a4a23c59a ('net/mlx5: E-Switch, Fix out of bound access when setting vport rate')"

For -stable v4.18:
"98d6627c372a ('net/mlx5e: Set vlan masks for all offloaded TC rules')"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'rmnet-fixes'
David S. Miller [Wed, 3 Oct 2018 05:16:00 +0000 (22:16 -0700)]
Merge branch 'rmnet-fixes'

Subash Abhinov Kasiviswanathan says:

====================
net: qualcomm: rmnet: Updates 2018-10-02

This series is a set of small fixes for rmnet driver

Patch 1 is a fix for a scenario reported by syzkaller
Patch 2 & 3 are fixes for incorrect allocation flags
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qualcomm: rmnet: Fix incorrect allocation flag in receive path
Subash Abhinov Kasiviswanathan [Wed, 3 Oct 2018 00:52:03 +0000 (18:52 -0600)]
net: qualcomm: rmnet: Fix incorrect allocation flag in receive path

The incoming skb needs to be reallocated in case the headroom
is not sufficient to adjust the ethernet header. This allocation
needs to be atomic otherwise it results in this splat

 [<600601bb>] ___might_sleep+0x185/0x1a3
 [<603f6314>] ? _raw_spin_unlock_irqrestore+0x0/0x27
 [<60069bb0>] ? __wake_up_common_lock+0x95/0xd1
 [<600602b0>] __might_sleep+0xd7/0xe2
 [<60065598>] ? enqueue_task_fair+0x112/0x209
 [<600eea13>] __kmalloc_track_caller+0x5d/0x124
 [<600ee9b6>] ? __kmalloc_track_caller+0x0/0x124
 [<602696d5>] __kmalloc_reserve.isra.34+0x30/0x7e
 [<603f629b>] ? _raw_spin_lock_irqsave+0x0/0x3d
 [<6026b744>] pskb_expand_head+0xbf/0x310
 [<6025ca6a>] rmnet_rx_handler+0x7e/0x16b
 [<6025c9ec>] ? rmnet_rx_handler+0x0/0x16b
 [<6027ad0c>] __netif_receive_skb_core+0x301/0x96f
 [<60033c17>] ? set_signals+0x0/0x40
 [<6027bbcb>] __netif_receive_skb+0x24/0x8e

Fixes: 74692caf1b0b ("net: qualcomm: rmnet: Process packets over ethernet")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qualcomm: rmnet: Fix incorrect allocation flag in transmit
Subash Abhinov Kasiviswanathan [Wed, 3 Oct 2018 00:52:02 +0000 (18:52 -0600)]
net: qualcomm: rmnet: Fix incorrect allocation flag in transmit

The incoming skb needs to be reallocated in case the headroom
is not sufficient to add the MAP header. This allocation needs to
be atomic otherwise it results in the following splat

[32805.801456] BUG: sleeping function called from invalid context
[32805.841141] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[32805.904773] task: ffffffd7c5f62280 task.stack: ffffff80464a8000
[32805.910851] pc : ___might_sleep+0x180/0x188
[32805.915143] lr : ___might_sleep+0x180/0x188
[32806.131520] Call trace:
[32806.134041]  ___might_sleep+0x180/0x188
[32806.137980]  __might_sleep+0x50/0x84
[32806.141653]  __kmalloc_track_caller+0x80/0x3bc
[32806.146215]  __kmalloc_reserve+0x3c/0x88
[32806.150241]  pskb_expand_head+0x74/0x288
[32806.154269]  rmnet_egress_handler+0xb0/0x1d8
[32806.162239]  rmnet_vnd_start_xmit+0xc8/0x13c
[32806.166627]  dev_hard_start_xmit+0x148/0x280
[32806.181181]  sch_direct_xmit+0xa4/0x198
[32806.185125]  __qdisc_run+0x1f8/0x310
[32806.188803]  net_tx_action+0x23c/0x26c
[32806.192655]  __do_softirq+0x220/0x408
[32806.196420]  do_softirq+0x4c/0x70

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qualcomm: rmnet: Skip processing loopback packets
Sean Tranchetti [Wed, 3 Oct 2018 00:52:01 +0000 (18:52 -0600)]
net: qualcomm: rmnet: Skip processing loopback packets

RMNET RX handler was processing invalid packets that were
originally sent on the real device and were looped back via
dev_loopback_xmit(). This was detected using syzkaller.

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: systemport: Fix wake-up interrupt race during resume
Florian Fainelli [Tue, 2 Oct 2018 23:52:03 +0000 (16:52 -0700)]
net: systemport: Fix wake-up interrupt race during resume

The AON_PM_L2 is normally used to trigger and identify the source of a
wake-up event. Since the RX_SYS clock is no longer turned off, we also
have an interrupt being sent to the SYSTEMPORT INTRL_2_0 controller, and
that interrupt remains active up until the magic packet detector is
disabled which happens much later during the driver resumption.

The race happens if we have a CPU that is entering the SYSTEMPORT
INTRL2_0 handler during resume, and another CPU has managed to clear the
wake-up interrupt during bcm_sysport_resume_from_wol(). In that case, we
have the first CPU stuck in the interrupt handler with an interrupt
cause that has been cleared under its feet, and so we keep returning
IRQ_NONE and we never make any progress.

This was not a problem before because we would always turn off the
RX_SYS clock during WoL, so the SYSTEMPORT INTRL2_0 would also be turned
off as well, thus not latching the interrupt.

The fix is to make sure we do not enable either the MPD or
BRCM_TAG_MATCH interrupts since those are redundant with what the
AON_PM_L2 interrupt controller already processes and they would cause
such a race to occur.

Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER")
Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'wireless-drivers-for-davem-2018-10-01' of git://git.kernel.org/pub/scm...
David S. Miller [Tue, 2 Oct 2018 23:16:59 +0000 (16:16 -0700)]
Merge tag 'wireless-drivers-for-davem-2018-10-01' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.19

First, and also hopefully the last, set of fixes for 4.19. All small
but still important fixes

mt76x0

* fix a bug when a virtual interface is removed multiple times

b43

* fix DMA error related regression with proprietary firmware

iwlwifi

* fix an oops which was a regression in v4.19-rc1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096
Eric Dumazet [Tue, 2 Oct 2018 22:47:35 +0000 (15:47 -0700)]
rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096

We have an impressive number of syzkaller bugs that are linked
to the fact that syzbot was able to create a networking device
with millions of TX (or RX) queues.

Let's limit the number of RX/TX queues to 4096, this really should
cover all known cases.

A separate patch will add various cond_resched() in the loops
handling sysfs entries at device creation and dismantle.

Tested:

lpaa6:~# ip link add gre-4097 numtxqueues 4097 numrxqueues 4097 type ip6gretap
RTNETLINK answers: Invalid argument

lpaa6:~# time ip link add gre-4096 numtxqueues 4096 numrxqueues 4096 type ip6gretap

real 0m0.180s
user 0m0.000s
sys 0m0.107s

Fixes: 76ff5cc91935 ("rtnl: allow to specify number of rx and tx queues on device creation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: fix warning message
Mahesh Bandewar [Tue, 2 Oct 2018 19:14:34 +0000 (12:14 -0700)]
bonding: fix warning message

RX queue config for bonding master could be different from its slave
device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local
packets to bonding master also."), the packet is reinjected into stack
with skb->dev as bonding master. This potentially triggers the
message:

   "bondX received packet on queue Y, but number of RX queues is Z"

whenever the queue that packet is received on is higher than the
numrxqueues on bonding master (Y > Z).

Fixes: 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also.")
Reported-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: make sure to grab rcu_read_lock before using ireq->ireq_opt
Eric Dumazet [Tue, 2 Oct 2018 19:35:05 +0000 (12:35 -0700)]
inet: make sure to grab rcu_read_lock before using ireq->ireq_opt

Timer handlers do not imply rcu_read_lock(), so my recent fix
triggered a LOCKDEP warning when SYNACK is retransmit.

Lets add rcu_read_lock()/rcu_read_unlock() pairs around ireq->ireq_opt
usages instead of guessing what is done by callers, since it is
not worth the pain.

Get rid of ireq_opt_deref() helper since it hides the logic
without real benefit, since it is now a standard rcu_dereference().

Fixes: 1ad98e9d1bdf ("tcp/dccp: fix lockdep issue when SYN is backlogged")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRISCV: Fix end PFN for low memory
Atish Patra [Tue, 11 Sep 2018 18:30:18 +0000 (11:30 -0700)]
RISCV: Fix end PFN for low memory

Use memblock_end_of_DRAM which provides correct last low memory
PFN. Without that, DMA32 region becomes empty resulting in zero
pages being allocated for DMA32.

This patch is based on earlier patch from palmer which never
merged into 4.19. I just edited the commit text to make more
sense.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
5 years agonfp: avoid soft lockups under control message storm
Jakub Kicinski [Tue, 2 Oct 2018 17:10:14 +0000 (10:10 -0700)]
nfp: avoid soft lockups under control message storm

When FW floods the driver with control messages try to exit the cmsg
processing loop every now and then to avoid soft lockups.  Cmsg
processing is generally very lightweight so 512 seems like a reasonable
budget, which should not be exceeded under normal conditions.

Fixes: 77ece8d5f196 ("nfp: add control vNIC datapath")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodeclance: Fix continuation with the adapter identification message
Maciej W. Rozycki [Tue, 2 Oct 2018 13:23:45 +0000 (14:23 +0100)]
declance: Fix continuation with the adapter identification message

Fix a commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing
continuation lines") regression with the `declance' driver, which caused
the adapter identification message to be split between two lines, e.g.:

declance.c: v0.011 by Linux MIPS DECstation task force
tc6: PMAD-AA
, addr = 08:00:2b:1b:2a:6a, irq = 14
tc6: registered as eth0.

Address that properly, by printing identification with a single call,
making the messages now look like:

declance.c: v0.011 by Linux MIPS DECstation task force
tc6: PMAD-AA, addr = 08:00:2b:1b:2a:6a, irq = 14
tc6: registered as eth0.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fec: fix rare tx timeout
Rickard x Andersson [Tue, 2 Oct 2018 12:49:32 +0000 (14:49 +0200)]
net: fec: fix rare tx timeout

During certain heavy network loads TX could time out
with TX ring dump.
TX is sometimes never restarted after reaching
"tx_stop_threshold" because function "fec_enet_tx_queue"
only tests the first queue.

In addition the TX timeout callback function failed to
recover because it also operated only on the first queue.

Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux
Greg Kroah-Hartman [Tue, 2 Oct 2018 12:19:43 +0000 (05:19 -0700)]
Merge tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux

Bartlomiej writes:
  "fbdev fixes for v4.19-rc7:

   - fix OMAPFB_MEMORY_READ ioctl to not leak kernel memory in omapfb driver
     (Tomi Valkeinen)

   - add missing prepare/unprepare clock operations in pxa168fb driver
     (Lubomir Rintel)

   - add nobgrt option in efifb driver to disable ACPI BGRT logo restore
     (Hans de Goede)

   - fix spelling mistake in fall-through annotation in stifb driver
     (Gustavo A. R. Silva)

   - fix URL for uvesafb repository in the documentation (Adam Jackson)"

* tag 'fbdev-v4.19-rc7' of https://github.com/bzolnier/linux:
  video/fbdev/stifb: Fix spelling mistake in fall-through annotation
  uvesafb: Fix URLs in the documentation
  efifb: BGRT: Add nobgrt option
  fbdev/omapfb: fix omapfb_memory_read infoleak
  pxa168fb: prepare the clock

5 years agoMerge tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Greg Kroah-Hartman [Tue, 2 Oct 2018 12:19:04 +0000 (05:19 -0700)]
Merge tag 'mmc-v4.19-rc4' of git://git./linux/kernel/git/ulfh/mmc

Ulf writes:
  "MMC core:
    - Fixup conversion of debounce time to/from ms/us

   MMC host:
    - sdhi: Fixup whitelisting for Gen3 types"

* tag 'mmc-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: slot-gpio: Fix debounce time to use miliseconds again
  mmc: core: Fix debounce time to use microseconds
  mmc: sdhi: sys_dmac: check for all Gen3 types when whitelisting

5 years agor8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO
Heiner Kallweit [Fri, 28 Sep 2018 21:51:54 +0000 (23:51 +0200)]
r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO

Some of the chip-specific hw_start functions set bit TXCFG_AUTO_FIFO
in register TxConfig. The original patch changed the order of some
calls resulting in these changes being overwritten by
rtl_set_tx_config_registers() in rtl_hw_start(). This eventually
resulted in network stalls especially under high load.

Analyzing the chip-specific hw_start functions all chip version from
34, with the exception of version 39, need this bit set.
This patch moves setting this bit to rtl_set_tx_config_registers().

Fixes: 4fd48c4ac0a0 ("r8169: move common initializations to tp->hw_start")
Reported-by: Ortwin Glück <odi@odi.ch>
Reported-by: David Arendt <admin@prnet.org>
Root-caused-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Tested-by: Tony Atkinson <tatkinson@linux.com>
Tested-by: David Arendt <admin@prnet.org>
Tested-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'tun-races'
David S. Miller [Tue, 2 Oct 2018 06:27:28 +0000 (23:27 -0700)]
Merge branch 'tun-races'

Eric Dumazet says:

====================
tun: address two syzbot reports

Small changes addressing races discovered by syzbot.

First patch is a cleanup.
Second patch moves a mutex init sooner.
Third patch makes sure each tfile gets its own napi enable flags.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: napi flags belong to tfile
Eric Dumazet [Fri, 28 Sep 2018 21:51:49 +0000 (14:51 -0700)]
tun: napi flags belong to tfile

Since tun->flags might be shared by multiple tfile structures,
it is better to make sure tun_get_user() is using the flags
for the current tfile.

Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint
of what could happen, but we need something stronger to please
syzbot.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715
 tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967
 call_write_iter include/linux/fs.h:1808 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457579
Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579
RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a
RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4
R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff
Modules linked in:

RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: initialize napi_mutex unconditionally
Eric Dumazet [Fri, 28 Sep 2018 21:51:48 +0000 (14:51 -0700)]
tun: initialize napi_mutex unconditionally

This is the first part to fix following syzbot report :

console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=443816db871edd66
link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80
Following patch is fixing the race condition, but it seems safer
to initialize this mutex at tfile creation anyway.

Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: remove unused parameters
Eric Dumazet [Fri, 28 Sep 2018 21:51:47 +0000 (14:51 -0700)]
tun: remove unused parameters

tun_napi_disable() and tun_napi_del() do not need
a pointer to the tun_struct

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobond: take rcu lock in netpoll_send_skb_on_dev
Dave Jones [Fri, 28 Sep 2018 20:26:08 +0000 (16:26 -0400)]
bond: take rcu lock in netpoll_send_skb_on_dev

The bonding driver lacks the rcu lock when it calls down into
netdev_lower_get_next_private_rcu from bond_poll_controller, which
results in a trace like:

WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
Workqueue: bond0 bond_mii_monitor
RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
Call Trace:
 bond_poll_controller+0x52/0x170
 netpoll_poll_dev+0x79/0x290
 netpoll_send_skb_on_dev+0x158/0x2c0
 netpoll_send_udp+0x2d5/0x430
 write_ext_msg+0x1e0/0x210
 console_unlock+0x3c4/0x630
 vprintk_emit+0xfa/0x2f0
 printk+0x52/0x6e
 ? __netdev_printk+0x12b/0x220
 netdev_info+0x64/0x80
 ? bond_3ad_set_carrier+0xe9/0x180
 bond_select_active_slave+0x1fc/0x310
 bond_mii_monitor+0x709/0x9b0
 process_one_work+0x221/0x5e0
 worker_thread+0x4f/0x3b0
 kthread+0x100/0x140
 ? process_one_work+0x5e0/0x5e0
 ? kthread_delayed_work_timer_fn+0x90/0x90
 ret_from_fork+0x24/0x30

We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
before we call down into netpoll_poll_dev, so just take the lock there.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agortnetlink: Fail dump if target netnsid is invalid
David Ahern [Fri, 28 Sep 2018 19:28:41 +0000 (12:28 -0700)]
rtnetlink: Fail dump if target netnsid is invalid

Link dumps can return results from a target namespace. If the namespace id
is invalid, then the dump request should fail if get_target_net fails
rather than continuing with a dump of the current namespace.

Fixes: 79e1ad148c844 ("rtnetlink: use netnsid to query interface")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "openvswitch: Fix template leak in error cases."
Flavio Leitner [Fri, 28 Sep 2018 17:55:34 +0000 (14:55 -0300)]
Revert "openvswitch: Fix template leak in error cases."

This reverts commit 90c7afc96cbbd77f44094b5b651261968e97de67.

When the commit was merged, the code used nf_ct_put() to free
the entry, but later on commit 76644232e612 ("openvswitch: Free
tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
is a more appropriate. Now the original problem is removed.

Then 44d6e2f27328 ("net: Replace NF_CT_ASSERT() with WARN_ON().")
replaced a debug assert with a WARN_ON() which is trigged now.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Tue, 2 Oct 2018 05:40:39 +0000 (22:40 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth

Johan Hedberg says:

====================
pull request: bluetooth 2018-09-27

Here's one more Bluetooth fix for 4.19, fixing the handling of an
attempt to unpair a device while pairing is in progress.

Let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: ignore STATE_MSG on wrong link session
LUU Duc Canh [Wed, 26 Sep 2018 20:28:52 +0000 (22:28 +0200)]
tipc: ignore STATE_MSG on wrong link session

The initial session number when a link is created is based on a random
value, taken from struct tipc_net->random. It is then incremented for
each link reset to avoid mixing protocol messages from different link
sessions.

However, when a bearer is reset all its links are deleted, and will
later be re-created using the same random value as the first time.
This means that if the link never went down between creation and
deletion we will still sometimes have two subsequent sessions with
the same session number. In virtual environments with potentially
long transmission times this has turned out to be a real problem.

We now fix this by randomizing the session number each time a link
is created.

With a session number size of 16 bits this gives a risk of session
collision of 1/64k. To reduce this further, we also introduce a sanity
check on the very first STATE message arriving at a link. If this has
an acknowledge value differing from 0, which is logically impossible,
we ignore the message. The final risk for session collision is hence
reduced to 1/4G, which should be sufficient.

Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: act_ipt: check for underflow in __tcf_ipt_init()
Dan Carpenter [Sat, 22 Sep 2018 13:46:48 +0000 (16:46 +0300)]
net: sched: act_ipt: check for underflow in __tcf_ipt_init()

If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we
return -EINVAL.  But we don't check whether it's smaller than
sizeof(struct xt_entry_target) and that could lead to an out of bounds
read.

Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Tue, 2 Oct 2018 05:29:25 +0000 (22:29 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2018-10-01

1) Validate address prefix lengths in the xfrm selector,
   otherwise we may hit undefined behaviour in the
   address matching functions if the prefix is too
   big for the given address family.

2) Fix skb leak on local message size errors.
   From Thadeu Lima de Souza Cascardo.

3) We currently reset the transport header back to the network
   header after a transport mode transformation is applied. This
   leads to an incorrect transport header when multiple transport
   mode transformations are applied. Reset the transport header
   only after all transformations are already applied to fix this.
   From Sowmini Varadhan.

4) We only support one offloaded xfrm, so reset crypto_done after
   the first transformation in xfrm_input(). Otherwise we may call
   the wrong input method for subsequent transformations.
   From Sowmini Varadhan.

5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
   skb_dst_force does not really force a dst refcount anymore, it might
   clear it instead. xfrm code did not expect this, add a check to not
   dereference skb_dst() if it was cleared by skb_dst_force.

6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds
   read in xfrm_state_find. From Sean Tranchetti.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Greg Kroah-Hartman [Tue, 2 Oct 2018 00:24:20 +0000 (17:24 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Will writes:
  "Late arm64 fixes

   - Fix handling of young contiguous ptes for hugetlb mappings

   - Fix livelock when taking access faults on contiguous hugetlb mappings

   - Tighten up register accesses via KVM SET_ONE_REG ioctl()s"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: KVM: Sanitize PSTATE.M when being set from userspace
  arm64: KVM: Tighten guest core register access from userspace
  arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags
  arm64: hugetlb: Fix handling of young ptes

5 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Greg Kroah-Hartman [Tue, 2 Oct 2018 00:23:27 +0000 (17:23 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Olof writes:
  "ARM: SoC fixes

   A handful of fixes that have been coming in the last couple of weeks:

   - Freescale fixes for on-chip accellerators
   - A DT fix for stm32 to avoid fallback to non-DMA SPI mode
   - Fixes for badly specified interrupts on BCM63xx SoCs
   - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40
   - Drive strength fix for SAMA5D2 NAND pins on one board"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: stm32: update SPI6 dmas property on stm32mp157c
  soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift()
  soc: fsl: qbman: qman: avoid allocating from non existing gen_pool
  ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
  MAINTAINERS: update the Annapurna Labs maintainer email
  ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT
  ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl

5 years agoMerge tag 'pstore-v4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/kees...
Greg Kroah-Hartman [Tue, 2 Oct 2018 00:22:36 +0000 (17:22 -0700)]
Merge tag 'pstore-v4.19-rc7' of https://git./linux/kernel/git/kees/linux

Kees writes:
  "Pstore fixes for v4.19-rc7

   - Fix failure-path memory leak in ramoops_init (nixiaoming)"

* tag 'pstore-v4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Fix failure-path memory leak in ramoops_init

5 years agotcp/dccp: fix lockdep issue when SYN is backlogged
Eric Dumazet [Mon, 1 Oct 2018 22:02:26 +0000 (15:02 -0700)]
tcp/dccp: fix lockdep issue when SYN is backlogged

In normal SYN processing, packets are handled without listener
lock and in RCU protected ingress path.

But syzkaller is known to be able to trick us and SYN
packets might be processed in process context, after being
queued into socket backlog.

In commit 06f877d613be ("tcp/dccp: fix other lockdep splats
accessing ireq_opt") I made a very stupid fix, that happened
to work mostly because of the regular path being RCU protected.

Really the thing protecting ireq->ireq_opt is RCU read lock,
and the pseudo request refcnt is not relevant.

This patch extends what I did in commit 449809a66c1d ("tcp/dccp:
block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
pair in the paths that might be taken when processing SYN from
socket backlog (thus possibly in process context)

Fixes: 06f877d613be ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Mon, 1 Oct 2018 22:41:01 +0000 (15:41 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree:

1) Skip ip_sabotage_in() for packet making into the VRF driver,
   otherwise packets are dropped, from David Ahern.

2) Clang compilation warning uncovering typo in the
   nft_validate_register_store() call from nft_osf, from Stefan Agner.

3) Double sizeof netlink message length calculations in ctnetlink,
   from zhong jiang.

4) Missing rb_erase() on batch full in rbtree garbage collector,
   from Taehee Yoo.

5) Calm down compilation warning in nf_hook(), from Florian Westphal.

6) Missing check for non-null sk in xt_socket before validating
   netns procedence, from Flavio Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Set vlan masks for all offloaded TC rules
Jianbo Liu [Sat, 25 Aug 2018 03:29:58 +0000 (03:29 +0000)]
net/mlx5e: Set vlan masks for all offloaded TC rules

In flow steering, if asked to, the hardware matches on the first ethertype
which is not vlan. It's possible to set a rule as follows, which is meant
to match on untagged packet, but will match on a vlan packet:
    tc filter add dev eth0 parent ffff: protocol ip flower ...

To avoid this for packets with single tag, we set vlan masks to tell
hardware to check the tags for every matched packet.

Fixes: 095b6cfd69ce ('net/mlx5e: Add TC vlan match parsing')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Fix out of bound access when setting vport rate
Eran Ben Elisha [Sun, 16 Sep 2018 11:45:27 +0000 (14:45 +0300)]
net/mlx5: E-Switch, Fix out of bound access when setting vport rate

The code that deals with eswitch vport bw guarantee was going beyond the
eswitch vport array limit, fix that.  This was pointed out by the kernel
address sanitizer (KASAN).

The error from KASAN log:
[2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in
mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core]

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules
Alaa Hleihel [Wed, 5 Sep 2018 08:43:23 +0000 (11:43 +0300)]
net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules

If the peer device was already unbound, then do not attempt to modify
it's resources, otherwise we will crash on dereferencing non-existing
device.

Fixes: 5c65c564c962 ("net/mlx5e: Support offloading TC NIC hairpin flows")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agodrm/i915: Avoid compiler warning for maybe unused gu_misc_iir
Chris Wilson [Wed, 26 Sep 2018 10:47:18 +0000 (11:47 +0100)]
drm/i915: Avoid compiler warning for maybe unused gu_misc_iir

/kisskb/src/drivers/gpu/drm/i915/i915_irq.c: warning: 'gu_misc_iir' may be used uninitialized in this function [-Wuninitialized]:  => 3120:10

Silence the compiler warning by ensuring that the local variable is
initialised and removing the guard that is confusing the older gcc.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: df0d28c185ad ("drm/i915/icl: GSE interrupt moves from DE_MISC to GU_MISC")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926104718.17462-1-chris@chris-wilson.co.uk
(cherry picked from commit 7a90938332d80faf973fbcffdf6e674e7b8f0914)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 years agodrm/i915: Do not redefine the has_csr parameter.
Anusha Srivatsa [Fri, 17 Aug 2018 17:33:30 +0000 (10:33 -0700)]
drm/i915: Do not redefine the has_csr parameter.

Let us reuse the already defined has_csr check and not
redefine it.

The main difference is that in effect this will flip .has_csr to 1
(via GEN9_FEATURES which GEN11_FEATURES pulls in).

Suggested-by: Imre Deak <imre.deak@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107382
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1534527210-16841-1-git-send-email-anusha.srivatsa@intel.com
(cherry picked from commit da4468a1aa75457e6134127b19761b7ba62ce945)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 years agoarm64: KVM: Sanitize PSTATE.M when being set from userspace
Marc Zyngier [Thu, 27 Sep 2018 15:53:22 +0000 (16:53 +0100)]
arm64: KVM: Sanitize PSTATE.M when being set from userspace

Not all execution modes are valid for a guest, and some of them
depend on what the HW actually supports. Let's verify that what
userspace provides is compatible with both the VM settings and
the HW capabilities.

Cc: <stable@vger.kernel.org>
Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
5 years agoarm64: KVM: Tighten guest core register access from userspace
Dave Martin [Thu, 27 Sep 2018 15:53:21 +0000 (16:53 +0100)]
arm64: KVM: Tighten guest core register access from userspace

We currently allow userspace to access the core register file
in about any possible way, including straddling multiple
registers and doing unaligned accesses.

This is not the expected use of the ABI, and nobody is actually
using it that way. Let's tighten it by explicitly checking
the size and alignment for each field of the register file.

Cc: <stable@vger.kernel.org>
Fixes: 2f4a07c5f9fe ("arm64: KVM: guest one-reg interface")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[maz: rewrote Dave's initial patch to be more easily backported]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
5 years agodrm/exynos: Use selected dma_dev default iommu domain instead of a fake one
Marek Szyprowski [Fri, 28 Sep 2018 16:09:23 +0000 (18:09 +0200)]
drm/exynos: Use selected dma_dev default iommu domain instead of a fake one

Instead of allocating a fake IOMMU domain for all Exynos DRM components,
simply reuse the default IOMMU domain of the already selected DMA device.
This allows some design changes in IOMMU framework without breaking IOMMU
support in Exynos DRM.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
5 years agoxfs: fix error handling in xfs_bmap_extents_to_btree
Dave Chinner [Sun, 30 Sep 2018 22:11:07 +0000 (08:11 +1000)]
xfs: fix error handling in xfs_bmap_extents_to_btree

Commit 01239d77b9dd ("xfs: fix a null pointer dereference in
xfs_bmap_extents_to_btree") attempted to fix a null pointer
dreference when a fuzzing corruption of some kind was found.
This fix was flawed, resulting in assert failures like:

XFS: Assertion failed: ifp->if_broot == NULL, file: fs/xfs/libxfs/xfs_bmap.c, line: 715
.....
Call Trace:
  xfs_bmap_extents_to_btree+0x6b9/0x7b0
  __xfs_bunmapi+0xae7/0xf00
  ? xfs_log_reserve+0x1c8/0x290
  xfs_reflink_remap_extent+0x20b/0x620
  xfs_reflink_remap_blocks+0x7e/0x290
  xfs_reflink_remap_range+0x311/0x530
  vfs_dedupe_file_range_one+0xd7/0xe0
  vfs_dedupe_file_range+0x15b/0x1a0
  do_vfs_ioctl+0x267/0x6c0

The problem is that the error handling code now asserts that the
inode fork is not in btree format before the error handling code
undoes the modifications that put the fork back in extent format.
Fix this by moving the assert back to after the xfs_iroot_realloc()
call that returns the fork to extent format, and clean up the jump
labels to be meaningful.

Also, returning ENOSPC when xfs_btree_get_bufl() fails to
instantiate the buffer that was allocated (the actual fix in the
commit mentioned above) is incorrect. This is a fatal error - only
an invalid block address or a filesystem shutdown can result in
failing to get a buffer here.

Hence change this to EFSCORRUPTED so that the higher layer knows
this was a corruption related failure and should not treat it as an
ENOSPC error.  This should result in a shutdown (via cancelling a
dirty transaction) which is necessary as we do not attempt to clean
up the (invalid) block that we have already allocated.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agopstore/ram: Fix failure-path memory leak in ramoops_init
Kees Cook [Fri, 28 Sep 2018 22:17:50 +0000 (15:17 -0700)]
pstore/ram: Fix failure-path memory leak in ramoops_init

As reported by nixiaoming, with some minor clarifications:

1) memory leak in ramoops_register_dummy():
   dummy_data = kzalloc(sizeof(*dummy_data), GFP_KERNEL);
   but no kfree() if platform_device_register_data() fails.

2) memory leak in ramoops_init():
   Missing platform_device_unregister(dummy) and kfree(dummy_data)
   if platform_driver_register(&ramoops_driver) fails.

I've clarified the purpose of ramoops_register_dummy(), and added a
common cleanup routine for all three failure paths to call.

Reported-by: nixiaoming <nixiaoming@huawei.com>
Cc: stable@vger.kernel.org
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
5 years agoLinux 4.19-rc6 v4.19-rc6
Greg Kroah-Hartman [Sun, 30 Sep 2018 14:15:35 +0000 (07:15 -0700)]
Linux 4.19-rc6

5 years agoMerge tag 'auxdisplay-for-greg-v4.19-rc6' of https://github.com/ojeda/linux
Greg Kroah-Hartman [Sun, 30 Sep 2018 13:20:33 +0000 (06:20 -0700)]
Merge tag 'auxdisplay-for-greg-v4.19-rc6' of https://github.com/ojeda/linux

Miguel writes:
  "A trivial fix for auxdisplay

    - MAINTAINERS reference fix for moved file
      Reported by Joe Perches"

* tag 'auxdisplay-for-greg-v4.19-rc6' of https://github.com/ojeda/linux:
  MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c

5 years agoMerge tag 'libnvdimm-fixes2-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sun, 30 Sep 2018 13:19:38 +0000 (06:19 -0700)]
Merge tag 'libnvdimm-fixes2-4.19-rc6' of git://git./linux/kernel/git/nvdimm/nvdimm

Dan writes:
  "filesystem-dax for 4.19-rc6

   Fix a deadlock in the new for 4.19 dax_lock_mapping_entry() routine."

* tag 'libnvdimm-fixes2-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix deadlock in dax_lock_mapping_entry()

5 years agoMAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
Miguel Ojeda [Sun, 30 Sep 2018 11:50:05 +0000 (13:50 +0200)]
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c

Commit 51c1e9b554c9 ("auxdisplay: Move panel.c to drivers/auxdisplay folder")
moved the file, but the MAINTAINERS reference was not updated.

Link: https://lore.kernel.org/lkml/20180928220131.31075-1-joe@perches.com/
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
5 years agoMerge tag 'for-linus-20180929' of git://git.kernel.dk/linux-block
Greg Kroah-Hartman [Sat, 29 Sep 2018 21:52:14 +0000 (14:52 -0700)]
Merge tag 'for-linus-20180929' of git://git.kernel.dk/linux-block

Jens writes:
  "Block fixes for 4.19-rc6

   A set of fixes that should go into this release. This pull request
   contains:

   - A fix (hopefully) for the persistent grants for xen-blkfront. A
     previous fix from this series wasn't complete, hence reverted, and
     this one should hopefully be it. (Boris Ostrovsky)

   - Fix for an elevator drain warning with SMR devices, which is
     triggered when you switch schedulers (Damien)

   - bcache deadlock fix (Guoju Fang)

   - Fix for the block unplug tracepoint, which has had the
     timer/explicit flag reverted since 4.11 (Ilya)

   - Fix a regression in this series where the blk-mq timeout hook is
     invoked with the RCU read lock held, hence preventing it from
     blocking (Keith)

   - NVMe pull from Christoph, with a single multipath fix (Susobhan Dey)"

* tag 'for-linus-20180929' of git://git.kernel.dk/linux-block:
  xen/blkfront: correct purging of persistent grants
  Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
  blk-mq: I/O and timer unplugs are inverted in blktrace
  bcache: add separate workqueue for journal_write to avoid deadlock
  xen/blkfront: When purging persistent grants, keep them in the buffer
  block: fix deadline elevator drain for zoned block devices
  blk-mq: Allow blocking queue tag iter callbacks
  nvme: properly propagate errors in nvme_mpath_init

5 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sat, 29 Sep 2018 21:34:06 +0000 (14:34 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Thomas writes:
  "A single fix for the AMD memory encryption boot code so it does not
   read random garbage instead of the cached encryption bit when a kexec
   kernel is allocated above the 32bit address limit."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix kexec booting failure in the SEV bit detection code

5 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sat, 29 Sep 2018 21:32:49 +0000 (14:32 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Thomas writes:
  "Three small fixes for clocksource drivers:
   - Proper error handling in the Atmel PIT driver
   - Add CLOCK_SOURCE_SUSPEND_NONSTOP for TI SoCs so suspend works again
   - Fix the next event function for Facebook Backpack-CMM BMC chips so
     usleep(100) doesnt sleep several milliseconds"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-atmel-pit: Properly handle error cases
  clocksource/drivers/fttmr010: Fix set_next_event handler
  clocksource/drivers/ti-32k: Add CLOCK_SOURCE_SUSPEND_NONSTOP flag for non-am43 SoCs

5 years agonetlink: fix typo in nla_parse_nested() comment
Johannes Berg [Wed, 26 Sep 2018 20:19:42 +0000 (22:19 +0200)]
netlink: fix typo in nla_parse_nested() comment

Fix a simple typo: attribuets -> attributes

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: Disable clk during suspend / resume
Hans de Goede [Wed, 26 Sep 2018 20:12:39 +0000 (22:12 +0200)]
r8169: Disable clk during suspend / resume

Disable the clk during suspend to save power. Note that tp->clk may be
NULL, the clk core functions handle this without problems.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Carlo Caione <carlo@endlessm.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqlcnic: fix Tx descriptor corruption on 82xx devices
Shahed Shaikh [Wed, 26 Sep 2018 19:41:10 +0000 (12:41 -0700)]
qlcnic: fix Tx descriptor corruption on 82xx devices

In regular NIC transmission flow, driver always configures MAC using
Tx queue zero descriptor as a part of MAC learning flow.
But with multi Tx queue supported NIC, regular transmission can occur on
any non-zero Tx queue and from that context it uses
Tx queue zero descriptor to configure MAC, at the same time TX queue
zero could be used by another CPU for regular transmission
which could lead to Tx queue zero descriptor corruption and cause FW
abort.

This patch fixes this in such a way that driver always configures
learned MAC address from the same Tx queue which is used for
regular transmission.

Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: fix failover problem
LUU Duc Canh [Wed, 26 Sep 2018 19:00:54 +0000 (21:00 +0200)]
tipc: fix failover problem

We see the following scenario:
1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
   Since there is a second working link, failover procedure is started.
2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
   A on node 2. The node item 1->2 goes to state FAILINGOVER.
3) Linke endpoint A/2 receives the failover, and is supposed to take
   down its parallell link endpoint B/2, while producing a FAILOVER
   message to send back to A/1.
4) However, B/2 has already been deleted, so no FAILOVER message can
   created.
5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
   any messages that can bring B/1 up again. We are left with a non-
   redundant link between node 1 and 2.

We fix this with letting endpoint A/2 build a dummy FAILOVER message
to send to back to A/1, so that the situation can be resolved.

Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Greg Kroah-Hartman [Sat, 29 Sep 2018 18:32:03 +0000 (11:32 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Thomas writes:
  "A single fix for a missing sanity check when a pinned event is tried
  to be read on the wrong CPU due to a legit event scheduling failure."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Add sanity check to deal with pinned event failure

5 years agoMerge branch 'net-usb-Check-for-Wake-on-LAN-modes'
David S. Miller [Sat, 29 Sep 2018 18:31:30 +0000 (11:31 -0700)]
Merge branch 'net-usb-Check-for-Wake-on-LAN-modes'

Florian Fainelli says:

====================
net: usb: Check for Wake-on-LAN modes

Most of our USB Ethernet drivers don't seem to be checking properly
whether the user is supplying a correct Wake-on-LAN mode to enter, so
the experience as an user could be confusing, since it would generally
lead to either no wake-up, or the device not being marked for wake-up.

Please review!

Changes in v2:

- fixed lan78xx handling, thanks Woojung!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosmsc95xx: Check for Wake-on-LAN modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:56 +0000 (16:18 -0700)]
smsc95xx: Check for Wake-on-LAN modes

The driver does not check for Wake-on-LAN modes specified by an user,
but will conditionally set the device as wake-up enabled or not based on
that, which could be a very confusing user experience.

Fixes: e0e474a83c18 ("smsc95xx: add wol magic packet support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosmsc75xx: Check for Wake-on-LAN modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:55 +0000 (16:18 -0700)]
smsc75xx: Check for Wake-on-LAN modes

The driver does not check for Wake-on-LAN modes specified by an user,
but will conditionally set the device as wake-up enabled or not based on
that, which could be a very confusing user experience.

Fixes: 6c636503260d ("smsc75xx: add wol magic packet support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8152: Check for supported Wake-on-LAN Modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:54 +0000 (16:18 -0700)]
r8152: Check for supported Wake-on-LAN Modes

The driver does not check for Wake-on-LAN modes specified by an user,
but will conditionally set the device as wake-up enabled or not based on
that, which could be a very confusing user experience.

Fixes: 21ff2e8976b1 ("r8152: support WOL")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosr9800: Check for supported Wake-on-LAN modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:53 +0000 (16:18 -0700)]
sr9800: Check for supported Wake-on-LAN modes

The driver currently silently accepts unsupported Wake-on-LAN modes
(other than WAKE_PHY or WAKE_MAGIC) without reporting that to the user,
which is confusing.

Fixes: 19a38d8e0aa3 ("USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agolan78xx: Check for supported Wake-on-LAN modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:52 +0000 (16:18 -0700)]
lan78xx: Check for supported Wake-on-LAN modes

The driver supports a fair amount of Wake-on-LAN modes, but is not
checking that the user specified one that is supported.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Woojung Huh <Woojung.Huh@Microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoax88179_178a: Check for supported Wake-on-LAN modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:51 +0000 (16:18 -0700)]
ax88179_178a: Check for supported Wake-on-LAN modes

The driver currently silently accepts unsupported Wake-on-LAN modes
(other than WAKE_PHY or WAKE_MAGIC) without reporting that to the user,
which is confusing.

Fixes: e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoasix: Check for supported Wake-on-LAN modes
Florian Fainelli [Fri, 28 Sep 2018 23:18:50 +0000 (16:18 -0700)]
asix: Check for supported Wake-on-LAN modes

The driver currently silently accepts unsupported Wake-on-LAN modes
(other than WAKE_PHY or WAKE_MAGIC) without reporting that to the user,
which is confusing.

Fixes: 2e55cc7210fe ("[PATCH] USB: usbnet (3/9) module for ASIX Ethernet adapters")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ieee802154-for-davem-2018-09-28' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Sat, 29 Sep 2018 18:28:49 +0000 (11:28 -0700)]
Merge branch 'ieee802154-for-davem-2018-09-28' of git://git./linux/kernel/git/sschmidt/wpan

Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2018-09-28

An update from ieee802154 for your *net* tree.

Some cleanup patches throughout the drivers from the Huawei tag team
Yue Haibing and Zhong Jiang.
Xue is replacing some magic numbers with defines in his mcr20a driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'rxrpc-fixes-20180928' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Sat, 29 Sep 2018 18:28:17 +0000 (11:28 -0700)]
Merge tag 'rxrpc-fixes-20180928' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fixes

Here are some miscellaneous fixes for AF_RXRPC:

 (1) Remove a duplicate variable initialisation.

 (2) Fix one of the checks made when we decide to set up a new incoming
     service call in which a flag is being checked in the wrong field of
     the packet header.  This check is abstracted out into helper
     functions.

 (3) Fix RTT gathering.  The code has been trying to make use of socket
     timestamps, but wasn't actually enabling them.  The code has also been
     recording a transmit time for the outgoing packet for which we're
     going to measure the RTT after sending the message - but we can get
     the incoming packet before we get to that and record a negative RTT.

 (4) Fix the emission of BUSY packets (we are emitting ABORTs instead).

 (5) Improve error checking on incoming packets.

 (6) Try to fix a bug in new service call handling whereby a BUG we should
     never be able to reach somehow got triggered.  Do this by moving much
     of the checking as early as possible and not repeating it later
     (depends on (5) above).

 (7) Fix the sockopts set on a UDP6 socket to include the ones set on a
     UDP4 socket so that we receive UDP4 errors and packet-too-large
     notifications too.

 (8) Fix the distribution of errors so that we do it at the point of
     receiving an error in the UDP callback rather than deferring it
     thereby cutting short any transmissions that would otherwise occur in
     the window.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'pm-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Greg Kroah-Hartman [Sat, 29 Sep 2018 13:50:36 +0000 (06:50 -0700)]
Merge tag 'pm-4.19-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Rafael writes:
  "Power management fix for 4.19-rc6

   Fix incorrect __init and __exit annotations in the Qualcomm
   Kryo cpufreq driver (Nathan Chancellor)."

* tag 'pm-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: qcom-kryo: Fix section annotations

5 years agocpufreq: qcom-kryo: Fix section annotations
Nathan Chancellor [Thu, 20 Sep 2018 00:22:21 +0000 (17:22 -0700)]
cpufreq: qcom-kryo: Fix section annotations

There is currently a warning when building the Kryo cpufreq driver into
the kernel image:

WARNING: vmlinux.o(.text+0x8aa424): Section mismatch in reference from
the function qcom_cpufreq_kryo_probe() to the function
.init.text:qcom_cpufreq_kryo_get_msm_id()
The function qcom_cpufreq_kryo_probe() references
the function __init qcom_cpufreq_kryo_get_msm_id().
This is often because qcom_cpufreq_kryo_probe lacks a __init
annotation or the annotation of qcom_cpufreq_kryo_get_msm_id is wrong.

Remove the '__init' annotation from qcom_cpufreq_kryo_get_msm_id
so that there is no more mismatch warning.

Additionally, Nick noticed that the remove function was marked as
'__init' when it should really be marked as '__exit'.

Fixes: 46e2856b8e18 (cpufreq: Add Kryo CPU scaling driver)
Fixes: 5ad7346b4ae2 (cpufreq: kryo: Add module remove and exit)
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.18+ <stable@vger.kernel.org> # 4.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5 years agoMerge tag 'dma-mapping-4.19-3' of git://git.infradead.org/users/hch/dma-mapping
Greg Kroah-Hartman [Sat, 29 Sep 2018 09:52:24 +0000 (02:52 -0700)]
Merge tag 'dma-mapping-4.19-3' of git://git.infradead.org/users/hch/dma-mapping

Christoph writes:
  "dma mapping fix for 4.19-rc6

   fix a missing Kconfig symbol for commits introduced in 4.19-rc"

* tag 'dma-mapping-4.19-3' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration

5 years agoiomap: set page dirty after partial delalloc on mkwrite
Brian Foster [Sat, 29 Sep 2018 03:51:01 +0000 (13:51 +1000)]
iomap: set page dirty after partial delalloc on mkwrite

The iomap page fault mechanism currently dirties the associated page
after the full block range of the page has been allocated. This
leaves the page susceptible to delayed allocations without ever
being set dirty on sub-page block sized filesystems.

For example, consider a page fault on a page with one preexisting
real (non-delalloc) block allocated in the middle of the page. The
first iomap_apply() iteration performs delayed allocation on the
range up to the preexisting block, the next iteration finds the
preexisting block, and the last iteration attempts to perform
delayed allocation on the range after the prexisting block to the
end of the page. If the first allocation succeeds and the final
allocation fails with -ENOSPC, iomap_apply() returns the error and
iomap_page_mkwrite() fails to dirty the page having already
performed partial delayed allocation. This eventually results in the
page being invalidated without ever converting the delayed
allocation to real blocks.

This problem is reliably reproduced by generic/083 on XFS on ppc64
systems (64k page size, 4k block size). It results in leaked
delalloc blocks on inode reclaim, which triggers an assert failure
in xfs_fs_destroy_inode() and filesystem accounting inconsistency.

Move the set_page_dirty() call from iomap_page_mkwrite() to the
actor callback, similar to how the buffer head implementation works.
The actor callback is called iff ->iomap_begin() returns success, so
ensures the page is dirtied as soon as possible after an allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: remove invalid log recovery first/last cycle check
Brian Foster [Sat, 29 Sep 2018 03:50:41 +0000 (13:50 +1000)]
xfs: remove invalid log recovery first/last cycle check

One of the first steps of log recovery is to check for the special
case of a zeroed log. If the first cycle in the log is zero or the
tail portion of the log is zeroed, the head is set to the first
instance of cycle 0. xlog_find_zeroed() includes a sanity check that
enforces that the first cycle in the log must be 1 if the last cycle
is 0. While this is true in most cases, the check is not totally
valid because it doesn't consider the case where the filesystem
crashed after a partial/out of order log buffer completion that
wraps around the end of the physical log.

For example, consider a filesystem that has completed most of the
first cycle of the log, reaches the end of the physical log and
splits the next single log buffer write into two in order to wrap
around the end of the log. If these I/Os are reordered, the second
(wrapped) I/O completes and the first happens to fail, the log is
left in a state where the last cycle of the log is 0 and the first
cycle is 2. This causes the xlog_find_zeroed() sanity check to fail
and prevents the filesystem from mounting. This situation has been
reproduced on particular systems via repeated runs of generic/475.

This is an expected state that log recovery already knows how to
deal with, however. Since the log is still partially zeroed, the
head is detected correctly and points to a valid tail. The
subsequent stale block detection clears blocks beyond the head up to
the tail (within a maximum range), with the express purpose of
clearing such out of order writes. As expected, this removes the out
of order cycle 2 blocks at the physical start of the log.

In other words, the only thing that prevents a clean mount and
recovery of the filesystem in this scenario is the specific (last ==
0 && first != 1) sanity check in xlog_find_zeroed(). Since the log
head/tail are now independently validated via cycle, log record and
CRC checks, this highly specific first cycle check is of dubious
value. Remove it and rely on the higher level validation to
determine whether log content is sane and recoverable.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: validate inode di_forkoff
Eric Sandeen [Sat, 29 Sep 2018 03:50:13 +0000 (13:50 +1000)]
xfs: validate inode di_forkoff

Verify the inode di_forkoff, lifted from xfs_repair's
process_check_inode_forkoff().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: skip delalloc COW blocks in xfs_reflink_end_cow
Christoph Hellwig [Sat, 29 Sep 2018 03:49:58 +0000 (13:49 +1000)]
xfs: skip delalloc COW blocks in xfs_reflink_end_cow

The iomap direct I/O code issues a single ->end_io call for the whole
I/O request, and if some of the extents cowered needed a COW operation
it will call xfs_reflink_end_cow over the whole range.

When we do AIO writes we drop the iolock after doing the initial setup,
but before the I/O completion.  Between dropping the lock and completing
the I/O we can have a racing buffered write create new delalloc COW fork
extents in the region covered by the outstanding direct I/O write, and
thus see delalloc COW fork extents in xfs_reflink_end_cow.  As
concurrent writes are fundamentally racy and no guarantees are given we
can simply skip those.

This can be easily reproduced with xfstests generic/208 in always_cow
mode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: don't treat unknown di_flags2 as corruption in scrub
Eric Sandeen [Sat, 29 Sep 2018 03:49:00 +0000 (13:49 +1000)]
xfs: don't treat unknown di_flags2 as corruption in scrub

xchk_inode_flags2() currently treats any di_flags2 values that the
running kernel doesn't recognize as corruption, and calls
xchk_ino_set_corrupt() if they are set.  However, it's entirely possible
that these flags were set in some newer kernel and are quite valid,
but ignored in this kernel.

(Validators don't care one bit about unknown di_flags2.)

Call xchk_ino_set_warning instead, because this may or may not actually
indicate a problem.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: remove duplicated include from alloc.c
YueHaibing [Sat, 29 Sep 2018 03:48:21 +0000 (13:48 +1000)]
xfs: remove duplicated include from alloc.c

Remove duplicated include xfs_alloc.h

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: don't bring in extents in xfs_bmap_punch_delalloc_range
Christoph Hellwig [Sat, 29 Sep 2018 03:47:46 +0000 (13:47 +1000)]
xfs: don't bring in extents in xfs_bmap_punch_delalloc_range

This function is only used to punch out delayed allocations on I/O
failure, which means we need to have read the extents earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: fix transaction leak in xfs_reflink_allocate_cow()
Dave Chinner [Sat, 29 Sep 2018 03:47:15 +0000 (13:47 +1000)]
xfs: fix transaction leak in xfs_reflink_allocate_cow()

When xfs_reflink_allocate_cow() allocates a transaction, it drops
the ILOCK to perform the operation. This Introduces a race condition
where another thread modifying the file can perform the COW
allocation operation underneath us. This result in the retry loop
finding an allocated block and jumping straight to the conversion
code. It does not, however, cancel the transaction it holds and so
this gets leaked. This results in a lockdep warning:

================================================
WARNING: lock held when returning to user space!
4.18.5 #1 Not tainted
------------------------------------------------
worker/6123 is leaving the kernel with locks still held!
1 lock held by worker/6123:
 #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220

And eventually the filesystem deadlocks because it runs out of log
space that is reserved by the leaked transaction and never gets
released.

The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
gotos - it's no surprise that it has bug where the flow through
several goto jumps then fails to clean up context from a non-obvious
logic path. CLean up the logic flow and make sure every path does
the right thing.

Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
Tested-by: Alexander Y. Fomichev <git.user@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: avoid lockdep false positives in xfs_trans_alloc
Dave Chinner [Sat, 29 Sep 2018 03:46:21 +0000 (13:46 +1000)]
xfs: avoid lockdep false positives in xfs_trans_alloc

We've had a few reports of lockdep tripping over memory reclaim
context vs filesystem freeze "deadlocks". They all have looked
to be false positives on analysis, but it seems that they are
being tripped because we take freeze references before we run
a GFP_KERNEL allocation for the struct xfs_trans.

We can avoid this false positive vector just by re-ordering the
operations in xfs_trans_alloc(). That is. we need allocate the
structure before we take the freeze reference and enter the GFP_NOFS
allocation context that follows the xfs_trans around. This prevents
lockdep from seeing the GFP_KERNEL allocation inside the transaction
context, and that prevents it from triggering the freeze level vs
alloc context vs reclaim warnings.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 years agoxfs: refactor xfs_buf_log_item reference count handling
Brian Foster [Sat, 29 Sep 2018 03:45:26 +0000 (13:45 +1000)]
xfs: refactor xfs_buf_log_item reference count handling

The xfs_buf_log_item structure has a reference counter with slightly
tricky semantics. In the common case, a buffer is logged and
committed in a transaction, committed to the on-disk log (added to
the AIL) and then finally written back and removed from the AIL. The
bli refcount covers two potentially overlapping timeframes:

 1. the bli is held in an active transaction
 2. the bli is pinned by the log

The caveat to this approach is that the reference counter does not
purely dictate the lifetime of the bli. IOW, when a dirty buffer is
physically logged and unpinned, the bli refcount may go to zero as
the log item is inserted into the AIL. Only once the buffer is
written back can the bli finally be freed.

The above semantics means that it is not enough for the various
refcount decrementing contexts to release the bli on decrement to
zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
unpin (->iop_unpin()) must all drop the associated reference and
make additional checks to determine if the current context is
responsible for freeing the item.

For example, if a transaction holds but does not dirty a particular
bli, the commit may drop the refcount to zero. If the bli itself is
clean, it is also not AIL resident and must be freed at this time.
The same is true for xfs_trans_brelse(). If the transaction dirties
a bli and then aborts or an unpin results in an abort due to a log
I/O error, the last reference count holder is expected to explicitly
remove the item from the AIL and release it (since an abort means
filesystem shutdown and metadata writeback will never occur).

This leads to fairly complex checks being replicated in a few
different places. Since ->iop_unlock() and xfs_trans_brelse() are
nearly identical, refactor the logic into a common helper that
implements and documents the semantics in one place. This patch does
not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
5 years agoxfs: clean up xfs_trans_brelse()
Brian Foster [Sat, 29 Sep 2018 03:45:02 +0000 (13:45 +1000)]
xfs: clean up xfs_trans_brelse()

xfs_trans_brelse() is a bit of a historical mess, similar to
xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of
commented out code, inconsistency with regard to stale items, etc.

Clean up xfs_trans_brelse() to use similar logic and flow as
xfs_buf_item_unlock() with regard to bli reference count handling.
This patch makes no functional changes, but facilitates further
refactoring of the common bli reference count handling code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>