5 years agolibceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS
Ilya Dryomov [Mon, 26 Jun 2017 10:05:55 +0000 (12:05 +0200)]
libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS

All four SERVER_LUMINOUS feature bits are implemented, switch it on!

NEW_OSDOP_ENCODING doesn't mean much for the client (it signifies
support for MOSDOp v6) but needs to be enabled in order to get the
latest (currently v25) pg_pool_t.

Signed-off-by: Ilya Dryomov <>
Acked-by: Sage Weil <>
5 years agolibceph: osd_state is 32 bits wide in luminous
Ilya Dryomov [Thu, 22 Jun 2017 17:44:06 +0000 (19:44 +0200)]
libceph: osd_state is 32 bits wide in luminous

Signed-off-by: Ilya Dryomov <>
5 years agocrush: remove an obsolete comment
Ilya Dryomov [Thu, 22 Jun 2017 17:44:05 +0000 (19:44 +0200)]
crush: remove an obsolete comment

Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50.

Signed-off-by: Ilya Dryomov <>
5 years agocrush: crush_init_workspace starts with struct crush_work
Ilya Dryomov [Thu, 22 Jun 2017 17:44:05 +0000 (19:44 +0200)]
crush: crush_init_workspace starts with struct crush_work

It is not just a pointer to crush_work, it is the whole structure.
That is not a problem since it only contains a pointer. But it will
be a problem if new data members are added to crush_work.

Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph, crush: per-pool crush_choose_arg_map for crush_do_rule()
Ilya Dryomov [Thu, 22 Jun 2017 17:44:05 +0000 (19:44 +0200)]
libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()

If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,

Signed-off-by: Ilya Dryomov <>
5 years agocrush: implement weight and id overrides for straw2
Ilya Dryomov [Thu, 22 Jun 2017 17:44:05 +0000 (19:44 +0200)]
crush: implement weight and id overrides for straw2

bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: apply_upmap()
Ilya Dryomov [Wed, 21 Jun 2017 15:27:18 +0000 (17:27 +0200)]
libceph: apply_upmap()

Previously, pg_to_raw_osds() didn't filter for existent OSDs because
raw_to_up_osds() would filter for "up" ("up" is predicated on "exists")
and raw_to_up_osds() was called directly after pg_to_raw_osds().  Now,
with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds()
output can affect apply_upmap().  Introduce remove_nonexistent_osds()
to deal with that.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: compute actual pgid in ceph_pg_to_up_acting_osds()
Ilya Dryomov [Wed, 21 Jun 2017 15:27:18 +0000 (17:27 +0200)]
libceph: compute actual pgid in ceph_pg_to_up_acting_osds()

Move raw_pg_to_pg() call out of get_temp_osds() and into
ceph_pg_to_up_acting_osds(), for upcoming apply_upmap().

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: pg_upmap[_items] infrastructure
Ilya Dryomov [Wed, 21 Jun 2017 15:27:18 +0000 (17:27 +0200)]
libceph: pg_upmap[_items] infrastructure

pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap.  (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)

Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: ceph_decode_skip_* helpers
Ilya Dryomov [Wed, 21 Jun 2017 15:27:17 +0000 (17:27 +0200)]
libceph: ceph_decode_skip_* helpers

Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.

Replace skip_name_map() with ceph_decode_skip_map as an example.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: kill __{insert,lookup,remove}_pg_mapping()
Ilya Dryomov [Wed, 21 Jun 2017 15:27:17 +0000 (17:27 +0200)]
libceph: kill __{insert,lookup,remove}_pg_mapping()

Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping().

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: introduce and switch to decode_pg_mapping()
Ilya Dryomov [Wed, 21 Jun 2017 15:27:17 +0000 (17:27 +0200)]
libceph: introduce and switch to decode_pg_mapping()

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: don't pass pgid by value
Ilya Dryomov [Wed, 21 Jun 2017 15:27:17 +0000 (17:27 +0200)]
libceph: don't pass pgid by value

Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping
counterparts: take const struct ceph_pg *.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: respect RADOS_BACKOFF backoffs
Ilya Dryomov [Mon, 19 Jun 2017 10:18:05 +0000 (12:18 +0200)]
libceph: respect RADOS_BACKOFF backoffs

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: make DEFINE_RB_* helpers more general
Ilya Dryomov [Mon, 19 Jun 2017 10:18:05 +0000 (12:18 +0200)]
libceph: make DEFINE_RB_* helpers more general

Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id,
compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare()

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: avoid unnecessary pi lookups in calc_target()
Ilya Dryomov [Thu, 15 Jun 2017 14:30:56 +0000 (16:30 +0200)]
libceph: avoid unnecessary pi lookups in calc_target()

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: use target pi for calc_target() calculations
Ilya Dryomov [Thu, 15 Jun 2017 14:30:55 +0000 (16:30 +0200)]
libceph: use target pi for calc_target() calculations

For luminous and beyond we are encoding the actual spgid, which
requires operating with the correct pg_num, i.e. that of the target

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: always populate t->target_{oid,oloc} in calc_target()
Ilya Dryomov [Thu, 15 Jun 2017 14:30:55 +0000 (16:30 +0200)]
libceph: always populate t->target_{oid,oloc} in calc_target()

need_check_tiering logic doesn't make a whole lot of sense.  Drop it
and apply tiering unconditionally on every calc_target() call instead.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: make sure need_resend targets reflect latest map
Ilya Dryomov [Thu, 15 Jun 2017 14:30:55 +0000 (16:30 +0200)]
libceph: make sure need_resend targets reflect latest map

Otherwise we may miss events like PG splits, pool deletions, etc when
we get multiple incremental maps at once.  Because check_pool_dne() can
now be fed an unlinked request, finish_request() needed to be taught to
handle unlinked requests.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: delete from need_resend_linger before check_linger_pool_dne()
Ilya Dryomov [Thu, 15 Jun 2017 14:30:55 +0000 (16:30 +0200)]
libceph: delete from need_resend_linger before check_linger_pool_dne()

When processing a map update consisting of multiple incrementals, we
may end up running check_linger_pool_dne() on a lingering request that
was previously added to need_resend_linger list.  If it is concluded
that the target pool doesn't exist, the request is killed off while
still on need_resend_linger list, which leads to a crash on a NULL
lreq->osd in kick_requests():

    libceph: linger_id 18446462598732840961 pool does not exist
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: ceph_osdc_handle_map+0x4ae/0x870

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: resend on PG splits if OSD has RESEND_ON_SPLIT
Ilya Dryomov [Thu, 15 Jun 2017 14:30:54 +0000 (16:30 +0200)]
libceph: resend on PG splits if OSD has RESEND_ON_SPLIT

Note that ceph_osd_request_target fields are updated regardless of

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: drop need_resend from calc_target()
Ilya Dryomov [Thu, 15 Jun 2017 14:30:54 +0000 (16:30 +0200)]
libceph: drop need_resend from calc_target()

Replace it with more fine-grained bools to separate updating
ceph_osd_request_target fields and the decision to resend.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: MOSDOp v8 encoding (actual spgid + full hash)
Ilya Dryomov [Thu, 15 Jun 2017 14:30:54 +0000 (16:30 +0200)]
libceph: MOSDOp v8 encoding (actual spgid + full hash)

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: ceph_connection_operations::reencode_message() method
Ilya Dryomov [Thu, 15 Jun 2017 14:30:54 +0000 (16:30 +0200)]
libceph: ceph_connection_operations::reencode_message() method

Give upper layers a chance to reencode the message after the connection
is negotiated and ->peer_features is set.  OSD client will use this to
support both luminous and pre-luminous OSDs (in a single cluster): the
former need MOSDOp v8; the latter will continue to be sent MOSDOp v4.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: encode_{pgid,oloc}() helpers
Ilya Dryomov [Thu, 15 Jun 2017 14:30:53 +0000 (16:30 +0200)]
libceph: encode_{pgid,oloc}() helpers

Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: introduce ceph_spg, ceph_pg_to_primary_shard()
Ilya Dryomov [Thu, 15 Jun 2017 14:30:53 +0000 (16:30 +0200)]
libceph: introduce ceph_spg, ceph_pg_to_primary_shard()

Store both raw pgid and actual spgid in ceph_osd_request_target.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: new pi->last_force_request_resend
Ilya Dryomov [Mon, 5 Jun 2017 12:45:00 +0000 (14:45 +0200)]
libceph: new pi->last_force_request_resend

The old (v15) pi->last_force_request_resend has been repurposed to
make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do
obey pi->last_force_request_resend resend on splits.  See ceph.git
commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend
ops on split").

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: fold [l]req->last_force_resend into ceph_osd_request_target
Ilya Dryomov [Mon, 5 Jun 2017 12:45:00 +0000 (14:45 +0200)]
libceph: fold [l]req->last_force_resend into ceph_osd_request_target

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: support SERVER_JEWEL feature bits
Ilya Dryomov [Mon, 5 Jun 2017 12:45:00 +0000 (14:45 +0200)]
libceph: support SERVER_JEWEL feature bits


Signed-off-by: Ilya Dryomov <>
5 years agolibceph: advertise support for OSD_POOLRESEND
Ilya Dryomov [Mon, 5 Jun 2017 12:45:00 +0000 (14:45 +0200)]
libceph: advertise support for OSD_POOLRESEND

The code has been in place since commit 63244fa123a7 ("libceph:
introduce ceph_osd_request_target, calc_target()"), and, with the
ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now
in working order.

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: handle non-empty dest in ceph_{oloc,oid}_copy()
Ilya Dryomov [Mon, 5 Jun 2017 12:44:59 +0000 (14:44 +0200)]
libceph: handle non-empty dest in ceph_{oloc,oid}_copy()

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: new features macros
Ilya Dryomov [Mon, 5 Jun 2017 12:44:59 +0000 (14:44 +0200)]
libceph: new features macros

Signed-off-by: Ilya Dryomov <>
5 years agolibceph: remove ceph_sanitize_features() workaround
Ilya Dryomov [Mon, 5 Jun 2017 12:44:59 +0000 (14:44 +0200)]
libceph: remove ceph_sanitize_features() workaround

Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a.

Signed-off-by: Ilya Dryomov <>
5 years agoceph: update ceph_dentry_info::lease_session when necessary
Yan, Zheng [Mon, 3 Jul 2017 01:09:10 +0000 (09:09 +0800)]
ceph: update ceph_dentry_info::lease_session when necessary

Current code does not update ceph_dentry_info::lease_session once
it is set. If auth mds of corresponding dentry changes, dentry lease
keeps in an invalid state.

Signed-off-by: "Yan, Zheng" <>
Reviewed-by: Jeff Layton <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: new mount option that specifies fscache uniquifier
Yan, Zheng [Tue, 27 Jun 2017 03:57:56 +0000 (11:57 +0800)]
ceph: new mount option that specifies fscache uniquifier

Current ceph uses FSID as primary index key of fscache data. This
allows ceph to retain cached data across remount. But this causes
problem (kernel opps, fscache does not support sharing data) when
a filesystem get mounted several times (with fscache enabled, with
different mount options).

The fix is adding a new mount option, which specifies uniquifier
for fscache.

Signed-off-by: "Yan, Zheng" <>
Acked-by: Jeff Layton <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: avoid accessing freeing inode in ceph_check_delayed_caps()
Yan, Zheng [Tue, 27 Jun 2017 09:17:24 +0000 (17:17 +0800)]
ceph: avoid accessing freeing inode in ceph_check_delayed_caps()

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: avoid invalid memory dereference in the middle of umount
Yan, Zheng [Thu, 22 Jun 2017 08:26:34 +0000 (16:26 +0800)]
ceph: avoid invalid memory dereference in the middle of umount

extra_mon_dispatch() and debugfs' foo_show functions dereference
fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch
and debugfs before destroying fsc->mds.

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: getattr before read on ceph.* xattrs
Yan, Zheng [Wed, 14 Jun 2017 07:54:56 +0000 (15:54 +0800)]
ceph: getattr before read on ceph.* xattrs

Previously we were returning values for quota, layout
xattrs without any kind of update -- the user just got
whatever happened to be in our cache.

Clearly this extra round trip has a cost, but reads of
these xattrs are fairly rare, happening on admin
intervention rather than in normal operation.

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: don't re-send interrupted flock request
Yan, Zheng [Mon, 5 Jun 2017 03:07:28 +0000 (11:07 +0800)]
ceph: don't re-send interrupted flock request

Don't re-send interrupted flock request in cases of mds failover
and receiving request forward. Because corresponding 'lock intr'
request may have been finished, it won't get re-sent.

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: cleanup writepage_nounlock()
Yan, Zheng [Tue, 23 May 2017 09:48:28 +0000 (17:48 +0800)]
ceph: cleanup writepage_nounlock()

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: redirty page when writepage_nounlock() skips unwritable page
Yan, Zheng [Tue, 23 May 2017 09:18:53 +0000 (17:18 +0800)]
ceph: redirty page when writepage_nounlock() skips unwritable page

Ceph needs to flush dirty page in the order in which in which snap
context they belong to. Dirty pages belong to older snap context
should be flushed earlier. if writepage_nounlock() can not flush a
page, it should redirty the page.

Reported-by: Dan Carpenter <>
Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: remove useless page->mapping check in writepage_nounlock()
Yan, Zheng [Tue, 23 May 2017 09:03:12 +0000 (17:03 +0800)]
ceph: remove useless page->mapping check in writepage_nounlock()

Callers of writepage_nounlock() have already ensured non-null

Reported-by: Dan Carpenter <>
Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: update the 'approaching max_size' code
Yan, Zheng [Mon, 22 May 2017 04:03:32 +0000 (12:03 +0800)]
ceph: update the 'approaching max_size' code

The old 'approaching max_size' code expects MDS set max_size to
'2 * reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoceph: re-request max size after importing caps
Yan, Zheng [Tue, 16 May 2017 00:55:34 +0000 (08:55 +0800)]
ceph: re-request max size after importing caps

The 'wanted max size' could be sent to inode's old auth mds, re-send
it to inode's new auth mds if necessary. Otherwise write syscall may

Signed-off-by: "Yan, Zheng" <>
Signed-off-by: Ilya Dryomov <>
5 years agoLinux 4.12 v4.12
Linus Torvalds [Sun, 2 Jul 2017 23:07:02 +0000 (16:07 -0700)]
Linux 4.12

5 years agomoduleparam: fix doc: hwparam_irq configures an IRQ
Sylvain 'ythier' Hitier [Sun, 2 Jul 2017 13:21:56 +0000 (15:21 +0200)]
moduleparam: fix doc: hwparam_irq configures an IRQ

Signed-off-by: Sylvain 'ythier' Hitier <>
Signed-off-by: Linus Torvalds <>
5 years agoMerge branch 'upstream' of git://
Linus Torvalds [Sun, 2 Jul 2017 18:53:44 +0000 (11:53 -0700)]
Merge branch 'upstream' of git://

Pull MIPS fixes from Ralf Baechle:
 "Here's a final round of fixes for 4.12:

   - Fix misordered instructions in assembly code making kenel startup
     via UHB unreliable.

   - Fix special case of MADDF and MADDF emulation.

   - Fix alignment issue in address calculation in pm-cps on 64 bit.

   - Fix IRQ tracing & lockdep when rescheduling

   - Systems with MAARs require post-DMA cache flushes.

  The reordering fix and the MADDF/MSUBF fix have sat in linux-next for
  a number of days. The others haven't propagated from my pull tree to
  linux-next yet but all have survived manual testing and Imagination's
  automated test system and there are no pending bug reports"

* 'upstream' of git://
  MIPS: Avoid accidental raw backtrace
  MIPS: Perform post-DMA cache flushes on systems with MAARs
  MIPS: Fix IRQ tracing & lockdep when rescheduling
  MIPS: pm-cps: Drop manual cache-line alignment of ready_count
  MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
  MIPS: head: Reorder instructions missing a delay slot

5 years agoMerge branch 'fixes' of git://
Linus Torvalds [Sun, 2 Jul 2017 17:09:40 +0000 (10:09 -0700)]
Merge branch 'fixes' of git://

Pull ARM fix from Russell King:
 "One final fix for 4.12 - Doug found a boot failure case triggered by
  requesting a non-even MB vmalloc size"

* 'fixes' of git://
  ARM: 8685/1: ensure memblock-limit is pmd-aligned

5 years agoMerge branch 'x86-urgent-for-linus' of git://
Linus Torvalds [Sat, 1 Jul 2017 16:10:17 +0000 (09:10 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Fixlets for x86:

   - Prevent kexec crash when KASLR is enabled, which was caused by an
     address calculation bug

   - Restore the freeing of PUDs on memory hot remove

   - Correct a negated pointer check in the intel uncore performance
     monitoring driver

   - Plug a memory leak in an error exit path in the RDT code"

* 'x86-urgent-for-linus' of git://
  x86/intel_rdt: Fix memory leak on mount failure
  x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug
  x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization
  perf/x86/intel/uncore: Fix wrong box pointer check
  x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD

5 years agoMerge branch 'perf-urgent-for-linus' of git://
Linus Torvalds [Sat, 1 Jul 2017 15:46:52 +0000 (08:46 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
 "The last fix for perf for this cycles:

   - Prevent a segfault when kernel.kptr_restrict=2 is set by avoiding a
     null pointer dereference"

* 'perf-urgent-for-linus' of git://
  perf machine: Fix segfault for kernel.kptr_restrict=2

5 years agoMerge tag 'pinctrl-v4.12-4' of git://
Linus Torvalds [Sat, 1 Jul 2017 15:39:13 +0000 (08:39 -0700)]
Merge tag 'pinctrl-v4.12-4' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fix from Linus Walleij:
 "Brian noticed that this regression has not got a proper fix for the
  entire merge window and consequently we need to revert the offending

  It's part of the RT-mainstream work, the dance goes like this, two
  steps forward, one step back.


   - A last fix for v4.12, an IRQ problem reported early in the merge
     window appears not to have been properly fixed, so the offending
     commit will be reverted and we will find the proper fix for v4.13.

* tag 'pinctrl-v4.12-4' of git://
  Revert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"

5 years agoMerge tag 'gpio-v4.12-4' of git://
Linus Torvalds [Sat, 1 Jul 2017 15:24:54 +0000 (08:24 -0700)]
Merge tag 'gpio-v4.12-4' of git://git./linux/kernel/git/linusw/linux-gpio

Pull last minute fixes for GPIO from Linus Walleij:

 - Fix another ACPI problem with broken BIOSes.

 - Filter out the right GPIO events, making a very user-visible bug go

* tag 'gpio-v4.12-4' of git://
  gpio: acpi: Skip _AEI entries without a handler rather then aborting the scan
  gpiolib: fix filtering out unwanted events

5 years agoMerge tag 'trace-v4.12-rc5' of git://
Linus Torvalds [Sat, 1 Jul 2017 00:18:57 +0000 (17:18 -0700)]
Merge tag 'trace-v4.12-rc5' of git://git./linux/kernel/git/rostedt/linux-trace

Pull last-minute tracing fixes from Steven Rostedt:
 "Two fixes:

  One is for a crash when using the :mod: trace probe command into
  stack_trace_filter. This bug was introduced during the last merge

  The other was there forever. It's a small bug that makes it impossible
  to name a module function for kprobes when the module starts with a

* tag 'trace-v4.12-rc5' of git://
  tracing/kprobes: Allow to create probe with a module name starting with a digit
  ftrace: Fix regression with module command in stack_trace_filter

5 years agouapi/linux/a.out.h: don't use deprecated system-specific predefines.
Zack Weinberg [Wed, 14 Jun 2017 15:14:28 +0000 (08:14 -0700)]
uapi/linux/a.out.h: don't use deprecated system-specific predefines.

uapi/linux/a.out.h uses a number of predefined macros that are
deprecated because they're in the application namespace
(e.g. '#ifdef linux' instead of '#ifdef __linux__').
This patch either corrects or just removes them if they are not
applicable to Linux.

The primary reason this is worth bothering to fix, considering how
obsolete a.out binary support is, is that the GCC build process
considers this such a severe error that it will copy the header into a
private directory and change the macro names, which causes future
updates to the header to be masked.  This header probably doesn't get
updated very often anymore, but it is the _only_ uapi header that gets
this treatment, so IMHO it is worth patching just to drive that number
all the way to zero.

Signed-off-by: Zack Weinberg <>
[hch: removed dead conditionals]
Signed-off-by: Christoph Hellwig <>
Signed-off-by: Linus Torvalds <>
5 years agohashtable: remove repeated phrase from a comment
Jakub Kicinski [Thu, 29 Jun 2017 04:25:48 +0000 (21:25 -0700)]
hashtable: remove repeated phrase from a comment

"in a rcu enabled hashtable" is repeated twice in a comment.

Signed-off-by: Jakub Kicinski <>
Signed-off-by: Linus Torvalds <>
5 years agox86/intel_rdt: Fix memory leak on mount failure
Vikas Shivappa [Mon, 26 Jun 2017 18:55:49 +0000 (11:55 -0700)]
x86/intel_rdt: Fix memory leak on mount failure

If mount fails, the kn_info directory is not freed causing memory leak.

Add the missing error handling path.

Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system")
Signed-off-by: Vikas Shivappa <>
Signed-off-by: Thomas Gleixner <>
5 years agoMerge tag 'powerpc-4.12-8' of git://
Linus Torvalds [Fri, 30 Jun 2017 17:55:34 +0000 (10:55 -0700)]
Merge tag 'powerpc-4.12-8' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Hopefully the last two powerpc fixes for 4.12.

  The CXL one is larger than I'd usually send at rc7, but it fixes new
  code this cycle, so better to have it working for the release. It was
  actually sent a few weeks back but got blocked in testing behind
  another fix that was causing issues.

  We are still tracking one crash in v4.12-rc7, but only one person has
  reproduced it and the commit identified by bisect doesn't touch any of
  the relevant code, so I think it's 50/50 whether that commit is
  actually the problem or it's some code layout / toolchain issue.

  Two fixes for code we merged this cycle:

   - cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

   - Avoid miscompilation w/GCC 4.6.3 on 32-bit - don't inline

  Thanks to Al Viro, Larry Finger, Christophe Lombard"

* tag 'powerpc-4.12-8' of git://
  powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user()
  cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

5 years agoMerge tag 'iommu-fixes-v4.12-rc7' of git://
Linus Torvalds [Fri, 30 Jun 2017 17:37:48 +0000 (10:37 -0700)]
Merge tag 'iommu-fixes-v4.12-rc7' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "Two fixes:

   - A fix for AMD IOMMU interrupt remapping code when IRQs are
     forwarded directly to KVM guests

   - Fixed check in the recently merged code to allow tboot with
     Intel VT-d disabled"

* tag 'iommu-fixes-v4.12-rc7' of git://
  iommu/amd: Fix interrupt remapping when disable guest_mode
  iommu/vt-d: Correctly disable Intel IOMMU force on

5 years agoMerge tag 'sound-4.12' of git://
Linus Torvalds [Fri, 30 Jun 2017 17:30:26 +0000 (10:30 -0700)]
Merge tag 'sound-4.12' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Two last-minute HD-audio fixes"

* tag 'sound-4.12' of git://
  ALSA: hda - Fix endless loop of codec configure
  ALSA: hda - set input_path bitmap to zero after moving it to new place

5 years agoMerge branch 'overlayfs-linus' of git://
Linus Torvalds [Fri, 30 Jun 2017 17:22:59 +0000 (10:22 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix two bugs in copy-up code. One introduced in 4.11 and one in

* 'overlayfs-linus' of git://
  ovl: don't set origin on broken lower hardlink
  ovl: copy-up: don't unlock between lookup and link

5 years agox86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug
Baoquan He [Tue, 27 Jun 2017 12:39:06 +0000 (20:39 +0800)]
x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug

Kernel text KASLR is separated into physical address and virtual
address randomization. And for virtual address randomization, we
only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
but not the original kernel loading address 'output'.

The bug will cause kernel boot failure if kernel is loaded at a different
position than the address, 16M, which is decided at compiled time.
Kexec/kdump is such practical case.

To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial

Tested-by: Dave Young <>
Signed-off-by: Baoquan He <>
Cc: Linus Torvalds <>
Cc: Peter Zijlstra <>
Cc: Thomas Gleixner <>
Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately")
Signed-off-by: Ingo Molnar <>
5 years agox86/boot/KASLR: Add checking for the offset of kernel virtual address randomization
Baoquan He [Tue, 27 Jun 2017 12:39:05 +0000 (20:39 +0800)]
x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization

For kernel text KASLR, the virtual address is confined to area of 1G,
[0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
virtual address randomization, we only randomize to get an offset
between 16M and 1G, then add this offset to the starting address,
0xffffffff80000000. Here 16M is the offset which is decided at linking
stage. So the amount of the local variable 'virt_addr' which respresents
the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.

Add a debug check for the offset. If out of bounds, print error
message and hang there.

Suggested-by: Ingo Molnar <>
Signed-off-by: Baoquan He <>
Cc: Linus Torvalds <>
Cc: Peter Zijlstra <>
Cc: Thomas Gleixner <>
Signed-off-by: Ingo Molnar <>
5 years agotracing/kprobes: Allow to create probe with a module name starting with a digit
Sabrina Dubroca [Thu, 22 Jun 2017 09:24:42 +0000 (11:24 +0200)]
tracing/kprobes: Allow to create probe with a module name starting with a digit

Always try to parse an address, since kstrtoul() will safely fail when
given a symbol as input. If that fails (which will be the case for a
symbol), try to parse a symbol instead.

This allows creating a probe such as:

    p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0

Which is necessary for this command to work:

    perf probe -m 8021q -a vlan_gro_receive

Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer")
Acked-by: Masami Hiramatsu <>
Signed-off-by: Sabrina Dubroca <>
Signed-off-by: Steven Rostedt (VMware) <>
5 years agoMIPS: Avoid accidental raw backtrace
James Hogan [Thu, 29 Jun 2017 14:05:04 +0000 (15:05 +0100)]
MIPS: Avoid accidental raw backtrace

Since commit 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with
usermode") show_backtrace() invokes the raw backtracer when
cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels
where user and kernel address spaces overlap.

However this is used by show_stack() which creates its own pt_regs on
the stack and leaves cp0_status uninitialised in most of the code paths.
This results in the non deterministic use of the raw back tracer
depending on the previous stack content.

show_stack() deals exclusively with kernel mode stacks anyway, so
explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure
we get a useful backtrace.

Fixes: 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode")
Signed-off-by: James Hogan <>
Cc: <> # 3.15+
Signed-off-by: Ralf Baechle <>
5 years agoMIPS: Perform post-DMA cache flushes on systems with MAARs
Paul Burton [Tue, 13 Jun 2017 17:01:08 +0000 (10:01 -0700)]
MIPS: Perform post-DMA cache flushes on systems with MAARs

Recent CPUs from Imagination Technologies such as the I6400 or P6600 are
able to speculatively fetch data from memory into caches. This means
that if used in a system with non-coherent DMA they require that caches
be invalidated after a device performs DMA, and before the CPU reads the
DMA'd data, in order to ensure that stale values weren't speculatively

Such CPUs also introduced Memory Accessibility Attribute Registers
(MAARs) in order to control the regions in which they are allowed to
speculate. Thus we can use the presence of MAARs as a good indication
that the CPU requires the above cache maintenance. Use the presence of
MAARs to determine the result of cpu_needs_post_dma_flush() in the
default case, in order to handle these recent CPUs correctly.

Note that the return type of cpu_needs_post_dma_flush() is changed to
bool, such that it's clearer what's happening when cpu_has_maar is cast
to bool for the return value. If this patch were backported to a
pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int
we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is
currently 1ull<<30, so when truncated to an int gives a non-zero value
anyway, but even so the implicit conversion from long long int to bool
makes it clearer to understand what will happen than the implicit
conversion from long long int to int would. The bool return type also
fits this usage better semantically, so seems like an all-round win.

Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting
the return type change.

Signed-off-by: Paul Burton <>
Reviewed-by: Bryan O'Donoghue <>
Tested-by: Bryan O'Donoghue <>
Cc: Ed Blake <>
Signed-off-by: Ralf Baechle <>
5 years agoMIPS: Fix IRQ tracing & lockdep when rescheduling
Paul Burton [Fri, 3 Mar 2017 23:26:05 +0000 (15:26 -0800)]
MIPS: Fix IRQ tracing & lockdep when rescheduling

When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
from arch/mips/kernel/entry.S we disable interrupts. This is true
regardless of whether we reach work_resched from syscall_exit_work,
resume_userspace or by looping after calling schedule(). Although we
disable interrupts in these paths we don't call trace_hardirqs_off()
before calling into C code which may acquire locks, and we therefore
leave lockdep with an inconsistent view of whether interrupts are
both enabled.

Without tracing this interrupt state lockdep will print warnings such
as the following once a task returns from a syscall via
syscall_exit_partial with TIF_NEED_RESCHED set:

[   49.927678] ------------[ cut here ]------------
[   49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
[   49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
[   49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
[   49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
[   49.974431]         0000000000000000 0000000000000000 0000000000000000 000000000000004a
[   49.985300]         ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
[   49.996194]         0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
[   50.007063]         000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
[   50.017945]         0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
[   50.028827]         0000000000000000 0000000000000001 0000000000000000 0000000000000000
[   50.039688]         0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
[   50.050575]         00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
[   50.061448]         0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
[   50.072327]         ...
[   50.076087] Call Trace:
[   50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
[   50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
[   50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
[   50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
[   50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
[   50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
[   50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
[   50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
[   50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
[   50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
[   50.148405] possible reason: unannotated irqs-off.
[   50.154600] irq event stamp: 400463
[   50.159566] hardirqs last  enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
[   50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
[   50.183897] softirqs last  enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
[   50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128

Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
schedule() following the work_resched label because:

 1) Interrupts are disabled regardless of the path we take to reach
    work_resched() & schedule().

 2) Performing the tracing here avoids the need to do it in paths which
    disable interrupts but don't call out to C code before hitting a
    path which uses the RESTORE_SOME macro that will call
    trace_hardirqs_on() or trace_hardirqs_off() as appropriate.

We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
syscall_trace_leave() for similar reasons, ensuring that lockdep has a
consistent view of state after we re-enable interrupts.

Signed-off-by: Paul Burton <>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <>
Signed-off-by: Ralf Baechle <>
5 years agoMIPS: pm-cps: Drop manual cache-line alignment of ready_count
Paul Burton [Thu, 2 Mar 2017 22:02:40 +0000 (14:02 -0800)]
MIPS: pm-cps: Drop manual cache-line alignment of ready_count

We allocate memory for a ready_count variable per-CPU, which is accessed
via a cached non-coherent TLB mapping to perform synchronisation between
threads within the core using LL/SC instructions. In order to ensure
that the variable is contained within its own data cache line we
allocate 2 lines worth of memory & align the resulting pointer to a line
boundary. This is however unnecessary, since kmalloc is guaranteed to
return memory which is at least cache-line aligned (see
ARCH_DMA_MINALIGN). Stop the redundant manual alignment.

Besides cleaning up the code & avoiding needless work, this has the side
effect of avoiding an arithmetic error found by Bryan on 64 bit systems
due to the 32 bit size of the former dlinesz. This led the ready_count
variable to have its upper 32b cleared erroneously for MIPS64 kernels,
causing problems when ready_count was later used on MIPS64 via cpuidle.

Signed-off-by: Paul Burton <>
Fixes: 3179d37ee1ed ("MIPS: pm-cps: add PM state entry code for CPS systems")
Reported-by: Bryan O'Donoghue <>
Reviewed-by: Bryan O'Donoghue <>
Tested-by: Bryan O'Donoghue <>
Cc: stable <> # v3.16+
Signed-off-by: Ralf Baechle <>
5 years agoARM: 8685/1: ensure memblock-limit is pmd-aligned
Doug Berger [Thu, 29 Jun 2017 17:41:36 +0000 (18:41 +0100)]
ARM: 8685/1: ensure memblock-limit is pmd-aligned

The pmd containing memblock_limit is cleared by prepare_page_table()
which creates the opportunity for early_alloc() to allocate unmapped
memory if memblock_limit is not pmd aligned causing a boot-time hang.

Commit 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
attempted to resolve this problem, but there is a path through the
adjust_lowmem_bounds() routine where if all memory regions start and
end on pmd-aligned addresses the memblock_limit will be set to

Since arm_lowmem_limit can be affected by the vmalloc early parameter,
the value of arm_lowmem_limit may not be pmd-aligned. This commit
corrects this oversight such that memblock_limit is always rounded
down to pmd-alignment.

Fixes: 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
Signed-off-by: Doug Berger <>
Suggested-by: Mark Rutland <>
Signed-off-by: Russell King <>
5 years agoMerge git://
Linus Torvalds [Thu, 29 Jun 2017 21:30:07 +0000 (14:30 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Need to access netdev->num_rx_queues behind an accessor in netvsc
    driver otherwise the build breaks with some configs, from Arnd

 2) Add dummy xfrm_dev_event() so that build doesn't fail when
    CONFIG_XFRM_OFFLOAD is not set. From Hangbin Liu.

 3) Don't OOPS when pfkey_msg2xfrm_state() signals an erros, from Dan

 4) Fix MCDI command size for filter operations in sfc driver, from
    Martin Habets.

 5) Fix UFO segmenting so that we don't calculate incorrect checksums,
    from Michal Kubecek.

 6) When ipv6 datagram connects fail, reset destination address and
    port. From Wei Wang.

 7) TCP disconnect must reset the cached receive DST, from WANG Cong.

 8) Fix sign extension bug on 32-bit in dev_get_stats(), from Eric

 9) fman driver has to depend on HAS_DMA, from Madalin Bucur.

10) Fix bpf pointer leak with xadd in verifier, from Daniel Borkmann.

11) Fix negative page counts with GFO, from Michal Kubecek.

* git:// (41 commits)
  sfc: fix attempt to translate invalid filter ID
  net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
  bpf: prevent leaking pointer via xadd on unpriviledged
  arcnet: com20020-pci: add missing pdev setup in netdev structure
  arcnet: com20020-pci: fix dev_id calculation
  arcnet: com20020: remove needless base_addr assignment
  Trivial fix to spelling mistake in arc_printk message
  arcnet: change irq handler to lock irqsave
  rocker: move dereference before free
  mlxsw: spectrum_router: Fix NULL pointer dereference
  net: sched: Fix one possible panic when no destroy callback
  virtio-net: serialize tx routine during reset
  net: usb: asix88179_178a: Add support for the Belkin B2B128
  fsl/fman: add dependency on HAS_DMA
  net: prevent sign extension in dev_get_stats()
  tcp: reset sk_rx_dst in tcp_disconnect()
  net: ipv6: reset daddr and dport in sk if connect() fails
  bnx2x: Don't log mc removal needlessly
  bnxt_en: Fix netpoll handling.
  bnxt_en: Add missing logic to handle TPA end error conditions.

5 years agoMerge tag 'for-4.12/dm-fixes-5' of git://
Linus Torvalds [Thu, 29 Jun 2017 21:23:02 +0000 (14:23 -0700)]
Merge tag 'for-4.12/dm-fixes-5' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - dm thinp fix for crash that will occur when metadata device failure
   races with discard passdown to the underlying data device.

 - dm raid fix to not access the superblock's >= 1.9.0 'sectors' member

* tag 'for-4.12/dm-fixes-5' of git://
  dm thin: do not queue freed thin mapping for next stage processing
  dm raid: fix oops on upgrading to extended superblock format

5 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 29 Jun 2017 21:10:37 +0000 (14:10 -0700)]
Merge branch 'for-linus' of git://

Pull block fixes from Jens Axboe:
 "Two fixes that should go into this release.

  One is an nvme regression fix from Keith, fixing a missing queue
  freeze if the controller is being reset. This causes the reset to

  The other is a fix for a leak of the bio protection info, if smaller
  sized O_DIRECT is used. This fix should be more involved as we have
  other problematic paths in the kernel, but given as this isn't a
  regression in this series, we'll tackle those for 4.13"

* 'for-linus' of git://
  block: provide bio_uninit() free freeing integrity/task associations
  nvme/pci: Fix stuck nvme reset

5 years agosfc: fix attempt to translate invalid filter ID
Edward Cree [Thu, 29 Jun 2017 15:50:06 +0000 (16:50 +0100)]
sfc: fix attempt to translate invalid filter ID

When filter insertion fails with no rollback, we were trying to convert
 EFX_EF10_FILTER_ID_INVALID to an id to store in 'ids' (which is either
 vlan->uc or vlan->mc).  This would WARN_ON_ONCE and then record a bogus
 filter ID of 0x1fff, neither of which is a good thing.

Fixes: 0ccb998bf46d ("sfc: fix filter_id misinterpretation in edge case")
Signed-off-by: Edward Cree <>
Signed-off-by: David S. Miller <>
5 years agonet: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
Michal Kubeček [Thu, 29 Jun 2017 09:13:36 +0000 (11:13 +0200)]
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()

Recently I started seeing warnings about pages with refcount -1. The
problem was traced to packets being reused after their head was merged into
a GRO packet by skb_gro_receive(). While bisecting the issue pointed to
commit c21b48cc1bbf ("net: adjust skb->truesize in ___pskb_trim()") and
I have never seen it on a kernel with it reverted, I believe the real
problem appeared earlier when the option to merge head frag in GRO was

Handling NAPI_GRO_FREE_STOLEN_HEAD state was only added to GRO_MERGED_FREE
branch of napi_skb_finish() so that if the driver uses napi_gro_frags()
and head is merged (which in my case happens after the skb_condense()
call added by the commit mentioned above), the skb is reused including the
head that has been merged. As a result, we release the page reference
twice and eventually end up with negative page refcount.

To fix the problem, handle NAPI_GRO_FREE_STOLEN_HEAD in napi_frags_finish()
the same way it's done in napi_skb_finish().

Fixes: d7e8883cfcf4 ("net: make GRO aware of skb->head_frag")
Signed-off-by: Michal Kubecek <>
Signed-off-by: David S. Miller <>
5 years agobpf: prevent leaking pointer via xadd on unpriviledged
Daniel Borkmann [Thu, 29 Jun 2017 01:04:59 +0000 (03:04 +0200)]
bpf: prevent leaking pointer via xadd on unpriviledged

Leaking kernel addresses on unpriviledged is generally disallowed,
for example, verifier rejects the following:

  0: (b7) r0 = 0
  1: (18) r2 = 0xffff897e82304400
  3: (7b) *(u64 *)(r1 +48) = r2
  R2 leaks addr into ctx

Doing pointer arithmetic on them is also forbidden, so that they
don't turn into unknown value and then get leaked out. However,
there's xadd as a special case, where we don't check the src reg
for being a pointer register, e.g. the following will pass:

  0: (b7) r0 = 0
  1: (7b) *(u64 *)(r1 +48) = r0
  2: (18) r2 = 0xffff897e82304400 ; map
  4: (db) lock *(u64 *)(r1 +48) += r2
  5: (95) exit

We could store the pointer into skb->cb, loose the type context,
and then read it out from there again to leak it eventually out
of a map value. Or more easily in a different variant, too:

   0: (bf) r6 = r1
   1: (7a) *(u64 *)(r10 -8) = 0
   2: (bf) r2 = r10
   3: (07) r2 += -8
   4: (18) r1 = 0x0
   6: (85) call bpf_map_lookup_elem#1
   7: (15) if r0 == 0x0 goto pc+3
   R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
   8: (b7) r3 = 0
   9: (7b) *(u64 *)(r0 +0) = r3
  10: (db) lock *(u64 *)(r0 +0) += r6
  11: (b7) r0 = 0
  12: (95) exit

  from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
  11: (b7) r0 = 0
  12: (95) exit

Prevent this by checking xadd src reg for pointer types. Also
add a couple of test cases related to this.

Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <>
Acked-by: Alexei Starovoitov <>
Acked-by: Martin KaFai Lau <>
Acked-by: Edward Cree <>
Signed-off-by: David S. Miller <>
5 years agoperf/x86/intel/uncore: Fix wrong box pointer check
Kan Liang [Thu, 29 Jun 2017 19:09:26 +0000 (12:09 -0700)]
perf/x86/intel/uncore: Fix wrong box pointer check

Should not init a NULL box. It will cause system crash.
The issue looks like caused by a typo.

This was not noticed because there is no NULL box. Also, for most
boxes, they are enabled by default. The init code is not critical.

Fixes: fff4b87e594a ("perf/x86/intel/uncore: Make package handling more robust")
Signed-off-by: Kan Liang <>
Signed-off-by: Thomas Gleixner <>
5 years agoMerge branch 'arcnet-fixes'
David S. Miller [Thu, 29 Jun 2017 19:18:38 +0000 (15:18 -0400)]
Merge branch 'arcnet-fixes'

Michael Grzeschik says:

arcnet: Collection of latest fixes

Here we sum up the recent fixes I collected on the way to use and
stabilise the framework. Part of it is an possible deadlock that we
prevent as well to fix the calculation of the dev_id that can be setup
by an rotary encoder. Beside that we added an trivial spelling patch and
fix some wrong and missing assignments that improves the code footprint.

Signed-off-by: David S. Miller <>
5 years agoarcnet: com20020-pci: add missing pdev setup in netdev structure
Michael Grzeschik [Wed, 28 Jun 2017 16:28:37 +0000 (18:28 +0200)]
arcnet: com20020-pci: add missing pdev setup in netdev structure

We add the pdev data to the pci devices netdev structure. This way
the interface get consistent device names in the userspace (udev).

Signed-off-by: Michael Grzeschik <>
Signed-off-by: David S. Miller <>
5 years agoarcnet: com20020-pci: fix dev_id calculation
Michael Grzeschik [Wed, 28 Jun 2017 16:28:36 +0000 (18:28 +0200)]
arcnet: com20020-pci: fix dev_id calculation

The dev_id was miscalculated. Only the two bits 4-5 are relevant for the
MA1 card. PCIARC1 and PCIFB2 use the four bits 4-7 for id selection.

Signed-off-by: Michael Grzeschik <>
Signed-off-by: David S. Miller <>
5 years agoarcnet: com20020: remove needless base_addr assignment
Michael Grzeschik [Wed, 28 Jun 2017 16:28:35 +0000 (18:28 +0200)]
arcnet: com20020: remove needless base_addr assignment

The assignment is superfluous.

Signed-off-by: Michael Grzeschik <>
Signed-off-by: David S. Miller <>
5 years agoTrivial fix to spelling mistake in arc_printk message
Colin Ian King [Wed, 28 Jun 2017 16:28:34 +0000 (18:28 +0200)]
Trivial fix to spelling mistake in arc_printk message

Signed-off-by: Colin Ian King <>
Signed-off-by: Michael Grzeschik <>
Signed-off-by: David S. Miller <>
5 years agoarcnet: change irq handler to lock irqsave
Michael Grzeschik [Wed, 28 Jun 2017 16:28:33 +0000 (18:28 +0200)]
arcnet: change irq handler to lock irqsave

This patch prevents the arcnet driver from the following deadlock.

[   41.273910] ======================================================
[   41.280397] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[   41.287433] 4.4.0-00034-gc0ae784 #536 Not tainted
[   41.292366] ------------------------------------------------------
[   41.298863] arcecho/233 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
[   41.305628]  (&(&lp->lock)->rlock){+.+...}, at: [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[   41.315199]
[   41.315199] and this task is already holding:
[   41.321324]  (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8
[   41.329593] which would create a new lock dependency:
[   41.334893]  (_xmit_ARCNET#2){+.-...} -> (&(&lp->lock)->rlock){+.+...}
[   41.341801]
[   41.341801] but this new dependency connects a SOFTIRQ-irq-safe lock:
[   41.350108]  (_xmit_ARCNET#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[   41.357539]   [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.362677]   [<c063ab8c>] dev_watchdog+0x5c/0x264
[   41.367723]   [<c0094edc>] call_timer_fn+0x6c/0xf4
[   41.372759]   [<c00950b8>] run_timer_softirq+0x154/0x210
[   41.378340]   [<c0036b30>] __do_softirq+0x144/0x298
[   41.383469]   [<c0036fb4>] irq_exit+0xcc/0x130
[   41.388138]   [<c0085c50>] __handle_domain_irq+0x60/0xb4
[   41.393728]   [<c0014578>] __irq_svc+0x58/0x78
[   41.398402]   [<c0010274>] arch_cpu_idle+0x24/0x3c
[   41.403443]   [<c007127c>] cpu_startup_entry+0x1f8/0x25c
[   41.409029]   [<c09adc90>] start_kernel+0x3c0/0x3cc
[   41.414170]
[   41.414170] to a SOFTIRQ-irq-unsafe lock:
[   41.419931]  (&(&lp->lock)->rlock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
[   41.427996] ...  [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.433409]   [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[   41.439646]   [<c0089120>] handle_nested_irq+0x8c/0xec
[   41.445063]   [<c03c1170>] regmap_irq_thread+0x190/0x314
[   41.450661]   [<c0087244>] irq_thread_fn+0x1c/0x34
[   41.455700]   [<c0087548>] irq_thread+0x13c/0x1dc
[   41.460649]   [<c0050f10>] kthread+0xe4/0xf8
[   41.465158]   [<c000f810>] ret_from_fork+0x14/0x24
[   41.470207]
[   41.470207] other info that might help us debug this:
[   41.470207]
[   41.478627]  Possible interrupt unsafe locking scenario:
[   41.478627]
[   41.485763]        CPU0                    CPU1
[   41.490521]        ----                    ----
[   41.495279]   lock(&(&lp->lock)->rlock);
[   41.499414]                                local_irq_disable();
[   41.505636]                                lock(_xmit_ARCNET#2);
[   41.511967]                                lock(&(&lp->lock)->rlock);
[   41.518741]   <Interrupt>
[   41.521490]     lock(_xmit_ARCNET#2);
[   41.525356]
[   41.525356]  *** DEADLOCK ***
[   41.525356]
[   41.531587] 1 lock held by arcecho/233:
[   41.535617]  #0:  (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8
[   41.544355]
the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[   41.552362] -> (_xmit_ARCNET#2){+.-...} ops: 27 {
[   41.557357]    HARDIRQ-ON-W at:
[   41.560664]                     [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.567445]                     [<c063ba28>] dev_deactivate_many+0x114/0x304
[   41.574866]                     [<c063bc3c>] dev_deactivate+0x24/0x38
[   41.581646]                     [<c0630374>] linkwatch_do_dev+0x40/0x74
[   41.588613]                     [<c06305d8>] __linkwatch_run_queue+0xec/0x140
[   41.596120]                     [<c0630658>] linkwatch_event+0x2c/0x34
[   41.602991]                     [<c004af30>] process_one_work+0x188/0x40c
[   41.610131]                     [<c004b200>] worker_thread+0x4c/0x480
[   41.616912]                     [<c0050f10>] kthread+0xe4/0xf8
[   41.623048]                     [<c000f810>] ret_from_fork+0x14/0x24
[   41.629735]    IN-SOFTIRQ-W at:
[   41.633039]                     [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.639820]                     [<c063ab8c>] dev_watchdog+0x5c/0x264
[   41.646508]                     [<c0094edc>] call_timer_fn+0x6c/0xf4
[   41.653190]                     [<c00950b8>] run_timer_softirq+0x154/0x210
[   41.660425]                     [<c0036b30>] __do_softirq+0x144/0x298
[   41.667201]                     [<c0036fb4>] irq_exit+0xcc/0x130
[   41.673518]                     [<c0085c50>] __handle_domain_irq+0x60/0xb4
[   41.680754]                     [<c0014578>] __irq_svc+0x58/0x78
[   41.687077]                     [<c0010274>] arch_cpu_idle+0x24/0x3c
[   41.693769]                     [<c007127c>] cpu_startup_entry+0x1f8/0x25c
[   41.701006]                     [<c09adc90>] start_kernel+0x3c0/0x3cc
[   41.707791]    INITIAL USE at:
[   41.711003]                    [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.717696]                    [<c063ba28>] dev_deactivate_many+0x114/0x304
[   41.725026]                    [<c063bc3c>] dev_deactivate+0x24/0x38
[   41.731718]                    [<c0630374>] linkwatch_do_dev+0x40/0x74
[   41.738593]                    [<c06305d8>] __linkwatch_run_queue+0xec/0x140
[   41.746011]                    [<c0630658>] linkwatch_event+0x2c/0x34
[   41.752789]                    [<c004af30>] process_one_work+0x188/0x40c
[   41.759847]                    [<c004b200>] worker_thread+0x4c/0x480
[   41.766541]                    [<c0050f10>] kthread+0xe4/0xf8
[   41.772596]                    [<c000f810>] ret_from_fork+0x14/0x24
[   41.779198]  }
[   41.780945]  ... key      at: [<c124d620>] netdev_xmit_lock_key+0x38/0x1c8
[   41.788192]  ... acquired at:
[   41.791309]    [<c007bed8>] lock_acquire+0x70/0x90
[   41.796361]    [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54
[   41.802324]    [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[   41.808844]    [<c06b9380>] packet_direct_xmit+0x130/0x1c8
[   41.814622]    [<c06bc7e4>] packet_sendmsg+0x3b8/0x680
[   41.820034]    [<c05fe8b0>] sock_sendmsg+0x14/0x24
[   41.825091]    [<c05ffd68>] SyS_sendto+0xb8/0xe0
[   41.829956]    [<c05ffda8>] SyS_send+0x18/0x20
[   41.834638]    [<c000f780>] ret_fast_syscall+0x0/0x1c
[   41.839954]
[   41.841514]
the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[   41.850302] -> (&(&lp->lock)->rlock){+.+...} ops: 5 {
[   41.855644]    HARDIRQ-ON-W at:
[   41.858945]                     [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.865726]                     [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[   41.873607]                     [<c0089120>] handle_nested_irq+0x8c/0xec
[   41.880666]                     [<c03c1170>] regmap_irq_thread+0x190/0x314
[   41.887901]                     [<c0087244>] irq_thread_fn+0x1c/0x34
[   41.894593]                     [<c0087548>] irq_thread+0x13c/0x1dc
[   41.901195]                     [<c0050f10>] kthread+0xe4/0xf8
[   41.907338]                     [<c000f810>] ret_from_fork+0x14/0x24
[   41.914025]    SOFTIRQ-ON-W at:
[   41.917328]                     [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.924106]                     [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[   41.931981]                     [<c0089120>] handle_nested_irq+0x8c/0xec
[   41.939028]                     [<c03c1170>] regmap_irq_thread+0x190/0x314
[   41.946264]                     [<c0087244>] irq_thread_fn+0x1c/0x34
[   41.952954]                     [<c0087548>] irq_thread+0x13c/0x1dc
[   41.959548]                     [<c0050f10>] kthread+0xe4/0xf8
[   41.965689]                     [<c000f810>] ret_from_fork+0x14/0x24
[   41.972379]    INITIAL USE at:
[   41.975595]                    [<c06f8fc8>] _raw_spin_lock+0x30/0x40
[   41.982283]                    [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet]
[   41.990063]                    [<c0089120>] handle_nested_irq+0x8c/0xec
[   41.997027]                    [<c03c1170>] regmap_irq_thread+0x190/0x314
[   42.004172]                    [<c0087244>] irq_thread_fn+0x1c/0x34
[   42.010766]                    [<c0087548>] irq_thread+0x13c/0x1dc
[   42.017267]                    [<c0050f10>] kthread+0xe4/0xf8
[   42.023314]                    [<c000f810>] ret_from_fork+0x14/0x24
[   42.029903]  }
[   42.031648]  ... key      at: [<bf0854cc>] __key.42091+0x0/0xfffff0f8 [arcnet]
[   42.039255]  ... acquired at:
[   42.042372]    [<c007bed8>] lock_acquire+0x70/0x90
[   42.047413]    [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54
[   42.053364]    [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet]
[   42.059872]    [<c06b9380>] packet_direct_xmit+0x130/0x1c8
[   42.065634]    [<c06bc7e4>] packet_sendmsg+0x3b8/0x680
[   42.071030]    [<c05fe8b0>] sock_sendmsg+0x14/0x24
[   42.076069]    [<c05ffd68>] SyS_sendto+0xb8/0xe0
[   42.080926]    [<c05ffda8>] SyS_send+0x18/0x20
[   42.085601]    [<c000f780>] ret_fast_syscall+0x0/0x1c
[   42.090918]
[   42.092481]
[   42.092481] stack backtrace:
[   42.097065] CPU: 0 PID: 233 Comm: arcecho Not tainted 4.4.0-00034-gc0ae784 #536
[   42.104751] Hardware name: Generic AM33XX (Flattened Device Tree)
[   42.111183] [<c0017ec8>] (unwind_backtrace) from [<c00139d0>] (show_stack+0x10/0x14)
[   42.119337] [<c00139d0>] (show_stack) from [<c02a82c4>] (dump_stack+0x8c/0x9c)
[   42.126937] [<c02a82c4>] (dump_stack) from [<c0078260>] (check_usage+0x4bc/0x63c)
[   42.134815] [<c0078260>] (check_usage) from [<c0078438>] (check_irq_usage+0x58/0xb0)
[   42.142964] [<c0078438>] (check_irq_usage) from [<c007aaa0>] (__lock_acquire+0x1524/0x20b0)
[   42.151740] [<c007aaa0>] (__lock_acquire) from [<c007bed8>] (lock_acquire+0x70/0x90)
[   42.159886] [<c007bed8>] (lock_acquire) from [<c06f9140>] (_raw_spin_lock_irqsave+0x40/0x54)
[   42.168768] [<c06f9140>] (_raw_spin_lock_irqsave) from [<bf083bc8>] (arcnet_send_packet+0x60/0x1c0 [arcnet])
[   42.179115] [<bf083bc8>] (arcnet_send_packet [arcnet]) from [<c06b9380>] (packet_direct_xmit+0x130/0x1c8)
[   42.189182] [<c06b9380>] (packet_direct_xmit) from [<c06bc7e4>] (packet_sendmsg+0x3b8/0x680)
[   42.198059] [<c06bc7e4>] (packet_sendmsg) from [<c05fe8b0>] (sock_sendmsg+0x14/0x24)
[   42.206199] [<c05fe8b0>] (sock_sendmsg) from [<c05ffd68>] (SyS_sendto+0xb8/0xe0)
[   42.213978] [<c05ffd68>] (SyS_sendto) from [<c05ffda8>] (SyS_send+0x18/0x20)
[   42.221388] [<c05ffda8>] (SyS_send) from [<c000f780>] (ret_fast_syscall+0x0/0x1c)

Signed-off-by: Michael Grzeschik <>
   v1 -> v2: removed unneeded zero assignment of flags
Signed-off-by: David S. Miller <>
5 years agorocker: move dereference before free
Dan Carpenter [Wed, 28 Jun 2017 11:44:21 +0000 (14:44 +0300)]
rocker: move dereference before free

My static checker complains that ofdpa_neigh_del() can sometimes free
"found".   It just makes sense to use it first before deleting it.

Fixes: ecf244f753e0 ("rocker: fix maybe-uninitialized warning")
Signed-off-by: Dan Carpenter <>
Signed-off-by: David S. Miller <>
5 years agomlxsw: spectrum_router: Fix NULL pointer dereference
Ido Schimmel [Wed, 28 Jun 2017 06:03:12 +0000 (09:03 +0300)]
mlxsw: spectrum_router: Fix NULL pointer dereference

In case a VLAN device is enslaved to a bridge we shouldn't create a
router interface (RIF) for it when it's configured with an IP address.
This is already handled by the driver for other types of netdevs, such
as physical ports and LAG devices.

If this IP address is then removed and the interface is subsequently
unlinked from the bridge, a NULL pointer dereference can happen, as the
original 802.1d FID was replaced with an rFID which was then deleted.

To reproduce:
$ ip link set dev enp3s0np9 up
$ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111
$ ip link set dev enp3s0np9.111 up
$ ip link add name br0 type bridge
$ ip link set dev br0 up
$ ip link set enp3s0np9.111 master br0
$ ip address add dev enp3s0np9.111
$ ip address del dev enp3s0np9.111
$ ip link set dev enp3s0np9.111 nomaster

Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <>
Reported-by: Petr Machata <>
Tested-by: Petr Machata <>
Reviewed-by: Petr Machata <>
Signed-off-by: David S. Miller <>
5 years agonet: sched: Fix one possible panic when no destroy callback
Gao Feng [Wed, 28 Jun 2017 04:53:54 +0000 (12:53 +0800)]
net: sched: Fix one possible panic when no destroy callback

When qdisc fail to init, qdisc_create would invoke the destroy callback
to cleanup. But there is no check if the callback exists really. So it
would cause the panic if there is no real destroy callback like the qdisc
codel, fq, and so on.

Take codel as an example following:
When a malicious user constructs one invalid netlink msg, it would cause
codel_init->codel_change->nla_parse_nested failed.
Then kernel would invoke the destroy callback directly but qdisc codel
doesn't define one. It causes one panic as a result.

Now add one the check for destroy to avoid the possible panic.

Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Signed-off-by: Gao Feng <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
5 years agovirtio-net: serialize tx routine during reset
Jason Wang [Wed, 28 Jun 2017 01:51:03 +0000 (09:51 +0800)]
virtio-net: serialize tx routine during reset

We don't hold any tx lock when trying to disable TX during reset, this
would lead a use after free since ndo_start_xmit() tries to access
the virtqueue which has already been freed. Fix this by using
netif_tx_disable() before freeing the vqs, this could make sure no tx
after vq freeing.

Reported-by: Jean-Philippe Menil <>
Tested-by: Jean-Philippe Menil <>
Fixes commit f600b6905015 ("virtio_net: Add XDP support")
Cc: John Fastabend <>
Signed-off-by: Jason Wang <>
Acked-by: Michael S. Tsirkin <>
Acked-by: Robert McCabe <>
Signed-off-by: David S. Miller <>
5 years agoftrace: Fix regression with module command in stack_trace_filter
Steven Rostedt (VMware) [Thu, 29 Jun 2017 14:05:45 +0000 (10:05 -0400)]
ftrace: Fix regression with module command in stack_trace_filter

When doing the following command:

 # echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter

it triggered a crash.

This happened with the clean up of probes. It required all callers to the
regex function (doing ftrace filtering) to have ops->private be a pointer to
a trace_array. But for the stack tracer, that is not the case.

Allow for the ops->private to be NULL, and change the function command
callbacks to handle the trace_array pointer being NULL as well.

Fixes: d2afd57a4b96 ("tracing/ftrace: Allow instances to have their own function probes")
Signed-off-by: Steven Rostedt (VMware) <>
5 years agoRevert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"
Brian Norris [Fri, 23 Jun 2017 20:59:11 +0000 (13:59 -0700)]
Revert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"

This reverts commit 88bb94216f59e10802aaf78c858a4146085faf18.

It introduced a new CONFIG_DEBUG_ATOMIC_SLEEP warning in v4.12-rc1:

[ 7226.716713] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
[ 7226.716716] in_atomic(): 0, irqs_disabled(): 0, pid: 1708, name: bash
[ 7226.716722] CPU: 1 PID: 1708 Comm: bash Not tainted 4.12.0-rc6+ #1213
[ 7226.716724] Hardware name: Google Kevin (DT)
[ 7226.716726] Call trace:
[ 7226.716738] [<ffffff8008089928>] dump_backtrace+0x0/0x24c
[ 7226.716743] [<ffffff8008089b94>] show_stack+0x20/0x28
[ 7226.716749] [<ffffff8008371370>] dump_stack+0x90/0xb0
[ 7226.716755] [<ffffff80080cd2a0>] ___might_sleep+0x10c/0x124
[ 7226.716760] [<ffffff80080cd330>] __might_sleep+0x78/0x88
[ 7226.716765] [<ffffff800879e210>] mutex_lock+0x2c/0x64
[ 7226.716771] [<ffffff80083ad678>] rockchip_irq_bus_lock+0x30/0x3c
[ 7226.716777] [<ffffff80080f6d40>] __irq_get_desc_lock+0x78/0x98
[ 7226.716782] [<ffffff80080f7e6c>] irq_set_irq_wake+0x44/0x12c
[ 7226.716787] [<ffffff8008486e18>] dev_pm_arm_wake_irq+0x4c/0x58
[ 7226.716792] [<ffffff800848b80c>] device_wakeup_arm_wake_irqs+0x3c/0x58
[ 7226.716796] [<ffffff80084896fc>] dpm_suspend_noirq+0xf8/0x3a0
[ 7226.716800] [<ffffff80080f1384>] suspend_devices_and_enter+0x1a4/0x9a8
[ 7226.716803] [<ffffff80080f21ec>] pm_suspend+0x664/0x6a4
[ 7226.716807] [<ffffff80080f04d8>] state_store+0xd4/0xf8

It was reported on -rc1, and it's still not fixed in -rc6, so it should
just be reverted.

Cc: John Keeping <>
Signed-off-by: Brian Norris <>
Reviewed-by: Heiko Stuebner <>
Signed-off-by: Linus Walleij <>
5 years agogpio: acpi: Skip _AEI entries without a handler rather then aborting the scan
Hans de Goede [Fri, 23 Jun 2017 07:26:13 +0000 (09:26 +0200)]
gpio: acpi: Skip _AEI entries without a handler rather then aborting the scan

acpi_walk_resources will stop as soon as the callback passed in returns
an error status. On a x86 tablet I have the first GpioInt in the _AEI
resource list has no handler defined in the DSDT, causing
acpi_walk_resources to abort scanning the rest of the resource list,
which does define valid ACPI GPIO events.

This commit changes the return for not finding a handler from
AE_BAD_PARAMETER to AE_OK so that the rest of the resource list will
get scanned normally in case of missing event handlers.

Signed-off-by: Hans de Goede <>
Acked-by: Mika Westerberg <>
Acked-by: Andy Shevchenko <>
Signed-off-by: Linus Walleij <>
5 years agogpiolib: fix filtering out unwanted events
Bartosz Golaszewski [Fri, 23 Jun 2017 11:45:16 +0000 (13:45 +0200)]
gpiolib: fix filtering out unwanted events

GPIOEVENT_REQUEST_BOTH_EDGES is not a single flag, but a binary OR of

The expression 'le->eflags & GPIOEVENT_REQUEST_BOTH_EDGES' we'll get
evaluated to true even if only one event type was requested.

Fix it by checking both RISING & FALLING flags explicitly.

Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events")
Signed-off-by: Bartosz Golaszewski <>
Signed-off-by: Linus Walleij <>
5 years agoarch: remove unused macro/function thread_saved_pc()
Tobias Klauser [Wed, 28 Jun 2017 13:30:02 +0000 (15:30 +0200)]
arch: remove unused macro/function thread_saved_pc()

The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d5597793 ("sched/core: Remove pointless printout in
sched_show_task()").  Remove the implementations as well.

Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.

Signed-off-by: Tobias Klauser <>
Acked-by: Geert Uytterhoeven <>
Cc: Ingo Molnar <>
Signed-off-by: Linus Torvalds <>
5 years agoblock: provide bio_uninit() free freeing integrity/task associations
Jens Axboe [Wed, 28 Jun 2017 21:30:13 +0000 (15:30 -0600)]
block: provide bio_uninit() free freeing integrity/task associations

Wen reports significant memory leaks with DIF and O_DIRECT:

"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute

/proc/meminfo | grep SUnreclaim...

SUnreclaim:      6752128 kB
SUnreclaim:      6874880 kB
SUnreclaim:      7238080 kB
SUnreclaim:     22307264 kB
SUnreclaim:     22485888 kB
SUnreclaim:     22720256 kB

When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."

Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.

Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.

Fixes: 542ff7bf18c6 ("block: new direct I/O implementation")
Reported-by: Wen Xiong <>
Reviewed-by: Christoph Hellwig <>
Signed-off-by: Jens Axboe <>
5 years agoMerge tag 'nfs-for-4.12-3' of git://
Linus Torvalds [Wed, 28 Jun 2017 20:27:15 +0000 (13:27 -0700)]
Merge tag 'nfs-for-4.12-3' of git://

Pull NFS client bugfixes from Trond Myklebust:
 "Bugfixes include:

   - stable fix for exclusive create if the server supports the umask

   - trunking detection should handle ERESTARTSYS/EINTR

   - stable fix for a race in the LAYOUTGET function

   - stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left

   - nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()"

* tag 'nfs-for-4.12-3' of git://
  NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
  Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"
  NFSv4.1: Fix a race in nfs4_proc_layoutget
  NFS: Trunking detection should handle ERESTARTSYS/EINTR
  NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask

5 years agoMerge branch 'drm-fixes' of git://
Linus Torvalds [Wed, 28 Jun 2017 20:22:26 +0000 (13:22 -0700)]
Merge branch 'drm-fixes' of git://

Pull drm fixes from Dave Airlie:
 "This is the final set of fixes for -rc8, just a few i915 and one
  vmwgfx ones.

  I'm off on holidays for a week, so if anything shows up for fixes I've
  asked Daniel or Sean Paul to herd it in the right direction"

[ The additional etnaviv fixes were already herded towards me as seen in
  my previous pull - Linus ]

* 'drm-fixes' of git://
  drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
  drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
  drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object
  drm/i915: Retire the VMA's fence tracker before unbinding

5 years agoMerge branch 'etnaviv/fixes' of git://
Linus Torvalds [Wed, 28 Jun 2017 20:13:48 +0000 (13:13 -0700)]
Merge branch 'etnaviv/fixes' of git://

Pull drm/etnaviv fixes from Lucas Stach:
 "I realized I just missed the cut-off point for the final drm fixes
  pull, but I have 2 more etnaviv fixes that need to go into 4.12, as
  they fix fallout from the explicit sync work introduced in the last
  merge window"

[ Pulling directly because Dave is on vacation. Noted by Daniel Vetter,
  and acked by Dave Airlie  - Linus ]

* 'etnaviv/fixes' of git://
  drm/etnaviv: Fix implicit/explicit sync sense inversion
  drm/etnaviv: fix submit flags getting overwritten by BO content

5 years agoiommu/amd: Fix interrupt remapping when disable guest_mode
Suravee Suthikulpanit [Mon, 26 Jun 2017 09:28:04 +0000 (04:28 -0500)]
iommu/amd: Fix interrupt remapping when disable guest_mode

Pass-through devices to VM guest can get updated IRQ affinity
information via irq_set_affinity() when not running in guest mode.
Currently, AMD IOMMU driver in GA mode ignores the updated information
if the pass-through device is setup to use vAPIC regardless of guest_mode.
This could cause invalid interrupt remapping.

Also, the guest_mode bit should be set and cleared only when
SVM updates posted-interrupt interrupt remapping information.

Signed-off-by: Suravee Suthikulpanit <>
Cc: Joerg Roedel <>
Fixes: d98de49a53e48 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <>
5 years agoovl: don't set origin on broken lower hardlink
Miklos Szeredi [Wed, 28 Jun 2017 11:41:22 +0000 (13:41 +0200)]
ovl: don't set origin on broken lower hardlink

When copying up a file that has multiple hard links we need to break any
association with the origin file.  This makes copy-up be essentially an
atomic replace.

The new file has nothing to do with the old one (except having the same
data and metadata initially), so don't set the overlay.origin attribute.

We can relax this in the future when we are able to index upper object by

Signed-off-by: Miklos Szeredi <>
Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")
5 years agoovl: copy-up: don't unlock between lookup and link
Miklos Szeredi [Wed, 28 Jun 2017 11:41:22 +0000 (13:41 +0200)]
ovl: copy-up: don't unlock between lookup and link

Nothing prevents mischief on upper layer while we are busy copying up the

Move the lookup right before the looked up dentry is actually used.

Signed-off-by: Miklos Szeredi <>
Fixes: 01ad3eb8a073 ("ovl: concurrent copy up of regular files")
Cc: <> # v4.11
5 years agoALSA: hda - Fix endless loop of codec configure
Takashi Iwai [Wed, 28 Jun 2017 10:02:02 +0000 (12:02 +0200)]
ALSA: hda - Fix endless loop of codec configure

azx_codec_configure() loops over the codecs found on the given
controller via a linked list.  The code used to work in the past, but
in the current version, this may lead to an endless loop when a codec
binding returns an error.

The culprit is that the snd_hda_codec_configure() unregisters the
device upon error, and this eventually deletes the given codec object
from the bus.  Since the list is initialized via list_del_init(), the
next object points to the same device itself.  This behavior change
was introduced at splitting the HD-audio code code, and forgotten to
adapt it here.

For fixing this bug, just use a *_safe() version of list iteration.

Fixes: d068ebc25e6e ("ALSA: hda - Move some codes up to hdac_bus struct")
Reported-by: Daniel Vetter <>
Cc: <>
Signed-off-by: Takashi Iwai <>
5 years agodrm/etnaviv: Fix implicit/explicit sync sense inversion
Daniel Stone [Thu, 22 Jun 2017 11:22:22 +0000 (12:22 +0100)]
drm/etnaviv: Fix implicit/explicit sync sense inversion

We were reading the no-implicit sync flag the wrong way around,
synchronizing too much for the explicit case, and not at all for the
implicit case. Oops.

Signed-off-by: Daniel Stone <>
Signed-off-by: Lucas Stach <>
5 years agodrm/etnaviv: fix submit flags getting overwritten by BO content
Lucas Stach [Tue, 27 Jun 2017 14:02:51 +0000 (16:02 +0200)]
drm/etnaviv: fix submit flags getting overwritten by BO content

The addition of the flags member to etnaviv_gem_submit structure didn't
take into account that the last member of this structure is a variable
length array.

Signed-off-by: Lucas Stach <>