sfrench/cifs-2.6.git
3 years agopowerpc/xive: Introduce XIVE_IPI_HW_IRQ
Cédric Le Goater [Thu, 10 Dec 2020 17:14:40 +0000 (18:14 +0100)]
powerpc/xive: Introduce XIVE_IPI_HW_IRQ

The XIVE driver deals with CPU IPIs in a peculiar way. Each CPU has
its own XIVE IPI interrupt allocated at the HW level, for PowerNV, or
at the hypervisor level for pSeries. In practice, these interrupts are
not always used. pSeries/PowerVM prefers local doorbells for local
threads since they are faster. On PowerNV, global doorbells are also
preferred for the same reason.

The mapping in the Linux is reduced to a single interrupt using HW
interrupt number 0 and a custom irq_chip to handle EOI. This can cause
performance issues in some benchmark (ipistorm) on multichip systems.

Clarify the use of the 0 value, it will help in improving multichip
support.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-4-clg@kaod.org
3 years agopowerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag
Cédric Le Goater [Thu, 10 Dec 2020 17:14:39 +0000 (18:14 +0100)]
powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag

This is a simple cleanup to identify easily all flags of the XIVE
interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI
are the escalations used to wake up vCPUs in KVM. They are handled
very differently from the rest.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org
3 years agoKVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output
Cédric Le Goater [Thu, 10 Dec 2020 17:14:38 +0000 (18:14 +0100)]
KVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output

This is useful to track allocation of the HW resources on per guest
basis. Making sure IPIs are local to the chip of the vCPUs reduces
rerouting between interrupt controllers and gives better performance
in case of pinning. Checking the distribution of VP structures on the
chips also helps in reducing PowerBUS traffic.

[ clg: resurrected show_sources and reworked ouput ]

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-2-clg@kaod.org
3 years agopowerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:59 +0000 (16:08 +0530)]
powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache

On POWER platforms where only some groups of threads within a core
share the L2-cache (indicated by the ibm,thread-groups device-tree
property), we currently print the incorrect shared_cpu_map/list for
L2-cache in the sysfs.

This patch reports the correct shared_cpu_map/list on such platforms.

Example:
On a platform with "ibm,thread-groups" set to
                 00000001 00000002 00000004 00000000
                 00000002 00000004 00000006 00000001
                 00000003 00000005 00000007 00000002
                 00000002 00000004 00000000 00000002
                 00000004 00000006 00000001 00000003
                 00000005 00000007

This indicates that threads {0,2,4,6} in the core share the L2-cache
and threads {1,3,5,7} in the core share the L2 cache.

However, without the patch, the shared_cpu_map/list for L2 for CPUs 0,
1 is reported in the sysfs as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,000000ff

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000ff

With the patch, the shared_cpu_map/list for L2 cache for CPUs 0, 1 is
correctly reported as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa

This patch also defines cpu_l2_cache_mask() for !CONFIG_SMP case.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-6-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Add support detecting thread-groups sharing L2 cache
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:58 +0000 (16:08 +0530)]
powerpc/smp: Add support detecting thread-groups sharing L2 cache

On POWER systems, groups of threads within a core sharing the L2-cache
can be indicated by the "ibm,thread-groups" property array with the
identifier "2".

This patch adds support for detecting this, and when present, populate
the populating the cpu_l2_cache_mask of every CPU to the core-siblings
which share L2 with the CPU as specified in the by the
"ibm,thread-groups" property array.

On a platform with the following "ibm,thread-group" configuration
 00000001 00000002 00000004 00000000
 00000002 00000004 00000006 00000001
 00000003 00000005 00000007 00000002
 00000002 00000004 00000000 00000002
 00000004 00000006 00000001 00000003
 00000005 00000007

Without this patch, the sched-domain hierarchy for CPUs 0,1 would be
CPU0 attaching sched-domain(s):
domain-0: span=0,2,4,6 level=SMT
domain-1: span=0-7 level=CACHE
domain-2: span=0-15,24-39,48-55 level=MC
domain-3: span=0-55 level=DIE

CPU1 attaching sched-domain(s):
domain-0: span=1,3,5,7 level=SMT
domain-1: span=0-7 level=CACHE
domain-2: span=0-15,24-39,48-55 level=MC
domain-3: span=0-55 level=DIE

The CACHE domain at 0-7 is incorrect since the ibm,thread-groups
sub-array
[00000002 00000002 00000004
 00000000 00000002 00000004 00000006
 00000001 00000003 00000005 00000007]
indicates that L2 (Property "2") is shared only between the threads of a single
group. There are "2" groups of threads where each group contains "4"
threads each. The groups being {0,2,4,6} and {1,3,5,7}.

With this patch, the sched-domain hierarchy for CPUs 0,1 would be
      CPU0 attaching sched-domain(s):
domain-0: span=0,2,4,6 level=SMT
domain-1: span=0-15,24-39,48-55 level=MC
domain-2: span=0-55 level=DIE

CPU1 attaching sched-domain(s):
domain-0: span=1,3,5,7 level=SMT
domain-1: span=0-15,24-39,48-55 level=MC
domain-2: span=0-55 level=DIE

The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
resp.) gets degenerated into the SMT domain. Furthermore, the
last-level-cache domain gets correctly set to the SMT sched-domain.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-5-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:57 +0000 (16:08 +0530)]
powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic

init_thread_group_l1_cache_map() initializes the per-cpu cpumask
thread_group_l1_cache_map with the core-siblings which share L1 cache
with the CPU. Make this function generic to the cache-property (L1 or
L2) and update a suitable mask. This is a preparatory patch for the
next patch where we will introduce discovery of thread-groups that
share L2-cache.

No functional change.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-4-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:56 +0000 (16:08 +0530)]
powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map

On platforms which have the "ibm,thread-groups" property, the per-cpu
variable cpu_l1_cache_map keeps a track of which group of threads
within the same core share the L1 cache, Instruction and Data flow.

This patch renames the variable to "thread_group_l1_cache_map" to make
it consistent with a subsequent patch which will introduce
thread_group_l2_cache_map.

This patch introduces no functional change.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-3-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/smp: Parse ibm,thread-groups with multiple properties
Gautham R. Shenoy [Thu, 10 Dec 2020 10:38:55 +0000 (16:08 +0530)]
powerpc/smp: Parse ibm,thread-groups with multiple properties

The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.

Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]

This can be decomposed up into two consecutive arrays:

a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]

where in,

a) provides information of Property "1" being shared by "2" groups,
   each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
   first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
   the second group is {9,11,13,15}. Property "1" is indicative of
   the thread in the group sharing L1 cache, translation cache and
   Instruction Data flow.

b) provides information of Property "2" being shared by "2" groups,
   each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
   the first group is {8,10,12,14} and the
   "ibm,ppc-interrupt-server#s" of the second group is
   {9,11,13,15}. Property "2" indicates that the threads in each group
   share the L2-cache.

The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).

This patch extends the parsing support on platforms which encode
information about multiple properties being shared by the
corresponding groups of threads.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-2-git-send-email-ego@linux.vnet.ibm.com
3 years agopowerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
Ravi Bangoria [Fri, 6 Nov 2020 04:56:50 +0000 (10:26 +0530)]
powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions

POWER10 DD1 has an issue where it generates watchpoint exceptions when
it shouldn't. The conditions where this occur are:

 - octword op
 - ending address of DAWR range is less than starting address of op
 - those addresses need to be in the same or in two consecutive 512B
   blocks
 - 'op address + 64B' generates an address that has a carry into bit
   52 (crosses 2K boundary)

Handle such spurious exception by considering them as extraneous and
emulating/single-steeping instruction without generating an event.

[ravi: Fixed build warning reported by lkp@intel.com]
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201106045650.278987-1-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Add testcases for VSX vector paired load/store instructions
Balamuruhan S [Sun, 11 Oct 2020 05:09:08 +0000 (10:39 +0530)]
powerpc/sstep: Add testcases for VSX vector paired load/store instructions

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp           : PASS
  emulate_step_test: stxvp          : PASS
  emulate_step_test: lxvpx          : PASS
  emulate_step_test: stxvpx         : PASS
  emulate_step_test: plxvp          : PASS
  emulate_step_test: pstxvp         : PASS

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-6-ravi.bangoria@linux.ibm.com
3 years agopowerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
Balamuruhan S [Sun, 11 Oct 2020 05:09:07 +0000 (10:39 +0530)]
powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-5-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Support VSX vector paired storage access instructions
Balamuruhan S [Sun, 11 Oct 2020 05:09:06 +0000 (10:39 +0530)]
powerpc/sstep: Support VSX vector paired storage access instructions

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

[kernel test robot reported a build failure]

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-4-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Cover new VSX instructions under CONFIG_VSX
Ravi Bangoria [Sun, 11 Oct 2020 05:09:05 +0000 (10:39 +0530)]
powerpc/sstep: Cover new VSX instructions under CONFIG_VSX

Recently added Power10 prefixed VSX instruction are included
unconditionally in the kernel. If they are executed on a
machine without VSX support, it might create issues. Fix that.
Also fix one mnemonics spelling mistake in comment.

Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-3-ravi.bangoria@linux.ibm.com
3 years agopowerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
Balamuruhan S [Sun, 11 Oct 2020 05:09:04 +0000 (10:39 +0530)]
powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Fixes: 3920742b92f5 ("powerpc sstep: Add support for prefixed fixed-point arithmetic")
Fixes: 50b80a12e4cc ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-2-ravi.bangoria@linux.ibm.com
3 years agopowerpc/64s: Remove idle workaround code from restore_cpu_cpufeatures
Nicholas Piggin [Thu, 11 Jul 2019 02:24:04 +0000 (12:24 +1000)]
powerpc/64s: Remove idle workaround code from restore_cpu_cpufeatures

Idle code no longer uses the .cpu_restore CPU operation to restore
SPRs, so this workaround is no longer required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711022404.18132-2-npiggin@gmail.com
3 years agopowerpc/perf: Exclude kernel samples while counting events in user space.
Athira Rajeev [Wed, 25 Nov 2020 07:26:55 +0000 (02:26 -0500)]
powerpc/perf: Exclude kernel samples while counting events in user space.

Perf event attritube supports exclude_kernel flag to avoid
sampling/profiling in supervisor state (kernel). Based on this event
attr flag, Monitor Mode Control Register bit is set to freeze on
supervisor state. But sometimes (due to hardware limitation), Sampled
Instruction Address Register (SIAR) locks on to kernel address even
when freeze on supervisor is set. Patch here adds a check to drop
those samples.

Cc: stable@vger.kernel.org
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com
3 years agopowerpc/64: irq replay remove decrementer overflow check
Nicholas Piggin [Sat, 7 Nov 2020 01:43:36 +0000 (11:43 +1000)]
powerpc/64: irq replay remove decrementer overflow check

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long.  It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
3 years agopowerpc/64s: Remove MSR[ISF] bit
Nicholas Piggin [Fri, 6 Nov 2020 04:53:40 +0000 (14:53 +1000)]
powerpc/64s: Remove MSR[ISF] bit

No supported processor implements this mode. Setting the bit in
MSR values can be a bit confusing (and would prevent the bit from
ever being reused). Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201106045340.1935841-1-npiggin@gmail.com
3 years agopowerpc/64s/iommu: Don't use atomic_ function on atomic64_t type
Nicholas Piggin [Wed, 11 Nov 2020 11:07:22 +0000 (21:07 +1000)]
powerpc/64s/iommu: Don't use atomic_ function on atomic64_t type

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111110723.3148665-3-npiggin@gmail.com
3 years agopowerpc/32s: Cleanup around PTE_FLAGS_OFFSET in hash_low.S
Christophe Leroy [Tue, 24 Nov 2020 19:51:57 +0000 (19:51 +0000)]
powerpc/32s: Cleanup around PTE_FLAGS_OFFSET in hash_low.S

PTE_FLAGS_OFFSET is defined in asm/page_32.h and used only
in hash_low.S

And PTE_FLAGS_OFFSET nullity depends on CONFIG_PTE_64BIT

Instead of tests like #if (PTE_FLAGS_OFFSET != 0), use
CONFIG_PTE_64BIT related code.

Also move the definition of PTE_FLAGS_OFFSET into hash_low.S
directly, that improves readability.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f5bc21db7a33dab55924734e6060c2e9daed562e.1606247495.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: In add_hash_page(), calculate VSID later
Christophe Leroy [Tue, 24 Nov 2020 19:51:56 +0000 (19:51 +0000)]
powerpc/32s: In add_hash_page(), calculate VSID later

VSID is only for create_hpte(). When _PAGE_HASHPTE is
already set, add_hash_page() bails out without calling
create_hpte() and doesn't need the value of VSID.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3907199974c89b85a3441cf3f528751173b7649c.1606247495.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Remove unused counters incremented by create_hpte()
Christophe Leroy [Tue, 24 Nov 2020 19:51:55 +0000 (19:51 +0000)]
powerpc/32s: Remove unused counters incremented by create_hpte()

primary_pteg_full and htab_hash_searches are not used.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6470ab99e58c84a5445af43ce4d1d772b0dc3e93.1606247495.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions
Christophe Leroy [Fri, 6 Nov 2020 13:20:54 +0000 (13:20 +0000)]
powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions

All hugetlb range freeing functions have a verification like the following,
which only differs by the mask used, depending on the page table level.

start &= MASK;
if (start < floor)
return;
if (ceiling) {
ceiling &= MASK;
if (! ceiling)
return;
}
if (end - 1 > ceiling - 1)
return;

Refactor that into a helper function which takes the mask as
an argument, returning true when [start;end[ is not fully
contained inside [floor;ceiling[

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/16a571bb32eb6e8cd44bda484c8d81cd8a25e6d7.1604668827.git.christophe.leroy@csgroup.eu
3 years agopowerpc/fault: Perform exception fixup in do_page_fault()
Christophe Leroy [Wed, 9 Dec 2020 05:29:25 +0000 (05:29 +0000)]
powerpc/fault: Perform exception fixup in do_page_fault()

Exception fixup doesn't require the heady full regs saving,
do it from do_page_fault() directly.

For that, split bad_page_fault() in two parts.

As bad_page_fault() can also be called from other places than
handle_page_fault(), it will still perform exception fixup and
fallback on __bad_page_fault().

handle_page_fault() directly calls __bad_page_fault() as the
exception fixup will now be done by do_page_fault()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bd07d6fef9237614cd6d318d8f19faeeadaa816b.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/fault: Avoid heavy search_exception_tables() verification
Christophe Leroy [Wed, 9 Dec 2020 05:29:24 +0000 (05:29 +0000)]
powerpc/fault: Avoid heavy search_exception_tables() verification

search_exception_tables() is an heavy operation, we have to avoid it.
When KUAP is selected, we'll know the fault has been blocked by KUAP.
When it is blocked by KUAP, check whether we are in an expected
userspace access place. If so, emit a warning to spot something is
going work. Otherwise, just remain silent, it will likely Oops soon.

When KUAP is not selected, it behaves just as if the address was
already in the TLBs and no fault was generated.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9870f01e293a5a76c4f4e4ddd4a6b0f63038c591.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Move the WARN() out of bad_kuap_fault()
Christophe Leroy [Wed, 9 Dec 2020 05:29:23 +0000 (05:29 +0000)]
powerpc/mm: Move the WARN() out of bad_kuap_fault()

In order to prepare the removal of calls to
search_exception_tables() on the fast path, move the
WARN() out of bad_kuap_fault().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9501311014bd6507e04b27a0c3035186ccf65cd5.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/fault: Unnest definition of page_fault_is_write() and page_fault_is_bad()
Christophe Leroy [Wed, 9 Dec 2020 05:29:22 +0000 (05:29 +0000)]
powerpc/fault: Unnest definition of page_fault_is_write() and page_fault_is_bad()

To make it more readable, separate page_fault_is_write() and page_fault_is_bad()
to avoir several levels of #ifdefs

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6afaac2495248d68f94c438c5ec36b6010931de5.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: sanity_check_fault() should work for all, not only BOOK3S
Christophe Leroy [Wed, 9 Dec 2020 05:29:21 +0000 (05:29 +0000)]
powerpc/mm: sanity_check_fault() should work for all, not only BOOK3S

The verification and message introduced by commit 374f3f5979f9
("powerpc/mm/hash: Handle user access of kernel address gracefully")
applies to all platforms, it should not be limited to BOOK3S.

Make the BOOK3S version of sanity_check_fault() the one for all,
and bail out earlier if not BOOK3S.

Fixes: 374f3f5979f9 ("powerpc/mm/hash: Handle user access of kernel address gracefully")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe199d5af3578d3bf80035d203a94d742a7a28af.1607491748.git.christophe.leroy@csgroup.eu
3 years agopowerpc/ppc-opcode: Add PPC_RAW_MFSPR()
Christophe Leroy [Tue, 24 Nov 2020 15:24:59 +0000 (15:24 +0000)]
powerpc/ppc-opcode: Add PPC_RAW_MFSPR()

Add PPC_RAW_MFSPR() to replace open coding done in 8xx-pmu.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e281e3a611eead8817c49cf06a60072a021af823.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Use SPRN_SPRG_SCRATCH2 in DTLB miss exception
Christophe Leroy [Tue, 24 Nov 2020 15:24:58 +0000 (15:24 +0000)]
powerpc/8xx: Use SPRN_SPRG_SCRATCH2 in DTLB miss exception

Use SPRN_SPRG_SCRATCH2 in DTLB miss exception instead of DAR
in order to be similar to ITLB miss exception.

This also simplifies mpc8xx_pmu_del()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e3cc8f023ef40e1e8ae144e4dd1330a5ff022528.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Use SPRN_SPRG_SCRATCH2 in ITLB miss exception
Christophe Leroy [Tue, 24 Nov 2020 15:24:57 +0000 (15:24 +0000)]
powerpc/8xx: Use SPRN_SPRG_SCRATCH2 in ITLB miss exception

In order to re-enable MMU earlier, ensure ITLB miss exception
cannot clobber SPRN_SPRG_SCRATCH0 and SPRN_SPRG_SCRATCH1.
Do so by using SPRN_SPRG_SCRATCH2 and SPRN_M_TW instead, like
the DTLB miss exception.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/abc78e8e9577d473691ebb9996c6413b37bfd9ca.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Simplify INVALIDATE_ADJACENT_PAGES_CPU15
Christophe Leroy [Tue, 24 Nov 2020 15:24:56 +0000 (15:24 +0000)]
powerpc/8xx: Simplify INVALIDATE_ADJACENT_PAGES_CPU15

We now have r11 available as a scratch register so
INVALIDATE_ADJACENT_PAGES_CPU15() can be simplified.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bdafd651b4ac3a851fd09249f5f3699c50da29f2.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Always pin kernel text TLB
Christophe Leroy [Tue, 24 Nov 2020 15:24:55 +0000 (15:24 +0000)]
powerpc/8xx: Always pin kernel text TLB

There is no big poing in not pinning kernel text anymore, as now
we can keep pinned TLB even with things like DEBUG_PAGEALLOC.

Remove CONFIG_PIN_TLB_TEXT, making it always right.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Drop ifdef around mmu_pin_tlb() to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/203b89de491e1379f1677a2685211b7c32adfff0.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: DEBUG_PAGEALLOC doesn't require an ITLB miss exception handler
Christophe Leroy [Tue, 24 Nov 2020 15:24:54 +0000 (15:24 +0000)]
powerpc/8xx: DEBUG_PAGEALLOC doesn't require an ITLB miss exception handler

Since commit e611939fc8ec ("powerpc/mm: Ensure change_page_attr()
doesn't invalidate pinned TLBs"), pinned TLBs are not anymore
invalidated by __kernel_map_pages() when CONFIG_DEBUG_PAGEALLOC is
selected.

Remove the dependency on CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e796c5fcb5898de827c803cf1ab8ba1d7a5d4b76.1606231483.git.christophe.leroy@csgroup.eu
3 years agopowerpc/process: Remove target specific __set_dabr()
Christophe Leroy [Fri, 4 Dec 2020 10:12:51 +0000 (10:12 +0000)]
powerpc/process: Remove target specific __set_dabr()

__set_dabr() are simple functions that can be inline directly
inside set_dabr() and using IS_ENABLED() instead of #ifdef

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c10b263668e137236c71d76648b03cf2cd1ee66f.1607076733.git.christophe.leroy@csgroup.eu
3 years agopowerpc/8xx: Fix early debug when SMC1 is relocated
Christophe Leroy [Fri, 4 Dec 2020 10:11:34 +0000 (10:11 +0000)]
powerpc/8xx: Fix early debug when SMC1 is relocated

When SMC1 is relocated and early debug is selected, the
board hangs is ppc_md.setup_arch(). This is because ones
the microcode has been loaded and SMC1 relocated, early
debug writes in the weed.

To allow smooth continuation, the SMC1 parameter RAM set up
by the bootloader have to be copied into the new location.

Fixes: 43db76f41824 ("powerpc/8xx: Add microcode patch to move SMC parameter RAM.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b2f71f39eca543f1e4ec06596f09a8b12235c701.1607076683.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Handle PROTFAULT in hash_page() also for CONFIG_PPC_KUAP
Christophe Leroy [Mon, 16 Nov 2020 16:09:31 +0000 (16:09 +0000)]
powerpc/32s: Handle PROTFAULT in hash_page() also for CONFIG_PPC_KUAP

On hash 32 bits, handling minor protection faults like unsetting
dirty flag is heavy if done from the normal page_fault processing,
because it implies hash table software lookup for flushing the entry
and then a DSI is taken anyway to add the entry back.

When KUAP was implemented, as explained in commit a68c31fc01ef
("powerpc/32s: Implement Kernel Userspace Access Protection"),
protection faults has been diverted from hash_page() because
hash_page() was not able to identify a KUAP fault.

Implement KUAP verification in hash_page(), by clearing write
permission when the access is a kernel access and Ks is 1.
This works regardless of the address because kernel segments always
have Ks set to 0 while user segments have Ks set to 0 only
when kernel write to userspace is granted.

Then protection faults can be handled by hash_page() even for KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8a4ffe4798e9ea32aaaccdf85e411bb1beed3500.1605542955.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Make support for 603 and 604+ selectable
Christophe Leroy [Thu, 22 Oct 2020 06:29:44 +0000 (06:29 +0000)]
powerpc/32s: Make support for 603 and 604+ selectable

book3s/32 has two main families:
- CPU with 603 cores that don't have HASH PTE table and
perform SW TLB loading.
- Other CPUs based on 604+ cores that have HASH PTE table.

This leads to some complex logic and additionnal code to
support both. This makes sense for distribution kernels
that aim at running on any CPU, but when you are fine
tuning a kernel for an embedded 603 based board you
don't need all the HASH logic.

Allow selection of support for each family, in order to opt
out unneeded parts of code. At least one must be selected.

Note that some of the CPU supporting HASH also support SW TLB
loading, however it is not supported by Linux kernel at the
time being, because they do not have alternate registers in
the TLB miss exception handlers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8dde0cdb629a71abc29b0d85a52a86e920376cb6.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Regroup 603 based CPUs in cputable
Christophe Leroy [Thu, 22 Oct 2020 06:29:43 +0000 (06:29 +0000)]
powerpc/32s: Regroup 603 based CPUs in cputable

In order to selectively build the kernel for 603 SW TLB handling,
regroup all 603 based CPUs together.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/45065263fdb9f5cc2a2d210ec2a762ac8bf5b2bc.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Remove CONFIG_PPC_BOOK3S_6xx
Christophe Leroy [Thu, 22 Oct 2020 06:29:42 +0000 (06:29 +0000)]
powerpc/32s: Remove CONFIG_PPC_BOOK3S_6xx

As 601 is gone, CONFIG_PPC_BOO3S_6xx and CONFIG_PPC_BOOK3S_32
are dedundant.

Remove CONFIG_PPC_BOOK3S_6xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f18c16af37f6f77b577bed8d9e12831b695617ae.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Move early_mmu_init() into mmu.c
Christophe Leroy [Thu, 22 Oct 2020 06:29:41 +0000 (06:29 +0000)]
powerpc/32s: Move early_mmu_init() into mmu.c

early_mmu_init() is independent of MMU type and not
directly linked to tlb handling.

In a following patch, tlb.c will be restricted to HASH mmu.

Move early_mmu_init() into mmu.c which is common.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e51b5e2fe6bca623b33116403043d3a1b5eaf826.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline flush_hash_entry()
Christophe Leroy [Thu, 22 Oct 2020 06:29:40 +0000 (06:29 +0000)]
powerpc/32s: Inline flush_hash_entry()

flush_hash_entry() is a simple function calling
flush_hash_pages() if it's a hash MMU or doing nothing otherwise.

Inline it.

And use it also in __ptep_test_and_clear_young().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9af895be7d4b404d40e749a2659552fd138e62c4.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline tlb_flush()
Christophe Leroy [Thu, 22 Oct 2020 06:29:39 +0000 (06:29 +0000)]
powerpc/32s: Inline tlb_flush()

On book3s/32, tlb_flush() does nothing when the CPU has a hash table,
it calls _tlbia() otherwise.

Inline it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ebc933d1c530a19ef3cf7983f6ae94814f6e92ac.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Split and inline flush_range()
Christophe Leroy [Thu, 22 Oct 2020 06:29:38 +0000 (06:29 +0000)]
powerpc/32s: Split and inline flush_range()

flush_range() handle both the MMU_FTR_HPTE_TABLE case and
the other case.

The non MMU_FTR_HPTE_TABLE case is trivial as it is only a call
to _tlbie()/_tlbia() which is not worth a dedicated function.

Make flush_range() a hash specific and call it from tlbflush.h based
on mmu_has_feature(MMU_FTR_HPTE_TABLE).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/132ab19aae52abc8e06ab524ec86d4229b5b9c3d.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline flush_tlb_range() and flush_tlb_kernel_range()
Christophe Leroy [Thu, 22 Oct 2020 06:29:37 +0000 (06:29 +0000)]
powerpc/32s: Inline flush_tlb_range() and flush_tlb_kernel_range()

flush_tlb_range() and flush_tlb_kernel_range() are trivial calls to
flush_range().

Make flush_range() global and inline flush_tlb_range()
and flush_tlb_kernel_range().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c7029a78e78709ad9272d7a44260e06b649169b2.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Split and inline flush_tlb_mm() and flush_tlb_page()
Christophe Leroy [Thu, 22 Oct 2020 06:29:36 +0000 (06:29 +0000)]
powerpc/32s: Split and inline flush_tlb_mm() and flush_tlb_page()

flush_tlb_mm() and flush_tlb_page() handle both the MMU_FTR_HPTE_TABLE
case and the other case.

The non MMU_FTR_HPTE_TABLE case is trivial as it is only a call
to _tlbie()/_tlbia() which is not worth a dedicated function.

Make flush_tlb_mm() and flush_tlb_page() hash specific and call
them from tlbflush.h based on mmu_has_feature(MMU_FTR_HPTE_TABLE).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/11e932ded41ba6d9b251d89b7afa33cc060d3aa4.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Move _tlbie() and _tlbia() in a new file
Christophe Leroy [Thu, 22 Oct 2020 06:29:35 +0000 (06:29 +0000)]
powerpc/32s: Move _tlbie() and _tlbia() in a new file

_tlbie() and _tlbia() are used only on 603 cores while the
other functions are used only on cores having a hash table.

Move them into a new file named nohash_low.S

Add mmu_hash_lock var is used by both, it needs to go
in a common file.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a265b1b17a64153463d361280cb4b43eb1266a4.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Inline _tlbie() on non SMP
Christophe Leroy [Thu, 22 Oct 2020 06:29:34 +0000 (06:29 +0000)]
powerpc/32s: Inline _tlbie() on non SMP

On non SMP, _tlbie() is just a tlbie plus a sync instruction.

Make it static inline.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/475136425541db5c7c8a0395d19d400525b251bc.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Move _tlbie() and _tlbia() prototypes to tlbflush.h
Christophe Leroy [Thu, 22 Oct 2020 06:29:33 +0000 (06:29 +0000)]
powerpc/32s: Move _tlbie() and _tlbia() prototypes to tlbflush.h

In order to use _tlbie() and _tlbia() directly
from asm/book3s/32/tlbflush.h, move their prototypes
from mm/mm_decl.h to there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/867587af929973ad65f8ef6972f2474a80c1737a.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Declare Hash related vars as __initdata
Christophe Leroy [Thu, 22 Oct 2020 06:29:32 +0000 (06:29 +0000)]
powerpc/32s: Declare Hash related vars as __initdata

Hash related vars are used at init only.

Declare them in __initdata.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3878ea30706839fcff9196790ff3f99c128c3f6a.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Make Hash var static
Christophe Leroy [Thu, 22 Oct 2020 06:29:31 +0000 (06:29 +0000)]
powerpc/32s: Make Hash var static

Hash var is used only locally in mmu.c now.

No need to set it in head_32.S anymore.

Make it static and initialises it to the early hash table.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/786c82a89cdfdaabb32b72a44f7c312fa81d192b.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead of checking Hash var
Christophe Leroy [Thu, 22 Oct 2020 06:29:30 +0000 (06:29 +0000)]
powerpc/32s: Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead of checking Hash var

We now have an early hash table on hash MMU, so no need to check
Hash var to know if the Hash table is set of not.

Use mmu_has_feature(MMU_FTR_HPTE_TABLE) instead. This will allow
optimisation via jump_label.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f1766631a9e014b6433f1a3c12c726ddfce34220.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/32s: Make bat_addrs[] static
Christophe Leroy [Thu, 22 Oct 2020 06:29:29 +0000 (06:29 +0000)]
powerpc/32s: Make bat_addrs[] static

This table is used only locally. Declare it static.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/054fec0c139fc4c0a306360b360784733c0a6e65.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Remove flush_tlb_page_nohash() prototype.
Christophe Leroy [Thu, 22 Oct 2020 06:29:28 +0000 (06:29 +0000)]
powerpc/mm: Remove flush_tlb_page_nohash() prototype.

flush_tlb_page_nohash() was removed by
commit 703b41ad1a87 ("powerpc/mm: remove flush_tlb_page_nohash")

Remove stale prototype and comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4a58831da6d6ba4fe309b94aa1dd8f02982d46b2.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/mm: Add mask of always present MMU features
Christophe Leroy [Thu, 22 Oct 2020 06:29:27 +0000 (06:29 +0000)]
powerpc/mm: Add mask of always present MMU features

On the same principle as commit 773edeadf672 ("powerpc/mm: Add mask
of possible MMU features"), add mask for MMU features that are
always there in order to optimise out dead branches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4943775fbe91885eb3e09133b093aaf62e55c715.1603348103.git.christophe.leroy@csgroup.eu
3 years agopowerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter
Tyrel Datwyler [Tue, 8 Dec 2020 19:54:34 +0000 (13:54 -0600)]
powerpc/rtas: Fix typo of ibm,open-errinjct in RTAS filter

Commit bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
introduced the following error when invoking the errinjct userspace
tool:

  [root@ltcalpine2-lp5 librtas]# errinjct open
  [327884.071171] sys_rtas: RTAS call blocked - exploit attempt?
  [327884.071186] sys_rtas: token=0x26, nargs=0 (called by errinjct)
  errinjct: Could not open RTAS error injection facility
  errinjct: librtas: open: Unexpected I/O error

The entry for ibm,open-errinjct in rtas_filter array has a typo where
the "j" is omitted in the rtas call name. After fixing this typo the
errinjct tool functions again as expected.

  [root@ltcalpine2-lp5 linux]# errinjct open
  RTAS error injection facility open, token = 1

Fixes: bd59380c5ba4 ("powerpc/rtas: Restrict RTAS requests from userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208195434.8289-1-tyreld@linux.ibm.com
3 years agopowerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK
Christophe Leroy [Tue, 8 Dec 2020 05:24:19 +0000 (05:24 +0000)]
powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK

low_sleep_handler() can't restore the context from standard
stack because the stack can hardly be accessed with MMU OFF.

Store everything in a global storage area instead of storing
a pointer to the stack in that global storage area.

To avoid a complete churn of the function, still use r1 as
the pointer to the storage area during restore.

Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Reported-by: Giuseppe Sacco <giuseppe@sguazz.it>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Giuseppe Sacco <giuseppe@sguazz.it>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e3e0d8042a3ba75cb4a9546c19c408b5b5b28994.1607404931.git.christophe.leroy@csgroup.eu
3 years agopowerpc: fix spelling mistake in Kconfig "seleted" -> "selected"
Colin Ian King [Mon, 7 Dec 2020 15:54:20 +0000 (15:54 +0000)]
powerpc: fix spelling mistake in Kconfig "seleted" -> "selected"

There is a spelling mistake in the help text of the Kconfig. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207155420.172370-1-colin.king@canonical.com
3 years agopowerpc/pseries/mobility: refactor node lookup during DT update
Nathan Lynch [Mon, 7 Dec 2020 21:52:00 +0000 (15:52 -0600)]
powerpc/pseries/mobility: refactor node lookup during DT update

In pseries_devicetree_update(), with each call to ibm,update-nodes the
partition firmware communicates the node to be deleted or updated by
placing its phandle in the work buffer. Each of delete_dt_node(),
update_dt_node(), and add_dt_node() have duplicate lookups using the
phandle value and corresponding refcount management.

Move the lookup and of_node_put() into pseries_devicetree_update(),
and emit a warning on any failed lookups.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-29-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove unused rtas_suspend_me_data
Nathan Lynch [Mon, 7 Dec 2020 21:51:59 +0000 (15:51 -0600)]
powerpc/rtas: remove unused rtas_suspend_me_data

All code which used this type has been removed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-28-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: remove prepare_late() callback
Nathan Lynch [Mon, 7 Dec 2020 21:51:58 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: remove prepare_late() callback

The pseries hibernate code no longer calls into the original
join/suspend code in kernel/rtas.c, so pseries_prepare_late() and
related code don't accomplish anything now.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-27-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: perform post-suspend fixups later
Nathan Lynch [Mon, 7 Dec 2020 21:51:57 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: perform post-suspend fixups later

The pseries hibernate code calls post_mobility_fixup() which is sort
of a dumping ground of fixups that need to run after resuming from
suspend regardless of whether suspend was a hibernation or a
migration. Calling post_mobility_fixup() from
pseries_suspend_enable_irqs() runs this code early in resume with
devices suspended and only one CPU up, while the much more commonly
used migration case runs these fixups in a more typical process
context.

Call post_mobility_fixup() after the suspend core returns a success
status to the hibernate sysfs store method and remove
pseries_suspend_enable_irqs().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-26-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: remove redundant cacheinfo update
Nathan Lynch [Mon, 7 Dec 2020 21:51:56 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: remove redundant cacheinfo update

Partitions with cache nodes in the device tree can encounter the
following warning on resume:

CPU 0 already accounted in PowerPC,POWER9@0(Data)
WARNING: CPU: 0 PID: 3177 at arch/powerpc/kernel/cacheinfo.c:197 cacheinfo_cpu_online+0x640/0x820

These calls to cacheinfo_cpu_offline/online have been redundant since
commit e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo
hierarchy post-migration").

Fixes: e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-25-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove unused rtas_suspend_last_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:55 +0000 (15:51 -0600)]
powerpc/rtas: remove unused rtas_suspend_last_cpu()

rtas_suspend_last_cpu() is now unused, remove it and
__rtas_suspend_last_cpu() which also becomes unused.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-24-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()
Nathan Lynch [Mon, 7 Dec 2020 21:51:54 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()

rtas_suspend_last_cpu() and related code perform a lot of work that
isn't relevant to the hibernation workflow. All other CPUs are offline
when called so there is no need to place them in H_JOIN or prod them
on resume, nor is there need for retries or operations on shared
state.

Call the rtas_ibm_suspend_me() wrapper function directly from
pseries_suspend_enter() instead of using rtas_suspend_last_cpu().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-23-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove rtas_suspend_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:53 +0000 (15:51 -0600)]
powerpc/rtas: remove rtas_suspend_cpu()

rtas_suspend_cpu() no longer has users; remove it and
__rtas_suspend_cpu() which now becomes unused as well.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-22-nathanl@linux.ibm.com
3 years agopowerpc/machdep: remove suspend_disable_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:52 +0000 (15:51 -0600)]
powerpc/machdep: remove suspend_disable_cpu()

There are no users left of the suspend_disable_cpu() callback, remove
it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-21-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: remove pseries_suspend_cpu()
Nathan Lynch [Mon, 7 Dec 2020 21:51:51 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: remove pseries_suspend_cpu()

Since commit 48f6e7f6d948 ("powerpc/pseries: remove cede offline state
for CPUs"), ppc_md.suspend_disable_cpu() is no longer used and all
CPUs (save one) are placed into true offline state as opposed to
H_JOIN. So pseries_suspend_cpu() is effectively unused; remove it.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-20-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: pass stream id via function arguments
Nathan Lynch [Mon, 7 Dec 2020 21:51:50 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: pass stream id via function arguments

There is no need for the stream id to be a file-global variable; pass
it from hibernate_store() to pseries_suspend_begin() for the
H_VASI_STATE call.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-19-nathanl@linux.ibm.com
3 years agopowerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops
Nathan Lynch [Mon, 7 Dec 2020 21:51:49 +0000 (15:51 -0600)]
powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend ops

There are three ways pseries_suspend_begin() can be reached:

1. When "mem" is written to /sys/power/state:

kobj_attr_store()
-> state_store()
  -> pm_suspend()
    -> suspend_devices_and_enter()
      -> pseries_suspend_begin()

This never works because there is no way to supply a valid stream id
using this interface, and H_VASI_STATE is called with a stream id of
zero. So this call path is useless at best.

2. When a stream id is written to /sys/devices/system/power/hibernate.
pseries_suspend_begin() is polled directly from store_hibernate()
until the stream is in the "Suspending" state (i.e. the platform is
ready for the OS to suspend execution):

dev_attr_store()
-> store_hibernate()
  -> pseries_suspend_begin()

3. When a stream id is written to /sys/devices/system/power/hibernate
(continued). After #2, pseries_suspend_begin() is called once again
from the pm core:

dev_attr_store()
-> store_hibernate()
  -> pm_suspend()
    -> suspend_devices_and_enter()
      -> pseries_suspend_begin()

This is redundant because the VASI suspend state is already known to
be Suspending.

The begin() callback of platform_suspend_ops is optional, so we can
simply remove that assignment with no loss of function.

Fixes: 32d8ad4e621d ("powerpc/pseries: Partition hibernation support")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-18-nathanl@linux.ibm.com
3 years agopowerpc/rtas: remove rtas_ibm_suspend_me_unsafe()
Nathan Lynch [Mon, 7 Dec 2020 21:51:48 +0000 (15:51 -0600)]
powerpc/rtas: remove rtas_ibm_suspend_me_unsafe()

rtas_ibm_suspend_me_unsafe() is now unused; remove it and
rtas_percpu_suspend_me() which becomes unused as a result.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-17-nathanl@linux.ibm.com
3 years agopowerpc/rtas: dispatch partition migration requests to pseries
Nathan Lynch [Mon, 7 Dec 2020 21:51:47 +0000 (15:51 -0600)]
powerpc/rtas: dispatch partition migration requests to pseries

sys_rtas() cannot call ibm,suspend-me directly in the same way it
handles other inputs. Instead it must dispatch the request to code
that can first perform the H_JOIN sequence before any call to
ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair
amount of platform-specific code to implement this.

Since a different, more robust implementation of the suspend sequence
is now in the pseries platform code, we want to dispatch the request
there.

Note that invoking ibm,suspend-me via the RTAS syscall is all but
deprecated; this change preserves ABI compatibility for old programs
while providing to them the benefit of the new partition suspend
implementation. This is a behavior change in that the kernel performs
the device tree update and firmware activation before returning, but
experimentation indicates this is tolerated fine by legacy user space.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-16-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: retry partition suspend after error
Nathan Lynch [Mon, 7 Dec 2020 21:51:46 +0000 (15:51 -0600)]
powerpc/pseries/mobility: retry partition suspend after error

This is a mitigation for the relatively rare occurrence where a
virtual IOA can be in a transient state that prevents the
suspend/migration from succeeding, resulting in an error from
ibm,suspend-me.

If the join/suspend sequence returns an error, it is acceptable to
retry as long as the VASI suspend session state is still
"Suspending" (i.e. the platform is still waiting for the OS to
suspend).

Retry a few times on suspend failure while this condition holds,
progressively increasing the delay between attempts. We don't want to
retry indefinitey because firmware emits an error log event on each
unsuccessful attempt.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-15-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: signal suspend cancellation to platform
Nathan Lynch [Mon, 7 Dec 2020 21:51:45 +0000 (15:51 -0600)]
powerpc/pseries/mobility: signal suspend cancellation to platform

If we're returning an error to user space, use H_VASI_SIGNAL to send a
cancellation request to the platform. This isn't strictly required but
it communicates that Linux will not attempt to complete the suspend,
which allows the various entities involved to promptly end the
operation in progress.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-14-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: use stop_machine for join/suspend
Nathan Lynch [Mon, 7 Dec 2020 21:51:44 +0000 (15:51 -0600)]
powerpc/pseries/mobility: use stop_machine for join/suspend

The partition suspend sequence as specified in the platform
architecture requires that all active processor threads call
H_JOIN, which:

- suspends the calling thread until it is the target of
  an H_PROD; or
- immediately returns H_CONTINUE, if the calling thread is the last to
  call H_JOIN. This thread is expected to call ibm,suspend-me to
  completely suspend the partition.

Upon returning from ibm,suspend-me the calling thread must wake all
others using H_PROD.

rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this
protocol, but because of its synchronizing nature this is susceptible
to deadlock versus users of stop_machine() or other callers of
on_each_cpu().

Not only is stop_machine() intended for use cases like this, it
handles error propagation and allows us to keep the data shared
between CPUs minimal: a single atomic counter which ensures exactly
one CPU will wake the others from their joined states.

Switch the migration code to use stop_machine() and a less complex
local implementation of the H_JOIN/ibm,suspend-me logic, which
carries additional benefits:

- more informative error reporting, appropriately ratelimited
- resets the lockup detector / watchdog on resume to prevent lockup
  warnings when the OS has been suspended for a time exceeding the
  threshold.

Fixes: 91dc182ca6e2 ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-13-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: extract VASI session polling logic
Nathan Lynch [Mon, 7 Dec 2020 21:51:43 +0000 (15:51 -0600)]
powerpc/pseries/mobility: extract VASI session polling logic

The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to
the caller until the specified VASI suspend session state makes the
transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest
of separating concerns to prepare for a new implementation of the
join/suspend sequence, extract VASI session polling logic into a
couple of local functions. Waiting for the session state to reach
H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures
that we will never get an EAGAIN result necessitating a retry. No
user-visible change in behavior is intended.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-12-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: use rtas_activate_firmware() on resume
Nathan Lynch [Mon, 7 Dec 2020 21:51:42 +0000 (15:51 -0600)]
powerpc/pseries/mobility: use rtas_activate_firmware() on resume

It's incorrect to abort post-suspend processing if
ibm,activate-firmware isn't available. Use rtas_activate_firmware(),
which logs this condition appropriately and allows us to proceed.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-11-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: error message improvements
Nathan Lynch [Mon, 7 Dec 2020 21:51:41 +0000 (15:51 -0600)]
powerpc/pseries/mobility: error message improvements

- Convert printk(KERN_ERR) to pr_err().
- Include errno in property update failure message.
- Remove reference to "Post-mobility" from device tree update message:
  with pr_err() it will have a "mobility:" prefix.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-10-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: add missing break to default case
Nathan Lynch [Mon, 7 Dec 2020 21:51:40 +0000 (15:51 -0600)]
powerpc/pseries/mobility: add missing break to default case

update_dt_node() has a switch statement where the default case lacks a
break statement.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-9-nathanl@linux.ibm.com
3 years agopowerpc/pseries/mobility: don't error on absence of ibm, update-nodes
Nathan Lynch [Mon, 7 Dec 2020 21:51:39 +0000 (15:51 -0600)]
powerpc/pseries/mobility: don't error on absence of ibm, update-nodes

Treat the absence of the ibm,update-nodes function as benign instead
of reporting an error. If the platform does not provide that facility,
it's not a problem for Linux.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-8-nathanl@linux.ibm.com
3 years agopowerpc/hvcall: add token and codes for H_VASI_SIGNAL
Nathan Lynch [Mon, 7 Dec 2020 21:51:38 +0000 (15:51 -0600)]
powerpc/hvcall: add token and codes for H_VASI_SIGNAL

H_VASI_SIGNAL can be used by a partition to request cancellation of
its migration. To be used in future changes.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-7-nathanl@linux.ibm.com
3 years agopowerpc/rtas: add rtas_activate_firmware()
Nathan Lynch [Mon, 7 Dec 2020 21:51:37 +0000 (15:51 -0600)]
powerpc/rtas: add rtas_activate_firmware()

Provide a documented wrapper function for the ibm,activate-firmware
service, which must be called after a partition migration or
hibernation.

If the function is absent or the call fails, the OS will continue to
run normally with the current firmware, so there is no need to perform
any recovery. Just log it and continue.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-6-nathanl@linux.ibm.com
3 years agopowerpc/rtas: add rtas_ibm_suspend_me()
Nathan Lynch [Mon, 7 Dec 2020 21:51:36 +0000 (15:51 -0600)]
powerpc/rtas: add rtas_ibm_suspend_me()

Now that the name is available, provide a simple wrapper for
ibm,suspend-me which returns both a Linux errno and optionally the
actual RTAS status to the caller.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-5-nathanl@linux.ibm.com
3 years agopowerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe
Nathan Lynch [Mon, 7 Dec 2020 21:51:35 +0000 (15:51 -0600)]
powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafe

The pseries partition suspend sequence requires that all active CPUs
call H_JOIN, which suspends all but one of them with interrupts
disabled. The "chosen" CPU is then to call ibm,suspend-me to complete
the suspend. Upon returning from ibm,suspend-me, the chosen CPU is to
use H_PROD to wake the joined CPUs.

Using on_each_cpu() for this, as rtas_ibm_suspend_me() does to
implement partition migration, is susceptible to deadlock with other
users of on_each_cpu() and with users of stop_machine APIs. The
callback passed to on_each_cpu() is not allowed to synchronize with
other CPUs in the way it is used here.

Complicating the fix is the fact that rtas_ibm_suspend_me() also
occupies the function name that should be used to provide a more
conventional wrapper for ibm,suspend-me. Rename rtas_ibm_suspend_me()
to rtas_ibm_suspend_me_unsafe() to free up the name and indicate that
it should not gain users.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-4-nathanl@linux.ibm.com
3 years agopowerpc/rtas: complete ibm,suspend-me status codes
Nathan Lynch [Mon, 7 Dec 2020 21:51:34 +0000 (15:51 -0600)]
powerpc/rtas: complete ibm,suspend-me status codes

We don't completely account for the possible return codes for
ibm,suspend-me. Add definitions for these.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-3-nathanl@linux.ibm.com
3 years agopowerpc/rtas: prevent suspend-related sys_rtas use on LE
Nathan Lynch [Mon, 7 Dec 2020 21:51:33 +0000 (15:51 -0600)]
powerpc/rtas: prevent suspend-related sys_rtas use on LE

While drmgr has had work in some areas to make its RTAS syscall
interactions endian-neutral, its code for performing partition
migration via the syscall has never worked on LE. While it is able to
complete ibm,suspend-me successfully, it crashes when attempting the
subsequent ibm,update-nodes call.

drmgr is the only known (or plausible) user of ibm,suspend-me,
ibm,update-nodes, and ibm,update-properties, so allow them only in
big-endian configurations.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207215200.1785968-2-nathanl@linux.ibm.com
3 years agopowerpc/book3s64/kuap: Improve error reporting with KUAP
Aneesh Kumar K.V [Tue, 8 Dec 2020 03:15:39 +0000 (08:45 +0530)]
powerpc/book3s64/kuap: Improve error reporting with KUAP

This partially reverts commit eb232b162446 ("powerpc/book3s64/kuap: Improve
error reporting with KUAP") and update the fault handler to print

[   55.022514] Kernel attempted to access user page (7e6725b70000) - exploit attempt? (uid: 0)
[   55.022528] BUG: Unable to handle kernel data access on read at 0x7e6725b70000
[   55.022533] Faulting instruction address: 0xc000000000e8b9bc
[   55.022540] Oops: Kernel access of bad area, sig: 11 [#1]
....

when the kernel access userspace address without unlocking AMR.

bad_kuap_fault() is added as part of commit 5e5be3aed230 ("powerpc/mm: Detect
bad KUAP faults") to catch userspace access incorrectly blocked by AMR. Hence
retain the full stack dump there even with hash translation. Also, add a comment
explaining the difference between hash and radix.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208031539.84878-1-aneesh.kumar@linux.ibm.com
3 years agopowerpc/powernv/idle: Restore CIABR after idle for Power9
Jordan Niethe [Mon, 7 Dec 2020 01:05:19 +0000 (12:05 +1100)]
powerpc/powernv/idle: Restore CIABR after idle for Power9

On Power9, CIABR is lost after idle. This means that instruction
breakpoints set by xmon which use CIABR do not work. Fix this by
restoring CIABR after idle.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207010519.15597-2-jniethe5@gmail.com
3 years agopowerpc/book3s64/kexec: Clear CIABR on kexec
Jordan Niethe [Mon, 7 Dec 2020 01:05:18 +0000 (12:05 +1100)]
powerpc/book3s64/kexec: Clear CIABR on kexec

The value in CIABR persists across kexec which can lead to unintended
results when the new kernel hits the old kernel's breakpoint. For
example:

0:mon> bi $loadavg_proc_show
0:mon> b
   type            address
1 inst   c000000000519060  loadavg_proc_show+0x0/0x130
0:mon> x

$ kexec -l /mnt/vmlinux --initrd=/mnt/rootfs.cpio.gz --append='xmon=off'
$ kexec -e

$ cat /proc/loadavg
Trace/breakpoint trap

Make sure CIABR is cleared so this does not happen.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201207010519.15597-1-jniethe5@gmail.com
3 years agopowerpc: Remove ucache_bsize
Christophe Leroy [Tue, 17 Nov 2020 05:07:59 +0000 (05:07 +0000)]
powerpc: Remove ucache_bsize

ppc601 and e200 were the users of ucache_bsize.
ppc601 and e200 are now gone.

Remove ucache_bsize.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/288b6048597c0fdc495b203fda57a223d89499d2.1605589460.git.christophe.leroy@csgroup.eu
3 years agopowerpc: Retire e200 core (mpc555x processor)
Christophe Leroy [Tue, 17 Nov 2020 05:07:58 +0000 (05:07 +0000)]
powerpc: Retire e200 core (mpc555x processor)

There is no defconfig selecting CONFIG_E200, and no platform.

e200 is an earlier version of booke, a predecessor of e500,
with some particularities like an unified cache instead of both an
instruction cache and a data cache.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/34ebc3ba2c768d97f363bd5f2deea2356e9ae127.1605589460.git.christophe.leroy@csgroup.eu
3 years agopowerpc: Fix update form addressing in inline assembly
Christophe Leroy [Thu, 22 Oct 2020 09:29:21 +0000 (09:29 +0000)]
powerpc: Fix update form addressing in inline assembly

In several places, inline assembly uses the "%Un" modifier
to enable the use of instruction with update form addressing,
but the associated "<>" constraint is missing.

As mentioned in previous patch, this fails with gcc 4.9, so
"<>" can't be used directly.

Use UPD_CONSTR macro everywhere %Un modifier is used.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62eab5ca595485c192de1765bdac099f633a21d0.1603358942.git.christophe.leroy@csgroup.eu
3 years agopowerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
Mathieu Desnoyers [Thu, 22 Oct 2020 09:29:20 +0000 (09:29 +0000)]
powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at

The placeholder for instruction selection should use the second
argument's operand, which is %1, not %0. This could generate incorrect
assembly code if the memory addressing of operand %0 is a different
form from that of operand %1.

Also remove the %Un placeholder because having %Un placeholders
for two operands which are based on the same local var (ptep) doesn't
make much sense. By the way, it doesn't change the current behaviour
because "<>" constraint is missing for the associated "=m".

[chleroy: revised commit log iaw segher's comments and removed %U0]

Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support")
Cc: <stable@vger.kernel.org> # v2.6.28+
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96354bd77977a6a933fe9020da57629007fdb920.1603358942.git.christophe.leroy@csgroup.eu
3 years agopowerpc/xmon: Change printk() to pr_cont()
Christophe Leroy [Fri, 4 Dec 2020 10:35:38 +0000 (10:35 +0000)]
powerpc/xmon: Change printk() to pr_cont()

Since some time now, printk() adds carriage return, leading to
unusable xmon output if there is no udbg backend available:

  [   54.288722] sysrq: Entering xmon
  [   54.292209] Vector: 0  at [cace3d2c]
  [   54.292274]     pc:
  [   54.292331] c0023650
  [   54.292468] : xmon+0x28/0x58
  [   54.292519]
  [   54.292574]     lr:
  [   54.292630] c0023724
  [   54.292749] : sysrq_handle_xmon+0xa4/0xfc
  [   54.292801]
  [   54.292867]     sp: cace3de8
  [   54.292931]    msr: 9032
  [   54.292999]   current = 0xc28d0000
  [   54.293072]     pid   = 377, comm = sh
  [   54.293157] Linux version 5.10.0-rc6-s3k-dev-01364-gedf13f0ccd76-dirty (root@po17688vm.idsi0.si.c-s.fr) (powerpc64-linux-gcc (GCC) 10.1.0, GNU ld (GNU Binutils) 2.34) #4211 PREEMPT Fri Dec 4 09:32:11 UTC 2020
  [   54.293287] enter ? for help
  [   54.293470] [cace3de8]
  [   54.293532] c0023724
  [   54.293654]  sysrq_handle_xmon+0xa4/0xfc
  [   54.293711]  (unreliable)
  ...
  [   54.296002]
  [   54.296159] --- Exception: c01 (System Call) at
  [   54.296217] 0fd4e784
  [   54.296303]
  [   54.296375] SP (7fca6ff0) is in userspace
  [   54.296431] mon>
  [   54.296484]  <no input ...>

Use pr_cont() instead.

Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Mention that it only happens when udbg is not available]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c8a6ec704416ecd5ff2bd26213c9bc026bdd19de.1607077340.git.christophe.leroy@csgroup.eu
3 years agopowerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
Alexey Kardashevskiy [Sun, 22 Nov 2020 07:38:28 +0000 (18:38 +1100)]
powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU

We execute certain NPU2 setup code (such as mapping an LPID to a device
in NPU2) unconditionally if an Nvlink bridge is detected. However this
cannot succeed on POWER8NVL machines and errors appear in dmesg. This is
harmless as skiboot returns an error and the only place we check it is
vfio-pci but that code does not get called on P8+ either.

This adds a check if pnv_npu2_xxx helpers are called on a machine with
NPU2 which initializes pnv_phb::npu in pnv_npu2_init();
pnv_phb::npu==NULL on POWER8/NVL (Naples).

While at this, fix NULL derefencing in pnv_npu_peers_take_ownership/
pnv_npu_peers_release_ownership which occurs when GPUs on mentioned P8s
cause EEH which happens if "vfio-pci" disables devices using
the D3 power state; the vfio-pci's disable_idle_d3 module parameter
controls this and must be set on Naples. The EEH handling clears
the entire pnv_ioda_pe struct in pnv_ioda_free_pe() hence
the NULL derefencing. We cannot recover from that but at least we stop
crashing.

Tested on
- POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm,
  NVIDIA GV100 10de:1db1 driver 418.39
- POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm,
  NVIDIA P100 10de:15f9 driver 396.47

Fixes: 1b785611e119 ("powerpc/powernv/npu: Add release_ownership hook")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201122073828.15446-1-aik@ozlabs.ru
3 years agolkdtm/powerpc: Add SLB multihit test
Ganesh Goudar [Mon, 30 Nov 2020 08:30:57 +0000 (14:00 +0530)]
lkdtm/powerpc: Add SLB multihit test

To check machine check handling, add support to inject slb
multihit errors.

Co-developed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
[mpe: Use CONFIG_PPC_BOOK3S_64 to fix compile errors reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201130083057.135610-1-ganeshgr@linux.ibm.com
3 years agopowernv/pci: Print an error when device enable is blocked
Oliver O'Halloran [Thu, 9 Apr 2020 06:13:37 +0000 (16:13 +1000)]
powernv/pci: Print an error when device enable is blocked

If the platform decides to block enabling the device nothing is printed
currently. This can lead to some confusion since the dmesg output will
usually print an error with no context e.g.

e1000e: probe of 0022:01:00.0 failed with error -22

This shouldn't be spammy since pci_enable_device() already prints a
messages when it succeeds.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200409061337.9187-1-oohall@gmail.com
3 years agopowerpc/pci: Remove LSI mappings on device teardown
Oliver O'Halloran [Wed, 2 Dec 2020 00:52:22 +0000 (11:52 +1100)]
powerpc/pci: Remove LSI mappings on device teardown

When a passthrough IO adapter is removed from a pseries machine using hash
MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
to clear all page table entries related to the adapter. If some are still
present, the RTAS call which isolates the PCI slot returns error 9001
"valid outstanding translations" and the removal of the IO adapter fails.
This is because when the PHBs are scanned, Linux maps automatically the
INTx interrupts in the Linux interrupt number space but these are never
removed.

This problem can be fixed by adding the corresponding unmap operation when
the device is removed. There's no pcibios_* hook for the remove case, but
the same effect can be achieved using a bus notifier.

Because INTx are shared among PHBs (and potentially across the system),
this adds tracking of virq to unmap them only when the last user is gone.

[aik: added refcounter]

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201202005222.5477-1-aik@ozlabs.ru
3 years agopowerpc: add security.config, enforcing lockdown=integrity
Daniel Axtens [Thu, 3 Dec 2020 04:28:07 +0000 (15:28 +1100)]
powerpc: add security.config, enforcing lockdown=integrity

It's sometimes handy to have a config that boots a bit like a system
under secure boot (forcing lockdown=integrity, without needing any
extra stuff like a command line option).

This config file allows that, and also turns on a few assorted security
and hardening options for good measure.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201203042807.1293655-1-dja@axtens.net
3 years agopowerpc/44x: Don't support 47x code and non 47x code at the same time
Christophe Leroy [Sun, 18 Oct 2020 17:25:18 +0000 (17:25 +0000)]
powerpc/44x: Don't support 47x code and non 47x code at the same time

440/460 variants and 470 variants are not compatible, no
need to make code supporting both and using MMU features.

Just use CONFIG_PPC_47x to decide what to build.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c3e64da3d5d068c69a201e03bbae7da055761e5b.1603041883.git.christophe.leroy@csgroup.eu