This facility can be used to prevent such uncontrolled
GPE floodings.
Format: <int>
- Support masking of GPEs numbered from 0x00 to 0x7f.
acpi_no_auto_serialize [HW,ACPI]
Disable auto-serialization of AML methods
acpi_sleep= [HW,ACPI] Sleep options
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
- old_ordering, nonvs, sci_force_enable }
+ old_ordering, nonvs, sci_force_enable, nobl }
See Documentation/power/video.txt for information on
s3_bios and s3_mode.
s3_beep is for debugging; it makes the PC's speaker beep
sci_force_enable causes the kernel to set SCI_EN directly
on resume from S1/S3 (which is against the ACPI spec,
but some broken systems don't work without it).
+ nobl causes the internal blacklist of systems known to
+ behave incorrectly in some ways with respect to system
+ suspend and resume to be ignored (use wisely).
acpi_use_timer_override [HW,ACPI]
Use timer override. For some broken Nvidia NF5 boards
not play well with APC CPU idle - disable it if you have
APC and your system crashes randomly.
- apic= [APIC,X86-32] Advanced Programmable Interrupt Controller
+ apic= [APIC,X86] Advanced Programmable Interrupt Controller
Change the output verbosity whilst booting
Format: { quiet (default) | verbose | debug }
Change the amount of debugging information output
when initialising the APIC and IO-APIC components.
+ For X86-32, this can also be used to specify an APIC
+ driver name.
+ Format: apic=driver_name
+ Examples: apic=bigsmp
apic_extnmi= [APIC,X86] External NMI delivery setting
Format: { bsp (default) | all | none }
console=brl,ttyS0
For now, only VisioBraille is supported.
+ console_msg_format=
+ [KNL] Change console messages format
+ default
+ By default we print messages on consoles in
+ "[time stamp] text\n" format (time stamp may not be
+ printed, depending on CONFIG_PRINTK_TIME or
+ `printk_time' param).
+ syslog
+ Switch to syslog format: "<%u>[time stamp] text\n"
+ IOW, each message will have a facility and loglevel
+ prefix. The format is similar to one used by syslog()
+ syscall, or to executing "dmesg -S --raw" or to reading
+ from /proc/kmsg.
+
consoleblank= [KNL] The console blank (screen saver) timeout in
seconds. A value of 0 disables the blank timer.
Defaults to 0.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
- crossrelease_fullstack
- [KNL] Allow to record full stack trace in cross-release
-
cryptomgr.notests
[KNL] Disable crypto self-tests
isapnp= [ISAPNP]
Format: <RDP>,<reset>,<pci_scan>,<verbosity>
- isolcpus= [KNL,SMP] Isolate a given set of CPUs from disturbance.
+ isolcpus= [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
[Deprecated - use cpusets instead]
Format: [flag-list,]<cpu-list>
[KVM,ARM] Trap guest accesses to GICv3 common
system registers
+ kvm-arm.vgic_v4_enable=
+ [KVM,ARM] Allow use of GICv4 for direct injection of
+ LPIs.
+
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
This tests the locking primitive's ability to
transition abruptly to and from idle.
- locktorture.torture_runnable= [BOOT]
- Start locktorture running at boot time.
-
locktorture.torture_type= [KNL]
Specify the locking implementation to test.
This is useful when you use a panic=... timeout and
need the box quickly up again.
+ These settings can be accessed at runtime via
+ the nmi_watchdog and hardlockup_panic sysctls.
+
netpoll.carrier_timeout=
[NET] Specifies amount of time (in seconds) that
netpoll should wait for a carrier. By default netpoll
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
+ nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
+ (indirect branch prediction) vulnerability. System may
+ allow data leaks with this option, which is equivalent
+ to spectre_v2=off.
+
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
Valid arguments: on, off
Default: on
- nohz_full= [KNL,BOOT]
+ nohz_full= [KNL,BOOT,SMP,ISOL]
The argument is a cpu list, as described above.
In kernels built with CONFIG_NO_HZ_FULL=y, set
the specified list of CPUs whose tick will be stopped
pcie_scan_all Scan all possible PCIe devices. Otherwise we
only look for one device below a PCIe downstream
port.
+ big_root_window Try to add a big 64bit memory window to the PCIe
+ root complex on AMD CPUs. Some GFX hardware
+ can resize a BAR to allow access to all VRAM.
+ Adding the window is slightly risky (it may
+ conflict with unreported devices), so this
+ taints the kernel.
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
instead using the legacy FADT method
profile= [KNL] Enable kernel profiling via /proc/profile
- Format: [schedule,]<number>
+ Format: [<profiletype>,]<number>
+ Param: <profiletype>: "schedule", "sleep", or "kvm"
+ [defaults to kernel profiling]
Param: "schedule" - profile schedule points.
- Param: <number> - step/bucket size as a power of 2 for
- statistical time based profiling.
Param: "sleep" - profile D-state sleeping (millisecs).
Requires CONFIG_SCHEDSTATS
Param: "kvm" - profile VM exits.
+ Param: <number> - step/bucket size as a power of 2 for
+ statistical time based profiling.
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64] Control Page Table Isolation of user and
+ kernel address spaces. Disabling this feature
+ removes hardening, but improves performance of
+ system calls and interrupts.
+
+ on - unconditionally enable
+ off - unconditionally disable
+ auto - kernel detects whether your CPU model is
+ vulnerable to issues that PTI mitigates
+
+ Not specifying this option is equivalent to pti=auto.
+
+ nopti [X86_64]
+ Equivalent to pti=off
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
the same as for rcuperf.nreaders.
N, where N is the number of CPUs
- rcuperf.perf_runnable= [BOOT]
- Start rcuperf running at boot time.
-
rcuperf.perf_type= [KNL]
Specify the RCU implementation to test.
Test RCU's dyntick-idle handling. See also the
rcutorture.shuffle_interval parameter.
- rcutorture.torture_runnable= [BOOT]
- Start rcutorture running at boot time.
-
rcutorture.torture_type= [KNL]
Specify the RCU implementation to test.
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
- cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, mba.
+ cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
+ mba.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/laptops/sonypi.txt
+ spectre_v2= [X86] Control mitigation of Spectre variant 2
+ (indirect branch speculation) vulnerability.
+
+ on - unconditionally enable
+ off - unconditionally disable
+ auto - kernel detects whether your CPU model is
+ vulnerable
+
+ Selecting 'on' will, and 'auto' may, choose a
+ mitigation method at run time according to the
+ CPU, the available microcode, the setting of the
+ CONFIG_RETPOLINE configuration option, and the
+ compiler with which the kernel was built.
+
+ Specific mitigations can also be selected manually:
+
+ retpoline - replace indirect branches
+ retpoline,generic - google's original retpoline
+ retpoline,amd - AMD-specific minimal thunk
+
+ Not specifying this option is equivalent to
+ spectre_v2=auto.
+
spia_io_base= [HW,MTD]
spia_fio_base=
spia_pedr=
s64 %lld or %llx
u64 %llu or %llx
-If <type> is dependent on a config option for its size (e.g., ``sector_t``,
-``blkcnt_t``) or is architecture-dependent for its size (e.g., ``tcflag_t``),
-use a format specifier of its largest possible type and explicitly cast to it.
+
+If <type> is dependent on a config option for its size (e.g., sector_t,
+blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a
+format specifier of its largest possible type and explicitly cast to it.
Example::
printk("test: sector number/total blocks: %llu/%llu\n",
(unsigned long long)sector, (unsigned long long)blockcount);
-Reminder: ``sizeof()`` result is of type ``size_t``.
+Reminder: sizeof() returns type size_t.
-The kernel's printf does not support ``%n``. For obvious reasons, floating
-point formats (``%e, %f, %g, %a``) are also not recognized. Use of any
+The kernel's printf does not support %n. Floating point formats (%e, %f,
+%g, %a) are also not recognized, for obvious reasons. Use of any
unsupported specifier or length qualifier results in a WARN and early
-return from vsnprintf.
+return from vsnprintf().
+
+Pointer types
+=============
+
+A raw pointer value may be printed with %p which will hash the address
+before printing. The kernel also supports extended specifiers for printing
+pointers of different types.
+
+Plain Pointers
+--------------
+
+::
-Raw pointer value SHOULD be printed with %p. The kernel supports
-the following extended format specifiers for pointer types:
+ %p abcdef12 or 00000000abcdef12
+
+Pointers printed without a specifier extension (i.e unadorned %p) are
+hashed to prevent leaking information about the kernel memory layout. This
+has the added benefit of providing a unique identifier. On 64-bit machines
+the first 32 bits are zeroed. If you *really* want the address see %px
+below.
Symbols/Function Pointers
-=========================
+-------------------------
::
+ %pS versatile_init+0x0/0x110
+ %ps versatile_init
%pF versatile_init+0x0/0x110
%pf versatile_init
- %pS versatile_init+0x0/0x110
%pSR versatile_init+0x9/0x110
(with __builtin_extract_return_addr() translation)
- %ps versatile_init
%pB prev_fn_of_versatile_init+0x88/0x88
- The ``F`` and ``f`` specifiers are for printing function pointers,
- for example, f->func, &gettimeofday. They have the same result as
- ``S`` and ``s`` specifiers. But they do an extra conversion on
- ia64, ppc64 and parisc64 architectures where the function pointers
- are actually function descriptors.
+
-format. They result in the symbol name with (``S``) or without (``s``)
+ The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic
++format. They result in the symbol name with (S) or without (s)
+ offsets. If KALLSYMS are disabled then the symbol address is printed instead.
- The ``S`` and ``s`` specifiers can be used for printing symbols
- from direct addresses, for example, __builtin_return_address(0),
- (void *)regs->ip. They result in the symbol name with (S) or
- without (s) offsets. If KALLSYMS are disabled then the symbol
- address is printed instead.
+ Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+ and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
+ parisc64 function pointers are indirect and, in fact, are function
+ descriptors, which require additional dereferencing before we can lookup
+ the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
+ platforms (when needed), so ``F`` and ``f`` exist for compatibility
+ reasons only.
The ``B`` specifier results in the symbol name with offsets and should be
used when printing stack backtraces. The specifier takes into
consideration the effect of compiler optimisations which may occur
-when tail-call``s are used and marked with the noreturn GCC attribute.
+when tail-calls are used and marked with the noreturn GCC attribute.
- Examples::
-
- printk("Going to call: %pF\n", gettimeofday);
- printk("Going to call: %pF\n", p->func);
- printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
- printk("%s: called from %pS\n", __func__,
- (void *)__builtin_return_address(0));
- printk("Faulted at %pS\n", (void *)regs->ip);
- printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);
-
Kernel Pointers
-===============
+---------------
::
- %pK 0x01234567 or 0x0123456789abcdef
+ %pK 01234567 or 0123456789abcdef
For printing kernel pointers which should be hidden from unprivileged
-users. The behaviour of ``%pK`` depends on the ``kptr_restrict sysctl`` - see
+users. The behaviour of %pK depends on the kptr_restrict sysctl - see
Documentation/sysctl/kernel.txt for more details.
+Unmodified Addresses
+--------------------
+
+::
+
+ %px 01234567 or 0123456789abcdef
+
+For printing pointers when you *really* want to print the address. Please
+consider whether or not you are leaking sensitive information about the
+kernel memory layout before printing pointers with %px. %px is functionally
+equivalent to %lx (or %lu). %px is preferred because it is more uniquely
+grep'able. If in the future we need to modify the way the kernel handles
+printing pointers we will be better equipped to find the call sites.
+
Struct Resources
-================
+----------------
::
[mem 0x0000000060000000-0x000000006fffffff pref]
For printing struct resources. The ``R`` and ``r`` specifiers result in a
-printed resource with (``R``) or without (``r``) a decoded flags member.
+printed resource with (R) or without (r) a decoded flags member.
+
Passed by reference.
-Physical addresses types ``phys_addr_t``
-========================================
+Physical address types phys_addr_t
+----------------------------------
::
%pa[p] 0x01234567 or 0x0123456789abcdef
-For printing a ``phys_addr_t`` type (and its derivatives, such as
-``resource_size_t``) which can vary based on build options, regardless of
-the width of the CPU data path. Passed by reference.
+For printing a phys_addr_t type (and its derivatives, such as
+resource_size_t) which can vary based on build options, regardless of the
+width of the CPU data path.
+
+Passed by reference.
-DMA addresses types ``dma_addr_t``
-==================================
+DMA address types dma_addr_t
+----------------------------
::
%pad 0x01234567 or 0x0123456789abcdef
-For printing a ``dma_addr_t`` type which can vary based on build options,
-regardless of the width of the CPU data path. Passed by reference.
+For printing a dma_addr_t type which can vary based on build options,
+regardless of the width of the CPU data path.
+
+Passed by reference.
Raw buffer as an escaped string
-===============================
+-------------------------------
::
1b 62 20 5c 43 07 22 90 0d 5d
-few examples show how the conversion would be done (the result string
-without surrounding quotes)::
+A few examples show how the conversion would be done (excluding surrounding
+quotes)::
%*pE "\eb \C\a"\220\r]"
%*pEhp "\x1bb \C\x07"\x90\x0d]"
of flags (see :c:func:`string_escape_mem` kernel documentation for the
details):
- - ``a`` - ESCAPE_ANY
- - ``c`` - ESCAPE_SPECIAL
- - ``h`` - ESCAPE_HEX
- - ``n`` - ESCAPE_NULL
- - ``o`` - ESCAPE_OCTAL
- - ``p`` - ESCAPE_NP
- - ``s`` - ESCAPE_SPACE
+ - a - ESCAPE_ANY
+ - c - ESCAPE_SPECIAL
+ - h - ESCAPE_HEX
+ - n - ESCAPE_NULL
+ - o - ESCAPE_OCTAL
+ - p - ESCAPE_NP
+ - s - ESCAPE_SPACE
By default ESCAPE_ANY_NP is used.
ESCAPE_ANY_NP is the sane choice for many cases, in particularly for
printing SSIDs.
-If field width is omitted the 1 byte only will be escaped.
+If field width is omitted then 1 byte only will be escaped.
Raw buffer as a hex string
-==========================
+--------------------------
::
%*phD 00-01-02- ... -3f
%*phN 000102 ... 3f
-For printing a small buffers (up to 64 bytes long) as a hex string with
-certain separator. For the larger buffers consider to use
+For printing small buffers (up to 64 bytes long) as a hex string with a
+certain separator. For larger buffers consider using
:c:func:`print_hex_dump`.
MAC/FDDI addresses
-==================
+------------------
::
%pmR 050403020100
For printing 6-byte MAC/FDDI addresses in hex notation. The ``M`` and ``m``
-specifiers result in a printed address with (``M``) or without (``m``) byte
-separators. The default byte separator is the colon (``:``).
+specifiers result in a printed address with (M) or without (m) byte
+separators. The default byte separator is the colon (:).
Where FDDI addresses are concerned the ``F`` specifier can be used after
-the ``M`` specifier to use dash (``-``) separators instead of the default
+the ``M`` specifier to use dash (-) separators instead of the default
separator.
For Bluetooth addresses the ``R`` specifier shall be used after the ``M``
Passed by reference.
IPv4 addresses
-==============
+--------------
::
%p[Ii]4[hnbl]
For printing IPv4 dot-separated decimal addresses. The ``I4`` and ``i4``
-specifiers result in a printed address with (``i4``) or without (``I4``)
-leading zeros.
+specifiers result in a printed address with (i4) or without (I4) leading
+zeros.
The additional ``h``, ``n``, ``b``, and ``l`` specifiers are used to specify
host, network, big or little endian order addresses respectively. Where
Passed by reference.
IPv6 addresses
-==============
+--------------
::
%pI6c 1:2:3:4:5:6:7:8
For printing IPv6 network-order 16-bit hex addresses. The ``I6`` and ``i6``
-specifiers result in a printed address with (``I6``) or without (``i6``)
+specifiers result in a printed address with (I6) or without (i6)
colon-separators. Leading zeros are always used.
The additional ``c`` specifier can be used with the ``I`` specifier to
Passed by reference.
IPv4/IPv6 addresses (generic, with port, flowinfo, scope)
-=========================================================
+---------------------------------------------------------
::
%pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345
%p[Ii]S[pfschnbl]
-For printing an IP address without the need to distinguish whether it``s
-of type AF_INET or AF_INET6, a pointer to a valid ``struct sockaddr``,
+For printing an IP address without the need to distinguish whether it's of
+type AF_INET or AF_INET6. A pointer to a valid struct sockaddr,
specified through ``IS`` or ``iS``, can be passed to this format specifier.
The additional ``p``, ``f``, and ``s`` specifiers are used to specify port
%pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789
UUID/GUID addresses
-===================
+-------------------
::
%pUl 03020100-0504-0706-0809-0a0b0c0e0e0f
%pUL 03020100-0504-0706-0809-0A0B0C0E0E0F
-For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L',
-'b' and 'B' specifiers are used to specify a little endian order in
-lower ('l') or upper case ('L') hex characters - and big endian order
-in lower ('b') or upper case ('B') hex characters.
+For printing 16-byte UUID/GUIDs addresses. The additional ``l``, ``L``,
+``b`` and ``B`` specifiers are used to specify a little endian order in
+lower (l) or upper case (L) hex notation - and big endian order in lower (b)
+or upper case (B) hex notation.
Where no additional specifiers are used the default big endian
-order with lower case hex characters will be printed.
+order with lower case hex notation will be printed.
Passed by reference.
dentry names
-============
+------------
::
%pd{,2,3,4}
%pD{,2,3,4}
-For printing dentry name; if we race with :c:func:`d_move`, the name might be
-a mix of old and new ones, but it won't oops. ``%pd`` dentry is a safer
-equivalent of ``%s`` ``dentry->d_name.name`` we used to use, ``%pd<n>`` prints
-``n`` last components. ``%pD`` does the same thing for struct file.
+For printing dentry name; if we race with :c:func:`d_move`, the name might
+be a mix of old and new ones, but it won't oops. %pd dentry is a safer
+equivalent of %s dentry->d_name.name we used to use, %pd<n> prints ``n``
+last components. %pD does the same thing for struct file.
Passed by reference.
block_device names
-==================
+------------------
::
For printing name of block_device pointers.
struct va_format
-================
+----------------
::
Passed by reference.
kobjects
-========
+--------
::
- %pO
+ %pOF[fnpPcCF]
- Base specifier for kobject based structs. Must be followed with
- character for specific type of kobject as listed below:
- Device tree nodes:
+For printing kobject based structs (device nodes). Default behaviour is
+equivalent to %pOFf.
- %pOF[fnpPcCF]
+ - f - device node full_name
+ - n - device node name
+ - p - device node phandle
+ - P - device node path spec (name + @unit)
+ - F - device node flags
+ - c - major compatible string
+ - C - full compatible string
- For printing device tree nodes. The optional arguments are:
- f device node full_name
- n device node name
- p device node phandle
- P device node path spec (name + @unit)
- F device node flags
- c major compatible string
- C full compatible string
- Without any arguments prints full_name (same as %pOFf)
- The separator when using multiple arguments is ':'
+The separator when using multiple arguments is ':'
- Examples:
+Examples::
%pOF /foo/bar@0 - Node full name
%pOFf /foo/bar@0 - Same as above
P - Populated
B - Populated bus
- Passed by reference.
-
+Passed by reference.
struct clk
-==========
+----------
::
%pCn pll1
%pCr 1560000000
-For printing struct clk structures. ``%pC`` and ``%pCn`` print the name
+For printing struct clk structures. %pC and %pCn print the name
(Common Clock Framework) or address (legacy clock framework) of the
-structure; ``%pCr`` prints the current clock rate.
+structure; %pCr prints the current clock rate.
Passed by reference.
bitmap and its derivatives such as cpumask and nodemask
-=======================================================
+-------------------------------------------------------
::
%*pbl 0,3-6,8-10
For printing bitmap and its derivatives such as cpumask and nodemask,
-``%*pb`` output the bitmap with field width as the number of bits and ``%*pbl``
+%*pb outputs the bitmap with field width as the number of bits and %*pbl
output the bitmap as range list with field width as the number of bits.
Passed by reference.
Flags bitfields such as page flags, gfp_flags
-=============================================
+---------------------------------------------
::
expect ``unsigned long *``) and [g]fp_flags (expects ``gfp_t *``). The flag
names and print order depends on the particular type.
-Note that this format should not be used directly in :c:func:`TP_printk()` part
-of a tracepoint. Instead, use the ``show_*_flags()`` functions from
-<trace/events/mmflags.h>.
+Note that this format should not be used directly in the
+:c:func:`TP_printk()` part of a tracepoint. Instead, use the show_*_flags()
+functions from <trace/events/mmflags.h>.
Passed by reference.
Network device features
-=======================
+-----------------------
::
Passed by reference.
-If you add other ``%p`` extensions, please extend lib/test_printf.c with
-one or more test cases, if at all feasible.
+Thanks
+======
+If you add other %p extensions, please extend <lib/test_printf.c> with
+one or more test cases, if at all feasible.
Thank you for your cooperation and attention.
#include <linux/delay.h>
#include <linux/reboot.h>
#include <linux/interrupt.h>
- #include <linux/kallsyms.h>
#include <linux/init.h>
#include <linux/cpu.h>
#include <linux/elfcore.h>
show_regs_print_info(KERN_DEFAULT);
print_pstate(regs);
- print_symbol("pc : %s\n", regs->pc);
- print_symbol("lr : %s\n", lr);
+ printk("pc : %pS\n", (void *)regs->pc);
+ printk("lr : %pS\n", (void *)lr);
printk("sp : %016llx\n", sp);
i = top_reg;
clear_tsk_thread_flag(p, TIF_SVE);
p->thread.sve_state = NULL;
+ /*
+ * In case p was allocated the same task_struct pointer as some
+ * other recently-exited task, make sure p is disassociated from
+ * any cpu that may have run that now-exited task recently.
+ * Otherwise we could erroneously skip reloading the FPSIMD
+ * registers for p.
+ */
+ fpsimd_flush_task_state(p);
+
if (likely(!(p->flags & PF_KTHREAD))) {
*childregs = *current_pt_regs();
childregs->regs[0] = 0;
static void tls_thread_switch(struct task_struct *next)
{
- unsigned long tpidr, tpidrro;
-
tls_preserve_current_state();
- tpidr = *task_user_tls(next);
- tpidrro = is_compat_thread(task_thread_info(next)) ?
- next->thread.tp_value : 0;
+ if (is_compat_thread(task_thread_info(next)))
+ write_sysreg(next->thread.tp_value, tpidrro_el0);
+ else if (!arm64_kernel_unmapped_at_el0())
+ write_sysreg(0, tpidrro_el0);
- write_sysreg(tpidr, tpidr_el0);
- write_sysreg(tpidrro, tpidrro_el0);
+ write_sysreg(*task_user_tls(next), tpidr_el0);
}
/* Restore the UAO state depending on next's addr_limit */
#include <asm/cache.h>
#include <asm/ptrace.h>
#include <asm/pgtable.h>
+#include <asm/thread_info.h>
#include <asm-generic/vmlinux.lds.h>
RODATA
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+ __start_opd = .;
*(.opd)
+ __end_opd = .;
}
/*
#include <asm/io.h>
#include <asm/pgtable.h>
#include <asm/unwinder.h>
-
- extern char _etext, _stext;
+ #include <asm/sections.h>
int kstack_depth_to_print = 0x180;
int lwa_flag;
siginfo_t info;
if (user_mode(regs)) {
- /* Send a SIGSEGV */
- info.si_signo = SIGSEGV;
+ /* Send a SIGBUS */
+ info.si_signo = SIGBUS;
info.si_errno = 0;
- /* info.si_code has been set above */
- info.si_addr = (void *)address;
- force_sig_info(SIGSEGV, &info, current);
+ info.si_code = BUS_ADRALN;
+ info.si_addr = (void __user *)address;
+ force_sig_info(SIGBUS, &info, current);
} else {
printk("KERNEL: Unaligned Access 0x%.8lx\n", address);
show_registers(regs);
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/fs.h>
+#include <linux/cpu.h>
#include <linux/module.h>
#include <linux/personality.h>
#include <linux/ptrace.h>
return 1;
}
+/*
+ * Idle thread support
+ *
+ * Detect when running on QEMU with SeaBIOS PDC Firmware and let
+ * QEMU idle the host too.
+ */
+
+int running_on_qemu __read_mostly;
+
+void __cpuidle arch_cpu_idle_dead(void)
+{
+ /* nop on real hardware, qemu will offline CPU. */
+ asm volatile("or %%r31,%%r31,%%r31\n":::);
+}
+
+void __cpuidle arch_cpu_idle(void)
+{
+ local_irq_enable();
+
+ /* nop on real hardware, qemu will idle sleep. */
+ asm volatile("or %%r10,%%r10,%%r10\n":::);
+}
+
+static int __init parisc_idle_init(void)
+{
+ const char *marker;
+
+ /* check QEMU/SeaBIOS marker in PAGE0 */
+ marker = (char *) &PAGE0->pad0;
+ running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0);
+
+ if (!running_on_qemu)
+ cpu_idle_poll_ctrl(1);
+
+ return 0;
+}
+arch_initcall(parisc_idle_init);
+
/*
* Copy architecture-specific thread state
*/
ptr = p;
return ptr;
}
+
+ void *dereference_kernel_function_descriptor(void *ptr)
+ {
+ if (ptr < (void *)__start_opd ||
+ ptr >= (void *)__end_opd)
+ return ptr;
+
+ return dereference_function_descriptor(ptr);
+ }
#endif
static inline unsigned long brk_rnd(void)
/* Read-only data */
RO_DATA(PAGE_SIZE)
+#ifdef CONFIG_PPC64
+ . = ALIGN(8);
+ __rfi_flush_fixup : AT(ADDR(__rfi_flush_fixup) - LOAD_OFFSET) {
+ __start___rfi_flush_fixup = .;
+ *(__rfi_flush_fixup)
+ __stop___rfi_flush_fixup = .;
+ }
+#endif
+
EXCEPTION_TABLE(0)
NOTES :kernel :notes
}
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+ __start_opd = .;
*(.opd)
+ __end_opd = .;
}
. = ALIGN(256);
#include <linux/capability.h>
#include <linux/miscdevice.h>
#include <linux/ratelimit.h>
- #include <linux/kallsyms.h>
#include <linux/rcupdate.h>
#include <linux/kobject.h>
#include <linux/uaccess.h>
m->cs, m->ip);
if (m->cs == __KERNEL_CS)
- print_symbol("{%s}", m->ip);
+ pr_cont("{%pS}", (void *)m->ip);
pr_cont("\n");
}
bool mce_is_memory_error(struct mce *m)
{
if (m->cpuvendor == X86_VENDOR_AMD) {
- /* ErrCodeExt[20:16] */
- u8 xec = (m->status >> 16) & 0x1f;
+ return amd_mce_is_memory_error(m);
- return (xec == 0x0 || xec == 0x8);
} else if (m->cpuvendor == X86_VENDOR_INTEL) {
/*
* Intel SDM Volume 3B - 15.9.2 Compound Error Codes
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);
+static bool mce_is_correctable(struct mce *m)
+{
+ if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
+ return false;
+
+ if (m->status & MCI_STATUS_UC)
+ return false;
+
+ return true;
+}
+
static bool cec_add_mce(struct mce *m)
{
if (!m)
/* We eat only correctable DRAM errors with usable addresses. */
if (mce_is_memory_error(m) &&
- !(m->status & MCI_STATUS_UC) &&
+ mce_is_correctable(m) &&
mce_usable_address(m))
if (!cec_add_elem(m->addr >> PAGE_SHIFT))
return true;
if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) {
pfn = mce->addr >> PAGE_SHIFT;
- memory_failure(pfn, MCE_VECTOR, 0);
+ memory_failure(pfn, 0);
}
return NOTIFY_OK;
pr_err("Uncorrected hardware memory error in user-access at %llx", m->addr);
if (!(m->mcgstatus & MCG_STATUS_RIPV))
flags |= MF_MUST_KILL;
- ret = memory_failure(m->addr >> PAGE_SHIFT, MCE_VECTOR, flags);
+ ret = memory_failure(m->addr >> PAGE_SHIFT, flags);
if (ret)
pr_err("Memory error not recovered");
return ret;
EXPORT_SYMBOL_GPL(do_machine_check);
#ifndef CONFIG_MEMORY_FAILURE
-int memory_failure(unsigned long pfn, int vector, int flags)
+int memory_failure(unsigned long pfn, int flags)
{
/* mce_severity() should not hand us an ACTION_REQUIRED error */
BUG_ON(flags & MF_ACTION_REQUIRED);
void (*machine_check_vector)(struct pt_regs *, long error_code) =
unexpected_machine_check;
+dotraplinkage void do_mce(struct pt_regs *regs, long error_code)
+{
+ machine_check_vector(regs, error_code);
+}
+
/*
* Called for each booted CPU to set up machine checks.
* Must be called with preempt off:
+// SPDX-License-Identifier: GPL-2.0
/*
* drivers/base/core.c - core driver model code (device registration, etc)
*
* Copyright (c) 2002-3 Open Source Development Labs
* Copyright (c) 2006 Greg Kroah-Hartman <gregkh@suse.de>
* Copyright (c) 2006 Novell, Inc.
- *
- * This file is released under the GPLv2
- *
*/
#include <linux/device.h>
#include <linux/of.h>
#include <linux/of_device.h>
#include <linux/genhd.h>
- #include <linux/kallsyms.h>
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/netdevice.h>
if (dev_attr->show)
ret = dev_attr->show(dev, dev_attr, buf);
if (ret >= (ssize_t)PAGE_SIZE) {
- print_symbol("dev_attr_show: %s returned bad count\n",
- (unsigned long)dev_attr->show);
+ printk("dev_attr_show: %pS returned bad count\n",
+ dev_attr->show);
}
return ret;
}
return 0;
klist_iter_init(&parent->p->klist_children, &i);
- while ((child = next_device(&i)) && !error)
+ while (!error && (child = next_device(&i)))
error = fn(child, data);
klist_iter_exit(&i);
return error;
+// SPDX-License-Identifier: GPL-2.0
/*
* fs/sysfs/file.c - sysfs regular (text) file implementation
*
* Copyright (c) 2007 SUSE Linux Products GmbH
* Copyright (c) 2007 Tejun Heo <teheo@suse.de>
*
- * This file is released under the GPLv2.
- *
* Please see Documentation/filesystems/sysfs.txt for more information.
*/
#include <linux/module.h>
#include <linux/kobject.h>
- #include <linux/kallsyms.h>
#include <linux/slab.h>
#include <linux/list.h>
#include <linux/mutex.h>
* indicate truncated result or overflow in normal use cases.
*/
if (count >= (ssize_t)PAGE_SIZE) {
- print_symbol("fill_read_buffer: %s returned bad count\n",
- (unsigned long)ops->show);
+ printk("fill_read_buffer: %pS returned bad count\n",
+ ops->show);
/* Try to struggle along */
count = PAGE_SIZE - 1;
}
#include <linux/errno.h>
#include <linux/kernel.h>
#include <linux/stddef.h>
+ #include <linux/mm.h>
+ #include <linux/module.h>
+
+ #include <asm/sections.h>
#define KSYM_NAME_LEN 128
#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s]") + (KSYM_NAME_LEN - 1) + \
2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + 1)
-#ifndef CONFIG_64BIT
-# define KALLSYM_FMT "%08lx"
-#else
-# define KALLSYM_FMT "%016lx"
-#endif
-
struct module;
+ static inline int is_kernel_inittext(unsigned long addr)
+ {
+ if (addr >= (unsigned long)_sinittext
+ && addr <= (unsigned long)_einittext)
+ return 1;
+ return 0;
+ }
+
+ static inline int is_kernel_text(unsigned long addr)
+ {
+ if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
+ arch_is_kernel_text(addr))
+ return 1;
+ return in_gate_area_no_mm(addr);
+ }
+
+ static inline int is_kernel(unsigned long addr)
+ {
+ if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
+ return 1;
+ return in_gate_area_no_mm(addr);
+ }
+
+ static inline int is_ksym_addr(unsigned long addr)
+ {
+ if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
+ return is_kernel(addr);
+
+ return is_kernel_text(addr) || is_kernel_inittext(addr);
+ }
+
+ static inline void *dereference_symbol_descriptor(void *ptr)
+ {
+ #ifdef HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR
+ struct module *mod;
+
+ ptr = dereference_kernel_function_descriptor(ptr);
+ if (is_ksym_addr((unsigned long)ptr))
+ return ptr;
+
+ preempt_disable();
+ mod = __module_address((unsigned long)ptr);
+ preempt_enable();
+
+ if (mod)
+ ptr = dereference_module_function_descriptor(mod, ptr);
+ #endif
+ return ptr;
+ }
+
#ifdef CONFIG_KALLSYMS
/* Lookup the address for a symbol. Returns 0 if not found. */
unsigned long kallsyms_lookup_name(const char *name);
extern int sprint_symbol_no_offset(char *buffer, unsigned long address);
extern int sprint_backtrace(char *buffer, unsigned long address);
- /* Look up a kernel symbol and print it to the kernel messages. */
- extern void __print_symbol(const char *fmt, unsigned long address);
-
int lookup_symbol_name(unsigned long addr, char *symname);
int lookup_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name);
return false;
}
- /* Stupid that this does nothing, but I didn't create this mess. */
- #define __print_symbol(fmt, addr)
#endif /*CONFIG_KALLSYMS*/
- /* This macro allows us to keep printk typechecking */
- static __printf(1, 2)
- void __check_printsym_format(const char *fmt, ...)
- {
- }
-
- static inline void print_symbol(const char *fmt, unsigned long addr)
- {
- __check_printsym_format(fmt, "");
- __print_symbol(fmt, (unsigned long)
- __builtin_extract_return_addr((void *)addr));
- }
-
static inline void print_ip_sym(unsigned long ip)
{
printk("[<%p>] %pS\n", (void *) ip, (void *) ip);
#include <linux/jump_label.h>
#include <linux/export.h>
#include <linux/rbtree_latch.h>
+#include <linux/error-injection.h>
#include <linux/percpu.h>
#include <asm/module.h>
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
+
+#ifdef CONFIG_FUNCTION_ERROR_INJECTION
+ struct error_injection_entry *ei_funcs;
+ unsigned int num_ei_funcs;
+#endif
} ____cacheline_aligned __randomize_layout;
#ifndef MODULE_ARCH_INIT
#define MODULE_ARCH_INIT {}
__mod ? __mod->name : "kernel"; \
})
+ /* Dereference module function descriptor */
+ void *dereference_module_function_descriptor(struct module *mod, void *ptr);
+
/* For kallsyms to ask for address resolution. namebuf should be at
* least KSYM_NAME_LEN long: a pointer to namebuf is returned if
* found, otherwise NULL. */
return false;
}
+ /* Dereference module function descriptor */
+ static inline
+ void *dereference_module_function_descriptor(struct module *mod, void *ptr)
+ {
+ return ptr;
+ }
+
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
static inline void module_bug_cleanup(struct module *mod) {}
#endif /* CONFIG_GENERIC_BUG */
+#ifdef RETPOLINE
+extern bool retpoline_module_ok(bool has_retpoline);
+#else
+static inline bool retpoline_module_ok(bool has_retpoline)
+{
+ return true;
+}
+#endif
+
#ifdef CONFIG_MODULE_SIG
static inline bool module_sig_ok(struct module *module)
{
* Debugging printout:
*/
- #include <linux/kallsyms.h>
-
#define ___P(f) if (desc->status_use_accessors & f) printk("%14s set\n", #f)
#define ___PS(f) if (desc->istate & f) printk("%14s set\n", #f)
/* FIXME */
static inline void print_irq_desc(unsigned int irq, struct irq_desc *desc)
{
+ static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 5);
+
+ if (!__ratelimit(&ratelimit))
+ return;
+
printk("irq %d, desc: %p, depth: %d, count: %d, unhandled: %d\n",
irq, desc, desc->depth, desc->irq_count, desc->irqs_unhandled);
- printk("->handle_irq(): %p, ", desc->handle_irq);
- print_symbol("%s\n", (unsigned long)desc->handle_irq);
- printk("->irq_data.chip(): %p, ", desc->irq_data.chip);
- print_symbol("%s\n", (unsigned long)desc->irq_data.chip);
+ printk("->handle_irq(): %p, %pS\n",
+ desc->handle_irq, desc->handle_irq);
+ printk("->irq_data.chip(): %p, %pS\n",
+ desc->irq_data.chip, desc->irq_data.chip);
printk("->action(): %p\n", desc->action);
if (desc->action) {
- printk("->action->handler(): %p, ", desc->action->handler);
- print_symbol("%s\n", (unsigned long)desc->action->handler);
+ printk("->action->handler(): %p, %pS\n",
+ desc->action->handler, desc->action->handler);
}
___P(IRQ_LEVEL);
* compression (see scripts/kallsyms.c for a more complete description)
*/
#include <linux/kallsyms.h>
- #include <linux/module.h>
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/fs.h>
#include <linux/err.h>
#include <linux/proc_fs.h>
#include <linux/sched.h> /* for cond_resched */
- #include <linux/mm.h>
#include <linux/ctype.h>
#include <linux/slab.h>
#include <linux/filter.h>
#include <linux/ftrace.h>
#include <linux/compiler.h>
- #include <asm/sections.h>
-
/*
* These will be re-linked against their real values
* during the second link stage.
extern const unsigned long kallsyms_markers[] __weak;
- static inline int is_kernel_inittext(unsigned long addr)
- {
- if (addr >= (unsigned long)_sinittext
- && addr <= (unsigned long)_einittext)
- return 1;
- return 0;
- }
-
- static inline int is_kernel_text(unsigned long addr)
- {
- if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
- arch_is_kernel_text(addr))
- return 1;
- return in_gate_area_no_mm(addr);
- }
-
- static inline int is_kernel(unsigned long addr)
- {
- if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
- return 1;
- return in_gate_area_no_mm(addr);
- }
-
- static int is_ksym_addr(unsigned long addr)
- {
- if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
- return is_kernel(addr);
-
- return is_kernel_text(addr) || is_kernel_inittext(addr);
- }
-
/*
* Expand a compressed symbol data into the resulting uncompressed string,
* if uncompressed string is too long (>= maxlen), it will be truncated,
return __sprint_symbol(buffer, address, -1, 1);
}
- /* Look up a kernel symbol and print it to the kernel messages. */
- void __print_symbol(const char *fmt, unsigned long address)
- {
- char buffer[KSYM_SYMBOL_LEN];
-
- sprint_symbol(buffer, address);
-
- printk(fmt, buffer);
- }
- EXPORT_SYMBOL(__print_symbol);
-
/* To avoid using get_symbol_offset for every symbol, we carry prefix along. */
struct kallsym_iter {
loff_t pos;
static int s_show(struct seq_file *m, void *p)
{
- unsigned long value;
+ void *value;
struct kallsym_iter *iter = m->private;
/* Some debugging symbols have no name. Ignore them. */
if (!iter->name[0])
return 0;
- value = iter->show_value ? iter->value : 0;
+ value = iter->show_value ? (void *)iter->value : NULL;
if (iter->module_name[0]) {
char type;
*/
type = iter->exported ? toupper(iter->type) :
tolower(iter->type);
- seq_printf(m, KALLSYM_FMT " %c %s\t[%s]\n", value,
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
type, iter->name, iter->module_name);
} else
- seq_printf(m, KALLSYM_FMT " %c %s\n", value,
+ seq_printf(m, "%px %c %s\n", value,
iter->type, iter->name);
return 0;
}
}
#endif /* CONFIG_LIVEPATCH */
+static void check_modinfo_retpoline(struct module *mod, struct load_info *info)
+{
+ if (retpoline_module_ok(get_modinfo(info, "retpoline")))
+ return;
+
+ pr_warn("%s: loading module not compiled with retpoline compiler.\n",
+ mod->name);
+}
+
/* Sets info->hdr and info->len. */
static int copy_module_from_user(const void __user *umod, unsigned long len,
struct load_info *info)
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
}
+ check_modinfo_retpoline(mod, info);
+
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the quality "
sizeof(*mod->ftrace_callsites),
&mod->num_ftrace_callsites);
#endif
-
+#ifdef CONFIG_FUNCTION_ERROR_INJECTION
+ mod->ei_funcs = section_objs(info, "_error_injection_whitelist",
+ sizeof(*mod->ei_funcs),
+ &mod->num_ei_funcs);
+#endif
mod->extable = section_objs(info, "__ex_table",
sizeof(*mod->extable), &mod->num_exentries);
return symname(kallsyms, best);
}
+ void * __weak dereference_module_function_descriptor(struct module *mod,
+ void *ptr)
+ {
+ return ptr;
+ }
+
/* For kallsyms to ask for address resolution. NULL means not found. Careful
* not to lock to avoid deadlock on oopses, simply disable preemption. */
const char *module_address_lookup(unsigned long addr,
{
struct module *mod = list_entry(p, struct module, list);
char buf[MODULE_FLAGS_BUF_SIZE];
- unsigned long value;
+ void *value;
/* We always ignore unformed modules. */
if (mod->state == MODULE_STATE_UNFORMED)
mod->state == MODULE_STATE_COMING ? "Loading" :
"Live");
/* Used by oprofile and other similar tools. */
- value = m->private ? 0 : (unsigned long)mod->core_layout.base;
- seq_printf(m, " 0x" KALLSYM_FMT, value);
+ value = m->private ? NULL : mod->core_layout.base;
+ seq_printf(m, " 0x%px", value);
/* Taints info */
if (mod->taints)
/*
* Set sysctl string accordingly:
*/
- if (devkmsg_log == DEVKMSG_LOG_MASK_ON) {
- memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
- strncpy(devkmsg_log_str, "on", 2);
- } else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF) {
- memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
- strncpy(devkmsg_log_str, "off", 3);
- }
+ if (devkmsg_log == DEVKMSG_LOG_MASK_ON)
+ strcpy(devkmsg_log_str, "on");
+ else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF)
+ strcpy(devkmsg_log_str, "off");
/* else "ratelimit" which is set by default. */
/*
/* Flag: console code may call schedule() */
static int console_may_schedule;
+ enum con_msg_format_flags {
+ MSG_FORMAT_DEFAULT = 0,
+ MSG_FORMAT_SYSLOG = (1 << 0),
+ };
+
+ static int console_msg_format = MSG_FORMAT_DEFAULT;
+
/*
* The printk log buffer consists of a chain of concatenated variable
* length records. Every record starts with a record header, containing
return ret;
}
-static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
+static __poll_t devkmsg_poll(struct file *file, poll_table *wait)
{
struct devkmsg_user *user = file->private_data;
- int ret = 0;
+ __poll_t ret = 0;
if (!user)
return POLLERR|POLLNVAL;
return do_syslog(type, buf, len, SYSLOG_FROM_READER);
}
+ /*
+ * Special console_lock variants that help to reduce the risk of soft-lockups.
+ * They allow to pass console_lock to another printk() call using a busy wait.
+ */
+
+ #ifdef CONFIG_LOCKDEP
+ static struct lockdep_map console_owner_dep_map = {
+ .name = "console_owner"
+ };
+ #endif
+
+ static DEFINE_RAW_SPINLOCK(console_owner_lock);
+ static struct task_struct *console_owner;
+ static bool console_waiter;
+
+ /**
+ * console_lock_spinning_enable - mark beginning of code where another
+ * thread might safely busy wait
+ *
+ * This basically converts console_lock into a spinlock. This marks
+ * the section where the console_lock owner can not sleep, because
+ * there may be a waiter spinning (like a spinlock). Also it must be
+ * ready to hand over the lock at the end of the section.
+ */
+ static void console_lock_spinning_enable(void)
+ {
+ raw_spin_lock(&console_owner_lock);
+ console_owner = current;
+ raw_spin_unlock(&console_owner_lock);
+
+ /* The waiter may spin on us after setting console_owner */
+ spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+ }
+
+ /**
+ * console_lock_spinning_disable_and_check - mark end of code where another
+ * thread was able to busy wait and check if there is a waiter
+ *
+ * This is called at the end of the section where spinning is allowed.
+ * It has two functions. First, it is a signal that it is no longer
+ * safe to start busy waiting for the lock. Second, it checks if
+ * there is a busy waiter and passes the lock rights to her.
+ *
+ * Important: Callers lose the lock if there was a busy waiter.
+ * They must not touch items synchronized by console_lock
+ * in this case.
+ *
+ * Return: 1 if the lock rights were passed, 0 otherwise.
+ */
+ static int console_lock_spinning_disable_and_check(void)
+ {
+ int waiter;
+
+ raw_spin_lock(&console_owner_lock);
+ waiter = READ_ONCE(console_waiter);
+ console_owner = NULL;
+ raw_spin_unlock(&console_owner_lock);
+
+ if (!waiter) {
+ spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+ return 0;
+ }
+
+ /* The waiter is now free to continue */
+ WRITE_ONCE(console_waiter, false);
+
+ spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
+ /*
+ * Hand off console_lock to waiter. The waiter will perform
+ * the up(). After this, the waiter is the console_lock owner.
+ */
+ mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+ return 1;
+ }
+
+ /**
+ * console_trylock_spinning - try to get console_lock by busy waiting
+ *
+ * This allows to busy wait for the console_lock when the current
+ * owner is running in specially marked sections. It means that
+ * the current owner is running and cannot reschedule until it
+ * is ready to lose the lock.
+ *
+ * Return: 1 if we got the lock, 0 othrewise
+ */
+ static int console_trylock_spinning(void)
+ {
+ struct task_struct *owner = NULL;
+ bool waiter;
+ bool spin = false;
+ unsigned long flags;
+
+ if (console_trylock())
+ return 1;
+
+ printk_safe_enter_irqsave(flags);
+
+ raw_spin_lock(&console_owner_lock);
+ owner = READ_ONCE(console_owner);
+ waiter = READ_ONCE(console_waiter);
+ if (!waiter && owner && owner != current) {
+ WRITE_ONCE(console_waiter, true);
+ spin = true;
+ }
+ raw_spin_unlock(&console_owner_lock);
+
+ /*
+ * If there is an active printk() writing to the
+ * consoles, instead of having it write our data too,
+ * see if we can offload that load from the active
+ * printer, and do some printing ourselves.
+ * Go into a spin only if there isn't already a waiter
+ * spinning, and there is an active printer, and
+ * that active printer isn't us (recursive printk?).
+ */
+ if (!spin) {
+ printk_safe_exit_irqrestore(flags);
+ return 0;
+ }
+
+ /* We spin waiting for the owner to release us */
+ spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+ /* Owner will clear console_waiter on hand off */
+ while (READ_ONCE(console_waiter))
+ cpu_relax();
+ spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
+ printk_safe_exit_irqrestore(flags);
+ /*
+ * The owner passed the console lock to us.
+ * Since we did not spin on console lock, annotate
+ * this as a trylock. Otherwise lockdep will
+ * complain.
+ */
+ mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+
+ return 1;
+ }
+
/*
* Call the console drivers, asking them to write out
* log_buf[start] to log_buf[end - 1].
/* If called from the scheduler, we can not call up(). */
if (!in_sched) {
+ /*
+ * Disable preemption to avoid being preempted while holding
+ * console_sem which would prevent anyone from printing to
+ * console
+ */
+ preempt_disable();
/*
* Try to acquire and then immediately release the console
* semaphore. The release will print out buffers and wake up
* /dev/kmsg and syslog() users.
*/
- if (console_trylock())
+ if (console_trylock_spinning())
console_unlock();
+ preempt_enable();
}
return printed_len;
static ssize_t msg_print_ext_body(char *buf, size_t size,
char *dict, size_t dict_len,
char *text, size_t text_len) { return 0; }
+ static void console_lock_spinning_enable(void) { }
+ static int console_lock_spinning_disable_and_check(void) { return 0; }
static void call_console_drivers(const char *ext_text, size_t ext_len,
const char *text, size_t len) {}
static size_t msg_print_text(const struct printk_log *msg,
c->index = idx;
return 0;
}
+
+ static int __init console_msg_format_setup(char *str)
+ {
+ if (!strcmp(str, "syslog"))
+ console_msg_format = MSG_FORMAT_SYSLOG;
+ if (!strcmp(str, "default"))
+ console_msg_format = MSG_FORMAT_DEFAULT;
+ return 1;
+ }
+ __setup("console_msg_format=", console_msg_format_setup);
+
/*
* Set up a console. Called via do_early_param() in init/main.c
* for each "console=" parameter in the boot command line.
return 0;
}
console_locked = 1;
- /*
- * When PREEMPT_COUNT disabled we can't reliably detect if it's
- * safe to schedule (e.g. calling printk while holding a spin_lock),
- * because preempt_disable()/preempt_enable() are just barriers there
- * and preempt_count() is always 0.
- *
- * RCU read sections have a separate preemption counter when
- * PREEMPT_RCU enabled thus we must take extra care and check
- * rcu_preempt_depth(), otherwise RCU read sections modify
- * preempt_count().
- */
- console_may_schedule = !oops_in_progress &&
- preemptible() &&
- !rcu_preempt_depth();
+ console_may_schedule = 0;
return 1;
}
EXPORT_SYMBOL(console_trylock);
goto skip;
}
- len += msg_print_text(msg, false, text + len, sizeof(text) - len);
+ len += msg_print_text(msg,
+ console_msg_format & MSG_FORMAT_SYSLOG,
+ text + len,
+ sizeof(text) - len);
if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(ext_text,
sizeof(ext_text),
console_seq++;
raw_spin_unlock(&logbuf_lock);
+ /*
+ * While actively printing out messages, if another printk()
+ * were to occur on another CPU, it may wait for this one to
+ * finish. This task can not be preempted if there is a
+ * waiter waiting to take over.
+ */
+ console_lock_spinning_enable();
+
stop_critical_timings(); /* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
start_critical_timings();
+
+ if (console_lock_spinning_disable_and_check()) {
+ printk_safe_exit_irqrestore(flags);
+ return;
+ }
+
printk_safe_exit_irqrestore(flags);
if (do_cond_resched)
cond_resched();
}
+
console_locked = 0;
/* Release the exclusive_console once it is used */
void show_regs_print_info(const char *log_lvl)
{
dump_stack_print_info(log_lvl);
-
- printk("%stask: %p task.stack: %p\n",
- log_lvl, current, task_stack_page(current));
}
#endif
#include <linux/uuid.h>
#include <linux/of.h>
#include <net/addrconf.h>
+#include <linux/siphash.h>
+#include <linux/compiler.h>
#ifdef CONFIG_BLOCK
#include <linux/blkdev.h>
#endif
#include "../mm/internal.h" /* For the trace_print_flags arrays */
#include <asm/page.h> /* for PAGE_SIZE */
- #include <asm/sections.h> /* for dereference_function_descriptor() */
#include <asm/byteorder.h> /* cpu_to_le16 */
#include <linux/string_helpers.h>
return string(buf, end, uuid, spec);
}
+int kptr_restrict __read_mostly;
+
+static noinline_for_stack
+char *restricted_pointer(char *buf, char *end, const void *ptr,
+ struct printf_spec spec)
+{
+ spec.base = 16;
+ spec.flags |= SMALL;
+ if (spec.field_width == -1) {
+ spec.field_width = 2 * sizeof(ptr);
+ spec.flags |= ZEROPAD;
+ }
+
+ switch (kptr_restrict) {
+ case 0:
+ /* Always print %pK values */
+ break;
+ case 1: {
+ const struct cred *cred;
+
+ /*
+ * kptr_restrict==1 cannot be used in IRQ context
+ * because its test for CAP_SYSLOG would be meaningless.
+ */
+ if (in_irq() || in_serving_softirq() || in_nmi())
+ return string(buf, end, "pK-error", spec);
+
+ /*
+ * Only print the real pointer value if the current
+ * process has CAP_SYSLOG and is running with the
+ * same credentials it started with. This is because
+ * access to files is checked at open() time, but %pK
+ * checks permission at read() time. We don't want to
+ * leak pointer values if a binary opens a file using
+ * %pK and then elevates privileges before reading it.
+ */
+ cred = current_cred();
+ if (!has_capability_noaudit(current, CAP_SYSLOG) ||
+ !uid_eq(cred->euid, cred->uid) ||
+ !gid_eq(cred->egid, cred->gid))
+ ptr = NULL;
+ break;
+ }
+ case 2:
+ default:
+ /* Always print 0's for %pK */
+ ptr = NULL;
+ break;
+ }
+
+ return number(buf, end, (unsigned long)ptr, spec);
+}
+
static noinline_for_stack
char *netdev_bits(char *buf, char *end, const void *addr, const char *fmt)
{
return widen_string(buf, buf - buf_start, end, spec);
}
-int kptr_restrict __read_mostly;
+static noinline_for_stack
+char *pointer_string(char *buf, char *end, const void *ptr,
+ struct printf_spec spec)
+{
+ spec.base = 16;
+ spec.flags |= SMALL;
+ if (spec.field_width == -1) {
+ spec.field_width = 2 * sizeof(ptr);
+ spec.flags |= ZEROPAD;
+ }
+
+ return number(buf, end, (unsigned long int)ptr, spec);
+}
+
+static bool have_filled_random_ptr_key __read_mostly;
+static siphash_key_t ptr_key __read_mostly;
+
+static void fill_random_ptr_key(struct random_ready_callback *unused)
+{
+ get_random_bytes(&ptr_key, sizeof(ptr_key));
+ /*
+ * have_filled_random_ptr_key==true is dependent on get_random_bytes().
+ * ptr_to_id() needs to see have_filled_random_ptr_key==true
+ * after get_random_bytes() returns.
+ */
+ smp_mb();
+ WRITE_ONCE(have_filled_random_ptr_key, true);
+}
+
+static struct random_ready_callback random_ready = {
+ .func = fill_random_ptr_key
+};
+
+static int __init initialize_ptr_random(void)
+{
+ int ret = add_random_ready_callback(&random_ready);
+
+ if (!ret) {
+ return 0;
+ } else if (ret == -EALREADY) {
+ fill_random_ptr_key(&random_ready);
+ return 0;
+ }
+
+ return ret;
+}
+early_initcall(initialize_ptr_random);
+
+/* Maps a pointer to a 32 bit unique identifier. */
+static char *ptr_to_id(char *buf, char *end, void *ptr, struct printf_spec spec)
+{
+ unsigned long hashval;
+ const int default_width = 2 * sizeof(ptr);
+
+ if (unlikely(!have_filled_random_ptr_key)) {
+ spec.field_width = default_width;
+ /* string length must be less than default_width */
+ return string(buf, end, "(ptrval)", spec);
+ }
+
+#ifdef CONFIG_64BIT
+ hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
+ /*
+ * Mask off the first 32 bits, this makes explicit that we have
+ * modified the address (and 32 bits is plenty for a unique ID).
+ */
+ hashval = hashval & 0xffffffff;
+#else
+ hashval = (unsigned long)siphash_1u32((u32)ptr, &ptr_key);
+#endif
+
+ spec.flags |= SMALL;
+ if (spec.field_width == -1) {
+ spec.field_width = default_width;
+ spec.flags |= ZEROPAD;
+ }
+ spec.base = 16;
+
+ return number(buf, end, hashval, spec);
+}
/*
* Show a '%p' thing. A kernel extension is that the '%p' is followed
* c major compatible string
* C full compatible string
*
- * ** Please update also Documentation/printk-formats.txt when making changes **
+ * - 'x' For printing the address. Equivalent to "%lx".
+ *
+ * ** When making changes please also update:
+ * Documentation/core-api/printk-formats.rst
*
* Note: The difference between 'S' and 'F' is that on ia64 and ppc64
* function pointers are really function descriptors, which contain a
* pointer to the real address.
+ *
+ * Note: The default behaviour (unadorned %p) is to hash the address,
+ * rendering it useful as a unique identifier.
*/
static noinline_for_stack
char *pointer(const char *fmt, char *buf, char *end, void *ptr,
switch (*fmt) {
case 'F':
case 'f':
- ptr = dereference_function_descriptor(ptr);
- /* Fallthrough */
case 'S':
case 's':
+ ptr = dereference_symbol_descriptor(ptr);
+ /* Fallthrough */
case 'B':
return symbol_string(buf, end, ptr, spec, fmt);
case 'R':
return buf;
}
case 'K':
- switch (kptr_restrict) {
- case 0:
- /* Always print %pK values */
- break;
- case 1: {
- const struct cred *cred;
-
- /*
- * kptr_restrict==1 cannot be used in IRQ context
- * because its test for CAP_SYSLOG would be meaningless.
- */
- if (in_irq() || in_serving_softirq() || in_nmi()) {
- if (spec.field_width == -1)
- spec.field_width = default_width;
- return string(buf, end, "pK-error", spec);
- }
-
- /*
- * Only print the real pointer value if the current
- * process has CAP_SYSLOG and is running with the
- * same credentials it started with. This is because
- * access to files is checked at open() time, but %pK
- * checks permission at read() time. We don't want to
- * leak pointer values if a binary opens a file using
- * %pK and then elevates privileges before reading it.
- */
- cred = current_cred();
- if (!has_capability_noaudit(current, CAP_SYSLOG) ||
- !uid_eq(cred->euid, cred->uid) ||
- !gid_eq(cred->egid, cred->gid))
- ptr = NULL;
+ if (!kptr_restrict)
break;
- }
- case 2:
- default:
- /* Always print 0's for %pK */
- ptr = NULL;
- break;
- }
- break;
-
+ return restricted_pointer(buf, end, ptr, spec);
case 'N':
return netdev_bits(buf, end, ptr, fmt);
case 'a':
case 'F':
return device_node_string(buf, end, ptr, spec, fmt + 1);
}
+ case 'x':
+ return pointer_string(buf, end, ptr, spec);
}
- spec.flags |= SMALL;
- if (spec.field_width == -1) {
- spec.field_width = default_width;
- spec.flags |= ZEROPAD;
- }
- spec.base = 16;
- return number(buf, end, (unsigned long) ptr, spec);
+ /* default is to _not_ leak addresses, hash before printing */
+ return ptr_to_id(buf, end, ptr, spec);
}
/*
* - ``%n`` is unsupported
* - ``%p*`` is handled by pointer()
*
- * See pointer() or Documentation/printk-formats.txt for more
+ * See pointer() or Documentation/core-api/printk-formats.rst for more
* extensive description.
*
* **Please update the documentation in both places when making changes**
{
struct printf_spec spec = {0};
char *str, *end;
+ int width;
str = (char *)bin_buf;
end = (char *)(bin_buf + size);
#define save_arg(type) \
-do { \
+({ \
+ unsigned long long value; \
if (sizeof(type) == 8) { \
- unsigned long long value; \
+ unsigned long long val8; \
str = PTR_ALIGN(str, sizeof(u32)); \
- value = va_arg(args, unsigned long long); \
+ val8 = va_arg(args, unsigned long long); \
if (str + sizeof(type) <= end) { \
- *(u32 *)str = *(u32 *)&value; \
- *(u32 *)(str + 4) = *((u32 *)&value + 1); \
+ *(u32 *)str = *(u32 *)&val8; \
+ *(u32 *)(str + 4) = *((u32 *)&val8 + 1); \
} \
+ value = val8; \
} else { \
- unsigned long value; \
+ unsigned int val4; \
str = PTR_ALIGN(str, sizeof(type)); \
- value = va_arg(args, int); \
+ val4 = va_arg(args, int); \
if (str + sizeof(type) <= end) \
- *(typeof(type) *)str = (type)value; \
+ *(typeof(type) *)str = (type)(long)val4; \
+ value = (unsigned long long)val4; \
} \
str += sizeof(type); \
-} while (0)
+ value; \
+})
while (*fmt) {
int read = format_decode(fmt, &spec);
case FORMAT_TYPE_WIDTH:
case FORMAT_TYPE_PRECISION:
- save_arg(int);
+ width = (int)save_arg(int);
+ /* Pointers may require the width */
+ if (*fmt == 'p')
+ set_field_width(&spec, width);
break;
case FORMAT_TYPE_CHAR:
}
case FORMAT_TYPE_PTR:
- save_arg(void *);
+ /* Dereferenced pointers must be done now */
+ switch (*fmt) {
+ /* Dereference of functions is still OK */
+ case 'S':
+ case 's':
+ case 'F':
+ case 'f':
+ save_arg(void *);
+ break;
+ default:
+ if (!isalnum(*fmt)) {
+ save_arg(void *);
+ break;
+ }
+ str = pointer(fmt, str, end, va_arg(args, void *),
+ spec);
+ if (str + 1 < end)
+ *str++ = '\0';
+ else
+ end[-1] = '\0'; /* Must be nul terminated */
+ }
/* skip all alphanumeric pointer suffixes */
while (isalnum(*fmt))
fmt++;
break;
}
- case FORMAT_TYPE_PTR:
- str = pointer(fmt, str, end, get_arg(void *), spec);
+ case FORMAT_TYPE_PTR: {
+ bool process = false;
+ int copy, len;
+ /* Non function dereferences were already done */
+ switch (*fmt) {
+ case 'S':
+ case 's':
+ case 'F':
+ case 'f':
+ process = true;
+ break;
+ default:
+ if (!isalnum(*fmt)) {
+ process = true;
+ break;
+ }
+ /* Pointer dereference was already processed */
+ if (str < end) {
+ len = copy = strlen(args);
+ if (copy > end - str)
+ copy = end - str;
+ memcpy(str, args, copy);
+ str += len;
+ args += len;
+ }
+ }
+ if (process)
+ str = pointer(fmt, str, end, get_arg(void *), spec);
+
while (isalnum(*fmt))
fmt++;
break;
+ }
case FORMAT_TYPE_PERCENT_CHAR:
if (str < end)
}
}
+# check for smp_read_barrier_depends and read_barrier_depends
+ if (!$file && $line =~ /\b(smp_|)read_barrier_depends\s*\(/) {
+ WARN("READ_BARRIER_DEPENDS",
+ "$1read_barrier_depends should only be used in READ_ONCE or DEC Alpha code\n" . $herecurr);
+ }
+
# check of hardware specific defines
if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) {
CHK("ARCH_DEFINES",
for (my $count = $linenr; $count <= $lc; $count++) {
my $fmt = get_quoted_string($lines[$count - 1], raw_line($count, 0));
$fmt =~ s/%%//g;
- if ($fmt =~ /(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNOx]).)/) {
- if ($fmt =~ /(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNO]).)/) {
++ if ($fmt =~ /(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNOx]).)/) {
$bad_extension = $1;
last;
}
}
if ($bad_extension ne "") {
my $stat_real = raw_line($linenr, 0);
+ my $ext_type = "Invalid";
+ my $use = "";
for (my $count = $linenr + 1; $count <= $lc; $count++) {
$stat_real = $stat_real . "\n" . raw_line($count, 0);
}
+ if ($bad_extension =~ /p[Ff]/) {
+ $ext_type = "Deprecated";
+ $use = " - use %pS instead";
+ $use =~ s/pS/ps/ if ($bad_extension =~ /pf/);
+ }
WARN("VSPRINTF_POINTER_EXTENSION",
- "Invalid vsprintf pointer extension '$bad_extension'\n" . "$here\n$stat_real\n");
+ "$ext_type vsprintf pointer extension '$bad_extension'$use\n" . "$here\n$stat_real\n");
}
}
}
}
-# whine about ACCESS_ONCE
- if ($^V && $^V ge 5.10.0 &&
- $line =~ /\bACCESS_ONCE\s*$balanced_parens\s*(=(?!=))?\s*($FuncArg)?/) {
- my $par = $1;
- my $eq = $2;
- my $fun = $3;
- $par =~ s/^\(\s*(.*)\s*\)$/$1/;
- if (defined($eq)) {
- if (WARN("PREFER_WRITE_ONCE",
- "Prefer WRITE_ONCE(<FOO>, <BAR>) over ACCESS_ONCE(<FOO>) = <BAR>\n" . $herecurr) &&
- $fix) {
- $fixed[$fixlinenr] =~ s/\bACCESS_ONCE\s*\(\s*\Q$par\E\s*\)\s*$eq\s*\Q$fun\E/WRITE_ONCE($par, $fun)/;
- }
- } else {
- if (WARN("PREFER_READ_ONCE",
- "Prefer READ_ONCE(<FOO>) over ACCESS_ONCE(<FOO>)\n" . $herecurr) &&
- $fix) {
- $fixed[$fixlinenr] =~ s/\bACCESS_ONCE\s*\(\s*\Q$par\E\s*\)/READ_ONCE($par)/;
- }
- }
- }
-
# check for mutex_trylock_recursive usage
if ($line =~ /mutex_trylock_recursive/) {
ERROR("LOCKING",