sfrench/cifs-2.6.git
4 years agox86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32
Borislav Petkov [Fri, 3 Apr 2015 08:28:34 +0000 (10:28 +0200)]
x86/asm/entry: Drop now unused ENABLE_INTERRUPTS_SYSEXIT32

Commit:

  4214a16b0297 ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER")

removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the
macro now too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER
Andy Lutomirski [Fri, 3 Apr 2015 00:12:12 +0000 (17:12 -0700)]
x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER

SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
with usergs and IRQs on.  That means that we rely on STI to
correctly mask interrupts for one instruction.  This is okay by
itself, but the semantics with respect to NMIs are unclear.

Avoid the whole issue by using SYSRETL instead.  For background,
Intel CPUs don't allow SYSCALL from compat mode, but they do
allow SYSRETL back to compat mode.  Go figure.

To avoid doing too much at once, this doesn't revamp the calling
convention.  We still return with EBP, EDX, and ECX on the user
stack.

Oddly this seems to be 30 cycles or so faster.  Avoiding POPFQ
and STI will account for under half of that, I think, so my best
guess is that Intel just optimizes SYSRET much better than
SYSEXIT.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1
Andy Lutomirski [Thu, 2 Apr 2015 19:41:45 +0000 (12:41 -0700)]
x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1

We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
and we unnecessarily cache the value in tss.sp1.  We never
read the cached value.

Remove all of the caching.  It serves no purpose.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/32: Improve a TOP_OF_KERNEL_STACK_PADDING comment
Andy Lutomirski [Thu, 2 Apr 2015 19:41:44 +0000 (12:41 -0700)]
x86/asm/entry/32: Improve a TOP_OF_KERNEL_STACK_PADDING comment

At Denys' request, clean up the comment describing stack padding
in the 32-bit sysenter path.

No code changes.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm: Add support for the CLWB instruction
Ross Zwisler [Tue, 27 Jan 2015 16:53:51 +0000 (09:53 -0700)]
x86/asm: Add support for the CLWB instruction

Add support for the new CLWB (cache line write back)
instruction.  This instruction was announced in the document
"Intel Architecture Instruction Set Extensions Programming
Reference" with reference number 319433-022.

  https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

The CLWB instruction is used to write back the contents of
dirtied cache lines to memory without evicting the cache lines
from the processor's cache hierarchy.  This should be used in
favor of clflushopt or clflush in cases where you require the
cache line to be written to memory but plan to access the data
again in the near future.

One of the main use cases for this is with persistent memory
where CLWB can be used with PCOMMIT to ensure that data has been
accepted to memory and is durable on the DIMM.

This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
and PCOMMIT with appropriate fencing:

void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;

for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);

/* Flush any possible final partial cacheline */
clwb(vend);

/*
 * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
 * (MFENCE via mb() also works)
 */
wmb();

/* PCOMMIT and the required SFENCE for ordering */
pcommit_sfence();
}

After this function completes the data pointed to by vaddr is
has been accepted to memory and will be durable if the vaddr
points to persistent memory.

Regarding the details of how the alternatives assembly is set
up, we need one additional byte at the beginning of the CLFLUSH
so that we can flip it into a CLFLUSHOPT by changing that byte
into a 0x66 prefix.  Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no
functional effect with the plain CLFLUSH, but I've been told
that executing a CLFLUSH + prefix should be faster than
executing a CLFLUSH + NOP.

We had to hard code the assembly for CLWB because, lacking the
ability to assemble the CLWB instruction itself, the next
closest thing is to have an xsaveopt instruction with a 0x66
prefix.  Unfortunately XSAVEOPT itself is also relatively new,
and isn't included by all the GCC versions that the kernel needs
to support.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/cpu: Factor out common CPU initialization code, fix 32-bit Xen PV guests
Boris Ostrovsky [Wed, 1 Apr 2015 14:12:14 +0000 (10:12 -0400)]
x86/cpu: Factor out common CPU initialization code, fix 32-bit Xen PV guests

Some of x86 bare-metal and Xen CPU initialization code is common
between the two and therefore can be factored out to avoid code
duplication.

As a side effect, doing so will also extend the fix provided by
commit a7fcf28d431e ("x86/asm/entry: Replace this_cpu_sp0() with
current_top_of_stack() to x86_32") to 32-bit Xen PV guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: konrad.wilk@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1427897534-5086-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/boot/64: Use __BOOT_TSS instead of literal $0x20
Denys Vlasenko [Wed, 1 Apr 2015 14:50:58 +0000 (16:50 +0200)]
x86/asm/boot/64: Use __BOOT_TSS instead of literal $0x20

__BOOT_TSS = (GDT_ENTRY_BOOT_TSS * 8)
GDT_ENTRY_BOOT_TSS = (GDT_ENTRY_BOOT_CS + 2)
GDT_ENTRY_BOOT_CS = 2

(2 + 2) * 8 = 4 * 8 = 32 = 0x20

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Fold the 'test_in_nmi' macro into its only user
Denys Vlasenko [Wed, 1 Apr 2015 14:50:57 +0000 (16:50 +0200)]
x86/asm/entry/64: Fold the 'test_in_nmi' macro into its only user

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Use local label to skip around sycall dispatch
Denys Vlasenko [Tue, 31 Mar 2015 17:00:11 +0000 (19:00 +0200)]
x86/asm/entry/64: Use local label to skip around sycall dispatch

Logically, we just want to jump around the following instruction
and its prologue/epilogue:

  call *sys_call_table(,%rax,8)

if the syscall number is too big - we do not specifically target
the "int_ret_from_sys_call" label.

Use a local, numerical label for this jump, for more clarity.

This also makes the code smaller:

 -ffffffff8187756b:      0f 87 0f 00 00 00       ja     ffffffff81877580 <int_ret_from_sys_call>
 +ffffffff8187756b:      77 0f                   ja     ffffffff8187757c <int_ret_from_sys_call>

because jumps to global labels are never translated to short jump
instructions by GAS.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-9-git-send-email-dvlasenk@redhat.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm: Replace "MOVQ $imm, %reg" with MOVL
Denys Vlasenko [Tue, 31 Mar 2015 17:00:10 +0000 (19:00 +0200)]
x86/asm: Replace "MOVQ $imm, %reg" with MOVL

There is no reason to use MOVQ to load a non-negative immediate
constant value into a 64-bit register. MOVL does the same, since
the upper 32 bits are zero-extended by the CPU.

This makes the code a bit smaller, while leaving functionality
unchanged.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-8-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Simplify looping around preempt_schedule_irq()
Denys Vlasenko [Tue, 31 Mar 2015 17:00:07 +0000 (19:00 +0200)]
x86/asm/entry/64: Simplify looping around preempt_schedule_irq()

At the 'exit_intr' label we test whether interrupt/exception was in
kernel. If it did, we jump to the preemption check. If preemption
does happen (IOW if we call preempt_schedule_irq()), we go back to
'exit_intr'.

But it's pointless, we already know that the test succeeded last
time, preemption doesn't change the fact that interrupt/exception
was in the kernel.

We can go back directly to checking PER_CPU_VAR(__preempt_count) instead.

This makes the 'exit_intr' label unused, drop it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS()
Denys Vlasenko [Tue, 31 Mar 2015 17:00:06 +0000 (19:00 +0200)]
x86/asm/entry/64: Remove redundant DISABLE_INTERRUPTS()

At this location, we already have interrupts off, always.
To be more specific, we already disabled them here:

    ret_from_intr:
    DISABLE_INTERRUPTS(CLBR_NONE)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Simplify retint_kernel label usage, make retint_restore_args label...
Denys Vlasenko [Tue, 31 Mar 2015 17:00:05 +0000 (19:00 +0200)]
x86/asm/entry/64: Simplify retint_kernel label usage, make retint_restore_args label local

Get rid of #define obfuscation of retint_kernel in
CONFIG_PREEMPT case by defining retint_kernel label always, not
only for CONFIG_PREEMPT.

Strip retint_kernel of .global-ness (ENTRY macro) - it has no
users outside of this file.

This looks like cosmetics, but it is not:
"je LABEL" can be optimized to short jump by assember
only if LABEL is not global, for global labels jump is always
a near one with relocation.

Convert retint_restore_args to a local numeric label, making it
clearer that it is not used elsewhere in the file.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/32: Use smaller PUSH instructions instead of MOV, to build 'pt_regs...
Denys Vlasenko [Tue, 31 Mar 2015 17:00:04 +0000 (19:00 +0200)]
x86/asm/entry/32: Use smaller PUSH instructions instead of MOV, to build 'pt_regs' on stack

This mimics the recent similar 64-bit change.
Saves ~110 bytes of code.

Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
I also looked at the diff of entry_64.o disassembly, to have
a different view of the changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Do not TRACE_IRQS fast SYSRET64 path
Denys Vlasenko [Tue, 31 Mar 2015 17:00:03 +0000 (19:00 +0200)]
x86/asm/entry/64: Do not TRACE_IRQS fast SYSRET64 path

SYSRET code path has a small irq-off block.
On this code path, TRACE_IRQS_ON can't be called right before
interrupts are enabled for real, we can't clobber registers
there. So current code does it earlier, in a safe place.

But with this, TRACE_IRQS_OFF/ON frames just two fast
instructions, which is ridiculous: now most of irq-off block is
_outside_ of the framing.

Do the same thing that we do on SYSCALL entry: do not track this
irq-off block, it is very small to ever cause noticeable irq
latency.

Be careful: make sure that "jnz int_ret_from_sys_call_irqs_off"
now does invoke TRACE_IRQS_OFF - move
int_ret_from_sys_call_irqs_off label before TRACE_IRQS_OFF.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427821211-25099-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Remove user_mode_ignore_vm86()
Ingo Molnar [Sun, 29 Mar 2015 09:02:34 +0000 (11:02 +0200)]
x86/asm/entry: Remove user_mode_ignore_vm86()

user_mode_ignore_vm86() can be used instead of user_mode(), in
places where we have already done a v8086_mode() security
check of ptregs.

But doing this check in the wrong place would be a bug that
could result in security problems, and also the naming still
isn't very clear.

Furthermore, it only affects 32-bit kernels, while most
development happens on 64-bit kernels.

If we replace them with user_mode() checks then the cost is only
a very minor increase in various slowpaths:

   text             data   bss     dec              hex    filename
   10573391         703562 1753042 13029995         c6d26b vmlinux.o.before
   10573423         703562 1753042 13030027         c6d28b vmlinux.o.after

So lets get rid of this distinction once and for all.

Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Do not GET_THREAD_INFO() too early
Denys Vlasenko [Mon, 30 Mar 2015 18:09:34 +0000 (20:09 +0200)]
x86/asm/entry/64: Do not GET_THREAD_INFO() too early

At exit_intr, we GET_THREAD_INFO(%rcx) and then jump to
retint_kernel if saved CS was from kernel. But the code at
retint_kernel doesn't need %rcx.

Move GET_THREAD_INFO(%rcx) down, after CS check and branch.

While at it, remove "has a correct top of stack" comment.
After recent changes which eliminated FIXUP_TOP_OF_STACK,
we always have a correct pt_regs layout.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427738975-7391-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Move retint_kernel code block closer to its user
Denys Vlasenko [Mon, 30 Mar 2015 18:09:31 +0000 (20:09 +0200)]
x86/asm/entry/64: Move retint_kernel code block closer to its user

The "retint_kernel" code block is misplaced. Since its logical
continuation is "retint_restore_args", it is more natural to
place it above that label. This also makes two jumps "short".

This change only moves code block around, without changing
logic.

This enables the next simplification: making
"retint_restore_args" label a local numeric one.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427738975-7391-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/32: Make register zero-extension more prominent
Denys Vlasenko [Fri, 27 Mar 2015 10:36:21 +0000 (11:36 +0100)]
x86/asm/entry/32: Make register zero-extension more prominent

There are a couple of syscall argument zero-extension instructions in
the 32-bit compat entry code, and it was mentioned that people keep
trying to optimize them out, introducing bugs.

Make them more visible, and add a "do not remove" comment.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427452582-21624-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/32: Update "interrupt off" comments
Denys Vlasenko [Fri, 27 Mar 2015 10:36:20 +0000 (11:36 +0100)]
x86/asm/entry/32: Update "interrupt off" comments

The existing comment has proven to be not very clear.

Replace it with a comment similar to the one we now have in the 64-bit
syscall entry point. (Three instances, one per 32-bit syscall entry).

In the INT80 entry point's CFI annotations, replace mysterious
expressions with numric constants. In this case, raw numbers
look more understandable.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427452582-21624-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Add missing CFI annotation
Denys Vlasenko [Fri, 27 Mar 2015 10:36:19 +0000 (11:36 +0100)]
x86/asm/entry/64: Add missing CFI annotation

This is a missing bit of the recent MOV-to-PUSH conversion.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427452582-21624-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Fix comment about SYSENTER MSRs
Denys Vlasenko [Fri, 27 Mar 2015 10:59:16 +0000 (11:59 +0100)]
x86/asm/entry/64: Fix comment about SYSENTER MSRs

The comment is ancient, it dates to the time when only AMD's
x86_64 implementation existed. AMD wasn't (and still isn't)
supporting SYSENTER, so these writes were "just in case" back
then.

This has changed: Intel's x86_64 appeared, and Intel does
support SYSENTER in long mode. "Some future 64-bit CPU" is here
already.

The code may appear "buggy" for AMD as it stands, since
MSR_IA32_SYSENTER_EIP is only 32-bit for AMD CPUs. Writing a
kernel function's address to it would drop high bits. Subsequent
use of this MSR for branch via SYSENTER seem to allow user to
transition to CPL0 while executing his code. Scary, eh?

Explain why that is not a bug: because SYSENTER insn would not
work on AMD CPU.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427453956-21931-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/irq/tracing: Do not save callee-preserved registers around lockdep_sys_exit_thunk
Denys Vlasenko [Wed, 25 Mar 2015 20:14:28 +0000 (21:14 +0100)]
x86/irq/tracing: Do not save callee-preserved registers around lockdep_sys_exit_thunk

Internally, lockdep_sys_exit_thunk saves callee-clobbered
registers, and calls a C function, lockdep_sys_exit. Thus,
callee-preserved registers won't be mangled, there is no need to
save them.

Patch was run-tested.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427314468-12763-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/irq/tracing: Fold ARCH_LOCKDEP_SYS_EXIT defines into their users
Denys Vlasenko [Wed, 25 Mar 2015 20:14:27 +0000 (21:14 +0100)]
x86/irq/tracing: Fold ARCH_LOCKDEP_SYS_EXIT defines into their users

There is no need to have an extra level of macro indirection
here.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427314468-12763-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/irq/tracing: Move ARCH_LOCKDEP_SYS_EXIT defines closer to their users
Denys Vlasenko [Wed, 25 Mar 2015 20:14:26 +0000 (21:14 +0100)]
x86/irq/tracing: Move ARCH_LOCKDEP_SYS_EXIT defines closer to their users

This change simply moves defines around (even if it's not
obvious in a patch form). Nothing is changed.

This is a preparation for folding ARCH_LOCKDEP_SYS_EXIT defines
into their users.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427314468-12763-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Use smaller instructions
Denys Vlasenko [Wed, 25 Mar 2015 17:18:15 +0000 (18:18 +0100)]
x86/asm/entry/64: Use smaller instructions

The $AUDIT_ARCH_X86_64 parameter to syscall_trace_enter_phase1/2
is a 32-bit constant, loading it with 32-bit MOV produces 5-byte
insn instead of 10-byte MOVABS one.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427303896-24023-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Use better label name, fix comments
Denys Vlasenko [Wed, 25 Mar 2015 17:18:13 +0000 (18:18 +0100)]
x86/asm/entry/64: Use better label name, fix comments

A named label "ret_from_sys_call" implies that there are jumps
to this location from elsewhere, as happens with many other
labels in this file.

But this label is used only by the JMP a few insns above.
To make that obvious, use local numeric label instead.

Improve comments:

"and return regs->ax" isn't too informative. We always return
regs->ax.

The comment suggesting that it'd be cool to use rip relative
addressing for CALL is deleted. It's unclear why that would be
an improvement - we aren't striving to use position-independent
code here. PIC code here would require something like LEA
sys_call_table(%rip),reg + CALL *(reg,%rax*8)...

"iret frame is also incomplete" is no longer true, fix that too.

Also fix typo in comment.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427303896-24023-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoMerge branch 'x86/urgent' into x86/asm, to resolve conflict
Ingo Molnar [Tue, 24 Mar 2015 20:14:07 +0000 (21:14 +0100)]
Merge branch 'x86/urgent' into x86/asm, to resolve conflict

Conflicts:
arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm: Further improve segment.h readability
Ingo Molnar [Tue, 24 Mar 2015 19:45:42 +0000 (20:45 +0100)]
x86/asm: Further improve segment.h readability

 - extend/clarify explanations where necessary

 - move comments from macro values to before the macro, to
   make them more consistent, and to reduce preprocessor overhead

 - sort GDT index and selector values likewise by number

 - use consistent, modern kernel coding style across the file

 - capitalize consistently

 - use consistent vertical spacing

 - remove the unused get_limit() method (noticed by Andy Lutomirski)

No change in code (verified with objdump -d):

 64-bit defconfig+kvmconfig:

   815a129bc1f80de6445c1d8ca5b97cad  vmlinux.o.before.asm
   815a129bc1f80de6445c1d8ca5b97cad  vmlinux.o.after.asm

 32-bit defconfig+kvmconfig:

   e659ef045159ddf41a0771b33a34aae5  vmlinux.o.before.asm
   e659ef045159ddf41a0771b33a34aae5  vmlinux.o.after.asm

Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Check for syscall exit work with IRQs disabled
Andy Lutomirski [Mon, 23 Mar 2015 19:32:54 +0000 (12:32 -0700)]
x86/asm/entry: Check for syscall exit work with IRQs disabled

We currently have a race: if we're preempted during syscall
exit, we can fail to process syscall return work that is queued
up while we're preempted in ret_from_sys_call after checking
ti.flags.

Fix it by disabling interrupts before checking ti.flags.

Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 96b6352c1271 ("x86_64, entry: Remove the syscall exit audit")
Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO()
Ingo Molnar [Tue, 24 Mar 2015 18:44:42 +0000 (19:44 +0100)]
x86/asm/entry/64: Rename THREAD_INFO() to ASM_THREAD_INFO()

The THREAD_INFO() macro has a somewhat confusingly generic name,
defined in a generic .h C header file. It also does not make it
clear that it constructs a memory operand for use in assembly
code.

Rename it to ASM_THREAD_INFO() to make it all glaringly
obvious on first glance.

Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro
Ingo Molnar [Tue, 24 Mar 2015 18:44:11 +0000 (19:44 +0100)]
x86/asm/entry/64: Merge the field offset into the THREAD_INFO() macro

Before:

   TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d

After:

   movl    THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d

to turn it into a clear thread_info accessor.

No code changed:

 md5:
   fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.before.asm
   fb4cb2b3ce05d89940ca304efc8ff183  ia32entry.o.after.asm

   e39f2958a5d1300158e276e4f7663263  entry_64.o.before.asm
   e39f2958a5d1300158e276e4f7663263  entry_64.o.after.asm

Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Improve the THREAD_INFO() macro explanation
Ingo Molnar [Tue, 24 Mar 2015 18:43:11 +0000 (19:43 +0100)]
x86/asm/entry/64: Improve the THREAD_INFO() macro explanation

Explain the background, and add a real example.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/20150324184311.GA14760@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Always set up SYSENTER MSRs
Ingo Molnar [Tue, 24 Mar 2015 13:41:37 +0000 (14:41 +0100)]
x86/asm/entry/64: Always set up SYSENTER MSRs

On CONFIG_IA32_EMULATION=y kernels we set up
MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
kernels we leave them unchanged.

Clear them to make sure the instruction is disabled properly.

SYSCALL is set up properly in both cases.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm: Deobfuscate segment.h
Denys Vlasenko [Sun, 22 Mar 2015 21:01:12 +0000 (22:01 +0100)]
x86/asm: Deobfuscate segment.h

This file just defines a number of constants, and a few macros
and inline functions. It is particularly badly written.

For example, it is not trivial to see how descriptors are
numbered (you'd expect that should be easy, right?).

This change deobfuscates it via the following changes:

Group all GDT_ENTRY_foo together (move intervening stuff away).

Number them explicitly: use a number, not PREV_DEFINE+1, +2, +3:
I want to immediately see that GDT_ENTRY_PNPBIOS_CS32 is 18.
Seeing (GDT_ENTRY_KERNEL_BASE+6) instead is not useful.

The above change allows to remove GDT_ENTRY_KERNEL_BASE
and GDT_ENTRY_PNPBIOS_BASE, which weren't used anywhere else.

After a group of GDT_ENTRY_foo, define all selector values.

Remove or improve some comments. In particular:
Comment deleted as stating the obvious:
    /*
     * The GDT has 32 entries
     */
    #define GDT_ENTRIES 32

"The segment offset needs to contain a RPL. Grr. -AK"
    changed to
"Selectors need to also have a correct RPL (+3 thingy)"

"GDT layout to get 64bit syscall right (sysret hardcodes gdt
offsets)" expanded into a description *how exactly* sysret
hardcodes them.

Patch was tested to compile and not change vmlinux.o
on 32-bit and 64-bit builds (verified with objdump).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Get rid of int_ret_from_sys_call_fixup
Denys Vlasenko [Thu, 19 Mar 2015 17:17:49 +0000 (18:17 +0100)]
x86/asm/entry/64: Get rid of int_ret_from_sys_call_fixup

With the FIXUP_TOP_OF_STACK macro removed, this intermediate jump
is unnecessary.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Get rid of the FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK macros
Denys Vlasenko [Thu, 19 Mar 2015 17:17:48 +0000 (18:17 +0100)]
x86/asm/entry/64: Get rid of the FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK macros

The FIXUP_TOP_OF_STACK macro is only necessary because we don't save %r11
to pt_regs->r11 on SYSCALL64 fast path, but we want ptrace to see it populated.

Bite the bullet, add a single additional PUSH instruction, and remove
the FIXUP_TOP_OF_STACK macro.

The RESTORE_TOP_OF_STACK macro is already a nop. Remove it too.

On SandyBridge CPU, it does not get slower:
measured 54.22 ns per getpid syscall before and after last two
changes on defconfig kernel.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Use PUSH instructions to build pt_regs on stack
Denys Vlasenko [Thu, 19 Mar 2015 17:17:47 +0000 (18:17 +0100)]
x86/asm/entry/64: Use PUSH instructions to build pt_regs on stack

With this change, on SYSCALL64 code path we are now populating
pt_regs->cs, pt_regs->ss and pt_regs->rcx unconditionally and
therefore don't need to do that in FIXUP_TOP_OF_STACK.

We lose a number of large instructions there:

    text    data     bss     dec     hex filename
   13298       0       0   13298    33f2 entry_64_before.o
   12978       0       0   12978    32b2 entry_64.o

What's more important, we convert two "MOVQ $imm,off(%rsp)" to
"PUSH $imm" (the ones which fill pt_regs->cs,ss).

Before this patch, placing them on fast path was slowing it down
by two cycles: this form of MOV is very large, 12 bytes, and
this probably reduces decode bandwidth to one instruction per cycle
when CPU sees them.

Therefore they were living in FIXUP_TOP_OF_STACK instead (away
from fast path).

"PUSH $imm" is a small 2-byte instruction. Moving it to fast path does
not slow it down in my measurements.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-3-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Get rid of KERNEL_STACK_OFFSET
Denys Vlasenko [Thu, 19 Mar 2015 17:17:46 +0000 (18:17 +0100)]
x86/asm/entry: Get rid of KERNEL_STACK_OFFSET

PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Change the THREAD_INFO() definition to not depend on KERNEL_STACK_O...
Denys Vlasenko [Thu, 19 Mar 2015 17:17:45 +0000 (18:17 +0100)]
x86/asm/entry/64: Change the THREAD_INFO() definition to not depend on KERNEL_STACK_OFFSET

This changes the THREAD_INFO() definition and all its callsites
so that they do not count stack position from
(top of stack - KERNEL_STACK_OFFSET), but from top of stack.

Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
- "calculate thread_info's address using information that
rsp is SIZEOF_PTREGS bytes below top of stack".

While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
"((off)-THREAD_SIZE)(reg)". The form without parentheses
falsely looks like we invoke THREAD_SIZE() macro.

Improve comment atop THREAD_INFO macro definition.

This patch does not change generated code (verified by objdump).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Fold syscall32_cpu_init() into its sole user
Denys Vlasenko [Sun, 22 Mar 2015 19:48:14 +0000 (20:48 +0100)]
x86/asm/entry/64: Fold syscall32_cpu_init() into its sole user

Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.

Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry/64: Fix incorrect comment
Denys Vlasenko [Mon, 23 Mar 2015 13:03:59 +0000 (14:03 +0100)]
x86/asm/entry/64: Fix incorrect comment

The recent old_rsp -> rsp_scratch rename also changed this
comment, but in this case "old_rsp" was not referring to
PER_CPU(old_rsp).

Fix this.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427115839-6397-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Replace some open-coded VM86 checks with v8086_mode() checks
Andy Lutomirski [Thu, 19 Mar 2015 01:33:35 +0000 (18:33 -0700)]
x86/asm/entry: Replace some open-coded VM86 checks with v8086_mode() checks

This allows us to remove some unnecessary ifdefs.  There should
be no change to the generated code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f7e00f0d668e253abf0bd8bf36491ac47bd761ff.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Remove user_mode_vm()
Andy Lutomirski [Thu, 19 Mar 2015 01:33:34 +0000 (18:33 -0700)]
x86/asm/entry: Remove user_mode_vm()

It has no callers anymore.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a594afd6a0bddb1311bd7c92a15201c87fbb8681.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'
Andy Lutomirski [Thu, 19 Mar 2015 01:33:33 +0000 (18:33 -0700)]
x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()'

user_mode_vm() and user_mode() are now the same.  Change all callers
of user_mode_vm() to user_mode().

The next patch will remove the definition of user_mode_vm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org
[ Merged to a more recent kernel. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode
Andy Lutomirski [Thu, 19 Mar 2015 01:33:32 +0000 (18:33 -0700)]
x86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode

user_mode() is now identical to user_mode_vm().  Subsequent patches
will change all callers of user_mode_vm() to user_mode() and then
delete user_mode_vm().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0dd03eacb5f0a2b5ba0240de25347a31b493c289.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Use user_mode_ignore_vm86() where appropriate
Andy Lutomirski [Thu, 19 Mar 2015 01:33:31 +0000 (18:33 -0700)]
x86/asm/entry: Use user_mode_ignore_vm86() where appropriate

A few of the user_mode() checks in traps.c are immediately after
explicit checks for vm86 mode.  Change them to user_mode_ignore_vm86().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0b324d5b75c3402be07f8d3c6245ed7f4995029e.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()
Andy Lutomirski [Thu, 19 Mar 2015 01:33:30 +0000 (18:33 -0700)]
x86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()

There's no point in checking the VM bit on 64-bit, and, since
we're explicitly checking it, we can use user_mode_ignore_vm86()
after the check.

While we're at it, rearrange the #ifdef slightly to make the code
flow a bit clearer.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Add user_mode_ignore_vm86()
Andy Lutomirski [Thu, 19 Mar 2015 01:33:29 +0000 (18:33 -0700)]
x86/asm/entry: Add user_mode_ignore_vm86()

user_mode() is dangerous and user_mode_vm() has a confusing name.

Add user_mode_ignore_vm86() (equivalent to current user_mode()).
We'll change the small number of legitimate users of user_mode()
to user_mode_ignore_vm86().

Inspired by grsec, although this works rather differently.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/202c56ca63823c338af8e2e54948dbe222da6343.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoMerge tag 'v4.0-rc5' into x86/asm, to resolve conflicts
Ingo Molnar [Mon, 23 Mar 2015 10:13:15 +0000 (11:13 +0100)]
Merge tag 'v4.0-rc5' into x86/asm, to resolve conflicts

Conflicts:
arch/x86/kernel/entry_64.S

Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry, perf: Fix incorrect TIF_IA32 check in code_segment_base()
Andy Lutomirski [Thu, 19 Mar 2015 01:33:28 +0000 (18:33 -0700)]
x86/asm/entry, perf: Fix incorrect TIF_IA32 check in code_segment_base()

We want to check whether user code is in 32-bit mode, not
whether the task is nominally 32-bit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/33e5107085ce347a8303560302b15c2cadd62c4c.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/mm/fault: Use TASK_SIZE_MAX in is_prefetch()
Andy Lutomirski [Thu, 19 Mar 2015 01:33:27 +0000 (18:33 -0700)]
x86/mm/fault: Use TASK_SIZE_MAX in is_prefetch()

This is slightly shorter and slightly faster.  It's also more
correct: the split between user and kernel addresses is
TASK_SIZE_MAX, regardless of ti->flags.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/09156b63bad90a327827003c9e53faa82ef4c56e.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agox86/asm/entry: Fix execve() and sigreturn() syscalls to always return via IRET
Brian Gerst [Sat, 21 Mar 2015 22:54:21 +0000 (18:54 -0400)]
x86/asm/entry: Fix execve() and sigreturn() syscalls to always return via IRET

Both the execve() and sigreturn() family of syscalls have the
ability to change registers in ways that may not be compatabile
with the syscall path they were called from.

In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
and some bits in eflags.

These syscalls have stubs that are hardcoded to jump to the IRET path,
and not return to the original syscall path.

The following commit:

   76f5df43cab5e76 ("Always allocate a complete "struct pt_regs" on the kernel stack")

recently changed this for some 32-bit compat syscalls, but introduced a bug where
execve from a 32-bit program to a 64-bit program would fail because it still returned
via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.

This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
that the IRET path is always taken on exit to userspace.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
[ Improved the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
4 years agoLinux 4.0-rc5 v4.0-rc5
Linus Torvalds [Sun, 22 Mar 2015 23:50:21 +0000 (16:50 -0700)]
Linux 4.0-rc5

4 years agoMerge tag 'md/4.0-rc4-fix' of git://neil.brown.name/md
Linus Torvalds [Sun, 22 Mar 2015 23:38:19 +0000 (16:38 -0700)]
Merge tag 'md/4.0-rc4-fix' of git://neil.brown.name/md

Pull bugfix for md from Neil Brown:
 "One fix for md in 4.0-rc4

  Regression in recent patch causes crash on error path"

* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
  md: fix problems with freeing private data after ->run failure.

4 years agoMerge tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 22 Mar 2015 19:07:47 +0000 (12:07 -0700)]
Merge tag 'driver-core-4.0-rc5' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two bugfixes for things reported.  One regression in kernfs,
  and another issue fixed in the LZ4 code that was fixed in the
  "upstream" codebase that solves a reported kernel crash

  Both have been in linux-next for a while"

* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  LZ4 : fix the data abort issue
  kernfs: handle poll correctly on 'direct_read' files.

4 years agoMerge tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 22 Mar 2015 19:03:14 +0000 (12:03 -0700)]
Merge tag 'char-misc-4.0-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are three fixes for 4.0-rc5 that revert 3 PCMCIA patches that
  were merged in 4.0-rc1 that cause regressions.  So let's revert them
  for now and they will be reworked and resent sometime in the future.

  All have been tested in linux-next for a while"

* tag 'char-misc-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "pcmcia: add a new resource manager for non ISA systems"
  Revert "pcmcia: fix incorrect bracketing on a test"
  Revert "pcmcia: add missing include for new pci resource handler"

4 years agoMerge tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 22 Mar 2015 18:59:02 +0000 (11:59 -0700)]
Merge tag 'staging-4.0-rc5' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are four small staging driver fixes, all for the vt6656 and
  vt6655 drivers, that resolve some reported issues with them.

  All of these patches have been in linux next for a while"

* tag 'staging-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  vt6655: Fix late setting of byRFType.
  vt6655: RFbSetPower fix missing rate RATE_12M
  staging: vt6656: vnt_rf_setpower: fix missing rate RATE_12M
  staging: vt6655: vnt_tx_packet fix dma_idx selection.

4 years agoMerge tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 22 Mar 2015 18:54:29 +0000 (11:54 -0700)]
Merge tag 'tty-4.0-rc5' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial driver fix from Greg KH:
 "Here's a single 8250 serial driver that fixes a reported deadlock with
  the serial console and the tty driver.

  It's been in linux-next for a while now"

* tag 'tty-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_dw: Fix deadlock in LCR workaround

4 years agoMerge tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 22 Mar 2015 18:33:55 +0000 (11:33 -0700)]
Merge tag 'usb-4.0-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB / PHY driver fixes from Greg KH:
 "Here's a number of USB and PHY driver fixes for 4.0-rc5.

  The largest thing here is a revert of a gadget function driver patch
  that removes 500 lines of code.  Other than that, it's a number of
  reported bugs fixes and new quirk/id entries.

  All have been in linux-next for a while"

* tag 'usb-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
  usb: common: otg-fsm: only signal connect after switching to peripheral
  uas: Add US_FL_NO_ATA_1X for Initio Corporation controllers / devices
  USB: ehci-atmel: rework clk handling
  MAINTAINERS: add entry for USB OTG FSM
  usb: chipidea: otg: add a_alt_hnp_support response for B device
  phy: omap-usb2: Fix missing clk_prepare call when using old dt name
  phy: ti/omap: Fix modalias
  phy: core: Fixup return value of phy_exit when !pm_runtime_enabled
  phy: miphy28lp: Convert to devm_kcalloc and fix wrong sizof
  phy: miphy365x: Convert to devm_kcalloc and fix wrong sizeof
  phy: twl4030-usb: Remove redundant assignment for twl->linkstat
  phy: exynos5-usbdrd: Fix off-by-one valid value checking for args->args[0]
  phy: Find the right match in devm_phy_destroy()
  phy: rockchip-usb: Fixup rockchip_usb_phy_power_on failure path
  phy: ti-pipe3: Simplify ti_pipe3_dpll_wait_lock implementation
  phy: samsung-usb2: Remove NULL terminating entry from phys array
  phy: hix5hd2-sata: Check return value of platform_get_resource
  phy: exynos-dp-video: Kill exynos_dp_video_phy_pwr_isol function
  Revert "usb: gadget: zero: Add support for interrupt EP"
  Revert "xhci: Clear the host side toggle manually when endpoint is 'soft reset'"
  ...

4 years agoMerge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Sat, 21 Mar 2015 20:05:37 +0000 (13:05 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave dmaengine fixes from Vinod Koul:
 "Four fixes for dw, pl08x, imx-sdma and at_hdmac driver.  Nothing
  unusual here, simple fixes to these drivers"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: pl08x: Define capabilities for generic capabilities reporting
  dmaengine: dw: append MODULE_ALIAS for platform driver
  dmaengine: imx-sdma: switch to dynamic context mode after script loaded
  dmaengine: at_hdmac: Fix calculation of the residual bytes

4 years agoMerge tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sat, 21 Mar 2015 19:51:36 +0000 (12:51 -0700)]
Merge tag 'pm+acpi-4.0-rc5' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are fixes for recent regressions (PCI/ACPI resources and at91
  RTC locking), a stable-candidate powercap RAPL driver fix and two ARM
  cpuidle fixes (one stable-candidate too).

  Specifics:

   - Revert a recent PCI commit related to IRQ resources management that
     introduced a regression for drivers attempting to bind to devices
     whose previous drivers did not balance pci_enable_device() and
     pci_disable_device() as expected (Rafael J Wysocki).

   - Fix a deadlock in at91_rtc_interrupt() introduced by a typo in a
     recent commit related to wakeup interrupt handling (Dan Carpenter).

   - Allow the power capping RAPL (Running-Average Power Limit) driver
     to use different energy units for domains within one CPU package
     which is necessary to handle Intel Haswell EP processors correctly
     (Jacob Pan).

   - Improve the cpuidle mvebu driver's handling of Armada XP SoCs by
     updating the target residency and exit latency numbers for those
     chips (Sebastien Rannou).

   - Prevent the cpuidle mvebu driver from calling cpu_pm_enter() twice
     in a row before cpu_pm_exit() is called on the same CPU which
     breaks the core's assumptions regarding the usage of those
     functions (Gregory Clement)"

* tag 'pm+acpi-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "x86/PCI: Refine the way to release PCI IRQ resources"
  rtc: at91rm9200: double locking bug in at91_rtc_interrupt()
  powercap / RAPL: handle domains with different energy units
  cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
  cpuidle: mvebu: Fix the CPU PM notifier usage

4 years agoMerge git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Sat, 21 Mar 2015 19:41:50 +0000 (12:41 -0700)]
Merge git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "A bunch of fixes across drivers:

  radeon:
     disable two ended allocation for now, it breaks some stuff

  amdkfd:
     misc fixes

  nouveau:
     fix irq loop problem, add basic support for GM206 (new hw)

  i915:
     fix some WARNs people were seeing

  exynos:
     fix some iommu interactions causing boot failures"

* git://people.freedesktop.org/~airlied/linux:
  drm/radeon: drop ttm two ended allocation
  drm/exynos: fix the initialization order in FIMD
  drm/exynos: fix typo config name correctly.
  drm/exynos: Check for NULL dereference of crtc
  drm/exynos: IS_ERR() vs NULL bug
  drm/exynos: remove unused files
  drm/i915: Make sure the primary plane is enabled before reading out the fb state
  drm/nouveau/bios: fix i2c table parsing for dcb 4.1
  drm/nouveau/device/gm100: Basic GM206 bring up (as copy of GM204)
  drm/nouveau/device: post write to NV_PMC_BOOT_1 when flipping endian switch
  drm/nouveau/gr/gf100: fix some accidental or'ing of buffer addresses
  drm/nouveau/fifo/nv04: remove the loop from the interrupt handler
  drm/radeon: Changing number of compute pipe lines
  drm/amdkfd: Fix SDMA queue init. in non-HWS mode
  drm/amdkfd: destroy mqd when destroying kernel queue
  drm/i915: Ensure plane->state->fb stays in sync with plane->fb

4 years agoMerge tag 'devicetree-fixes-for-4.0-part2' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 21 Mar 2015 19:33:01 +0000 (12:33 -0700)]
Merge tag 'devicetree-fixes-for-4.0-part2' of git://git./linux/kernel/git/robh/linux

Pull more DeviceTree fixes vfom Rob Herring:

 - revert setting stdout-path as preferred console.  This caused
   regressions in PowerMACs and other systems.

 - yet another fix for stdout-path option parsing.

 - fix error path handling in of_irq_parse_one

* tag 'devicetree-fixes-for-4.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  Revert "of: Fix premature bootconsole disable with 'stdout-path'"
  of: handle both '/' and ':' in path strings
  of: unittest: Add option string test case with longer path
  of/irq: Fix of_irq_parse_one() returned error codes

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Sat, 21 Mar 2015 18:24:38 +0000 (11:24 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
 "Here are current target-pending fixes for v4.0-rc5 code that have made
  their way into the queue over the last weeks.

  The fixes this round include:

   - Fix long-standing iser-target logout bug related to early
     conn_logout_comp completion, resulting in iscsi_conn use-after-tree
     OOpsen.  (Sagi + nab)

   - Fix long-standing tcm_fc bug in ft_invl_hw_context() failure
     handing for DDP hw offload.  (DanC)

   - Fix incorrect use of unprotected __transport_register_session() in
     tcm_qla2xxx + other single local se_node_acl fabrics.  (Bart)

   - Fix reference leak in target_submit_cmd() -> target_get_sess_cmd()
     for ack_kref=1 failure path.  (Bart)

   - Fix pSCSI backend ->get_device_type() statistics OOPs with
     un-configured device.  (Olaf + nab)

   - Fix virtual LUN=0 target_configure_device failure OOPs at modprobe
     time.  (Claudio + nab)

   - Fix FUA write false positive failure regression in v4.0-rc1 code.
     (Christophe Vu-Brugier + HCH)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0
  target: Fix virtual LUN=0 target_configure_device failure OOPs
  target/pscsi: Fix NULL pointer dereference in get_device_type
  tcm_fc: missing curly braces in ft_invl_hw_context()
  target: Fix reference leak in target_get_sess_cmd() error path
  loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session
  tcm_qla2xxx: Fix incorrect use of __transport_register_session
  iscsi-target: Avoid early conn_logout_comp for iser connections
  Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"
  target: Disallow changing of WRITE cache/FUA attrs after export

4 years agoMerge tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Sat, 21 Mar 2015 18:15:13 +0000 (11:15 -0700)]
Merge tag 'dm-4.0-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull devicemapper fixes from Mike Snitzer:
 "A handful of stable fixes for DM:
   - fix thin target to always zero-fill reads to unprovisioned blocks
   - fix to interlock device destruction's suspend from internal
     suspends
   - fix 2 snapshot exception store handover bugs
   - fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing"

* tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME
  dm snapshot: suspend merging snapshot when doing exception handover
  dm snapshot: suspend origin when doing exception handover
  dm: hold suspend_lock while suspending device during device deletion
  dm thin: fix to consistently zero-fill reads to unprovisioned blocks

4 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux...
Linus Torvalds [Sat, 21 Mar 2015 17:53:37 +0000 (10:53 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "Most of these are fixing extent reservation accounting, or corners
  with tree writeback during commit.

  Josef's set does add a test, which isn't strictly a fix, but it'll
  keep us from making this same mistake again"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix outstanding_extents accounting in DIO
  Btrfs: add sanity test for outstanding_extents accounting
  Btrfs: just free dummy extent buffers
  Btrfs: account merges/splits properly
  Btrfs: prepare block group cache before writing
  Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list)
  Btrfs: account for the correct number of extents for delalloc reservations
  Btrfs: fix merge delalloc logic
  Btrfs: fix comp_oper to get right order
  Btrfs: catch transaction abortion after waiting for it
  btrfs: fix sizeof format specifier in btrfs_check_super_valid()

4 years agoMerge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Sat, 21 Mar 2015 17:41:15 +0000 (10:41 -0700)]
Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux

Pull nfsd bufix from Bruce Fields:
 "This is a fix for a crash easily triggered by 4.1 activity to a server
  built with CONFIG_NFSD_PNFS.

  There are some more bugfixes queued up that I intend to pass along
  next week, but this is the most critical"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  Subject: nfsd: don't recursively call nfsd4_cb_layout_fail

4 years agoMerge tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs
Linus Torvalds [Sat, 21 Mar 2015 17:36:44 +0000 (10:36 -0700)]
Merge tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs

Pull UBI fix from Artem Bityutskiy:
 "This fixes a bug introduced during the v4.0 merge window where we
  forgot to put braces where they should be"

* tag 'upstream-4.0-rc5' of git://git.infradead.org/linux-ubifs:
  UBI: fix missing brace control flow

4 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Sat, 21 Mar 2015 17:24:10 +0000 (10:24 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - mm switching fix where the kernel pgd ends up in the user TTBR0 after
   returning from an EFI run-time services call

 - fix __GFP_ZERO handling for atomic pool and CMA DMA allocations (the
   generic code does get the gfp flags, so it's left with the arch code
   to memzero accordingly)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Honor __GFP_ZERO in dma allocations
  arm64: efi: don't restore TTBR0 if active_mm points at init_mm

4 years agoMerge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Linus Torvalds [Sat, 21 Mar 2015 17:03:22 +0000 (10:03 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "Another few ARM fixes.  Fabrice fixed the L2 cache DT parsing to allow
  prefetch configuration to be specified even when the cache size
  parsing fails.

  Laura noticed that the setting of page attributes wasn't working for
  modules due to is_module_addr() always returning false.

  Marc Gonzalez (aka Mason) noticed a potential latent bug with the way
  we read one of the CPUID registers (where we could attempt to read a
  non-present CPUID register which may fault.)

  I've fixed an issue where 32-bit DMA masks were failing with memory
  which extended to the top of physical address space, and I've also
  added debugging output of the page tables when we hit a data access
  exception which we don't specifically handle - prompted by the lack of
  information in a bug report"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8313/1: Use read_cpuid_ext() macro instead of inline asm
  ARM: 8311/1: Don't use is_module_addr in setting page attributes
  ARM: 8310/1: l2c: Fix prefetch settings dt parsing
  ARM: dump pgd, pmd and pte states on unhandled data abort faults
  ARM: dma-api: fix off-by-one error in __dma_supported()

4 years agoMerge branches 'pm-cpuidle', 'powercap', 'irq-pm' and 'acpi-resources'
Rafael J. Wysocki [Fri, 20 Mar 2015 23:39:12 +0000 (00:39 +0100)]
Merge branches 'pm-cpuidle', 'powercap', 'irq-pm' and 'acpi-resources'

* pm-cpuidle:
  cpuidle: mvebu: Update cpuidle thresholds for Armada XP SOCs
  cpuidle: mvebu: Fix the CPU PM notifier usage

* powercap:
  powercap / RAPL: handle domains with different energy units

* irq-pm:
  rtc: at91rm9200: double locking bug in at91_rtc_interrupt()

* acpi-resources:
  Revert "x86/PCI: Refine the way to release PCI IRQ resources"

4 years agomd: fix problems with freeing private data after ->run failure.
NeilBrown [Fri, 13 Mar 2015 00:51:18 +0000 (11:51 +1100)]
md: fix problems with freeing private data after ->run failure.

If ->run() fails, it can either free the data structures it
allocated, or leave that task to ->free() which will be called
on failures.

However:
  md.c calls ->free() even if ->private_data is NULL, which
     causes problems in some personalities.
  raid0.c frees the data, but doesn't clear ->private_data,
     which will become a problem when we fix md.c

So better fix both these issues at once.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: 5aa61f427e4979be733e4847b9199ff9cc48a47e
URL: https://bugzilla.kernel.org/show_bug.cgi?id=94381
Signed-off-by: NeilBrown <neilb@suse.de>
4 years agoarm64: Honor __GFP_ZERO in dma allocations
Suzuki K. Poulose [Thu, 19 Mar 2015 18:17:09 +0000 (18:17 +0000)]
arm64: Honor __GFP_ZERO in dma allocations

Current implementation doesn't zero out the pages allocated.
Honor the __GFP_ZERO flag and zero out if set.

Cc: <stable@vger.kernel.org> # v3.14+
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoarm64: efi: don't restore TTBR0 if active_mm points at init_mm
Will Deacon [Thu, 19 Mar 2015 15:43:00 +0000 (15:43 +0000)]
arm64: efi: don't restore TTBR0 if active_mm points at init_mm

init_mm isn't a normal mm: it has swapper_pg_dir as its pgd (which
contains kernel mappings) and is used as the active_mm for the idle
thread.

When restoring the pgd after an EFI call, we write current->active_mm
into TTBR0. If the current task is actually the idle thread (e.g. when
initialising the EFI RTC before entering userspace), then the TLB can
erroneously populate itself with junk global entries as a result of
speculative table walks.

When we do eventually return to userspace, the task can end up hitting
these junk mappings leading to lockups, corruption or crashes.

This patch fixes the problem in the same way as the CPU suspend code by
ensuring that we never switch to the init_mm in efi_set_pgd and instead
point TTBR0 at the zero page. A check is also added to cpu_switch_mm to
BUG if we get passed swapper_pg_dir.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: f3cdfd239da5 ("arm64/efi: move SetVirtualAddressMap() to UEFI stub")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 years agoRevert "x86/PCI: Refine the way to release PCI IRQ resources"
Rafael J. Wysocki [Fri, 20 Mar 2015 13:56:19 +0000 (14:56 +0100)]
Revert "x86/PCI: Refine the way to release PCI IRQ resources"

Commit b4b55cda5874 (Refine the way to release PCI IRQ resources)
introduced a regression in the PCI IRQ resource management by causing
the IRQ resource of a device, established when pci_enabled_device()
is called on a fully disabled device, to be released when the driver
is unbound from the device, regardless of the enable_cnt.

This leads to the situation that an ill-behaved driver can now make a
device unusable to subsequent drivers by an imbalance in their use of
pci_enable/disable_device().  That is a serious problem for secondary
drivers like vfio-pci, which are innocent of the transgressions of
the previous driver.

Since the solution of this problem is not immediate and requires
further discussion, revert commit b4b55cda5874 and the issue it was
supposed to address (a bug related to xen-pciback) will be taken
care of in a different way going forward.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
4 years agoMerge tag 'drm-intel-fixes-2015-03-19' of git://anongit.freedesktop.org/drm-intel...
Dave Airlie [Fri, 20 Mar 2015 07:32:21 +0000 (17:32 +1000)]
Merge tag 'drm-intel-fixes-2015-03-19' of git://anongit.freedesktop.org/drm-intel into drm-fixes

Backporting a couple of plane related fixes from drm-next to v4.0.

* tag 'drm-intel-fixes-2015-03-19' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Make sure the primary plane is enabled before reading out the fb state
  drm/i915: Ensure plane->state->fb stays in sync with plane->fb

4 years agoMerge tag 'drm-amdkfd-fixes-2015-03-19' of git://people.freedesktop.org/~gabbayo...
Dave Airlie [Fri, 20 Mar 2015 07:32:01 +0000 (17:32 +1000)]
Merge tag 'drm-amdkfd-fixes-2015-03-19' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes

- Fixing SDMA initialization when in non-HWS mode (debug mode)
- Memory leak fix when destroying kernel queue
- Fix number of available compute pipelines according to new firmware

* tag 'drm-amdkfd-fixes-2015-03-19' of git://people.freedesktop.org/~gabbayo/linux:
  drm/radeon: Changing number of compute pipe lines
  drm/amdkfd: Fix SDMA queue init. in non-HWS mode
  drm/amdkfd: destroy mqd when destroying kernel queue

4 years agotarget: do not reject FUA CDBs when write cache is enabled but emulate_write_cache...
Christophe Vu-Brugier [Thu, 19 Mar 2015 13:30:13 +0000 (14:30 +0100)]
target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0

A check that rejects a CDB with FUA bit set if no write cache is
emulated was added by the following commit:

  fde9f50 target: Add sanity checks for DPO/FUA bit usage

The condition is as follows:

  if (!dev->dev_attrib.emulate_fua_write ||
      !dev->dev_attrib.emulate_write_cache)

However, this check is wrong if the backend device supports WCE but
"emulate_write_cache" is disabled.

This patch uses se_dev_check_wce() (previously named
spc_check_dev_wce) to invoke transport->get_write_cache() if the
device has a write cache or check the "emulate_write_cache" attribute
otherwise.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christophe Vu-Brugier <cvubrugier@fastmail.fm>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agotarget: Fix virtual LUN=0 target_configure_device failure OOPs
Nicholas Bellinger [Thu, 5 Mar 2015 03:28:24 +0000 (03:28 +0000)]
target: Fix virtual LUN=0 target_configure_device failure OOPs

This patch fixes a NULL pointer dereference triggered by a late
target_configure_device() -> alloc_workqueue() failure that results
in target_free_device() being called with DF_CONFIGURED already set,
which subsequently OOPses in destroy_workqueue() code.

Currently this only happens at modprobe target_core_mod time when
core_dev_setup_virtual_lun0() -> target_configure_device() fails,
and the explicit target_free_device() gets called.

To address this bug originally introduced by commit 0fd97ccf45, go
ahead and move DF_CONFIGURED to end of target_configure_device()
code to handle this special failure case.

Reported-by: Claudio Fleiner <cmf@daterainc.com>
Cc: Claudio Fleiner <cmf@daterainc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v3.7+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agotarget/pscsi: Fix NULL pointer dereference in get_device_type
Nicholas Bellinger [Fri, 27 Feb 2015 11:54:13 +0000 (03:54 -0800)]
target/pscsi: Fix NULL pointer dereference in get_device_type

This patch fixes a NULL pointer dereference OOPs with pSCSI backends
within target_core_stat.c code.  The bug is caused by a configfs attr
read if no pscsi_dev_virt->pdv_sd has been configured.

Reported-by: Olaf Hering <olaf@aepfle.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agotcm_fc: missing curly braces in ft_invl_hw_context()
Dan Carpenter [Wed, 25 Feb 2015 13:21:03 +0000 (16:21 +0300)]
tcm_fc: missing curly braces in ft_invl_hw_context()

This patch adds a missing set of conditional check braces in
ft_invl_hw_context() originally introduced by commit dcd998ccd
when handling DDP failures in ft_recv_write_data() code.

 commit dcd998ccdbf74a7d8fe0f0a44e85da1ed5975946
 Author: Kiran Patil <kiran.patil@intel.com>
 Date:   Wed Aug 3 09:20:01 2011 +0000

    tcm_fc: Handle DDP/SW fc_frame_payload_get failures in ft_recv_write_data

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: <stable@vger.kernel.org> # 3.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agotarget: Fix reference leak in target_get_sess_cmd() error path
Bart Van Assche [Wed, 18 Feb 2015 14:33:58 +0000 (15:33 +0100)]
target: Fix reference leak in target_get_sess_cmd() error path

This patch fixes a se_cmd->cmd_kref leak buf when se_sess->sess_tearing_down
is true within target_get_sess_cmd() submission path code.

This se_cmd reference leak can occur during active session shutdown when
ack_kref=1 is passed by target_submit_cmd_[map_sgls,tmr]() callers.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agoloop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session
Bart Van Assche [Thu, 12 Feb 2015 10:48:49 +0000 (11:48 +0100)]
loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session

This patch changes loopback, usb-gadget, vhost-scsi and xen-scsiback
fabric code to invoke transport_register_session() instead of the
unprotected flavour, to ensure se_tpg->session_lock is taken when
adding new session list nodes to se_tpg->tpg_sess_list.

Note that since these four fabric drivers already hold their own
internal TPG mutexes when accessing se_tpg->tpg_sess_list, and
consist of a single se_session created through configfs attribute
access, no list corruption can currently occur.

So for correctness sake, go ahead and use the se_tpg->session_lock
protected version for these four fabric drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agotcm_qla2xxx: Fix incorrect use of __transport_register_session
Bart Van Assche [Fri, 20 Mar 2015 05:25:16 +0000 (22:25 -0700)]
tcm_qla2xxx: Fix incorrect use of __transport_register_session

This patch fixes the incorrect use of __transport_register_session()
in tcm_qla2xxx_check_initiator_node_acl() code, that does not perform
explicit se_tpg->session_lock when accessing se_tpg->tpg_sess_list
to add new se_sess nodes.

Given that tcm_qla2xxx_check_initiator_node_acl() is not called with
qla_hw->hardware_lock held for all accesses of ->tpg_sess_list, the
code should be using transport_register_session() instead.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: <stable@vger.kernel.org> # 3.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agoiscsi-target: Avoid early conn_logout_comp for iser connections
Nicholas Bellinger [Mon, 23 Feb 2015 08:57:51 +0000 (00:57 -0800)]
iscsi-target: Avoid early conn_logout_comp for iser connections

This patch fixes a iser specific logout bug where early complete()
of conn->conn_logout_comp in iscsit_close_connection() was causing
isert_wait4logout() to complete too soon, triggering a use after
free NULL pointer dereference of iscsi_conn memory.

The complete() was originally added for traditional iscsi-target
when a ISCSI_LOGOUT_OP failed in iscsi_target_rx_opcode(), but given
iser-target does not wait in logout failure, this special case needs
to be avoided.

Reported-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Slava Shwartsman <valyushash@gmail.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agoRevert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"
Nicholas Bellinger [Thu, 26 Feb 2015 06:56:37 +0000 (22:56 -0800)]
Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target"

This reverts commit 72859d91d93319c00a18c29f577e56bf73a8654a.

The original patch was wrong, iscsit_close_connection() still needs
to release iscsi_conn during both normal + exception IN_LOGOUT status
with ib_isert enabled.

The original OOPs is due to completing conn_logout_comp early within
iscsit_close_connection(), causing isert_wait4logout() to complete
instead of waiting for iscsit_logout_post_handler_*() to be called.

Reported-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agotarget: Disallow changing of WRITE cache/FUA attrs after export
Nicholas Bellinger [Mon, 23 Feb 2015 06:17:13 +0000 (22:17 -0800)]
target: Disallow changing of WRITE cache/FUA attrs after export

Now that incoming FUA=1 bit check is enforced for backends with FUA or
WCE disabled, go ahead and disallow the changing of related backend
attributes when active fabric exports exist.

This is required to avoid potential failures with existing initiator
LUN registrations that have been previously created with FUA=1.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
4 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Thu, 19 Mar 2015 23:43:10 +0000 (16:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:
 "An update to Synaptics driver that makes it usable with the 2015
  lineup from Lenovo"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Revert "Input: synaptics - use dmax in input_mt_assign_slots"
  Input: synaptics - remove X250 from the topbuttonpad list
  Input: synaptics - remove X1 Carbon 3rd gen from the topbuttonpad list
  Input: synaptics - re-route tracksticks buttons on the Lenovo 2015 series
  Input: synaptics - remove TOPBUTTONPAD property for Lenovos 2015
  Input: synaptics - retrieve the extended capabilities in query $10
  Input: synaptics - do not retrieve the board id on old firmwares
  Input: synaptics - handle spurious release of trackstick buttons
  Input: synaptics - fix middle button on Lenovo 2015 products
  Input: synaptics - skip quirks when post-2013 dimensions
  Input: synaptics - support min/max board id in min_max_pnpid_table
  Input: synaptics - remove obsolete min/max quirk for X240
  Input: synaptics - query min dimensions for fw v8.1
  Input: synaptics - log queried and quirked dimension values
  Input: synaptics - split synaptics_resolution(), query first

4 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi...
Linus Torvalds [Thu, 19 Mar 2015 23:36:24 +0000 (16:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "This fixes bugs in zero-copy splice to the fuse device"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: explicitly set /dev/fuse file's private_data
  fuse: set stolen page uptodate
  fuse: notify: don't move pages

4 years agoMerge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszere...
Linus Torvalds [Thu, 19 Mar 2015 23:27:36 +0000 (16:27 -0700)]
Merge branch 'overlayfs-next' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "This fixes minor issues with the multi-layer update in v4.0"

* 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: upper fs should not be R/O
  ovl: check lowerdir amount for non-upper mount
  ovl: print error message for invalid mount options

4 years agoMerge tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc
Linus Torvalds [Thu, 19 Mar 2015 23:18:30 +0000 (16:18 -0700)]
Merge tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC fix from Ulf Hansson:
 "MMC core: fix error path in mmc_pwrseq_simple_alloc()"

* tag 'mmc-v4.0-rc4' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: pwrseq_simple: fix error path in mmc_pwrseq_simple_alloc

4 years agoMerge tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Thu, 19 Mar 2015 22:52:28 +0000 (15:52 -0700)]
Merge tag 'pinctrl-v4.0-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Here is a slew of pin control fixes I've accumulated for the v4.0
  kernel.  Nothing special, just driver fixes (mainly embedded Intel it
  seems) and a misunderstanding regarding the stub functions was
  reverted:

   - Fix up consumer return values on pin control stubs.
   - Four patches fixing up the interrupt handling and sleep context
     save in the Baytrail driver.
   - Make default output directions work properly in the Cherryview
     driver.
   - Fix interrupt locking in the AT91 driver.
   - Fix setting interrupt generating lines as input in the sunxi
     driver"

* tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
  pinctrl: at91: move lock/unlock_as_irq calls into request/release
  pinctrl: update direction_output function of cherryview driver
  pinctrl: baytrail: Save pin context over system sleep
  pinctrl: baytrail: Rework interrupt handling
  pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode
  pinctrl: baytrail: Relax GPIO request rules
  Revert "pinctrl: consumer: use correct retval for placeholder functions"

4 years agoMerge tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next
Linus Torvalds [Thu, 19 Mar 2015 22:24:28 +0000 (15:24 -0700)]
Merge tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next

Pull two arch/nios2 fixes from Ley Foon Tan:
 - Remove ucontext.h from exported arch headers
 - nios2: mm: do not invoke OOM killer on kernel fault OOM

* tag 'nios2-fixes-v4.0-rc5' of git://git.rocketboards.org/linux-socfpga-next:
  nios2: mm: do not invoke OOM killer on kernel fault OOM
  nios2: Remove ucontext.h from exported arch headers

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
Linus Torvalds [Thu, 19 Mar 2015 20:16:49 +0000 (13:16 -0700)]
Merge git://git./linux/kernel/git/davem/ide

Pull IDE fix from David Miller:
 "Just one fix to convert a by-hand conversion of jiffies to msecs, from
  Nicholas McGuire"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
  ide_tape: convert jiffies with jiffies_to_msecs

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Thu, 19 Mar 2015 20:11:55 +0000 (13:11 -0700)]
Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

 1) Some command cases of semtimedop() not even handled due to miscoded
    comparison on sparc64.  From Rob Gardner.

 2) Due to two bugs, /proc/kcore wan't working properly on sparc.

 3) Make sure fatal traps stop all running cpus, from Dave Kleikamp.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Fix /proc/kcore
  sparc: semtimedop() unreachable due to comparison error
  sparc: io_64.h: Replace io function-link macros
  sparc64: fatal trap should stop all cpus
  arch: sparc: kernel: starfire.c: Remove unused function
  arch: sparc: kernel: traps_64.c: Remove some unused functions

4 years agoSubject: nfsd: don't recursively call nfsd4_cb_layout_fail
Christoph Hellwig [Thu, 5 Mar 2015 13:17:31 +0000 (14:17 +0100)]
Subject: nfsd: don't recursively call nfsd4_cb_layout_fail

Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
leading to stack overflows.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes:  c5c707f9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4layouts.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 3c1bfa1..1028a06 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

  rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));

- nfsd4_cb_layout_fail(ls);
-
  printk(KERN_WARNING
  "nfsd: client %s failed to respond to layout recall. "
  "  Fencing..\n", addr_str);
--
1.9.1

4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 19 Mar 2015 18:19:44 +0000 (11:19 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix packet header offset calculation in _decode_session6(), from
    Hajime Tazaki.

 2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang.

 3) Be sure to clear state properly when scans fail in iwlwifi mvm code,
    from Luciano Coelho.

 4) iwlwifi tries to stop scans that aren't actually running, also from
    Luciano Coelho.

 5) mac80211 should drop mesh frames that are not encrypted, fix from
    Bob Copeland.

 6) Add new device ID to b43 wireless driver for BCM432228 chips, from
    Rafał Miłecki.

 7) Fix accidental addition of members after variable sized array in
    struct tc_u_hnode, from WANG Cong.

 8) Don't re-enable interrupts until after we call napi_complete() in
    ibmveth and WIZnet drivers, frm Yongbae Park.

 9) Fix regression in vlan tag handling of fec driver, from Fugang Duan.

10) If a network namespace change fails during rtnl_newlink(), we don't
    unwind the device registry properly.

11) Fix two TCP regressions, from Neal Cardwell:
  - Don't allow snd_cwnd_cnt to accumulate huge values due to missing
    test in tcp_cong_avoid_ai().
  - Restore CUBIC back to advancing cwnd by 1.5x packets per RTT.

12) Fix performance regression in xne-netback involving push TX
    notifications, from David Vrabel.

13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not
    dereference blindly.  From Willem de Bruijn.

14) Fix potential stack overflow in RDS protocol stack, from Arnd
    Bergmann.

15) VXLAN_VID_MASK used incorrectly in new remote checksum offload
    support of VXLAN driver.  Fix from Alexey Kodanev.

16) Fix too small netlink SKB allocation in inet_diag layer, from Eric
    Dumazet.

17) ieee80211_check_combinations() does not count interfaces correctly,
    from Andrei Otcheretianski.

18) Hardware feature determination in bxn2x driver references a piece of
    software state that actually isn't initialized yet, fix from Michal
    Schmidt.

19) inet_csk_wait_for_connect() needs a sched_annotate_sleep()
    annoation, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  Revert "net: cx82310_eth: use common match macro"
  net/mlx4_en: Set statistics bitmap at port init
  IB/mlx4: Saturate RoCE port PMA counters in case of overflow
  net/mlx4_en: Fix off-by-one in ethtool statistics display
  IB/mlx4: Verify net device validity on port change event
  act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome
  Revert "smc91x: retrieve IRQ and trigger flags in a modern way"
  inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()
  ip6_tunnel: fix error code when tunnel exists
  netdevice.h: fix ndo_bridge_* comments
  bnx2x: fix encapsulation features on 57710/57711
  mac80211: ignore CSA to same channel
  nl80211: ignore HT/VHT capabilities without QoS/WMM
  mac80211: ask for ECSA IE to be considered for beacon parse CRC
  mac80211: count interfaces correctly for combination checks
  isdn: icn: use strlcpy() when parsing setup options
  rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
  caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
  bridge: reset bridge mtu after deleting an interface
  can: kvaser_usb: Fix tx queue start/stop race conditions
  ...

4 years agofuse: explicitly set /dev/fuse file's private_data
Tom Van Braeckel [Mon, 12 Jan 2015 04:22:16 +0000 (05:22 +0100)]
fuse: explicitly set /dev/fuse file's private_data

The misc subsystem (which is used for /dev/fuse) initializes private_data to
point to the misc device when a driver has registered a custom open file
operation, and initializes it to NULL when a custom open file operation has
*not* been provided.

This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc device
structure. And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.

So to simplify things in the misc subsystem, a patch [1] has been proposed to
*always* set the private_data to point to the misc device, instead of only
doing this when a custom open file operation has been registered.

But before this patch can be applied we need to modify drivers that make the
assumption that a misc device file's private_data is initialized to NULL
because they didn't register a custom open file operation, so they don't rely
on this assumption anymore. FUSE uses private_data to store the fuse_conn and
errors out if this is not initialized to NULL at mount time.

Hence, we now set a file's private_data to NULL explicitly, to be independent
of whatever value the misc subsystem initializes it to by default.

[1] https://lkml.org/lkml/2014/12/4/939

Reported-by: Giedrius Statkevicius <giedriuswork@gmail.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Tom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
4 years agoRevert "of: Fix premature bootconsole disable with 'stdout-path'"
Peter Hurley [Tue, 17 Mar 2015 20:46:33 +0000 (16:46 -0400)]
Revert "of: Fix premature bootconsole disable with 'stdout-path'"

This reverts commit 2fa645cb2703d9b3786d850db815414dfeefa51d.

The assumption that at least 1 preferred console will be registered
when the stdout-path property is set is invalid, which can result
in _no_ consoles.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Rob Herring <robh@kernel.org>