sfrench/cifs-2.6.git
6 years agoCIFS: SMBD: Upper layer performs SMB write via RDMA read through memory registration
Long Li [Thu, 23 Nov 2017 00:38:45 +0000 (17:38 -0700)]
CIFS: SMBD: Upper layer performs SMB write via RDMA read through memory registration

When sending I/O, if size is larger than rdma_readwrite_threshold we prepare
to send SMB write packet for a RDMA read via memory registration. The actual
I/O is done by remote peer through local RDMA hardware. Modify the relevant
fields in the packet accordingly, and append a smbd_buffer_descriptor_v1 to
the end of the SMB write packet.

On write I/O finish, deregister the memory region if this was for a RDMA read.
If remote invalidation is not used, the call to smbd_deregister_mr will do
local invalidation and possibly wait. Memory region is normally deregistered
in MID callback as soon as it's used. There are situations where the MID may
not be created on I/O failure, under which memory region is deregistered when
write data context is released.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Implement RDMA memory registration
Long Li [Thu, 23 Nov 2017 00:38:44 +0000 (17:38 -0700)]
CIFS: SMBD: Implement RDMA memory registration

Memory registration is used for transferring payload via RDMA read or write.
After I/O is done, memory registrations are recovered and reused. This
process can be time consuming and is done in a work queue.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
6 years agoCIFS: SMBD: Upper layer sends data via RDMA send
Long Li [Thu, 23 Nov 2017 00:38:43 +0000 (17:38 -0700)]
CIFS: SMBD: Upper layer sends data via RDMA send

With SMB Direct connected, use it for sending data via RDMA send.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Implement function to send data via RDMA send
Long Li [Thu, 23 Nov 2017 00:38:42 +0000 (17:38 -0700)]
CIFS: SMBD: Implement function to send data via RDMA send

The transport doesn't maintain send buffers or send queue for transferring
payload via RDMA send. There is no data copy in the transport on send.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Upper layer receives data via RDMA receive
Long Li [Thu, 23 Nov 2017 00:38:41 +0000 (17:38 -0700)]
CIFS: SMBD: Upper layer receives data via RDMA receive

With SMB Direct connected, use it for receiving data via RDMA receive.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Implement function to receive data via RDMA receive
Long Li [Thu, 23 Nov 2017 00:38:40 +0000 (17:38 -0700)]
CIFS: SMBD: Implement function to receive data via RDMA receive

On the receive path, the transport maintains receive buffers and a reassembly
queue for transferring payload via RDMA recv. There is data copy in the
transport on recv when it copies the payload to upper layer.

The transport recognizes the RFC1002 header length use in the SMB
upper layer payloads in CIFS. Because this length is mainly used for TCP and
not applicable to RDMA, it is handled as a out-of-band information and is
never sent over the wire, and the trasnport behaves like TCP to upper layer
by processing and exposing the length correctly on data payloads.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Set SMB Direct maximum read or write size for I/O
Long Li [Thu, 23 Nov 2017 00:38:39 +0000 (17:38 -0700)]
CIFS: SMBD: Set SMB Direct maximum read or write size for I/O

When connecting over SMB Direct, the transport negotiates its maximum I/O sizes
with the server and determines how to choose to do RDMA send/recv vs
read/write. Expose these maximum I/O sizes to upper layer so we will get the
correct sized payloads.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Upper layer destroys SMB Direct session on shutdown or umount
Long Li [Thu, 23 Nov 2017 00:38:38 +0000 (17:38 -0700)]
CIFS: SMBD: Upper layer destroys SMB Direct session on shutdown or umount

When upper layer wants to umount, make it call shutdown on transport when
SMB Direct is used.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Implement function to destroy a SMB Direct connection
Long Li [Thu, 23 Nov 2017 00:38:37 +0000 (17:38 -0700)]
CIFS: SMBD: Implement function to destroy a SMB Direct connection

Add function to tear down a SMB Direct connection. This is used by upper layer
to free all SMB Direct connection and transport resources.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Upper layer reconnects to SMB Direct session
Long Li [Thu, 23 Nov 2017 00:38:36 +0000 (17:38 -0700)]
CIFS: SMBD: Upper layer reconnects to SMB Direct session

Do a reconnect on SMB Direct when it is used as the connection. Reconnect can
happen for many reasons and it's mostly the decision of SMB2 upper layer.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agoCIFS: SMBD: Implement function to reconnect to a SMB Direct transport
Long Li [Thu, 23 Nov 2017 00:38:35 +0000 (17:38 -0700)]
CIFS: SMBD: Implement function to reconnect to a SMB Direct transport

Add function to implement a reconnect to SMB Direct. This involves tearing down
the current connection and establishing/negotiating a new connection.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agoCIFS: SMBD: Upper layer connects to SMBDirect session
Long Li [Thu, 23 Nov 2017 00:38:34 +0000 (17:38 -0700)]
CIFS: SMBD: Upper layer connects to SMBDirect session

When "rdma" is specified in the mount option, make CIFS connect to
SMB Direct.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
6 years agocifs: fix build errors for SMB_DIRECT
Randy Dunlap [Tue, 2 Jan 2018 17:43:10 +0000 (09:43 -0800)]
cifs: fix build errors for SMB_DIRECT

Prevent build errors when CIFS=y and INFINIBAND=m.

fs/cifs/smbdirect.o: In function `smbd_qp_async_error_upcall':
smbdirect.c:(.text+0x28c): undefined reference to `ib_event_msg'
fs/cifs/smbdirect.o: In function `smbd_destroy_rdma_work':
smbdirect.c:(.text+0xfde): undefined reference to `ib_drain_qp'
smbdirect.c:(.text+0xfea): undefined reference to `rdma_destroy_qp'
smbdirect.c:(.text+0x12a0): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x12ac): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x12b8): undefined reference to `ib_dealloc_pd'
smbdirect.c:(.text+0x12c4): undefined reference to `rdma_destroy_id'
fs/cifs/smbdirect.o: In function `_smbd_get_connection':
smbdirect.c:(.text+0x168c): undefined reference to `rdma_create_id'
smbdirect.c:(.text+0x1713): undefined reference to `rdma_resolve_addr'
smbdirect.c:(.text+0x1780): undefined reference to `rdma_resolve_route'
smbdirect.c:(.text+0x17e3): undefined reference to `rdma_destroy_id'
smbdirect.c:(.text+0x183d): undefined reference to `rdma_destroy_id'
smbdirect.c:(.text+0x199d): undefined reference to `ib_alloc_cq'
smbdirect.c:(.text+0x19d9): undefined reference to `ib_alloc_cq'
smbdirect.c:(.text+0x1a89): undefined reference to `rdma_create_qp'
smbdirect.c:(.text+0x1b3c): undefined reference to `rdma_connect'
smbdirect.c:(.text+0x2538): undefined reference to `rdma_destroy_qp'
smbdirect.c:(.text+0x2549): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x255a): undefined reference to `ib_free_cq'
smbdirect.c:(.text+0x2563): undefined reference to `ib_dealloc_pd'
smbdirect.c:(.text+0x256c): undefined reference to `rdma_destroy_id'
smbdirect.c:(.text+0x25f0): undefined reference to `__ib_alloc_pd'
smbdirect.c:(.text+0x26bb): undefined reference to `rdma_disconnect'
fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
smbdirect.c:(.text+0x62): undefined reference to `rdma_disconnect'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org (moderated for non-subscribers)
Signed-off-by: Steve French <smfrench@gmail.com>
6 years agocifs: Fix missing put_xid in cifs_file_strict_mmap
Matthew Wilcox [Fri, 15 Dec 2017 20:48:32 +0000 (12:48 -0800)]
cifs: Fix missing put_xid in cifs_file_strict_mmap

If cifs_zap_mapping() returned an error, we would return without putting
the xid that we got earlier.  Restructure cifs_file_strict_mmap() and
cifs_file_mmap() to be more similar to each other and have a single
point of return that always puts the xid.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
6 years agoCIFS: SMBD: export protocol initial values
Long Li [Tue, 7 Nov 2017 08:54:58 +0000 (01:54 -0700)]
CIFS: SMBD: export protocol initial values

For use-configurable SMB Direct protocol values, export them to /proc/fs/cifs.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agoCIFS: SMBD: Implement function to create a SMB Direct connection
Long Li [Sat, 18 Nov 2017 01:26:52 +0000 (17:26 -0800)]
CIFS: SMBD: Implement function to create a SMB Direct connection

The upper layer calls this function to connect to peer through SMB Direct.
Each SMB Direct connection is based on a RDMA RC Queue Pair.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agoCIFS: SMBD: Establish SMB Direct connection
Long Li [Sun, 5 Nov 2017 01:17:24 +0000 (18:17 -0700)]
CIFS: SMBD: Establish SMB Direct connection

Add code to implement the core functions to establish a SMB Direct connection.

1. Establish an RDMA connection to SMB server.
2. Negotiate and setup SMB Direct protocol.
3. Implement idle connection timer and credit management.

SMB Direct is enabled by setting CONFIG_CIFS_SMB_DIRECT.

Add to Makefile to enable building SMB Direct.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agoCIFS: SMBD: Add SMB Direct protocol initial values and constants
Long Li [Tue, 7 Nov 2017 08:54:56 +0000 (01:54 -0700)]
CIFS: SMBD: Add SMB Direct protocol initial values and constants

To prepare for protocol implementation, add constants and user-configurable
values for the SMB Direct protocol.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agoCIFS: SMBD: Add rdma mount option
Long Li [Tue, 7 Nov 2017 08:54:55 +0000 (01:54 -0700)]
CIFS: SMBD: Add rdma mount option

Add "rdma" to CIFS mount options to connect to SMB Direct.
Add checks to validate this is used on SMB 3.X dialects.

To connect to SMBDirect, use "mount.cifs -o rdma,vers=3.x".
At the time of this patch, 3.x can be 3.0, 3.02 or 3.1.1.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
6 years agoCIFS: SMBD: Introduce kernel config option CONFIG_CIFS_SMB_DIRECT
Long Li [Tue, 7 Nov 2017 08:54:54 +0000 (01:54 -0700)]
CIFS: SMBD: Introduce kernel config option CONFIG_CIFS_SMB_DIRECT

Build SMB Direct code when this option is set.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
6 years agoCIFS: SMBD: Add parameter rdata to smb2_new_read_req
Long Li [Tue, 7 Nov 2017 08:54:53 +0000 (01:54 -0700)]
CIFS: SMBD: Add parameter rdata to smb2_new_read_req

This patch is for preparing upper layer for doing SMB read via RDMA write.

When we assemble the SMB read packet header, we need to know the I/O layout
if this request is to use a RDMA write. rdata has all the information we need
for memory registration. Add rdata to smb2_new_read_req.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
6 years agocifs: avoid a kmalloc in smb2_send_recv/SendReceive2 for the common case
Ronnie Sahlberg [Tue, 21 Nov 2017 04:08:07 +0000 (15:08 +1100)]
cifs: avoid a kmalloc in smb2_send_recv/SendReceive2 for the common case

In both functions, use an array of 8 (arbitrary but should be big enough
for all current uses) iov and avoid having to kmalloc the array
for the common case.

If 8 is too small, then fall back to the original behaviour and use
kmalloc/kfree.

This should not change any behaviour but should save us a tiny amount of
cpu cycles.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove small_smb2_init
Ronnie Sahlberg [Tue, 21 Nov 2017 00:04:42 +0000 (11:04 +1100)]
cifs: remove small_smb2_init

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_lease_ack
Ronnie Sahlberg [Tue, 21 Nov 2017 00:04:37 +0000 (11:04 +1100)]
cifs: remove rfc1002 header from smb2_lease_ack

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove unused variable from SMB2_read
Ronnie Sahlberg [Mon, 20 Nov 2017 22:57:45 +0000 (09:57 +1100)]
cifs: remove unused variable from SMB2_read

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
6 years agocifs: remove rfc1002 header from smb2_oplock_break we get from server
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:43 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_oplock_break we get from server

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_query_info_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:46 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_query_info_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_query_directory_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:45 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_query_directory_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_set_info_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:44 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_set_info_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2 read/write requests
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:41 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2 read/write requests

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_lock_req
Ronnie Sahlberg [Mon, 20 Nov 2017 23:07:27 +0000 (10:07 +1100)]
cifs: remove rfc1002 header from smb2_lock_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
6 years agocifs: remove rfc1002 header from smb2_flush_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:39 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_flush_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_create_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:38 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_create_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
6 years agocifs: remove rfc1002 header from smb2_sess_setup_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:36 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_sess_setup_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_tree_connect_req
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:23 +0000 (12:14 +1100)]
cifs: remove rfc1002 header from smb2_tree_connect_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_echo_req
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:21 +0000 (12:14 +1100)]
cifs: remove rfc1002 header from smb2_echo_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_ioctl_req
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:20 +0000 (12:14 +1100)]
cifs: remove rfc1002 header from smb2_ioctl_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_close_req
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:19 +0000 (12:14 +1100)]
cifs: remove rfc1002 header from smb2_close_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_tree_disconnect_req
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:18 +0000 (12:14 +1100)]
cifs: remove rfc1002 header from smb2_tree_disconnect_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_logoff_req
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:17 +0000 (12:14 +1100)]
cifs: remove rfc1002 header from smb2_logoff_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
6 years agocifs: remove rfc1002 header from smb2_negotiate_req
Ronnie Sahlberg [Mon, 20 Nov 2017 00:24:30 +0000 (11:24 +1100)]
cifs: remove rfc1002 header from smb2_negotiate_req

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
6 years agocifs: Add smb2_send_recv
Ronnie Sahlberg [Thu, 9 Nov 2017 01:14:15 +0000 (12:14 +1100)]
cifs: Add smb2_send_recv

This function is similar to SendReceive2 except it does not expect
a 4 byte rfc1002 length header in the first io vector.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 25 Jan 2018 01:24:30 +0000 (17:24 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Avoid negative netdev refcount in error flow of xfrm state add, from
    Aviad Yehezkel.

 2) Fix tcpdump decoding of IPSEC decap'd frames by filling in the
    ethernet header protocol field in xfrm{4,6}_mode_tunnel_input().
    From Yossi Kuperman.

 3) Fix a syzbot triggered skb_under_panic in pppoe having to do with
    failing to allocate an appropriate amount of headroom. From
    Guillaume Nault.

 4) Fix memory leak in vmxnet3 driver, from Neil Horman.

 5) Cure out-of-bounds packet memory access in em_nbyte EMATCH module,
    from Wolfgang Bumiller.

 6) Restrict what kinds of sockets can be bound to the KCM multiplexer
    and also disallow when another layer has attached to the socket and
    made use of sk_user_data. From Tom Herbert.

 7) Fix use before init of IOTLB in vhost code, from Jason Wang.

 8) Correct STACR register write bit definition in IBM emac driver, from
    Ivan Mikhaylov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net/ibm/emac: wrong bit is used for STA control register write
  net/ibm/emac: add 8192 rx/tx fifo size
  vhost: do not try to access device IOTLB when not initialized
  vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
  i40e: flower: check if TC offload is enabled on a netdev
  qed: Free reserved MR tid
  qed: Remove reserveration of dpi for kernel
  kcm: Check if sk_user_data already set in kcm_attach
  kcm: Only allow TCP sockets to be attached to a KCM mux
  net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
  net: sched: em_nbyte: don't add the data offset twice
  mlxsw: spectrum_router: Don't log an error on missing neighbor
  vmxnet3: repair memory leak
  ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
  pppoe: take ->needed_headroom of lower device into account on xmit
  xfrm: fix boolean assignment in xfrm_get_type_offload
  xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
  xfrm: fix error flow in case of add state fails
  xfrm: Add SA to hardware at the end of xfrm_state_construct()

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Wed, 24 Jan 2018 23:49:02 +0000 (15:49 -0800)]
Merge git://git./linux/kernel/git/davem/sparc

Pull sparc bugfix from David Miller:
 "Sparc Makefile typo fix"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64

6 years agonet/ibm/emac: wrong bit is used for STA control register write
Ivan Mikhaylov [Wed, 24 Jan 2018 12:53:25 +0000 (15:53 +0300)]
net/ibm/emac: wrong bit is used for STA control register write

STA control register has areas of mode and opcodes for opeations. 18 bit is
using for mode selection, where 0 is old MIO/MDIO access method and 1 is
indirect access mode. 19-20 bits are using for setting up read/write
operation(STA opcodes). In current state 'read' is set into old MIO/MDIO mode
with 19 bit and write operation is set into 18 bit which is mode selection,
not a write operation. To correlate write with read we set it into 20 bit.
All those bit operations are MSB 0 based.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ibm/emac: add 8192 rx/tx fifo size
Ivan Mikhaylov [Wed, 24 Jan 2018 12:53:24 +0000 (15:53 +0300)]
net/ibm/emac: add 8192 rx/tx fifo size

emac4syn chips has availability to use 8192 rx/tx fifo buffer sizes,
in current state if we set it up in dts 8192 as example, we will get
only 2048 which may impact on network speed.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovhost: do not try to access device IOTLB when not initialized
Jason Wang [Tue, 23 Jan 2018 09:27:26 +0000 (17:27 +0800)]
vhost: do not try to access device IOTLB when not initialized

The code will try to access dev->iotlb when processing
VHOST_IOTLB_INVALIDATE even if it was not initialized which may lead
to NULL pointer dereference. Fixes this by check dev->iotlb before.

Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovhost: use mutex_lock_nested() in vhost_dev_lock_vqs()
Jason Wang [Tue, 23 Jan 2018 09:27:25 +0000 (17:27 +0800)]
vhost: use mutex_lock_nested() in vhost_dev_lock_vqs()

We used to call mutex_lock() in vhost_dev_lock_vqs() which tries to
hold mutexes of all virtqueues. This may confuse lockdep to report a
possible deadlock because of trying to hold locks belong to same
class. Switch to use mutex_lock_nested() to avoid false positive.

Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
Reported-by: syzbot+dbb7c1161485e61b0241@syzkaller.appspotmail.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: flower: check if TC offload is enabled on a netdev
Jakub Kicinski [Tue, 23 Jan 2018 08:08:40 +0000 (00:08 -0800)]
i40e: flower: check if TC offload is enabled on a netdev

Since TC block changes drivers are required to check if
the TC hw offload flag is set on the interface themselves.

Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower")
Fixes: 44ae12a768b7 ("net: sched: move the can_offload check from binding phase to rule insertion phase")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64
Corentin Labbe [Tue, 23 Jan 2018 14:33:14 +0000 (14:33 +0000)]
sparc64: fix typo in CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64

This patch fixes the typo CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64

Fixes: 81658ad0d923 ("sparc64: Add CAMELLIA driver making use of the new camellia opcodes.")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'qed-rdma-bug-fixes'
David S. Miller [Wed, 24 Jan 2018 21:44:21 +0000 (16:44 -0500)]
Merge branch 'qed-rdma-bug-fixes'

Michal Kalderon says:

====================
qed: rdma bug fixes

This patch contains two small bug fixes related to RDMA.
Both related to resource reservations.
====================

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Free reserved MR tid
Michal Kalderon [Tue, 23 Jan 2018 09:33:47 +0000 (11:33 +0200)]
qed: Free reserved MR tid

A tid was allocated for reserved MR during initialization but
not freed. This lead to an annoying output message during
rdma unload flow.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Remove reserveration of dpi for kernel
Michal Kalderon [Tue, 23 Jan 2018 09:33:46 +0000 (11:33 +0200)]
qed: Remove reserveration of dpi for kernel

Double reservation for kernel dedicated dpi was performed.
Once in the core module and once in qedr.
Remove the reservation from core.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'kcm-fix-two-syzcaller-issues'
David S. Miller [Wed, 24 Jan 2018 20:54:31 +0000 (15:54 -0500)]
Merge branch 'kcm-fix-two-syzcaller-issues'

Tom Herbert says:

====================
kcm: fix two syzcaller issues

In this patch set:

- Don't allow attaching non-TCP or listener sockets to a KCM mux.
- In kcm_attach Check if sk_user_data is already set. This is
  under lock to avoid race conditions. More work is need to make
  all of the users of sk_user_data to use the same locking.

- v2
  Remove unncessary check for not PF_KCM in kcm_attach (suggested by
  Guillaume Nault)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agokcm: Check if sk_user_data already set in kcm_attach
Tom Herbert [Wed, 24 Jan 2018 20:35:41 +0000 (12:35 -0800)]
kcm: Check if sk_user_data already set in kcm_attach

This is needed to prevent sk_user_data being overwritten.
The check is done under the callback lock. This should prevent
a socket from being attached twice to a KCM mux. It also prevents
a socket from being attached for other use cases of sk_user_data
as long as the other cases set sk_user_data under the lock.
Followup work is needed to unify all the use cases of sk_user_data
to use the same locking.

Reported-by: syzbot+114b15f2be420a8886c3@syzkaller.appspotmail.com
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Tom Herbert <tom@quantonium.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agokcm: Only allow TCP sockets to be attached to a KCM mux
Tom Herbert [Wed, 24 Jan 2018 20:35:40 +0000 (12:35 -0800)]
kcm: Only allow TCP sockets to be attached to a KCM mux

TCP sockets for IPv4 and IPv6 that are not listeners or in closed
stated are allowed to be attached to a KCM mux.

Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+8865eaff7f9acd593945@syzkaller.appspotmail.com
Signed-off-by: Tom Herbert <tom@quantonium.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMAINTAINERS: update email address for James Morris
James Morris [Wed, 24 Jan 2018 20:53:57 +0000 (07:53 +1100)]
MAINTAINERS: update email address for James Morris

Update my email address.

Signed-off-by: James Morris <jmorris@namei.org>
6 years agonet: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr
Wolfgang Bumiller [Thu, 18 Jan 2018 10:32:36 +0000 (11:32 +0100)]
net: sched: fix TCF_LAYER_LINK case in tcf_get_base_ptr

TCF_LAYER_LINK and TCF_LAYER_NETWORK returned the same pointer as
skb->data points to the network header.
Use skb_mac_header instead.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: em_nbyte: don't add the data offset twice
Wolfgang Bumiller [Thu, 18 Jan 2018 10:32:35 +0000 (11:32 +0100)]
net: sched: em_nbyte: don't add the data offset twice

'ptr' is shifted by the offset and then validated,
the memcmp should not add it a second time.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 24 Jan 2018 18:08:16 +0000 (10:08 -0800)]
Merge tag 'trace-v4.15-rc9' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "With the new ORC unwinder, ftrace stack tracing became disfunctional.

  One was that ORC didn't know how to handle the ftrace callbacks in
  general (which Josh fixed).

  The other was that ORC would just bail if it hit a dynamically
  allocated trampoline. Which means all ftrace stack tracing that
  happens from the function tracer would produce no results (that
  includes killing the max stack size tracer). I added a check to the
  ORC unwinder to see if the trampoline belonged to ftrace, and if it
  did, use the orc entry of the static trampoline that was used to
  create the dynamic one (it would be identical).

  Finally, I noticed that the skip values of the stack tracing were out
  of whack. I went through and fixed them up"

* tag 'trace-v4.15-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Update stack trace skipping for ORC unwinder
  ftrace, orc, x86: Handle ftrace dynamically allocated trampolines
  x86/ftrace: Fix ORC unwinding from ftrace handlers

6 years agoMAINTAINERS: clarify that only verified bugs should be submitted to security@
Willy Tarreau [Thu, 4 Jan 2018 13:31:25 +0000 (14:31 +0100)]
MAINTAINERS: clarify that only verified bugs should be submitted to security@

We're seeing a raise of automated reports from testing tools and reports
about address leaks that are not really exploitable as-is, many of which
do not represent an immediate risk justifying to work in closed places.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoRevert "module: Add retpoline tag to VERMAGIC"
Greg Kroah-Hartman [Wed, 24 Jan 2018 14:28:17 +0000 (15:28 +0100)]
Revert "module: Add retpoline tag to VERMAGIC"

This reverts commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12.

Turns out distros do not want to make retpoline as part of their "ABI",
so this patch should not have been merged.  Sorry Andi, this was my
fault, I suggested it when your original patch was the "correct" way of
doing this instead.

Reported-by: Jiri Kosina <jikos@kernel.org>
Fixes: 6cfb521ac0d5 ("module: Add retpoline tag to VERMAGIC")
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomlxsw: spectrum_router: Don't log an error on missing neighbor
Yuval Mintz [Wed, 24 Jan 2018 09:02:09 +0000 (10:02 +0100)]
mlxsw: spectrum_router: Don't log an error on missing neighbor

Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Wed, 24 Jan 2018 15:32:29 +0000 (10:32 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2018-01-24

1) Only offloads SAs after they are fully initialized.
   Otherwise a NIC may receive packets on a SA we can
   not yet handle in the stack.
   From Yossi Kuperman.

2) Fix negative refcount in case of a failing offload.
   From Aviad Yehezkel.

3) Fix inner IP ptoro version when decapsulating
   from interaddress family tunnels.
   From Yossi Kuperman.

4) Use true or false for boolean variables instead of an
   integer value in xfrm_get_type_offload.
   From Gustavo A. R. Silva.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovmxnet3: repair memory leak
Neil Horman [Mon, 22 Jan 2018 21:06:37 +0000 (16:06 -0500)]
vmxnet3: repair memory leak

with the introduction of commit
b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info
is improperly handled.  While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when  the mtu of a vmxnet3 interface is changed
frequently.

Fix is pretty straight forward, just move the NULL set to after the
free.

Tested by myself with successful results

Applies to net, and should likely be queued for stable, please

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
Ben Hutchings [Mon, 22 Jan 2018 20:06:42 +0000 (20:06 +0000)]
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL

Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agopppoe: take ->needed_headroom of lower device into account on xmit
Guillaume Nault [Mon, 22 Jan 2018 17:06:37 +0000 (18:06 +0100)]
pppoe: take ->needed_headroom of lower device into account on xmit

In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").

But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.

This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.

skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  skb_under_panic net/core/skbuff.c:114 [inline]
  skb_push+0xce/0xf0 net/core/skbuff.c:1714
  ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
  dev_hard_header include/linux/netdevice.h:2723 [inline]
  pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  sock_write_iter+0x31a/0x5d0 net/socket.c:909
  call_write_iter include/linux/fs.h:1775 [inline]
  do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
  do_iter_write+0x154/0x540 fs/read_write.c:932
  vfs_writev+0x18a/0x340 fs/read_write.c:977
  do_writev+0xfc/0x2a0 fs/read_write.c:1012
  SYSC_writev fs/read_write.c:1085 [inline]
  SyS_writev+0x27/0x30 fs/read_write.c:1082
  entry_SYSCALL_64_fastpath+0x29/0xa0

Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.

Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().

Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotracing: Update stack trace skipping for ORC unwinder
Steven Rostedt (VMware) [Tue, 23 Jan 2018 18:25:04 +0000 (13:25 -0500)]
tracing: Update stack trace skipping for ORC unwinder

With the addition of ORC unwinder and FRAME POINTER unwinder, the stack
trace skipping requirements have changed.

I went through the tracing stack trace dumps with ORC and with frame
pointers and recalculated the proper values.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
6 years agoftrace, orc, x86: Handle ftrace dynamically allocated trampolines
Steven Rostedt (VMware) [Tue, 23 Jan 2018 03:32:51 +0000 (22:32 -0500)]
ftrace, orc, x86: Handle ftrace dynamically allocated trampolines

The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
6 years agoMerge tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Tue, 23 Jan 2018 20:45:40 +0000 (12:45 -0800)]
Merge tag 'pci-v4.15-fixes-3' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "Fix AMD regression due to not re-enabling the big window on resume
  (Christian König)"

* tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Enable AMD 64-bit window on resume

6 years agox86/ftrace: Fix ORC unwinding from ftrace handlers
Josh Poimboeuf [Tue, 23 Jan 2018 04:07:46 +0000 (22:07 -0600)]
x86/ftrace: Fix ORC unwinding from ftrace handlers

Steven Rostedt discovered that the ftrace stack tracer is broken when
it's used with the ORC unwinder.  The problem is that objtool is
instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't
generate any ORC data for it.

Fix it by making the asm code objtool-friendly:

- Objtool doesn't like the fact that save_mcount_regs pushes RBP at the
  beginning, but it's never restored (directly, at least).  So just skip
  the original RBP push, which is only needed for frame pointers anyway.

- Annotate some functions as normal callable functions with
  ENTRY/ENDPROC.

- Add an empty unwind hint to return_to_handler().  The return address
  isn't on the stack, so there's nothing ORC can do there.  It will just
  punt in the unlikely case it tries to unwind from that code.

With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile
annotation so objtool can read the file.

Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@treble
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 23 Jan 2018 16:52:55 +0000 (08:52 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix divide by zero in mlx5, from Talut Batheesh.

 2) Guard against invalid GSO packets coming from untrusted guests and
    arriving in qdisc_pkt_len_init(), from Eric Dumazet.

 3) Similarly add such protection to the various protocol GSO handlers.
    From Willem de Bruijn.

 4) Fix regression added to IGMP source address checking for IGMPv3
    reports, from Felix Feitkau.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  tls: Correct length of scatterlist in tls_sw_sendpage
  be2net: restore properly promisc mode after queues reconfiguration
  net: igmp: fix source address check for IGMPv3 reports
  gso: validate gso_type in GSO handlers
  net: qdisc_pkt_len_init() should be more robust
  ibmvnic: Allocate and request vpd in init_resources
  ibmvnic: Revert to previous mtu when unsupported value requested
  ibmvnic: Modify buffer size and number of queues on failover
  rds: tcp: compute m_ack_seq as offset from ->write_seq
  usbnet: silence an unnecessary warning
  cxgb4: fix endianness for vlan value in cxgb4_tc_flower
  cxgb4: set filter type to 1 for ETH_P_IPV6
  net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

6 years agoxfrm: fix boolean assignment in xfrm_get_type_offload
Gustavo A. R. Silva [Mon, 22 Jan 2018 22:34:09 +0000 (16:34 -0600)]
xfrm: fix boolean assignment in xfrm_get_type_offload

Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Fixes: ffdb5211da1c ("xfrm: Auto-load xfrm offload modules")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6 years agoxfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version
Yossi Kuperman [Mon, 22 Jan 2018 22:16:21 +0000 (00:16 +0200)]
xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version

IPSec tunnel mode supports encapsulation of IPv4 over IPv6 and vice-versa.

The outer IP header is stripped and the inner IP inherits the original
Ethernet header. Tcpdump fails to properly decode the inner packet in
case that h_proto is different than the inner IP version.

Fix h_proto to reflect the inner IP version.

Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6 years agonfsd: auth: Fix gid sorting when rootsquash enabled
Ben Hutchings [Mon, 22 Jan 2018 20:11:06 +0000 (20:11 +0000)]
nfsd: auth: Fix gid sorting when rootsquash enabled

Commit bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility
group_info allocators") appears to break nfsd rootsquash in a pretty
major way.

It adds a call to groups_sort() inside the loop that copies/squashes
gids, which means the valid gids are sorted along with the following
garbage.  The net result is that the highest numbered valid gids are
replaced with any lower-valued garbage gids, possibly including 0.

We should sort only once, after filling in all the gids.

Fixes: bdcf0a423ea1 ("kernel: make groups_sort calling a responsibility ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoorangefs: initialize op on loop restart in orangefs_devreq_read
Martin Brandenburg [Mon, 22 Jan 2018 20:44:52 +0000 (15:44 -0500)]
orangefs: initialize op on loop restart in orangefs_devreq_read

In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops.  If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN.  If the op has been given up on, the loop
is restarted via a goto.  The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.

This is triggered by interrupting a process while the op is being copied
to the client-core.  It's a fairly small window, but it's there.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoorangefs: use list_for_each_entry_safe in purge_waiting_ops
Martin Brandenburg [Mon, 22 Jan 2018 20:44:51 +0000 (15:44 -0500)]
orangefs: use list_for_each_entry_safe in purge_waiting_ops

set_op_state_purged can delete the op.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agotls: Correct length of scatterlist in tls_sw_sendpage
Dave Watson [Fri, 19 Jan 2018 20:30:13 +0000 (12:30 -0800)]
tls: Correct length of scatterlist in tls_sw_sendpage

The scatterlist is reused by both sendmsg and sendfile.
If a sendmsg of smaller number of pages is followed by a sendfile
of larger number of pages, the scatterlist may be too short, resulting
in a crash in gcm_encrypt.

Add sg_unmark_end to make the list the correct length.

tls_sw_sendmsg already calls sg_unmark_end correctly when it allocates
memory in alloc_sg, or in zerocopy_from_iter.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobe2net: restore properly promisc mode after queues reconfiguration
Ivan Vecera [Fri, 19 Jan 2018 19:23:50 +0000 (20:23 +0100)]
be2net: restore properly promisc mode after queues reconfiguration

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: igmp: fix source address check for IGMPv3 reports
Felix Fietkau [Fri, 19 Jan 2018 10:50:46 +0000 (11:50 +0100)]
net: igmp: fix source address check for IGMPv3 reports

Commit "net: igmp: Use correct source address on IGMPv3 reports"
introduced a check to validate the source address of locally generated
IGMPv3 packets.
Instead of checking the local interface address directly, it uses
inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
local subnet (or equal to the point-to-point address if used).

This breaks for point-to-point interfaces, so check against
ifa->ifa_local directly.

Cc: Kevin Cernekee <cernekee@chromium.org>
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agogso: validate gso_type in GSO handlers
Willem de Bruijn [Fri, 19 Jan 2018 14:29:18 +0000 (09:29 -0500)]
gso: validate gso_type in GSO handlers

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com>
Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qdisc_pkt_len_init() should be more robust
Eric Dumazet [Fri, 19 Jan 2018 03:59:19 +0000 (19:59 -0800)]
net: qdisc_pkt_len_init() should be more robust

Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.

tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.

Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")

We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ibmvnic-reset-behavior-fixes'
David S. Miller [Mon, 22 Jan 2018 20:46:56 +0000 (15:46 -0500)]
Merge branch 'ibmvnic-reset-behavior-fixes'

John Allen says:

====================
ibmvnic: Reset behavior fixes

This patchset fixes a number of issues related to ibmvnic reset uncovered
from testing new Power9 machines with Everglades adapters and the new
functionality to change mtu and other parameters in the driver.

Changes since v1:
-In patch 1/3, added the line to free the long term buffers before
allocating a new one. This change inadvertently uncovered the problem
that the number of queues can change after a failover as well. To fix
this, we check whether or not the number of queues has changed in
do_reset and if they have, we do a full release and init of the queues.
-In patch 1/3, added variables to the adapter struct to track how
many rx/tx pools have actually been allocated and modify the release
pools routines to use these values rather than the possibly incorrect
req_rx/tx_queues values.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Allocate and request vpd in init_resources
John Allen [Thu, 18 Jan 2018 22:27:58 +0000 (16:27 -0600)]
ibmvnic: Allocate and request vpd in init_resources

In reset events in which our memory allocations need to be reallocated,
VPD data is being freed, but never reallocated. This can cause issues if
we later attempt to access that memory or reset and attempt to free the
memory. This patch moves the allocation of the VPD data to init_resources
so that it will be symmetrically freed during release resources.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Revert to previous mtu when unsupported value requested
John Allen [Thu, 18 Jan 2018 22:27:12 +0000 (16:27 -0600)]
ibmvnic: Revert to previous mtu when unsupported value requested

If we request an unsupported mtu value, the vnic server will suggest a
different value. Currently we take the suggested value without question
and login with that value. However, the behavior doesn't seem completely
sane as attempting to change the mtu to some specific value will change
the mtu to some completely different value most of the time. This patch
fixes the issue by logging in with the previously used mtu value and
printing an error message saying that the given mtu is unsupported.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Modify buffer size and number of queues on failover
John Allen [Thu, 18 Jan 2018 22:26:31 +0000 (16:26 -0600)]
ibmvnic: Modify buffer size and number of queues on failover

Using newer backing devices can cause the required padding at the end of
buffer as well as the number of queues to change after a failover.
Since we currently assume that these values never change, after a
failover to a backing device with different capabilities, we can get
errors from the vnic server, attempt to free long term buffers that are
no longer there, or not free long term buffers that should be freed.

This patch resolves the issue by checking whether any of these values
change, and if so perform the necessary re-allocations.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: tcp: compute m_ack_seq as offset from ->write_seq
Sowmini Varadhan [Thu, 18 Jan 2018 21:11:07 +0000 (13:11 -0800)]
rds: tcp: compute m_ack_seq as offset from ->write_seq

rds-tcp uses m_ack_seq to track the tcp ack# that indicates
that the peer has received a rds_message. The m_ack_seq is
used in rds_tcp_is_acked() to figure out when it is safe to
drop the rds_message from the RDS retransmit queue.

The m_ack_seq must be calculated as an offset from the right
edge of the in-flight tcp buffer, i.e., it should be based on
the ->write_seq, not the ->snd_nxt.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agousbnet: silence an unnecessary warning
Oliver Neukum [Wed, 17 Jan 2018 15:33:54 +0000 (16:33 +0100)]
usbnet: silence an unnecessary warning

That a kevent could not be scheduled is not an error.
Such handlers must be able to deal with multiple events anyway.
As the successful scheduling of a work is a debug event, make
the failure debug priority, too.

V2: coding style

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: Cristian Caravena <caravena@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'cxgb4-tc-flower-offload-fixes'
David S. Miller [Mon, 22 Jan 2018 20:26:57 +0000 (15:26 -0500)]
Merge branch 'cxgb4-tc-flower-offload-fixes'

Daniel Borkmann says:

====================
pull-request: bpf 2018-01-18

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a divide by zero due to wrong if (src_reg == 0) check in
   64-bit mode. Properly handle this in interpreter and mask it
   also generically in verifier to guard against similar checks
   in JITs, from Eric and Alexei.

2) Fix a bug in arm64 JIT when tail calls are involved and progs
   have different stack sizes, from Daniel.

3) Reject stores into BPF context that are not expected BPF_STX |
   BPF_MEM variant, from Daniel.

4) Mark dst reg as unknown on {s,u}bounds adjustments when the
   src reg has derived bounds from dead branches, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: fix endianness for vlan value in cxgb4_tc_flower
Kumar Sanghvi [Wed, 17 Jan 2018 06:43:34 +0000 (12:13 +0530)]
cxgb4: fix endianness for vlan value in cxgb4_tc_flower

Don't change endianness when assigning vlan value in cxgb4_tc_flower
code when processing flow match parameters. The value gets converted
to network order as part of filtering code in set_filter_wr.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agocxgb4: set filter type to 1 for ETH_P_IPV6
Kumar Sanghvi [Wed, 17 Jan 2018 06:43:33 +0000 (12:13 +0530)]
cxgb4: set filter type to 1 for ETH_P_IPV6

For ethtype_key = ETH_P_IPV6, set filter type as 1 in cxgb4_tc_flower
code when processing flow match parameters.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomm, page_vma_mapped: Introduce pfn_in_hpage()
Kirill A. Shutemov [Mon, 22 Jan 2018 09:22:30 +0000 (12:22 +0300)]
mm, page_vma_mapped: Introduce pfn_in_hpage()

The new helper would check if the pfn belongs to the page. For huge
pages it checks if the PFN is within range covered by the huge page.

The helper is used in check_pte(). The original code the helper replaces
had two call to page_to_pfn(). page_to_pfn() is relatively costly.

Although current GCC is able to optimize code to have one call, it's
better to do this explicitly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()
Kirill A. Shutemov [Fri, 19 Jan 2018 12:49:24 +0000 (15:49 +0300)]
mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte()

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As ->pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on
flatmem or vmemmap sparsemem. Pointer arithmetic just works against all
'struct page' pointers. But with classic sparsemem, it doesn't because
each section memap is allocated separately and so consecutive pfns
crossing two sections might have struct pages at completely unrelated
addresses.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agonet/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
Talat Batheesh [Sun, 21 Jan 2018 03:30:42 +0000 (05:30 +0200)]
net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

Helmut reported a bug about division by zero while
running traffic and doing physical cable pull test.

When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is division by zero.

This patch prevent this division for both ppms and epms.

Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoLinux 4.15-rc9 v4.15-rc9
Linus Torvalds [Sun, 21 Jan 2018 21:51:26 +0000 (13:51 -0800)]
Linux 4.15-rc9

6 years agoMerge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Jan 2018 18:48:35 +0000 (10:48 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 pti fixes from Thomas Gleixner:
 "A small set of fixes for the meltdown/spectre mitigations:

   - Make kprobes aware of retpolines to prevent probes in the retpoline
     thunks.

   - Make the machine check exception speculation protected. MCE used to
     issue an indirect call directly from the ASM entry code. Convert
     that to a direct call into a C-function and issue the indirect call
     from there so the compiler can add the retpoline protection,

   - Make the vmexit_fill_RSB() assembly less stupid

   - Fix a typo in the PTI documentation"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
  x86/pti: Document fix wrong index
  kprobes/x86: Disable optimizing on the function jumps to indirect thunk
  kprobes/x86: Blacklist indirect thunk functions for kprobes
  retpoline: Introduce start/end markers of indirect thunk
  x86/mce: Make machine check speculation protected

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Jan 2018 18:41:48 +0000 (10:41 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 kexec fix from Thomas Gleixner:
 "A single fix for the WBINVD issue introduced by the SME support which
  causes kexec fails on non AMD/SME capable CPUs. Issue WBINVD only when
  the CPU has SME and avoid doing so in a loop"

[ Side note: this patch fixes the problem, but it isn't entirely clear
  why it is required. The wbinvd should just work regardless, but there
  seems to be some system - as opposed to CPU - issue, since the wbinvd
  causes more problems later in the shutdown sequence, but wbinvd
  instructions while the system is still active are not problematic.

  Possibly some SMI or pending machine check issue on the affected system ]

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()

6 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Jan 2018 18:39:58 +0000 (10:39 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single fix for the new matrix allocator to prevent vector exhaustion
  by certain network drivers which allocate gazillions of unused vectors
  which cannot be put into reservation mode due to MSI and the lack of
  MSI entry masking.

  The fix/workaround is to spread the vectors across CPUs by searching
  the supplied target CPU mask for the CPU with the smallest number of
  allocated vectors"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq/matrix: Spread interrupts on allocation

6 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88...
Linus Torvalds [Sun, 21 Jan 2018 04:12:47 +0000 (20:12 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha

Pull alpha fixes from Matt Turner:
 "A build fix and a regression fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha/PCI: Fix noname IRQ level detection
  alpha: extend memset16 to EV6 optimised routines

6 years agox86: Use __nostackprotect for sme_encrypt_kernel
Laura Abbott [Sun, 21 Jan 2018 01:14:02 +0000 (17:14 -0800)]
x86: Use __nostackprotect for sme_encrypt_kernel

Commit bacf6b499e11 ("x86/mm: Use a struct to reduce parameters for SME
PGD mapping") moved some parameters into a structure.

The structure was large enough to trigger the stack protection canary in
sme_encrypt_kernel which doesn't work this early, causing reboots.

Mark sme_encrypt_kernel appropriately to not use the canary.

Fixes: bacf6b499e11 ("x86/mm: Use a struct to reduce parameters for SME PGD mapping")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>