sfrench/cifs-2.6.git
5 years agoMerge tag 'mlx5e-updates-2018-12-14' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Sat, 15 Dec 2018 21:29:56 +0000 (13:29 -0800)]
Merge tag 'mlx5e-updates-2018-12-14' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-14 (VF Lag)

From Aviv Heller,

Subsequent patches introduce VF LAG, which provdies load-balancing and
high-availability capabilities for VFs associated with different
physical ports of the same Connect-X card.

This series consists of the following:
 - mlx5 devcom, driver infrastructure that facilitates operations that involve
   both core devices (physical functions) of the same card, to synchronize and
   communicate between two driver instances of the same card.
 - Infrastructure for TC rule duplication.
 - Changes to LAG logic to enable its use when SR-IOV is enabled
 - PFs in switchdev mode is the only mode currently supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-mitigate-retpoline-overhead'
David S. Miller [Sat, 15 Dec 2018 21:23:03 +0000 (13:23 -0800)]
Merge branch 'net-mitigate-retpoline-overhead'

Paolo Abeni says:

====================
net: mitigate retpoline overhead

The spectre v2 counter-measures, aka retpolines, are a source of measurable
overhead[1]. We can partially address that when the function pointer refers to
a builtin symbol resorting to a list of tests vs well-known builtin function and
direct calls.

Experimental results show that replacing a single indirect call via
retpoline with several branches and a direct call gives performance gains
even when multiple branches are added - 5 or more, as reported in [2].

This may lead to some uglification around the indirect calls. In netconf 2018
Eric Dumazet described a technique to hide the most relevant part of the needed
boilerplate with some macro help.

This series is a [re-]implementation of such idea, exposing the introduced
helpers in a new header file. They are later leveraged to avoid the indirect
call overhead in the GRO path, when possible.

Overall this gives > 10% performance improvement for UDP GRO benchmark and
smaller but measurable for TCP syn flood.

The added infra can be used in follow-up patches to cope with retpoline overhead
in other points of the networking stack (e.g. at the qdisc layer) and possibly
even in other subsystems.

v2  -> v3:
 - fix build error with CONFIG_IPV6=m

v1  -> v2:
 - list explicitly the builtin function names in INDIRECT_CALL_*(),
   as suggested by Ed Cree
 - expand the recipients list

rfc -> v1:
 - use branch prediction hints, as suggested by Eric

[1] http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf
[2] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoudp: use indirect call wrappers for GRO socket lookup
Paolo Abeni [Fri, 14 Dec 2018 10:52:00 +0000 (11:52 +0100)]
udp: use indirect call wrappers for GRO socket lookup

This avoids another indirect call for UDP GRO. Again, the test
for the IPv6 variant is performed first.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: use indirect call wrappers at GRO transport layer
Paolo Abeni [Fri, 14 Dec 2018 10:51:59 +0000 (11:51 +0100)]
net: use indirect call wrappers at GRO transport layer

This avoids an indirect call in the receive path for TCP and UDP
packets. TCP takes precedence on UDP, so that we have a single
additional conditional in the common case.

When IPV6 is build as module, all gro symbols except UDPv6 are
builtin, while the latter belong to the ipv6 module, so we
need some special care.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes
v2 -> v3:
 - fix build issue with CONFIG_IPV6=m

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: use indirect call wrappers at GRO network layer
Paolo Abeni [Fri, 14 Dec 2018 10:51:58 +0000 (11:51 +0100)]
net: use indirect call wrappers at GRO network layer

This avoids an indirect calls for L3 GRO receive path, both
for ipv4 and ipv6, if the latter is not compiled as a module.

Note that when IPv6 is compiled as builtin, it will be checked first,
so we have a single additional compare for the more common path.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoindirect call wrappers: helpers to speed-up indirect calls of builtin
Paolo Abeni [Fri, 14 Dec 2018 10:51:57 +0000 (11:51 +0100)]
indirect call wrappers: helpers to speed-up indirect calls of builtin

This header define a bunch of helpers that allow avoiding the
retpoline overhead when calling builtin functions via function pointers.
It boils down to explicitly comparing the function pointers to
known builtin functions and eventually invoke directly the latter.

The macros defined here implement the boilerplate for the above schema
and will be used by the next patches.

rfc -> v1:
 - use branch prediction hint, as suggested by Eric
v1  -> v2:
 - list explicitly the builtin function names in INDIRECT_CALL_*(),
   as suggested by Ed Cree

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: remove mmio reads on Tx
Ilias Apalodimas [Fri, 14 Dec 2018 08:59:01 +0000 (10:59 +0200)]
net: socionext: remove mmio reads on Tx

Currently the driver issues 2 mmio reads to figure out the number of
transmitted packets and clean them. We can get rid of the expensive
reads since BIT 31 of the Tx descriptor can be used for that.
We can also remove the budget counting of Tx completions since all of
the descriptors are not deliberately processed.

Performance numbers using pktgen are:
size  pre-patch(pps)  post-patch(pps)
64       362483           427916
128      358315           411686
256      352725           389683
512      215675           216464
1024     113812           114442

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socionext: correctly recover txq after being full
Ilias Apalodimas [Fri, 14 Dec 2018 08:59:00 +0000 (10:59 +0200)]
net: socionext: correctly recover txq after being full

Running pktgen with packets sizes > 512b ends up in the interface Txq
getting stuck.
"netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!"
appears on dmesg but the interface never recovers. It requires an
ifconfig down/up to make the interface usable again.

The reason that triggers this, is a race condition between
.ndo_start_xmit and the napi completion. The available budget is
calculated first and indicates the queue is full. Due to a costly
netif_err() the queue is not stopped in time while the napi completion
runs, clears the irq and frees up descriptors, thus the queue never wakes
up again.

Fix this by moving the print after stopping the queue, make the print
ratelimited, add barriers and check for cleaned descriptors..

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: ravb: Add support for r8a774c0 SoC
Fabrizio Castro [Thu, 13 Dec 2018 20:18:34 +0000 (20:18 +0000)]
dt-bindings: net: ravb: Add support for r8a774c0 SoC

Document RZ/G2E (R8A774C0) SoC bindings.

Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Improve neighbour struct layout
David Ahern [Thu, 13 Dec 2018 16:16:50 +0000 (08:16 -0800)]
neighbor: Improve neighbour struct layout

Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes.
Ensure ha element is always 8-byte aligned.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: simplify the qdisc_leaf code
Tonghao Zhang [Thu, 13 Dec 2018 08:43:23 +0000 (00:43 -0800)]
net: sched: simplify the qdisc_leaf code

Except for returning, the var leaf is not
used in the qdisc_leaf(). For simplicity, remove it.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Fix handling of LLA with VRF and sockets bound to VRF
David Ahern [Wed, 12 Dec 2018 23:27:38 +0000 (15:27 -0800)]
ipv6: Fix handling of LLA with VRF and sockets bound to VRF

A recent commit allows sockets bound to a VRF to receive ipv6 link local
packets. However, it only works for UDP and worse TCP connection attempts
to the LLA with the only listener bound to the VRF just hang where as
before the client gets a reset and connection refused. Fix by adjusting
ir_iif for LL addresses and packets received through a device enslaved
to a VRF.

Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cc: Mike Manning <mmanning@vyatta.att-mail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve spurious interrupt detection
Heiner Kallweit [Sat, 15 Dec 2018 15:25:05 +0000 (16:25 +0100)]
r8169: improve spurious interrupt detection

Improve detection of spurious interrupts by checking against the
interrupt mask as currently set in the chip.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()
Yangtao Li [Sat, 15 Dec 2018 07:59:30 +0000 (02:59 -0500)]
cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()

We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define
such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the
DEFINE_SHOW_ATTRIBUTE macro to simplify some code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipconfig: convert to DEFINE_SHOW_ATTRIBUTE
Yangtao Li [Sat, 15 Dec 2018 07:19:53 +0000 (02:19 -0500)]
ipconfig: convert to DEFINE_SHOW_ATTRIBUTE

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'
David S. Miller [Sat, 15 Dec 2018 18:54:18 +0000 (10:54 -0800)]
Merge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'

Salil Mehta says:

====================
net: hns3: Add more commands to Debugfs in HNS3 driver

This patch-set adds few more debugfs commands to HNS3 Ethernet
Driver. Support has been added to query info related to below
items:
1. Packet buffer descriptor ("echo bd info [queue no] [bd index] > cmd")
2. Manager table("echo dump mng tbl > cmd")
3. Dfx status register("echo dump reg ssu [prt id] > cmd")
4. Dcb status register("echo dump reg dcb [port id] > cmd")
5. Queue map ("echo queue map [queue no] > cmd")
6. Tm map ("echo tm map [queue no] > cmd")

NOTE: Above commands are *read-only* and are only intended to
query the information from the SoC(and dump inside the kernel,
for now) and in no way tries to perform write operations for
the purpose of configuration etc.

Change Log:
V1-->V2:
1. Addressed the GCC-8.2 compiler issue reported by David S. Miller.
Link: https://lkml.org/lkml/2018/12/14/1298
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "tm map" status information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:58 +0000 (15:31 +0000)]
net: hns3: Add "tm map" status information query function

This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump tm map 100 > cmd
queue_id | qset_id | pri_id | tc_id
0100     | 0065    | 08     | 00
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "queue map" information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:57 +0000 (15:31 +0000)]
net: hns3: Add "queue map" information query function

This patch prints queue map information.

debugfs command:
echo dump queue map > cmd

Sample Command:
root@(none)# echo queue map > cmd
 local queue id | global queue id | vector id
          0              32             769
          1              33             770
          2              34             771
          3              35             772
          4              36             773
          5              37             774
          6              38             775
          7              39             776
          8              40             777
          9              41             778
         10              42             779
         11              43             780
         12              44             781
         13              45             782
         14              46             783
         15              47             784
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "dcb register" status information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:56 +0000 (15:31 +0000)]
net: hns3: Add "dcb register" status information query function

This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump reg dcb > cmd
 roce_qset_mask: 0x0
 nic_qs_mask: 0x0
 qs_shaping_pass: 0x0
 qs_bp_sts: 0x0
 pri_mask: 0x0
 pri_cshaping_pass: 0x0
 pri_pshaping_pass: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "status register" information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:55 +0000 (15:31 +0000)]
net: hns3: Add "status register" information query function

This patch prints status register information by module.

debugfs command:
echo dump reg [mode name] > cmd

Sample Command:
root@(none)# echo dump reg bios common > cmd
 BP_CPU_STATE: 0x0
 DFX_MSIX_INFO_NIC_0: 0xc000
 DFX_MSIX_INFO_NIC_1: 0xf
 DFX_MSIX_INFO_NIC_2: 0x2
 DFX_MSIX_INFO_NIC_3: 0x2
 DFX_MSIX_INFO_ROC_0: 0xc000
 DFX_MSIX_INFO_ROC_1: 0x0
 DFX_MSIX_INFO_ROC_2: 0x0
 DFX_MSIX_INFO_ROC_3: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "manager table" information query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:54 +0000 (15:31 +0000)]
net: hns3: Add "manager table" information query function

This patch prints manager table information.

debugfs command:
echo dump mng tbl > cmd

Sample Command:
root@(none)# echo dump mng tbl > cmd
 entry|mac_addr         |mask|ether|mask|vlan|mask|i_map|i_dir|e_type
 00   |01:00:5e:00:00:01|0   |00000|0   |0000|0   |00   |00   |0
 01   |c2:f1:c5:82:68:17|0   |00000|0   |0000|0   |00   |00   |0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add "bd info" query function
liuzhongzhu [Sat, 15 Dec 2018 15:31:53 +0000 (15:31 +0000)]
net: hns3: Add "bd info" query function

This patch prints Sending and receiving
package descriptor information.

debugfs command:
echo dump bd info 1 > cmd

Sample Command:
root@(none)# echo bd info 1 > cmd
hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0
hns3 0000:7d:00.0: (TX) addr: 0x0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)send_size: 0
hns3 0000:7d:00.0: (TX)vlan_tso: 0
hns3 0000:7d:00.0: (TX)l2_len: 0
hns3 0000:7d:00.0: (TX)l3_len: 0
hns3 0000:7d:00.0: (TX)l4_len: 0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)tv: 0
hns3 0000:7d:00.0: (TX)vlan_msec: 0
hns3 0000:7d:00.0: (TX)ol2_len: 0
hns3 0000:7d:00.0: (TX)ol3_len: 0
hns3 0000:7d:00.0: (TX)ol4_len: 0
hns3 0000:7d:00.0: (TX)paylen: 0
hns3 0000:7d:00.0: (TX)vld_ra_ri: 0
hns3 0000:7d:00.0: (TX)mss: 0
hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120
hns3 0000:7d:00.0: (RX)addr: 0xffee7000
hns3 0000:7d:00.0: (RX)pkt_len: 0
hns3 0000:7d:00.0: (RX)size: 0
hns3 0000:7d:00.0: (RX)rss_hash: 0
hns3 0000:7d:00.0: (RX)fd_id: 0
hns3 0000:7d:00.0: (RX)vlan_tag: 0
hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0
hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0
hns3 0000:7d:00.0: (RX)bd_base_info: 0

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-prefer-listeners-bound-to-an-address'
David S. Miller [Fri, 14 Dec 2018 23:55:21 +0000 (15:55 -0800)]
Merge branch 'net-prefer-listeners-bound-to-an-address'

Peter Oskolkov says:

====================
net: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patchset eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In a future patchset I plan to explore whether it is possible
to remove port-only hashtables completely: additional refactoring
will be required, as some non-lookup code uses the hashtables.
====================

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: test that listening sockets match on address properly
Peter Oskolkov [Wed, 12 Dec 2018 21:15:37 +0000 (13:15 -0800)]
selftests: net: test that listening sockets match on address properly

This patch adds a selftest that verifies that a socket listening
on a specific address is chosen in preference over sockets
that listen on any address. The test covers UDP/UDP6/TCP/TCP6.

It is based on, and similar to, reuseport_dualstack.c selftest.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tcp6: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:36 +0000 (13:15 -0800)]
net: tcp6: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tcp: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:35 +0000 (13:15 -0800)]
net: tcp: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: udp6: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:34 +0000 (13:15 -0800)]
net: udp6: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: udp: prefer listeners bound to an address
Peter Oskolkov [Wed, 12 Dec 2018 21:15:33 +0000 (13:15 -0800)]
net: udp: prefer listeners bound to an address

A relatively common use case is to have several IPs configured
on a host, and have different listeners for each of them. We would
like to add a "catch all" listener on addr_any, to match incoming
connections not served by any of the listeners bound to a specific
address.

However, port-only lookups can match addr_any sockets when sockets
listening on specific addresses are present if so_reuseport flag
is set. This patch eliminates lookups into port-only hashtable,
as lookups by (addr,port) tuple are easily available.

In addition, compute_score() is tweaked to _not_ match
addr_any sockets to specific addresses, as hash collisions
could result in the unwanted behavior described above.

Tested: the patch compiles; full test in the last patch in this
patchset. Existing reuseport_* selftests also pass.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoadd snmp counters document
yupeng [Wed, 12 Dec 2018 08:14:10 +0000 (00:14 -0800)]
add snmp counters document

Add explainations for some general IP counters, SACK and DSACK related
counters

Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'neighbor-More-gc_list-changes'
David S. Miller [Fri, 14 Dec 2018 23:44:47 +0000 (15:44 -0800)]
Merge branch 'neighbor-More-gc_list-changes'

David Ahern says:

====================
neighbor: More gc_list changes

More gc_list changes and cleanups.

The first 2 patches are bug fixes from the first gc_list change.
Specifically, fix the locking order to be consistent - table lock
followed by neighbor lock, and then entries in the FAILED state
should always be candidates for forced_gc without waiting for any
time span (return to the eviction logic prior to the separate gc_list).

Patch 3 removes 2 now unnecessary arguments to neigh_del.

Patch 4 moves a helper from a header file to core code in preparation
for Patch 5 which removes NTF_EXT_LEARNED entries from the gc_list.
These entries are already exempt from forced_gc; patch 5 removes them
from consideration and makes them on par with PERMANENT entries given
that they are also managed by userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Remove externally learned entries from gc_list
David Ahern [Wed, 12 Dec 2018 01:57:25 +0000 (18:57 -0700)]
neighbor: Remove externally learned entries from gc_list

Externally learned entries are similar to PERMANENT entries in the
sense they are managed by userspace and can not be garbage collected.
As such remove them from the gc_list, remove the flags check from
neigh_forced_gc and skip threshold checks in neigh_alloc. As with
PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED
entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Move neigh_update_ext_learned to core file
David Ahern [Wed, 12 Dec 2018 01:57:24 +0000 (18:57 -0700)]
neighbor: Move neigh_update_ext_learned to core file

neigh_update_ext_learned has one caller in neighbour.c so does not need
to be defined in the header. Move it and in the process remove the
intialization of ndm_flags and just set it based on the flags check.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Remove state and flags arguments to neigh_del
David Ahern [Wed, 12 Dec 2018 01:57:23 +0000 (18:57 -0700)]
neighbor: Remove state and flags arguments to neigh_del

neigh_del now only has 1 caller, and the state and flags arguments
are both 0. Remove them and simplify neigh_del.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Fix state check in neigh_forced_gc
David Ahern [Wed, 12 Dec 2018 01:57:22 +0000 (18:57 -0700)]
neighbor: Fix state check in neigh_forced_gc

PERMANENT entries are not on the gc_list so the state check is now
redundant. Also, the move to not purge entries until after 5 seconds
should not apply to FAILED entries; those can be removed immediately
to make way for newer ones. This restores the previous logic prior to
the gc_list.

Fixes: 58956317c8de ("neighbor: Improve garbage collection")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Fix locking order for gc_list changes
David Ahern [Wed, 12 Dec 2018 01:57:21 +0000 (18:57 -0700)]
neighbor: Fix locking order for gc_list changes

Lock checker noted an inverted lock order between neigh_change_state
(neighbor lock then table lock) and neigh_periodic_work (table lock and
then neighbor lock) resulting in:

[  121.057652] ======================================================
[  121.058740] WARNING: possible circular locking dependency detected
[  121.059861] 4.20.0-rc6+ #43 Not tainted
[  121.060546] ------------------------------------------------------
[  121.061630] kworker/0:2/65 is trying to acquire lock:
[  121.062519] (____ptrval____) (&n->lock){++--}, at: neigh_periodic_work+0x237/0x324
[  121.063894]
[  121.063894] but task is already holding lock:
[  121.064920] (____ptrval____) (&tbl->lock){+.-.}, at: neigh_periodic_work+0x194/0x324
[  121.066274]
[  121.066274] which lock already depends on the new lock.
[  121.066274]
[  121.067693]
[  121.067693] the existing dependency chain (in reverse order) is:
...

Fix by renaming neigh_change_state to neigh_update_gc_list, changing
it to only manage whether an entry should be on the gc_list and taking
locks in the same order as neigh_periodic_work. Invoke at the end of
neigh_update only if diff between old or new states has the PERMANENT
flag set.

Fixes: 8cc196d6ef86 ("neighbor: gc_list changes should be protected by table lock")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: fold tcf_block_cb_call() into tc_setup_cb_call()
Cong Wang [Tue, 11 Dec 2018 19:15:46 +0000 (11:15 -0800)]
net_sched: fold tcf_block_cb_call() into tc_setup_cb_call()

After commit 69bd48404f25 ("net/sched: Remove egdev mechanism"),
tc_setup_cb_call() is nearly identical to tcf_block_cb_call(),
so we can just fold tcf_block_cb_call() into tc_setup_cb_call()
and remove its unused parameter 'exts'.

Fixes: 69bd48404f25 ("net/sched: Remove egdev mechanism")
Cc: Oz Shlomo <ozsh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/ibmvnic: Remove tests of member address
Wen Yang [Tue, 11 Dec 2018 04:20:46 +0000 (12:20 +0800)]
net/ibmvnic: Remove tests of member address

The driver was checking for non-NULL address.
- adapter->napi[i]

This is pointless as these will be always non-NULL, since the
'dapter->napi' is allocated in init_napi().
It is safe to get rid of useless checks for addresses to fix the
coccinelle warning:
>>drivers/net/ethernet/ibm/ibmvnic.c: test of a variable/field address
Since such statements always return true, they are redundant.

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Thomas Falcon <tlfalcon@linux.ibm.com>
CC: John Allen <jallen@linux.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linuxppc-dev@lists.ozlabs.org
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotun: replace get_cpu_ptr with this_cpu_ptr when bh disabled
Prashant Bhole [Tue, 11 Dec 2018 02:43:07 +0000 (11:43 +0900)]
tun: replace get_cpu_ptr with this_cpu_ptr when bh disabled

tun_xdp_one() runs with local bh disabled. So there is no need to
disable preemption by calling get_cpu_ptr while updating stats. This
patch replaces the use of get_cpu_ptr() with this_cpu_ptr() as a
micro-optimization. Also removes related put_cpu_ptr call.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5: Handle LAG FW commands failure gracefully
Aviv Heller [Mon, 27 Aug 2018 12:39:40 +0000 (15:39 +0300)]
net/mlx5: Handle LAG FW commands failure gracefully

When create_lag or destroy_lag FW commands fail, display an appropriate
error message, and try to recover, if possible.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Make RoCE and SR-IOV LAG modes explicit
Aviv Heller [Thu, 23 Aug 2018 10:47:53 +0000 (13:47 +0300)]
net/mlx5: Make RoCE and SR-IOV LAG modes explicit

With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()
Aviv Heller [Thu, 23 Aug 2018 10:34:10 +0000 (13:34 +0300)]
net/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()

The new name better represents the function's aim, and sets a precedent
for a '__' prefix for internal, non-locked versions of LAG APIs.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Allow co-enablement of uplink LAG and SRIOV
Rabie Loulou [Wed, 23 May 2018 11:19:07 +0000 (14:19 +0300)]
net/mlx5: Allow co-enablement of uplink LAG and SRIOV

Enable setting uplink LAG if sriov is enabled on both ports in switchdev
mode.

Once the sriov mode is changed from switchdev for any of the ports, the
LAG instance is disabled.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Allow/disallow LAG according to pre-req only
Rabie Loulou [Wed, 6 Jun 2018 13:31:34 +0000 (16:31 +0300)]
net/mlx5: Allow/disallow LAG according to pre-req only

Remove the lag forbid/allow functions, change the lag prereq check to
run in the do-bond logic, so every change in the prereq state will
cause LAG to be disabled/enabled accordingly after the next do-bond run.

Add lag update function, so every component which changes the prereq
state and want the LAG to re-calc the conditions can call the update
function.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Adjustments for the activate LAG logic to run under sriov
Rabie Loulou [Thu, 26 Apr 2018 13:45:41 +0000 (16:45 +0300)]
net/mlx5: Adjustments for the activate LAG logic to run under sriov

When HW lag is set/unset, roce must not be enabled on the port, as such
we wrap such changes with roce enable/disable either directly or through
re-creation of IB device.

Currently, lag and sriov are mutually exclusive, so by definition this
code doesn't run under sriov.

Towards changing this exclusion, we need to make sure that roce will not
be enabled on the eswitch manager port under sriov since this is
requirement of the switchdev mode.

We are going strict here and avoiding this all together under sriov.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG
Aviv Heller [Tue, 2 Oct 2018 14:18:44 +0000 (17:18 +0300)]
net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG

Under uplink LAG, packets that match a flow might arrive on both uplink
ports and transmitted through both as part of supporting aggregation and
high-availability.

When the netdevs representing the uplinks are set into LAG (bonding,
teaming), duplicate the TC flow offloading into each of the per-uplink
e-switches.

Duplication is not required if the source is the bond device, since in
this case it is assumed that the bond and the uplink netdevs share the
same TC block, and thus duplication will occur naturally by the stack.

Note that under encapsulation scheme, both flows will use the same
neighbour and hence both will contribute to the last-used feedback
towards the stack.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Offload TC e-switch rules with egress LAG device
Rabie Loulou [Wed, 6 Jun 2018 13:27:08 +0000 (16:27 +0300)]
net/mlx5e: Offload TC e-switch rules with egress LAG device

When parsing TC FDB actions, if the egress device is a bond/team
net-device which enslaved the uplink representor of the e-switch,
use the uplink representor as the destination in the HW rule.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: In case of LAG, one switch parent id is used for all representors
Rabie Loulou [Wed, 6 Jun 2018 12:49:27 +0000 (15:49 +0300)]
net/mlx5e: In case of LAG, one switch parent id is used for all representors

When the uplink representors are put into lag, set all the
representors (VFs and uplinks) of the same NIC to return the same
switchdev id.

Currently, the route lookup code on the encapsulation offload path
assumes that same switchdev id for the source and dest devices means
that the dest is also mlx5 HW netdev. This doesn't hold anymore when we
align the switchdev Id of the uplinks to be same, which in turn causes
the bond/team to return that id to the caller. As such, enhance the
relevant check to take into account the uplink lag case.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules
Shahar Klein [Thu, 30 Nov 2017 15:55:46 +0000 (17:55 +0200)]
net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules

Assign a counter dev attribute according to device capability and use
it for management of counters related to offloaded eswitch TC flows.

With upcoming support for uplink LAG, we have two HW rules per one
logical SW (TC) rule. Although the HW supports attaching one counter
to multiple rules, we are allocating counter per HW rule.

We need this separation for two reasons:

1. "flow eswitch" counter affinity HW require the counter to be
allocated on the device where the eswitch rule is set.

2. for some use-cases (multi-path routing) each HW flow relates to
different neighbour, hence our neigh update logic must have a per-rule
HW accountant in order to provide the proper feedback to the kernel.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Infrastructure for duplicated offloading of TC flows
Roi Dayan [Sun, 11 Nov 2018 20:24:03 +0000 (22:24 +0200)]
net/mlx5e: Infrastructure for duplicated offloading of TC flows

Under uplink LAG or multipath schemes, traffic that matches one flow
might arrive on both uplink ports and transmitted through both
as part of supporting aggregation and high-availability.

To cope with the fact that the SW model might use logical SW port
(e.g uplink team or bond) but we have two HW ports with e-switch on
each, there are cases where in order to offload a SW TC rule we
need to duplicate it to two HW flows.

Since each HW rule has its own counter we also aggregate the counter
of both rules when a flow stats query is executed from user-space.

Introduce the changes for the different elements (add/delete/stats),
currently nothing is duplicated.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: E-Switch, Add peer miss rules
Roi Dayan [Sun, 11 Nov 2018 15:50:16 +0000 (17:50 +0200)]
net/mlx5e: E-Switch, Add peer miss rules

In the sriov offloads mode, packets that are not matched by any
other rule are sent towards the e-switch vport manager for further
processing.

Under upcoming patches (e.g for uplink LAG), packets sent from VF
vports belonging to esw0 (e-switch related to PF0) might end up in
esw1 (e-switch related to PF1) due to muxing logic applied by the
FW.

In such a case we still want the missed packet to be sent to the
"base" esw manager vport in order to present the control plane a
consistent view of the source (VF reresentor) port.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Introduce inter-device communication mechanism
Aviv Heller [Tue, 4 Dec 2018 19:24:46 +0000 (21:24 +0200)]
net/mlx5: Introduce inter-device communication mechanism

This introduces devcom, a generic mechanism for performing operations
on both physical functions of the same Connect-X card.

The first user of this API is merged eswitch, which will be introduced
in subsequent patches.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agohamradio, ppp: change semaphore to completion
Arnd Bergmann [Mon, 10 Dec 2018 21:52:56 +0000 (22:52 +0100)]
hamradio, ppp: change semaphore to completion

ppp and hamradio have copies of the same code that uses a semaphore
in place of a completion for historic reasons. Make it use the
proper interface instead in all copies.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohns3: prevent building without CONFIG_INET
Arnd Bergmann [Mon, 10 Dec 2018 20:44:34 +0000 (21:44 +0100)]
hns3: prevent building without CONFIG_INET

We now get a link failure when CONFIG_INET is disabled, since
tcp_gro_complete is unavailable:

drivers/net/ethernet/hisilicon/hns3/hns3_enet.o: In function `hns3_set_gro_param':
hns3_enet.c:(.text+0x230c): undefined reference to `tcp_gro_complete'

Add an explicit CONFIG_INET dependency here to avoid the broken
configuration.

Fixes: a6d53b97a2e7 ("net: hns3: Adds GRO params to SKB for the stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Saeed Mahameed [Fri, 14 Dec 2018 19:15:25 +0000 (11:15 -0800)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux

mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:
1) Lag refactroing and flow counter affinity bits.
2) mlx5 core cleanups

By Roi Dayan (2) and others
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Fold the modify lag code into function
Shahar Klein [Thu, 13 Dec 2018 03:11:41 +0000 (19:11 -0800)]
net/mlx5: Fold the modify lag code into function

Handle the code of modifying the lag affinity within a separate function.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Add lag affinity info to log
Roi Dayan [Thu, 13 Dec 2018 03:11:40 +0000 (19:11 -0800)]
net/mlx5: Add lag affinity info to log

Report the initial LAG port affinity upon LAG creation.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Split the activate lag function into two routines
Roi Dayan [Thu, 13 Dec 2018 03:11:39 +0000 (19:11 -0800)]
net/mlx5: Split the activate lag function into two routines

Split the activate lag function in order to be symmetric with
the deactivate lag function.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Introduce flow counter affinity
Shahar Klein [Thu, 13 Dec 2018 03:11:38 +0000 (19:11 -0800)]
net/mlx5: E-Switch, Introduce flow counter affinity

This dictates the device affinity for eswitch flow counters, set by the FW
according to the HW device capabilities.

Under "source eswitch" affinity, the counter should be allocated on the
device related to the source vport in the match. This covers both non
merged e-switch mode as well as old FW that does not advertise this cap.

Under "flow eswitch" affinity, the counter should be allocated on the
device where the eswitch rule is set.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoIB/mlx5: Unify e-switch representors load approach between uplink and VFs
Mark Bloch [Thu, 13 Dec 2018 03:11:37 +0000 (19:11 -0800)]
IB/mlx5: Unify e-switch representors load approach between uplink and VFs

When in switchdev mode and the add function is called by the core
level driver, make sure we only register the callbacks, but don't
create the mlx5 IB device or initialize anything. With this change
all the IB devices in switchdev mode are created only once the load
callback is invoked by the e-switch core sub-module. This follows
the design paradigm under which the all the Eth representors must
be loaded before any of IB reprs is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Use lowercase 'X' for hex values
Saeed Mahameed [Thu, 13 Dec 2018 03:11:36 +0000 (19:11 -0800)]
net/mlx5: Use lowercase 'X' for hex values

Apparently gcc is cool with upper case '0X' but it is not commonly used.
Replace '0X' with lowercase '0x' in mlx5_ifc.h file.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge branch 'Introduce-NETDEV_PRE_CHANGEADDR'
David S. Miller [Fri, 14 Dec 2018 02:41:39 +0000 (18:41 -0800)]
Merge branch 'Introduce-NETDEV_PRE_CHANGEADDR'

Petr Machata says:

====================
Introduce NETDEV_PRE_CHANGEADDR

Spectrum devices have a limitation that all router interfaces need to
have the same address prefix. In Spectrum-1, the requirement is for the
initial 38 bits of all RIFs to be the same, in Spectrum-2 the limit is
36 bits. Currently violations of this requirement are not diagnosed. At
the same time, if the condition is not upheld, the mismatched MAC
address ends up overwriting the common prefix, and all RIF MAC addresses
silently change to the new prefix.

It is therefore desirable to be able at least to diagnose the issue, and
better to reject attempts to change MAC addresses in ways that is
incompatible with the device.

Currently MAC address changes are notified through emission of
NETDEV_CHANGEADDR, which is done after the change. Extending this
message to allow vetoing is certainly possible, but several other
notification types have instead adopted a simple two-stage approach:
first a "pre" notification is sent to make sure all interested parties
are OK with the change that's about to be done. Then the change is done,
and afterwards a "post" notification is sent.

This dual approach is easier to use: when the change is vetoed, nothing
has changed yet, and it's therefore unnecessary to roll anything back.
Therefore this patchset introduces it for NETDEV_CHANGEADDR as well.

One prominent path to emitting NETDEV_CHANGEADDR is through
dev_set_mac_address(). Therefore in patch #1, give this function an
extack argument, so that a textual reason for rejection (or a warning)
can be communicated back to the user.

In patch #2, add the new notification type. In patch #3, have dev.c emit
the notification for instances of dev_addr change, or addition of an
address to dev_addrs list.

In patches #4 and #5, extend the bridge driver to handle and emit the
new notifier.

In patch #6, change IPVLAN to emit the new notifier.

Likewise for bonding driver in patches #7 and #8. Note that the team
driver doesn't need this treatment, as it goes through
dev_set_mac_address().

In patches #9, #10 and #11 adapt mlxsw to veto MAC addresses on router
interfaces, if they violate the requirement that all RIF MAC addresses
have the same prefix.

Finally in patches #12 and #13, add a test for vetoing of a direct
change of a port device MAC, and indirect change of a bridge MAC.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Test FID RIF MAC vetoing
Petr Machata [Thu, 13 Dec 2018 11:54:56 +0000 (11:54 +0000)]
selftests: mlxsw: Test FID RIF MAC vetoing

When a FID RIF is created for a bridge with IP address, its MAC address
must obey the same requirements as other RIFs. Test that attempts to
change the address incompatibly by attaching a device are vetoed with
extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Test RIF MAC vetoing
Petr Machata [Thu, 13 Dec 2018 11:54:54 +0000 (11:54 +0000)]
selftests: mlxsw: Test RIF MAC vetoing

Test that attempts to change address in a way that violates Spectrum
requirements are vetoed with extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Veto unsupported RIF MAC addresses
Petr Machata [Thu, 13 Dec 2018 11:54:52 +0000 (11:54 +0000)]
mlxsw: spectrum_router: Veto unsupported RIF MAC addresses

On NETDEV_PRE_CHANGEADDR, if the change is related to a RIF interface,
verify that it satisfies the criterion that all RIF interfaces have the
same MAC address prefix, as indicated by mlxsw_sp.mac_mask.

Additionally, besides explicit address changes, check that the address
of an interface for which a RIF is about to be added matches the
required pattern as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Add mlxsw_sp.mac_mask
Petr Machata [Thu, 13 Dec 2018 11:54:50 +0000 (11:54 +0000)]
mlxsw: spectrum: Add mlxsw_sp.mac_mask

The Spectrum hardware demands that all router interfaces in the system
have the same first 38 resp. 36 bits of MAC address: the former limit
holds on Spectrum, the latter on Spectrum-2. Add a field that refers to
the required prefix mask and initialize in mlxsw_sp1_init() and
mlxsw_sp2_init().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Generalize mlxsw_sp_netdevice_router_port_event()
Petr Machata [Thu, 13 Dec 2018 11:54:48 +0000 (11:54 +0000)]
mlxsw: spectrum_router: Generalize mlxsw_sp_netdevice_router_port_event()

Prepare mlxsw_sp_netdevice_router_port_event() for handling of
NETDEV_PRE_CHANGEADDR. Split out the part that deals with the actual
changes and call it for the two events currently handled.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bonding: Issue NETDEV_PRE_CHANGEADDR
Petr Machata [Thu, 13 Dec 2018 11:54:46 +0000 (11:54 +0000)]
net: bonding: Issue NETDEV_PRE_CHANGEADDR

Give interested parties an opportunity to veto an impending HW address
change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bonding: Give bond_set_dev_addr() a return value
Petr Machata [Thu, 13 Dec 2018 11:54:44 +0000 (11:54 +0000)]
net: bonding: Give bond_set_dev_addr() a return value

Before NETDEV_CHANGEADDR, bond driver should emit NETDEV_PRE_CHANGEADDR,
and allow consumers to veto the address change. To propagate further the
return code from NETDEV_PRE_CHANGEADDR, give the function that
implements address change a return value.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipvlan: Issue NETDEV_PRE_CHANGEADDR
Petr Machata [Thu, 13 Dec 2018 11:54:41 +0000 (11:54 +0000)]
net: ipvlan: Issue NETDEV_PRE_CHANGEADDR

A NETDEV_CHANGEADDR event implies a change of address of each of the
IPVLANs of this IPVLAN device. Therefore propagate NETDEV_PRE_CHANGEADDR
to all the IPVLANs.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: Handle NETDEV_PRE_CHANGEADDR from ports
Petr Machata [Thu, 13 Dec 2018 11:54:39 +0000 (11:54 +0000)]
net: bridge: Handle NETDEV_PRE_CHANGEADDR from ports

When a port device seeks approval of a potential new MAC address, make
sure that should the bridge device end up using this address, all
interested parties would agree with it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: Issue NETDEV_PRE_CHANGEADDR
Petr Machata [Thu, 13 Dec 2018 11:54:37 +0000 (11:54 +0000)]
net: bridge: Issue NETDEV_PRE_CHANGEADDR

When a port is attached to a bridge, the address of the bridge in
question may change as well. Even if it would not change at this
point (because the current bridge address is lower), it might end up
changing later as a result of detach of another port, which can't be
vetoed.

Therefore issue NETDEV_PRE_CHANGEADDR regardless of whether the address
will be used at this point or not, and make sure all involved parties
would agree with the change.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dev: Issue NETDEV_PRE_CHANGEADDR
Petr Machata [Thu, 13 Dec 2018 11:54:35 +0000 (11:54 +0000)]
net: dev: Issue NETDEV_PRE_CHANGEADDR

When a device address is about to be changed, or an address added to the
list of device HW addresses, it is necessary to ensure that all
interested parties can support the address. Therefore, send the
NETDEV_PRE_CHANGEADDR notification, and if anyone bails on it, do not
change the address.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dev: Add NETDEV_PRE_CHANGEADDR
Petr Machata [Thu, 13 Dec 2018 11:54:33 +0000 (11:54 +0000)]
net: dev: Add NETDEV_PRE_CHANGEADDR

The NETDEV_CHANGEADDR notification is emitted after a device address
changes. Extending this message to allow vetoing is certainly possible,
but several other notification types have instead adopted a simple
two-stage approach: first a "pre" notification is sent to make sure all
interested parties are OK with a change that's about to be done. Then
the change is done, and afterwards a "post" notification is sent.

This dual approach is easier to use: when the change is vetoed, nothing
has changed yet, and it's therefore unnecessary to roll anything back.
Therefore adopt it for NETDEV_CHANGEADDR as well.

To that end, add NETDEV_PRE_CHANGEADDR and an info structure to go along
with it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dev: Add extack argument to dev_set_mac_address()
Petr Machata [Thu, 13 Dec 2018 11:54:30 +0000 (11:54 +0000)]
net: dev: Add extack argument to dev_set_mac_address()

A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5e-updates-2018-12-11' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 13 Dec 2018 06:22:53 +0000 (22:22 -0800)]
Merge tag 'mlx5e-updates-2018-12-11' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-11

From Eli Britstein,
Patches 1-10 adds remote mirroring support.
Patches 1-4 refactor encap related code as pre-steps for using per
destination encapsulation properties.
Patches 5-7 use extended destination feature for single/multi
destination scenarios that have a single encap destination.
Patches 8-10 enable multiple encap destinations for a TC flow.

From, Daniel Jurgens,
Patch 11, Use CQE padding for Ethernet CQs, PPC showed up to a 24%
improvement in small packet throughput

From Eyal Davidovich,
patches 12-14, FW monitor counter support
FW monitor counters feature came to solve the delayed reporting of
FW stats in the atomic get_stats64 ndo, since we can't access the
FW at that stage, this feature will enable immediate FW stats updates
in the driver via fw events on specific stats updates.

Patch 12, cleanup to avoid querying a FW counter when it is not
supported
Patch 13, Monitor counters FW commands support
Patch 14, Use monitor counters in ethernet netdevice to update FW
stats reported in the atomic get_stats64 ndo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet-next: stmmac: dwmac-mediatek: add module license info
Biao Huang [Thu, 13 Dec 2018 02:41:37 +0000 (10:41 +0800)]
net-next: stmmac: dwmac-mediatek: add module license info

Add MODULE_LICENSE info to fix this:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.o

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Remove set but not used variable 'upriv'
YueHaibing [Wed, 12 Dec 2018 08:33:53 +0000 (08:33 +0000)]
net/mlx5e: Remove set but not used variable 'upriv'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_vport_rep_load':
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1490:21: warning:
 variable 'upriv' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_vport_rep_unload':
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1557:21: warning:
 variable 'upriv' set but not used [-Wunused-but-set-variable]

It not used any more since commit ef381359e3a8 ("net/mlx5e: Replace egdev with
indirect block notifications"). Also remove unused variable 'uplink_rpriv'
after this change.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5: Remove duplicated include from eswitch.c
YueHaibing [Wed, 12 Dec 2018 08:03:38 +0000 (08:03 +0000)]
net/mlx5: Remove duplicated include from eswitch.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoMerge branch 'Pass-extack-to-SWITCHDEV_PORT_OBJ_ADD'
David S. Miller [Thu, 13 Dec 2018 00:34:22 +0000 (16:34 -0800)]
Merge branch 'Pass-extack-to-SWITCHDEV_PORT_OBJ_ADD'

Petr Machata says:

====================
Pass extack to SWITCHDEV_PORT_OBJ_ADD

Drivers may need to do validation as a result of port object addition.
An example is mlxsw, which needs to check the configuration of a VXLAN
device attached to an offloaded bridge. Without a mapped VLAN, the
invalidity of the device is not important, but as soon as a pvid,
untagged VLAN is configured for the device, it has to be validated and
offloaded. Should the validation fail, there's currently no way to
communicate details of the failure to the user, beyond an error number.

Because currently, extack is not available at all in that area of code,
this patch starts down at the RTNL level and progresses up towards the
driver(s).

In patch #1, ndo_bridge_setlink is updated to include extack, and
callbacks of all clients are updated as well (ignoring the argument).

In patch #2, the bridge driver is updated to propagate the extack
through to the switchdev border, br_switchdev_port_vlan_add().

Patches #3, #4 and #5 then gradually extend switchdev to pass the extack
argument through to the switchdev blocking notifier chain.

Patches #6 and #7 then update mlxsw to pass the extack argument from
VXLAN events resp. port events on to mlxsw_sp_bridge_8021q_vxlan_join().

Finally in patches #8 and #9, the code paths from the previous two
patches are verified to yield an error message.

v2:
- Patch #1:
    - In ndo_bridge_setlink(), keep the whole extack declaration on the
      same line.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: extack: Test VLAN add on a port device
Petr Machata [Wed, 12 Dec 2018 17:03:06 +0000 (17:03 +0000)]
selftests: mlxsw: extack: Test VLAN add on a port device

Test mapping a VLAN at a port device such that on the same VLAN, there
already is an unoffloadable VXLAN device.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: extack: Test VLAN add on a VXLAN device
Petr Machata [Wed, 12 Dec 2018 17:03:04 +0000 (17:03 +0000)]
selftests: mlxsw: extack: Test VLAN add on a VXLAN device

Test mapping a VLAN at a VXLAN device that can't be offloaded.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_switchdev: Propagate extack on port VLAN events
Petr Machata [Wed, 12 Dec 2018 17:03:02 +0000 (17:03 +0000)]
mlxsw: spectrum_switchdev: Propagate extack on port VLAN events

After switchdev_handle_port_obj_add() was extended in a preceding patch,
mlxsw_sp_port_obj_add() now takes an extack argument. Propagate it
further by extending a callee chain from mlxsw_sp_port_vlans_add(), via
mlxsw_sp_bridge_port_vlan_add() via mlxsw_sp_port_vlan_bridge_join() via
mlxsw_sp_port_vlan_fid_join() to mlxsw_sp_bridge_ops.fid_get, adding an
extack argument for each of them.

This code path is used when a VLAN is added to a port netdevice if there
already is an unoffloadable VXLAN device with that VLAN mapped.

mlxsw_sp_bridge_8021d_port_join() is updated to obey the new interfaces
changed by the abovementioned code, propagating extack ultimately from
NETDEV_CHANGEUPPER events.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_switchdev: Propagate extack on VXLAN VLAN events
Petr Machata [Wed, 12 Dec 2018 17:03:00 +0000 (17:03 +0000)]
mlxsw: spectrum_switchdev: Propagate extack on VXLAN VLAN events

Now that VLAN port object addition notifications carry an extack,
propagate it from mlxsw_sp_switchdev_vxlan_vlans_add() through
mlxsw_sp_switchdev_vxlan_vlan_add() to
mlxsw_sp_bridge_8021q_vxlan_join().

This code path is used when a VLAN is added to a VXLAN netdevice that
cannot be offloaded.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: switchdev: Add extack to switchdev_handle_port_obj_add() callback
Petr Machata [Wed, 12 Dec 2018 17:02:56 +0000 (17:02 +0000)]
net: switchdev: Add extack to switchdev_handle_port_obj_add() callback

Drivers use switchdev_handle_port_obj_add() to handle recursive descent
through lower devices. Change this function prototype to take add_cb
that itself takes an extack argument. Decode extack from
switchdev_notifier_port_obj_info and pass it to add_cb.

Update mlxsw and ocelot drivers which use this helper.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: switchdev: Add extack to struct switchdev_notifier_info
Petr Machata [Wed, 12 Dec 2018 17:02:54 +0000 (17:02 +0000)]
net: switchdev: Add extack to struct switchdev_notifier_info

In order to pass extack to the drivers that need it, add an extack field
to struct switchdev_notifier_info, and an extack argument to the
function call_switchdev_blocking_notifiers(). Also add a helper function
switchdev_notifier_info_to_extack().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: switchdev: Add extack argument to switchdev_port_obj_add()
Petr Machata [Wed, 12 Dec 2018 17:02:52 +0000 (17:02 +0000)]
net: switchdev: Add extack argument to switchdev_port_obj_add()

After the previous patch, bridge driver has extack argument available to
pass to switchdev. Therefore extend switchdev_port_obj_add() with this
argument, updating all callers, and passing the argument through to
switchdev_port_obj_notify().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: Propagate extack to switchdev
Petr Machata [Wed, 12 Dec 2018 17:02:50 +0000 (17:02 +0000)]
net: bridge: Propagate extack to switchdev

ndo_bridge_setlink has been updated in the previous patch to have extack
available, and changelink RTNL op has had this argument since the time
extack was added. Propagate both through the bridge driver to eventually
reach br_switchdev_port_vlan_add(), where it will be used by subsequent
patches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ndo_bridge_setlink: Add extack
Petr Machata [Wed, 12 Dec 2018 17:02:48 +0000 (17:02 +0000)]
net: ndo_bridge_setlink: Add extack

Drivers may not be able to implement a VLAN addition or reconfiguration.
In those cases it's desirable to explain to the user that it was
rejected (and why).

To that end, add extack argument to ndo_bridge_setlink. Adapt all users
to that change.

Following patches will use the new argument in the bridge driver.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt: remove printing of hwrm message
Jonathan Toppins [Wed, 12 Dec 2018 16:58:51 +0000 (11:58 -0500)]
bnxt: remove printing of hwrm message

bnxt_en 0000:19:00.0 (unregistered net_device) (uninitialized): hwrm
req_type 0x190 seq id 0x6 error 0xffff

The message above is commonly seen when a newer driver is used on
hardware with older firmware. The issue is this message means nothing to
anyone except Broadcom. Remove the message to not confuse users as this
message is really not very informative.

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetdevsim: convert to DEFINE_SHOW_ATTRIBUTE
Yangtao Li [Wed, 12 Dec 2018 16:40:07 +0000 (11:40 -0500)]
netdevsim: convert to DEFINE_SHOW_ATTRIBUTE

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'stmmac-mt2712-support'
David S. Miller [Wed, 12 Dec 2018 23:21:01 +0000 (15:21 -0800)]
Merge branch 'stmmac-mt2712-support'

Biao Huang says:

====================
add Ethernet driver support for mt2712

Changes in v6:
 modifications according to comments from Rob/Andrew/Sean:
 1. use delay_ps instead of delay stage.
 2. add comments in driver to avoid confusion.
 2. rewrite set_delay function.
 3. modify binding document for properties: tx-delay-ps/rx-delay-ps/pericfg etc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-binding: mediatek-dwmac: add binding document for MediaTek MT2712 DWMAC
Biao Huang [Wed, 12 Dec 2018 09:35:32 +0000 (17:35 +0800)]
dt-binding: mediatek-dwmac: add binding document for MediaTek MT2712 DWMAC

The commit adds the device tree binding documentation for the MediaTek DWMAC
found on MediaTek MT2712.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agostmmac: dwmac-mediatek: add support for mt2712
Biao Huang [Wed, 12 Dec 2018 09:35:31 +0000 (17:35 +0800)]
stmmac: dwmac-mediatek: add support for mt2712

Add Ethernet support for MediaTek SoCs from the mt2712 family

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Add-Spectrum-2-multicast-routing-support'
David S. Miller [Wed, 12 Dec 2018 07:01:34 +0000 (23:01 -0800)]
Merge branch 'mlxsw-Add-Spectrum-2-multicast-routing-support'

Ido Schimmel says:

====================
mlxsw: Add Spectrum-2 multicast routing support

Nir says:

In Spectrum the firmware provided an abstraction for multicast routing
on top of the policy engine. In Spectrum-2 this is no longer the case
and the driver must interact directly with the policy engine in order to
program multicast routes. Every route is written as an ACL rule, its
priority set according to route type (*,G) or (S,G) and its action is an
appropriate multicast routing action. Multicast routes are written to a
specific ACL group which is bound to the appropriate IP protocol
IPv4/IPv6.

Patch #1 adds PEMRBT register needed to declare which ACL group is
dedicated for each IP protocol multicast routing function.

Patch #2 Changes initialization order and puts ACL before router as
multicast router now uses ACL module.

Patch #3 adds Spectrum-2 ACL keys needed for multicast route matching.

Patch #4 adds another ACL profile - in addition to existing flower
profile - which allows the multicast routing module to program rules
directly into the ACL block.

Patch #5 adds the ability to update ACL rules' action, since multicast
routes actions may be updated after being configured.

Patch #6 separates rule creation operation and rule action creation
operation as in multicast router the action is created before the route
is inserted.

Patch #7 sharpens priority handling in Spectrum-2, to ensure incorrect
values are not set to rule's priority.

Patch #8 adds the implementation of multicast routing for IPv4 and IPv6
over existing ACL rule programming

Finally, patch #9 adds a test for IPv4/IPv6 multicast routing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add multicast routing test
Nir Dotan [Mon, 10 Dec 2018 07:11:46 +0000 (07:11 +0000)]
selftests: forwarding: Add multicast routing test

Introduce basic testing for both IPv4 and IPv6 multicast. The test creates
an (S,G) type route, sends traffic and verifies traffic arrives when the
route is present and then verifies traffic does not arrive after deleting
the route.
This test requires smcroute - https://github.com/troglobit/smcroute which
is a tool that allows creation of static multicast routes.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Add Multicast routing support for Spectrum-2
Nir Dotan [Mon, 10 Dec 2018 07:11:45 +0000 (07:11 +0000)]
mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2

Add implementation of Spectrum-2 multicast routes for both IPv4 and IPv6 by
using ACL module explicitly.
In Spectrum-2, multicast routes are set as ACL rules, so initialization
takes care of creating dedicated ACL groups and binding them to the
appropriate multicast routing protocol IPv4/IPv6, and afterwards routes
configuration translates to setting explicit ACL rules.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Limit priority value
Nir Dotan [Mon, 10 Dec 2018 07:11:44 +0000 (07:11 +0000)]
mlxsw: spectrum_acl: Limit priority value

In Spectrum-2, higher priority value wins and priority valid values are in
the range of {1,cap_kvd_size-1}. mlxsw_sp_acl_tcam_priority_get converts
from lower-bound priorities alike tc flower to Spectrum-2 HW range. Up
until now tc flower did not provide priority 0 or reached the maximal
value, however multicast routing does provide priority 0.

Therefore, Change mlxsw_sp_acl_tcam_priority_get to verify priority is in
the correct range. Make sure priority is never set to zero and never
exceeds the maximal allowed value.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Support rule creation without action creation
Nir Dotan [Mon, 10 Dec 2018 07:11:43 +0000 (07:11 +0000)]
mlxsw: spectrum_acl: Support rule creation without action creation

Up until now, when ACL rule was created its action was created with it.
It suits well for tc flower where ACL rule always needs an action, however
it does not suit multicast router, where the action is created prior to
setting a route, which in Spectrum-2 is actually an ACL rule.

Add support for rule creation without action creation. Do it by adding
afa_block argument to mlxsw_sp_acl_rule_create, which if NULL then an
action would be created, also add an indication within struct
mlxsw_sp_acl_rule_info that tells if the action should be destroyed when
the rule is destroyed.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Add replace rule action operation
Nir Dotan [Mon, 10 Dec 2018 07:11:41 +0000 (07:11 +0000)]
mlxsw: spectrum_acl: Add replace rule action operation

Multicast routes actions may be updated after creation. An example for that
is an addition of an egress interface to an existing route.
So far, as tc flower API dictated, ACL rules were either created or
deleted. Since multicast routes in Spectrum-2 are written to ACL as any
rule, it is required to allow the update of a rule's action as it may
change.

Add methods and operations to support updating rule's action. This is
supported only for Spectrum-2.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Add multicast router profile operations
Nir Dotan [Mon, 10 Dec 2018 07:11:40 +0000 (07:11 +0000)]
mlxsw: spectrum_acl: Add multicast router profile operations

Add specific ACL operations needed for programming multicast routing ACL
groups and routes.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>