git.samba.org - sfrench/cifs-2.6.git/log

pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US

Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
(like in PSCHED_TICKS_PER_SEC already) to avoid misleading.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Fix fib_trie rebalancing

While doing trie_rebalance(): resize(), inflate(), halve() RCU free
tnodes before updating their parents. It depends on RCU delaying the
real destruction, but if RCU readers start after call_rcu() and before
parent update they could access freed memory.

It is currently prevented with preempt_disable() on the update side,
but it's not safe, except maybe classic RCU, plus it conflicts with
memory allocations with GFP_KERNEL flag used from these functions.

This patch explicitly delays freeing of tnodes by adding them to the
list, which is flushed after the update is finished.

Reported-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver

The current build shows a warning with the DTL-1 driver:

CC [M] drivers/bluetooth/dtl1_cs.o
drivers/bluetooth/dtl1_cs.c: In function ‘dtl1_hci_send_frame’:
drivers/bluetooth/dtl1_cs.c:396: warning: ‘nsh.type’ may be used uninitialized in this function

Fix this by adding a proper error for unknown packet types.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Fix Kconfig issue with RFKILL integration

Since the re-write of the RFKILL subsystem it is no longer good to just
select RFKILL, but it is important to add a proper depends on rule.

Based on a report by Alexander Beregalov <a.beregalov@gmail.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

PIM-SM: namespace changes

IPv4:
  - make PIM register vifs netns local
  - set the netns when a PIM register vif is created
  - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
    by adding the protocol handler when multicast routing is initialized

IPv6:
  - make PIM register vifs netns local
  - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
    by adding the protocol handler when multicast routing is initialized

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: update ARPD help text

Removed the statements about ARP cache size as this config option does
not affect it. The cache size is controlled by neigh_table gc thresholds.

Remove also expiremental and obsolete markings as the API originally
intended for arp caching is useful for implementing ARP-like protocols
(e.g. NHRP) in user space and has been there for a long enough time.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use a deferred timer in rt_check_expire

For the sake of power saver lovers, use a deferrable timer to fire
rt_check_expire()

As some big routers cache equilibrium depends on garbage collection
done in time, we take into account elapsed time between two
rt_check_expire() invocations to adjust the amount of slots we have to
check.

Based on an initial idea and patch from Tero Kristo

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Tero Kristo <tero.kristo@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: fix kconfig bool/tristate muckup

menuconfig IEEE802154_DRIVERS is a bool that depends on tristate IEEE802154.
If the IEEE802154 symbol is 'm', the bool becomes 'y'.
This allows tristate symbols under IEEE802154_DRIVERS to be configured as
'y' and cause build problems.
Changing the menuconfig bool to a tristate fixes this.

drivers/built-in.o: In function `fake_scan_req':
fakehard.c:(.text+0x46d625): undefined reference to `ieee802154_nl_scan_confirm'
drivers/built-in.o: In function `fake_disassoc_req':
fakehard.c:(.text+0x46d66f): undefined reference to `ieee802154_nl_disassoc_confirm'
drivers/built-in.o: In function `fake_assoc_req':
fakehard.c:(.text+0x46d6be): undefined reference to `ieee802154_nl_assoc_confirm'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: initialization rework

Need to rework how bonding devices are initialized to make it more
amenable to creating bonding devices via netlink.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: use is_zero_ether_addr

Remove bogus non-portable possibly unaligned way of testing
for zero addres..

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: network device names are case sensative

The bonding device acts unlike all other Linux network device functions
in that it ignores case of device names. The developer must have come
from windows!

Cleanup the management of names and use standard routines where possible.
Flag places where bonding device still doesn't work right with network
namespaces.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: elminate bad refcount code

The "expected_refcount" stuff in bonding sysfs module is a mistake.
Sysfs does proper refcounting, and it is okay to remove a bond device
that has some user process holding the file open.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix style issues

Resolve some of the complaints from checkpatch, and remove "magic emacs format"
comments, and useless MODULE_SUPPORTED_DEVICE(). But should not
change actual code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix destructor

It is not safe to use a network device destructor that is a function in
the module, since it can be called after module is unloaded if sysfs
handle is open.

When eventually using netlink, the device cleanup code needs to be done
via uninit function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove bonding read/write semaphore

The whole read/write semaphore locking can be removed. It doesn't add any
protection that isn't already done by using the RTNL mutex properly.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: initialize before registration

Avoid a unnecessary carrier state transistion that happens when device
is registered.
Lockdep works better if initialization is done before registration as well.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: bond_create always called with default parameters

bond_create() is always called with same parameters so move the argument
down.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-next' of git://git./linux/kernel/git/sameo/irda-2.6

Merge branch 'master' of git://git./linux/kernel/git/kaber/nf-next-2.6

x_tables: Convert printk to pr_err

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: conntrack: optional reliable conntrack event delivery

This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.

The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.

At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.

The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.

During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.

A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.

For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.

This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

list_nulls: add hlist_nulls_add_head and hlist_nulls_del

This patch adds the hlist_nulls_add_head() function which is
based on hlist_nulls_add_head_rcu() but without the use of
rcu_assign_pointer(). It also adds hlist_nulls_del which is
exactly the same like hlist_nulls_del_rcu().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()

This patch moves the helper destruction to a function that lives
in nf_conntrack_helper.c. This new function is used in the patch
to add ctnetlink reliable event delivery.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: conntrack: move event caching to conntrack extension infrastructure

This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.

The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.

BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_conntrack: use mod_timer_pending() for conntrack refresh

Use mod_timer_pending() instead of atomic sequence of del_timer()/
add_timer(). mod_timer_pending() does not rearm an inactive timer,
so we don't need the conntrack lock anymore to make sure we don't
accidentally rearm a timer of a conntrack which is in the process
of being destroyed.

With this change, we don't need to take the global lock anymore at all,
counter updates can be performed under the per-conntrack lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_log: fix sleeping function called from invalid context

Fix regression introduced by 17625274 "netfilter: sysctl support of
logger choice":

BUG: sleeping function called from invalid context at /mnt/s390test/linux-2.6-tip/arch/s390/include/asm/uaccess.h:234
in_atomic(): 1, irqs_disabled(): 0, pid: 3245, name: sysctl
CPU: 1 Not tainted 2.6.30-rc8-tipjun10-02053-g39ae214 #1
Process sysctl (pid: 3245, task: 000000007f675da0, ksp: 000000007eb17cf0)
0000000000000000 000000007eb17be8 0000000000000002 0000000000000000
       000000007eb17c88 000000007eb17c00 000000007eb17c00 0000000000048156
       00000000003e2de8 000000007f676118 000000007eb17f10 0000000000000000
       0000000000000000 000000007eb17be8 000000000000000d 000000007eb17c58
       00000000003e2050 000000000001635c 000000007eb17be8 000000007eb17c30
Call Trace:
(Ý<00000000000162e6>¨ show_trace+0x13a/0x148)
Ý<00000000000349ea>¨ __might_sleep+0x13a/0x164
Ý<0000000000050300>¨ proc_dostring+0x134/0x22c
Ý<0000000000312b70>¨ nf_log_proc_dostring+0xfc/0x188
Ý<0000000000136f5e>¨ proc_sys_call_handler+0xf6/0x118
Ý<0000000000136fda>¨ proc_sys_read+0x26/0x34
Ý<00000000000d6e9c>¨ vfs_read+0xac/0x158
Ý<00000000000d703e>¨ SyS_read+0x56/0x88
Ý<0000000000027f42>¨ sysc_noemu+0x10/0x16

Use the nf_log_mutex instead of RCU to fix this.

Reported-and-tested-by: Maran Pakkirisamy <maranpsamy@in.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

net: use symbolic values for ndo_start_xmit() return codes

Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively.

0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases
where its in direct proximity to one of the other values.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 8)

Fix up USB drivers that return an errno value (result of usb_submit_urb())
to qdisc_restart(), causing qdisc_restart() to print a warning and requeue/
retransmit the skb.

- hso: skb is freed: use after free
- at76_usb: skb is freed: use after free

Compile tested only.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 7)

Fix up ATM drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning an requeue/retransmit the skb.

- lec: condition can only be remedied by userspace, until that retransmissions

Compile tested only.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 6)

Fix up hamradio drivers that return an errno value to dev_queue_xmit(), causing
it to print a warning an free the skb.

- bpqether: skb is freed: use after free

Compile tested only.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 5)

Fix up s390 drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning an requeue/retransmit the skb.

- claw: impossible condition, simply remove it

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 4)

Fix up WAN drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning an requeue/retransmit the skb.

- cycx_x25: intention appears to be to requeue the skb

Does not compile cleanly for me even without this patch, so untested.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 3)

net: fix network drivers ndo_start_xmit() return values (part 3)

Fix up wireless drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning an requeue/retransmit the skb.

- airo: transmission not implemented for chip, intention is to free and abort
- ipw2200: transmission not implemented for promiscous mode, intention is to
drop
- prism54: intention is to drop
- wl3501_cs: intention appears to be to drop
- zd1201: error counter indicates intention is to drop

All drivers compile tested.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network drivers ndo_start_xmit() return values (part 2)

Fix up IRDA drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning an requeue/retransmit the skb.

- donauboe: intention appears to be to have the skb retransmitted without
error message
- irda-usb: intention is to drop silently according to comment
- kingsub-sir: skb is freed: use after free
- ks959-sir: skb is freed: use after free
- ksdazzle-sir: skb is freed: use after free
- mcs7880: skb is freed: use after free

All but donauboe compile tested.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix network driver ndo_start_xmit() return values (part 1)

Fix up drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning and requeue/retransmit the skb.

- xpnet: memory allocation error, intention is to drop
- ethoc: oversized packet, packet must be dropped
- ibmlana: skb freed: use after free
- rrunner: skb freed: use after free

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

irda: add git tree to MAINTAINERS file

We now have an IrDA git tree on kernel.org:
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/irda-2.6.git

Signed-off-by: Samuel Ortiz <samuel@sortiz.org>

irda: smsc wait count reaches -1

The sir retries count reaches -1 rather than 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

irda: new Blackfin on-chip SIR IrDA driver

Signed-off-by: Graff Yang <graff.yang@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bridge: Simplify interface for ATM LANE

This patch changes FDB entry check for ATM LANE bridge integration.
There's no point in holding a FDB entry around SKB building.

br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr()
hook that checks if the addr has FDB entry pointing to other port
to the one the request arrived on.

FDB entry refcounting is removed as it's not used anywhere else.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[PATCH] net core: Some interface flags not returned by SIOCGIFFLAGS

Commit b00055aacdb172c05067612278ba27265fcd05ce " [NET] core: add
RFC2863 operstate" defined new interface flag values.  Its
documentation specified that these flags could be accessed from user
space via SIOCGIFFLAGS.  However, this does not work because the new
flags do not fit in that ioctl's argument width.

Change the documentation to match the code's behavior.  Also change
the source to explicitly show the truncation.  This _should_ have no
effect on executable code, and did not with gcc 4.2.4 generating x86
code.

A new ioctl could be defined to return all interface flags to user
space.  However, since this has been broken for three years with no
one complaining, there doesn't seem much need.  They are still
accessible via netlink.

Reported-by: "Fredrik Arnerup" <fredrik.arnerup@edgeware.tv>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: Fix IP alignment on non-mergeable RX path

We need to enforce the IP alignment on the non-mergeable RX path just
like the other RX path. Not doing so results in misaligned IP
headers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/kaber/nf-next-2.6

net: ntohs() misuse

Some drivers incorrectly use ntohs() instead of htons()

A cleanup as htons() returns same result than ntohs(),
but better to use the proper one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'linux-2.6.31.y' of git://git./linux/kernel/git/inaky/wimax

netfilter: ip_tables: fix build error

Fix build error introduced by commit bb70dfa5 (netfilter: xtables:
consolidate comefrom debug cast access):

net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table':
net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function)
net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once
net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.)

Signed-off-by: Patrick McHardy <kaber@trash.net>

wimax: fix gcc warnings in sh4 when calling BUG()

SH4's BUG() seems to confuse the compiler as it is considered to
return; thus, some functions would trigger usage of uninitialized
variables or non-void functions returning void.

Work around by initializing/returning.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax: fix warning caused by not checking retval of rfkill_set_hw_state()

Caused by an API update. The return value can be safely ignored, as
there is notthing we can do with it.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

mISDN: cleanup mISDNhw.h

Remove unused stuff.

Signed-off-by: Karsten Keil <keil@b1-systems.de>

mISDN: Do not disable IRQ in ph_data_ind()

This fix triggering the WARN_ON_ONCE(in_irq() || irqs_disabled()); in
local_bh_enable().

Here is no need to grab this lock, this was wrong at all and may
cause a deadlock and access to freed memory, since on a TEI remove
the current listelement can be deleted under us. So this is clearly
a case for list_for_each_entry_safe.

Signed-off-by: Karsten Keil <keil@b1-systems.de>

drivers/isdn/i4l/isdn_tty.c: fix check for array overindexing

The check for overindexing of dev->mdm.info[] has an off-by-one.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Karsten Keil <keil@b1-systems.de>

mISDN: Free hfcpci IRQ if init was not successful

If we get no interrupts for after 3 resets we need to unregister
the interrupt function, which is already done outside the loop.

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Karsten Keil <keil@b1-systems.de>

mISDN: Fix overlapping data access

Remove code rewriting a buffer by itself.
This fix bug 12970 on bugzilla.kernel.org.

Signed-off-by: Karsten Keil <keil@b1-systems.de>

ISDN:Fix DMA alloc for hfcpci

Replace wrong code with correct DMA API functions.

Signed-off-by: Karsten Keil <keil@b1-systems.de>

netfilter: nf_ct_tcp: fix up build after merge

Replace the last occurence of tcp_lock by the per-conntrack lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>

Merge branch 'master' of git://git./linux/kernel/git/davem/net-next-2.6

Merge branch 'master' of git://git./linux/kernel/git/holtmann/bluetooth-next-2.6

bonding: fix multiple module load problem

Some users still load bond module multiple times to create bonding
devices.  This accidentally was broken by a later patch about
the time sysfs was fixed.  According to Jay, it was broken
by:
   commit b8a9787eddb0e4665f31dd1d64584732b2b5d051
   Author: Jay Vosburgh <fubar@us.ibm.com>
   Date:   Fri Jun 13 18:12:04 2008 -0700

     bonding: Allow setting max_bonds to zero

Note: sysfs and procfs still produce WARN() messages when this is done
so the sysfs method is the recommended API.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: fix state transition INCOMPLETE->FAILED via Netlink request

The current code errors out the INCOMPLETE neigh entry skb queue only from
the timer if maximum probes have been attempted and there has been no reply.
This also causes the transtion to FAILED state.

However, the neigh entry can be also updated via Netlink to inform that the
address is unavailable. Currently, neigh_update() just stops the timers and
leaves the pending skb's unreleased. This results that the clean up code in
the timer callback is never called, preventing also proper garbage collection.

This fixes neigh_update() to process the pending skb queue immediately if
INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

wimax/i2400m: use -EL3RST to indicate device reset instead of -ERESTARTSYS

When the i2400m device resets, the driver code will force some
functions to return a -ERESTARTSYS error code, which can is used by
the caller to determine which recovery actions to take.

However, in certain situations the only thing that can be done is to
bubble up said error code to user space, for handling.

However, -ERESTARSYS was a poor choice, as it is supposed to be used
by the kernel only.

As such, replace -ERESTARTSYS with -EL3RST; as well, in
i2400m_msg_to_dev(), when the device is in boot mode (following a
recent reset), return -EL3RST instead of -ENODEV (meaning the device
is in bootrom mode after a reset, not that the device was
disconnected, and thus, normal commands cannot be executed).

Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>

wimax/i2400m: when bootstrap fails, reinitialize the bootrom

When a device reset happens during firmware load [in
i2400m_dev_bootstrap()], __i2400m_dev_start() will retry a number of
times. However, for those retries to be able to accomplish anything,
the device's bootrom has to be reinitialized.

Thus, on the retry path, pass the I2400M_MAC_REINIT to the firmware
load code.

Signed-off-by: Cindy H Kao <cindy.h.kao@intel.com>

wimax/i2400m/sdio: Move all the RX code to a unified, IRQ based receive routine

The current SDIO code was working in polling mode for boot-mode
(firmware load) mode. This was causing issues on some hardware.

Moved all the RX code to use a unified IRQ handler that based on the
type of data the device is sending can discriminate and decide which
is the right destination.

As well, all the reads from the device are made to be at least the
block size (256); the driver will ignore the rest when not needed.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: don't reset device when bootrom init retries are exceeded

When i2400m_bootrom_init() fails to put the device into a state of
being ready to accept firmware, the driver was currently trying to
reset it if it failed to do so. This is not too useful; as part of
trying to put the device in the right state a few resets have already
been tried.

At this point, things are probably fried out and an extra reset might
do more harm than good (for example causing reseting of other
functions in the same composite device).

So it is left up to the callers to determine the error path to take
(at the end this is always i2400m_setup(), who depending on how many
retries are left, might give up on the device).

From a fix by Cindy H. Kao.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m/sdio: Add device specific poke table.

Add a poke table for the SDIO device (as it is different than USB).

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>

wimax/i2400m: move boot time poke table out of common driver

This change moves the table of "pokes" performed on the device at boot
time to the bus specific portion of the driver.

Different models of the i2400m device supported by this driver require
different poke tables, thus having a single table that works for all
is impossible. For that, the table is moved to the bus-specific
driver, who can decide which table to use based on the specifics of
the device and point the generic driver to it.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>

wimax/i2400m: Allow bus-specific driver to specify retry count

The code that sets up the i2400m (firmware load and general driver
setup after it) includes a couple of retry loops.

The SDIO device sometimes can get in more complicated corners than the
USB one (due to its interaction with other SDIO functions), that
require trying a few more times.

To solve that, without having a failing USB device taking longer to be
considered dead, allow the retry counts to be specified by the
bus-specific driver, which the general driver takes as a parameter.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: if a device reboot happens during probe, handle it

When a device reboot happens when we are under probe, with init_mutex
taken, make sure we can recover. Have dev_reset_handle set boot mode
and i2400m_msg_to_dev() will see it and fail gracefully instead of
timing out.

Found and diagnosed by Cindy H. Kao.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: fix oops when the TX FIFO fills up due to a missing check

When the TX FIFO filled up and i2400m_tx_new() failed to allocate a
new TX message header, a missing check for said condition was causing a
kernel oops when trying to dereference a NULL i2400m->tx_msg pointer.

Found and diagnosed by Cindy H. Kao.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: don't reset device on i2400m_dev_shutdown()

i2400m_dev_shutdown() tried to reset the device to put it in a known
state before shutting down.

But that turned out to be pointless. We reach this case in two paths:

1 - when the device resets, to clean up state
2 - when the driver is unloaded, for the same

however, in both cases it is pointless; in (1) the device is already
reset, why do it again? in (2) we can't -- the USB stack, for example,
doesn't allow communicating with the device when the driver is being
unbound and if the device is disconnected, the device is gone already.

So just remove it. Leave the function as a placeholder for future
cleanups that will be done from data allocated by the driver during
device operation.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: fix panic due to missed corner cases on tail_room calculation

i2400m_tx_skip_tail() needs to handle the special case of being called
when the tail room that is left over in the FIFO is zero.

This happens when a TX message header was opened at the very end of
the FIFO (without payloads). The i2400m_tx_close() code already marked
said TX message (header) to be skipped and this function should be
doing nothing.

It is called anyway because it is part of a common "corner case" path
handling which takes care of more cases than only this one.

The tail room computation was also improved to take care of the case
when tx_in is at the end of the buffer boundary; tail_room has to be
modded (%) to the buffer size. To do that in a single well-documented
place, __i2400m_tx_tail_room() is introduced and used.

Treat i2400m->tx_in == 0 as a corner case and handle it accordingly.

Found and diagnosed by Cindy H. Kao.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: fix panic/warnings caused by missed check on empty TX message

In some situations, when a new TX message header is started, there
might be no space for data payloads. In this case the message is left
with zero payloads and the i2400m_tx_close() function has just to mark
it as "to skip". If it tries to go ahead it will overwrite things
because there is no space to add padding as defined by the
bus-specific layer. This can cause buffer overruns and in some stress
cases, panics.

Found and diagnosed by Cindy H. Kao.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: rename misleading I2400M_PL_PAD to I2400M_PL_ALIGN

The constant is being use as an alignment factor, not as a padding
factor; made reading/reviewing the code quite confusing.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m/sdio: Implement I2400M_RT_BUS reset type

This reset type causes the WiMAX function to be disabled and
re-enabled, which will force the WiMAX device to reset and enter boot
mode.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>

wimax/i2400m: Change d_printf() level for secure boot messages

Changing debug level of print out to support validation engineers
getting the messages they need.

Signed-off-by: <dirk.j.brandewie@intel.com>

wimax/i2400m: i2400m_schedule_work() doesn't need i2400m->work_queue

By mistake, the BUG_ON() check was left in there and it will fail when
called if i2400m->work_queue is still not setup.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: i2400m's work queue should be initialized before RX support

RX support is the only user of the work-queue, to process
reports/notifications from the device. Thus, it needs the work queue
to be initialized first.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: don't call netif_start_queue() in _tx_msg_sent()

Reported and fixed by Cindy H Kao.

When the device is stopped __i2400m_dev_stop() stops the network
queue.

However, when this is done in the middle of heavy network operation,
when the bus-specific subdriver is still wrapping up and it reports a
sent TX transaction with _tx_msg_sent() right after the device was
stopped, the queue was being started again, which was causing a stream
of oopsen and finally a panic.

In any case, said call has no place there. It's a left over from an
early implementation that was discarded later on.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

wimax/i2400m: introduce module parameter to disable entering power save

The i2400m driver waits for the device to report being ready for
entering power save before asking it to do so. This module parameter
allows control of said operation; if disabled, the driver won't ask
the device to enter power save mode.

This is useful in setups where power saving is not so important or
when the overhead imposed by network reentry after power save is not
acceptable; by combining this with parameter 'idle_mode_disabled', the
driver will always maintain both the connection and the device in
active state.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

net: No more expensive sock_hold()/sock_put() on each tx

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: No need to check vfree() pointer.

vfree() does its own 'NULL' check, so no need for check before
calling it.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix be_tx_q_clean() being called on freed queues

In the tx queue destroy path, be_tx_q_clean() is currently called after the tx queues are freed; it must be called before.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix early reset of rx-completion

be_rx_compl_get() must not reset(via the valid word) the rx_compl as the rx_compl is not processed yet; it must be reset after it is processed.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix rx stats updation in non-lro path

rx stats are not getting updated when an rx_compl with only one frag is rcvd in non-lro path.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix netdev stats rx_errors and rx_dropped

Fix netdev stat rx_errors to cover length related errors and checksum errors and rx_dropped to the pkts dropped due to lack of buffers

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Use cancel_delayed_work_sync instead of cancel_delayed_work()

Use cancel_delayed_work_sycn instead of cancel_delayed_work() to reliably kill be_worker() as it rearms itself.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tehuti: No need check vfree() pointer.

vfree() does its own 'NULL' check, so no need for check before
calling it.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxge: No need for check vfree() pointer.

vfree() does its own 'NULL' check, so no need for check before
calling it.

Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb3: remove __GFP_NOFAIL usage

Pre-allocate a skb at init time to be used for control messages to the HW
if skb allocation fails.

Tolerate failures to send messages initializing some memories at the cost of
parity error detection for these memories.
Retry sending connection id release messages if both alloc_skb(GFP_ATOMIC)
and alloc_skb(GFP_KERNEL) fail.
Do not bring the interface up if messages binding queue set to port fail to
be sent.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use dev_kfree_skb() instead of dev_kfree_skb_irq()

rtl8169_tx_interrupt() is used from NAPI context, it can
directly free skbs. dev_kfree_skb_irq() is a leftover from
pre-NAPI times of this driver.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Expose 100/1000BASE-T MDI-X status via ethtool

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mdio: Expose 10GBASE-T MDI-X status via ethtool

This is available in a standard MDIO register in 10GBASE-T PHYs.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Allow RX buf rings to be > than 4096 bytes.

RX buffer rings can be comprised of non-contiguous fixed
size chunks of memory. The ring is given to the hardware
as a pointer to a location that stores the location of
the queue. If the queue is greater than 4096 bytes then
the hardware gets a list of said pointers.
This patch addes the necessary logic to generate the list if
the queue size exceeds 4096.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Relax alignment on TX harware queue.

The alignment was on size of queue boundary, but the hardware
only requires 4-byte alignment.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1c: WAKE_MCAST tested twice, not WAKE_UCAST

The WAKE_MCAST bit is tested twice, the first should be WAKE_UCAST.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Jie Yang <jie.yang@atheros.com>
Cc: Jay Cliburn <jcliburn@gmail.com>
Cc: Chris Snook <csnook@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

de2104x: support for systems lacking cache coherence

Add a configurable Descriptor Skip Length for systems that lack cache
coherence.

(akpm: I think this should be done as a module parameter, not a
compile-tinme option)

Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/macvlan.c: fix cloning of tagged VLAN interfaces

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13348

akpm: the reporter disappeared, so I typed it in again.

It is not possible to make clone of tagged VLAN interface to be used as
mac-based vlan interfave.

How reproducible:
Use any 802.1q tagged vlan interface, e.g. eth2.700 and clone it:

  ip link add link eth2.700 address 00:04:75:cb:38:09 macvlan0 type macvlan
  ip link set dev macvlan0 up
  ip addr add 10.195.1.1/24 dev macvlan0

So far, so good. Now try to ping anything via macvlan0:

  ping 10.195.1.2

Actual results:
For every attempted packet tx kernel writes to console:

------------[ cut here ]------------
WARNING: at net/8021q/vlan_dev.c:254 vlan_dev_hard_header+0x36/0x126 [8021q]()
Hardware name: M22ES
Modules linked in: arptable_filter arp_tables bridge veth macvlan arc4 ecb
ppp_mppe ppp_async crc_ccitt ppp_generic slhc autofs4 sunrpc 8021q garp stp
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp
x_tables dm_mirror dm_region_hash dm_log dm_multipath dm_mod sbs sbshc lp
floppy snd_intel8x0 joydev snd_seq_dummy snd_intel8x0m snd_ac97_codec
ide_cd_mod ac97_bus snd_seq_oss cdrom snd_seq_midi_event serio_raw snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss parport_pc snd_pcm parport battery
8139cp snd_timer i2c_sis96x ac button snd rtc_cmos rtc_core 8139too soundcore
rtc_lib mii i2c_core pcspkr snd_page_alloc pata_sis libata sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: ip_tables]
Pid: 0, comm: swapper Tainted: G        W  2.6.29.3 #1
Call Trace:
[<c0425f48>] warn_slowpath+0x60/0x9f
[<c0425f6f>] warn_slowpath+0x87/0x9f
[<dffb850d>] vlan_dev_hard_header+0x0/0x126 [8021q]
[<dffb8543>] vlan_dev_hard_header+0x36/0x126 [8021q]
[<dffb850d>] vlan_dev_hard_header+0x0/0x126 [8021q]
[<df83155d>] macvlan_hard_header+0x3c/0x47 [macvlan]
[<df831521>] macvlan_hard_header+0x0/0x47 [macvlan]
[<c062bf3f>] arp_create+0xef/0x1ff
[<c062c08c>] arp_send+0x3d/0x54
[<c062c916>] arp_solicit+0x16c/0x177
[<c05fadd2>] neigh_timer_handler+0x227/0x269
[<c05fabab>] neigh_timer_handler+0x0/0x269
[<c042ce4d>] run_timer_softirq+0xf0/0x141
[<c0429e5a>] __do_softirq+0x76/0xf8
[<c0429de4>] __do_softirq+0x0/0xf8
<IRQ>  [<c044fb67>] handle_level_irq+0x0/0xad
[<c0429db7>] irq_exit+0x35/0x62
[<c04046bb>] do_IRQ+0xdf/0xf4
[<c04035a7>] common_interrupt+0x27/0x2c
[<c04079c5>] default_idle+0x2a/0x3d
[<c0401bb6>] cpu_idle+0x57/0x70

Macvlan driver always uses standard ethernet header length for all types
of interface to which it is linked.  This patch fixes this problem.

Reported-by: <sg.tweak@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e100: add non-MII PHY support

Restore support for cards with MII-lacking PHYs as compared to removed
pre-2.6.29 eepro100 driver: use full low-level MII I/O access abstraction
by providing clean PHY-specific mdio_ctrl() functions for either standard
MII-compliant cards, slightly special ones or non-MII PHY ones.

We now have another netdev_priv member for mdio_ctrl(), thus we have some
array indirection, but we save some additional opcodes for specific
phy_82552_v handling in the common case.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Andreas Mohr <andi@lisas.de>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: Use '%Zu' printf format for size_t.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6

cfg80211: fix rfkill locking problem

rfkill currently requires a global lock within the
rfkill_register() function, and holds that lock over
calls to the set_block() methods. This means that we
cannot hold a lock around rfkill_register() that we
also require in set_block(), directly or indirectly.
Fix cfg80211 to register rfkill outside the block
locked by its global lock. Much of what cfg80211 does
in the locked block doesn't need to be locked anyway.

Reported-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac80211: disable PS while probing AP

When associated, but probing the AP because we detected
beacon loss, we need to disable powersave to be able to
receive the probe response. Change the code to do that by
checking whether we're trying to probe when determining
the possibility of going into PS, and recalculate the PS
ability at the necessary spots.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>