sfrench/cifs-2.6.git
16 years ago[XFS] Ensure "both" features2 slots are consistent
Eric Sandeen [Thu, 10 Apr 2008 02:19:34 +0000 (12:19 +1000)]
[XFS] Ensure "both" features2 slots are consistent

Since older kernels may look in the sb_bad_features2 slot for flags,
rather than zeroing it out on fixup, we should make it equal to the
sb_features2 value.

Also, if the ATTR2 flag was not found prior to features2 fixup, it was not
set in the mount flags, so re-check after the fixup so that the current
session will use the feature.

Also fix up the comments to reflect these changes.

SGI-PV: 980085
SGI-Modid: xfs-linux-melb:xfs-kern:30778a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
16 years ago[XFS] Fix superblock features2 field alignment problem
David Chinner [Thu, 6 Mar 2008 02:45:50 +0000 (13:45 +1100)]
[XFS] Fix superblock features2 field alignment problem

Due to the xfs_dsb_t structure not being 64 bit aligned, the last field of
the on-disk superblock can vary in location This causes problems when the
filesystem gets moved to a different platform, or there is a 32 bit
userspace and 64 bit kernel.

This patch detects the defect at mount time, logs a warning such as:

XFS: correcting sb_features alignment problem

in dmesg and corrects the problem so that everything is OK. it also
blacklists the bad field in the superblock so it does not get used for
something else later on.

SGI-PV: 977636
SGI-Modid: xfs-linux-melb:xfs-kern:30539a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
16 years ago[XFS] remove shouting-indirection macros from xfs_sb.h
Eric Sandeen [Thu, 6 Mar 2008 02:44:28 +0000 (13:44 +1100)]
[XFS] remove shouting-indirection macros from xfs_sb.h

Remove macro-to-small-function indirection from xfs_sb.h, and remove some
which are completely unused.

SGI-PV: 976035
SGI-Modid: xfs-linux-melb:xfs-kern:30528a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
16 years agosplice: fix infinite loop in generic_file_splice_read()
Jens Axboe [Thu, 10 Apr 2008 06:24:25 +0000 (08:24 +0200)]
splice: fix infinite loop in generic_file_splice_read()

There's a quirky loop in generic_file_splice_read() that could go
on indefinitely, if the file splice returns 0 permanently (and not
just as a temporary condition). Get rid of the loop and pass
back -EAGAIN correctly from __generic_file_splice_read(), so we
handle that condition properly as well.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years ago[SPARC]: Fix several regset and ptrace bugs.
David S. Miller [Thu, 10 Apr 2008 02:39:25 +0000 (19:39 -0700)]
[SPARC]: Fix several regset and ptrace bugs.

1) ptrace should pass 'current' to task_user_regset_view()

2) When fetching general registers using a 64-bit view, and
   the target is 32-bit, we have to convert.

3) Skip the whole register window get/set code block if
   the user isn't asking to access anything in there.

   Otherwise we have problems if the user doesn't have
   an address space setup.  Fetching ptrace register is
   still valid at such a time, and ptrace does not try
   to access the register window area of the regset.

Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agopop previous section in alternative.c
Steven Rostedt [Wed, 9 Apr 2008 23:04:07 +0000 (19:04 -0400)]
pop previous section in alternative.c

gcc expects all toplevel assembly to return to the original section type.
The code in alteranative.c does not do this. This caused some strange bugs
in sched-devel where code would end up in the .rodata section and when
the kernel sets the NX bit on all .rodata, the kernel would crash when
executing this code.

This patch adds a .previous marker to return the code back to the
original section.

Credit goes to Andrew Pinski for telling me it wasn't a gcc bug but a
bug in the toplevel asm code in the kernel.  ;-)

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Thu, 10 Apr 2008 01:36:12 +0000 (18:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: don't BUG if fs reuses a superblock

16 years agoSELinux: don't BUG if fs reuses a superblock
Eric Paris [Wed, 9 Apr 2008 18:08:35 +0000 (14:08 -0400)]
SELinux: don't BUG if fs reuses a superblock

I (wrongly) assumed that nfs_xdev_get_sb() would not ever share a superblock
and so cloning mount options would always be correct.  Turns out that isn't
the case and we could fall over a BUG_ON() that wasn't a BUG at all.  Since
there is little we can do to reconcile different mount options this patch
just leaves the sb alone and the first set of options wins.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: James Morris <jmorris@namei.org>
16 years agoBNX2X: Correct bringing chip out of reset
Eliezer Tamir [Wed, 9 Apr 2008 22:25:46 +0000 (15:25 -0700)]
BNX2X: Correct bringing chip out of reset

Fixed bug: Wrong register was written to when bringing the chip out of
reset.

[ Bump driver version and release date -DaveM ]

Signed-off-by: Eliezer Tamir <eliezert@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_nat: autoload IPv4 connection tracking
Jan Engelhardt [Wed, 9 Apr 2008 22:14:58 +0000 (15:14 -0700)]
[NETFILTER]: nf_nat: autoload IPv4 connection tracking

Without this patch, the generic L3 tracker would kick in
if nf_conntrack_ipv4 was not loaded before nf_nat, which
would lead to translation problems with ICMP errors.

NAT does not make sense without IPv4 connection tracking
anyway, so just add a call to need_ipv4_conntrack().

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: xt_hashlimit: fix mask calculation
Patrick McHardy [Wed, 9 Apr 2008 22:14:18 +0000 (15:14 -0700)]
[NETFILTER]: xt_hashlimit: fix mask calculation

Shifts larger than the data type are undefined, don't try to shift
an u32 by 32. Also remove some special-casing of bitmasks divisible
by 32.

Based on patch by Jan Engelhardt <jengelh@computergmbh.de>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Wed, 9 Apr 2008 22:10:14 +0000 (15:10 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

16 years ago[XFRM]: xfrm_user: fix selector family initialization
Patrick McHardy [Wed, 9 Apr 2008 22:08:24 +0000 (15:08 -0700)]
[XFRM]: xfrm_user: fix selector family initialization

Commit df9dcb45 ([IPSEC]: Fix inter address family IPsec tunnel handling)
broke openswan by removing the selector initialization for tunnel mode
in case it is uninitialized.

This patch restores the initialization, fixing openswan, but probably
breaking inter-family tunnels again (unknown since the patch author
disappeared). The correct thing for inter-family tunnels is probably
to simply initialize the selector family explicitly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agort61pci: rt61pci_beacon_update do not free skb twice
Daniel Wagner [Wed, 9 Apr 2008 14:29:01 +0000 (16:29 +0200)]
rt61pci: rt61pci_beacon_update do not free skb twice

The layer above will free the skb in an error case.

Signed-off-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Wed, 9 Apr 2008 15:06:27 +0000 (08:06 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  ata/sata_fsl: Remove unused variable in sata_fsl_probe
  pata_sil680: Fix build on arch/ppc

16 years agossb-mipscore: Fix interrupt vectors
Michael Buesch [Tue, 8 Apr 2008 09:17:29 +0000 (11:17 +0200)]
ssb-mipscore: Fix interrupt vectors

This fixes assignment of the interrupt vectors on the SSB MIPS core.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agossb-pcicore: Fix IRQ TPS flag handling
Larry Finger [Tue, 8 Apr 2008 08:28:24 +0000 (10:28 +0200)]
ssb-pcicore: Fix IRQ TPS flag handling

This fixes the TPS flag handling for the SSB pcicore driver.
This fixes interrupts on some devices.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agomac80211: use short_preamble mode from capability if ERP IE not present
Vladimir Koutny [Mon, 31 Mar 2008 15:05:10 +0000 (17:05 +0200)]
mac80211: use short_preamble mode from capability if ERP IE not present

When associating to a b-only AP where there is no ERP IE, short preamble
mode is left at previous state (probably also protection mode). In this
case, disable protection and use short preamble mode as specified in
capability field. The same is done if capability field is changed on-the-fly.

Signed-off-by: Vladimir Koutny <vlado@ksp.sk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agoata/sata_fsl: Remove unused variable in sata_fsl_probe
Johann Felix Soden [Sun, 6 Apr 2008 13:10:54 +0000 (15:10 +0200)]
ata/sata_fsl: Remove unused variable in sata_fsl_probe

In sata_fsl_probe memory is allocated but never used or deallocated.
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10404
Thanks to Daniel Marjamäki for the bug report.

Reported-by: Daniel Marjamäki <danielm77@spray.se>
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
16 years agopata_sil680: Fix build on arch/ppc
Benjamin Herrenschmidt [Tue, 8 Apr 2008 21:51:07 +0000 (07:51 +1000)]
pata_sil680: Fix build on arch/ppc

Commit 0f436eff54f90419ac1b8accfb3e6e17c4b49a4e breaks build on
arch/ppc as it doesn't implement the machine_is() macro.

This fixes it by using CONFIG_PPC_MERGE instead which represents
arch/powerpc only, while CONFIG_PPC is set for both.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
16 years agoMerge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
Linus Torvalds [Wed, 9 Apr 2008 01:26:31 +0000 (18:26 -0700)]
Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6

* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Fix a memory leak in rpc_create()
  fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private()
  NFS: initialize flags field in nfs_open_context
  SUNRPC: don't call flush_dcache_page() with an invalid pointer

16 years agomtd/chips: add missing set_current_state() to cfi_{amdstd,staa}_sync()
Dmitry Adamushko [Wed, 9 Apr 2008 00:41:59 +0000 (17:41 -0700)]
mtd/chips: add missing set_current_state() to cfi_{amdstd,staa}_sync()

cfi_amdstd_sync() and cfi_staa_sync() call schedule() without changing task's
state appropriately.

In case of e.g.  chip->state == FL_ERASING, cfi_*_sync() will be busy-looping
either redundantly for a fixed interval of time (for SCHED_NORMAL tasks) or
possibly endlessly (for RT tasks and UP).

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agospi: spi_bfin5xx: remove unused label
Michael Hennerich [Wed, 9 Apr 2008 00:41:58 +0000 (17:41 -0700)]
spi: spi_bfin5xx: remove unused label

Remove unused label, and associated compiler warning.

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agospi: documentation tweaks
David Brownell [Wed, 9 Apr 2008 00:41:58 +0000 (17:41 -0700)]
spi: documentation tweaks

Update SPI documentation to clarify some areas of recent confusion: clock
polarity takes effect when chipselect goes active; and zero length buffers are
OK in certain cases.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agospi: spi_bfin5xx: fix probe() sequencing
Vitja Makarov [Wed, 9 Apr 2008 00:41:57 +0000 (17:41 -0700)]
spi: spi_bfin5xx: fix probe() sequencing

Fix bug in SPI probe: first initialize peripheral pins, and just after
register spi master device.  This fixes problems with SPI drivers built-in
kernel.

Singed-off-by: Vitja Makarov <vitja.makarov@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agospi: spi_bfin5xx build fix
Mike Frysinger [Wed, 9 Apr 2008 00:41:57 +0000 (17:41 -0700)]
spi: spi_bfin5xx build fix

Fix breakage cause by overzealous line wrapping; there should be only one
format string.

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoacpi: fix "buggy BIOS check" when CPUs are hot removed
Alok Kataria [Wed, 9 Apr 2008 00:41:56 +0000 (17:41 -0700)]
acpi: fix "buggy BIOS check" when CPUs are hot removed

Fixes a BUG in ACPI hotplugging.

processor_device_array[pr->id] needs to be set to NULL when removing a CPU.
Else the "buggy BIOS check" in acpi_processor_start mistakenly fires when a
CPU is removed from the system and then later re-added.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Dan Arai <arai@vmware.com>
Cc: Len Brown <lenb@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoes1968: fix sleep-while-holding-lock bug
Arjan van de Ven [Wed, 9 Apr 2008 00:41:55 +0000 (17:41 -0700)]
es1968: fix sleep-while-holding-lock bug

snd_es1968_ac97_read() calls snd_es1968_ac97_wait() first outside a locked
area, and later, while holding a lock.

snd_es1968_ac97_wait() has a polling loop with a cond_resched() inside it..
which sleeps, so the second call is invalid.

This patch adds a version of the wait function that just pure polls.  While
this is not very elegant in principle, it's very likely the easiest thing to
do here, we already checked if the chip was ready (while yielding) just
before, so it is very unlikely to take a long time here.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomemcg: fix node_state handling
KAMEZAWA Hiroyuki [Wed, 9 Apr 2008 00:41:54 +0000 (17:41 -0700)]
memcg: fix node_state handling

This should be N_NORMAL_MEMORY.

N_NORMAL_MEMORY is "true" if a node has memory for the kernel.  N_HIGH_MEMORY
is "true" if a node has memory for HIGHMEM.  (If CONFIG_HIGHMEM=n, always
"true")

This check is used for testing whether we can use kmalloc_node() on a node.
Then, if there is a node which only contains HIGHMEM, the system will call
kmalloc_node() which doesn't contain memory for the kernel.  If it happens
under SLUB, the kernel will panic.  I think this only happens on x86_32-numa.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoacpi thermal: fix result check
Krzysztof Helt [Wed, 9 Apr 2008 00:41:52 +0000 (17:41 -0700)]
acpi thermal: fix result check

thermal_zone_device_register() uses the ERR_PTR macro on its return values.  A
correct check is to use the IS_ERR() macro.

The 2.6.25 kernels panic on Compaq AP550 without this patch as it has more
then 10 (THERMAL_MAX_TRIPS) trip points (there are 12).

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoub: remove BUG() after __blk_end_request and fix the condition causing it
Pete Zaitcev [Wed, 9 Apr 2008 00:41:51 +0000 (17:41 -0700)]
ub: remove BUG() after __blk_end_request and fix the condition causing it

When __blk_end_request returns nonzero, it means that the request was
not completely processed and some BIOs are still attached. Since we
have dequeued it by that time, it means leaking requests and hanging
processes, which is why BUG() was in there. In ub this happens if
a packet request ends normally, but with residue (e.g. when scsi_id
issues INQUIRY).

The fix is to make sure that arguments passed to __blk_end_request
are correct: the full request length and not just transferred length.
The transferred length is indicated to applications by adjusting
rq->data_len with old, unchanged code outside of this patch.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Cc: Greg KH <greg@kroah.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoSUNRPC: Fix a memory leak in rpc_create()
Chuck Lever [Mon, 7 Apr 2008 20:52:44 +0000 (16:52 -0400)]
SUNRPC: Fix a memory leak in rpc_create()

Commit 510deb0d was supposed to move the xprt_create_transport() call in
rpc_create(), but neglected to remove the old call site.  This resulted in
a transport leak after every rpc_create() call.

This leak is present in 2.6.24 and 2.6.25.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agofix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_...
Bryan Wu [Wed, 2 Apr 2008 03:23:39 +0000 (20:23 -0700)]
fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private()

NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine
http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992

Signed-off-by: Bryan Wu <cooloney@kernel.org>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoNFS: initialize flags field in nfs_open_context
Jeff Layton [Mon, 31 Mar 2008 19:01:58 +0000 (15:01 -0400)]
NFS: initialize flags field in nfs_open_context

The nfs_open_context struct had a "flags" field added recently, but the
allocator isn't initializing it. It also looks like the allocator isn't
initializing the mode or list either, but they seem to be overwritten
by the caller, so that's less of an issue.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years agoSUNRPC: don't call flush_dcache_page() with an invalid pointer
Trond Myklebust [Mon, 31 Mar 2008 21:02:02 +0000 (17:02 -0400)]
SUNRPC: don't call flush_dcache_page() with an invalid pointer

Fix a problem in _copy_to_pages(), whereby it may call flush_dcache_page()
with an invalid pointer due to the fact that 'pgto' gets incremented
beyond the end of the page array. Fix is to exit the loop without this
unnecessary increment of pgto.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16 years ago[NET]: Undo code bloat in hot paths due to print_mac().
David S. Miller [Tue, 8 Apr 2008 23:50:44 +0000 (16:50 -0700)]
[NET]: Undo code bloat in hot paths due to print_mac().

If print_mac() is used inside of a pr_debug() the compiler
can't see that the call is redundant so still performs it
even of pr_debug() ends up being a nop.

So don't use print_mac() in such cases in hot code paths,
use MAC_FMT et al. instead.

As noted by Joe Perches, pr_debug() could be modified to
handle this better, but that is a change to an interface
used by the entire kernel and thus needs to be validated
carefully.  This here is thus the less risky fix for
2.6.25

Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TCP]: Don't allow FRTO to take place while MTU is being probed
Ilpo Järvinen [Tue, 8 Apr 2008 05:33:57 +0000 (22:33 -0700)]
[TCP]: Don't allow FRTO to take place while MTU is being probed

MTU probe can cause some remedies for FRTO because the normal
packet ordering may be violated allowing FRTO to make a wrong
decision (it might not be that serious threat for anything
though). Thus it's safer to not run FRTO while MTU probe is
underway.

It seems that the basic FRTO variant should also look for an
skb at probe_seq.start to check if that's retransmitted one
but I didn't implement it now (plain seqno in window check
isn't robust against wraparounds).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TCP]: tcp_simple_retransmit can cause S+L
Ilpo Järvinen [Tue, 8 Apr 2008 05:33:07 +0000 (22:33 -0700)]
[TCP]: tcp_simple_retransmit can cause S+L

This fixes Bugzilla #10384

tcp_simple_retransmit does L increment without any checking
whatsoever for overflowing S+L when Reno is in use.

The simplest scenario I can currently think of is rather
complex in practice (there might be some more straightforward
cases though). Ie., if mss is reduced during mtu probing, it
may end up marking everything lost and if some duplicate ACKs
arrived prior to that sacked_out will be non-zero as well,
leading to S+L > packets_out, tcp_clean_rtx_queue on the next
cumulative ACK or tcp_fastretrans_alert on the next duplicate
ACK will fix the S counter.

More straightforward (but questionable) solution would be to
just call tcp_reset_reno_sack() in tcp_simple_retransmit but
it would negatively impact the probe's retransmission, ie.,
the retransmissions would not occur if some duplicate ACKs
had arrived.

So I had to add reno sacked_out reseting to CA_Loss state
when the first cumulative ACK arrives (this stale sacked_out
might actually be the explanation for the reports of left_out
overflows in kernel prior to 2.6.23 and S+L overflow reports
of 2.6.24). However, this alone won't be enough to fix kernel
before 2.6.24 because it is building on top of the commit
1b6d427bb7e ([TCP]: Reduce sacked_out with reno when purging
write_queue) to keep the sacked_out from overflowing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb
Ilpo Järvinen [Tue, 8 Apr 2008 05:32:38 +0000 (22:32 -0700)]
[TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb

Fixes a long-standing bug which makes NewReno recovery crippled.
With GSO the whole head skb was marked as LOST which is in
violation of NewReno procedure that only wants to mark one packet
and ended up breaking our TCP code by causing counter overflow
because our code was built on top of assumption about valid
NewReno procedure. This manifested as triggering a WARN_ON for
the overflow in a number of places.

It seems relatively safe alternative to just do nothing if
tcp_fragment fails due to oom because another duplicate ACK is
likely to be received soon and the fragmentation will be retried.

Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was
lucky enough to be able to reproduce this so that the warning
for the overflow was hit. It's not as easy task as it seems even
if this bug happens quite often because the amount of outstanding
data is pretty significant for the mismarkings to lead to an
overflow.

Because it's very late in 2.6.25-rc cycle (if this even makes in
time), I didn't want to touch anything with SACK enabled here.
Fragmenting might be useful for it as well but it's more or less
a policy decision rather than mandatory fix. Thus there's no need
to rush and we can postpone considering tcp_fragment with SACK
for 2.6.26.

In 2.6.24 and earlier, this very same bug existed but the effect
is slightly different because of a small changes in the if
conditions that fit to the patch's context. With them nothing
got lost marker and thus no retransmissions happened.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack
Ilpo Järvinen [Tue, 8 Apr 2008 05:31:38 +0000 (22:31 -0700)]
[TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack

The fast retransmission can be forced locally to the rfc3517
branch in tcp_update_scoreboard instead of making such fragile
constructs deeper in tcp_mark_head_lost.

This is necessary for the next patch which must not have
loopholes for cnt > packets check. As one can notice,
readability got some improvements too because of this :-).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonl80211: fix STA AID bug
Johannes Berg [Mon, 7 Apr 2008 12:35:46 +0000 (14:35 +0200)]
nl80211: fix STA AID bug

This fixes the STA AID setting and actually makes hostapd/mac80211
work properly in presence of power-saving stations.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agob43legacy: fix bcm4303 crash
Stefano Brivio [Sun, 6 Apr 2008 15:05:07 +0000 (17:05 +0200)]
b43legacy: fix bcm4303 crash

This fixes an hard crash which happened upon driver loading on bcm4303 rev.
2 devices.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agoiwlwifi: fix n-band association problem
Abhijeet Kolekar [Fri, 4 Apr 2008 21:32:01 +0000 (14:32 -0700)]
iwlwifi: fix n-band association problem

This patch enables the IWL4965_HT flag (n-band) in Kconfig.
Removed the "depends on n" from Kconfig for config IWL4965_HT

Signed-off-by: Abhijeet Kolekar <abhijeet.kolekar@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agoipw2200: set MAC address on radiotap interface
Daniel Drake [Wed, 2 Apr 2008 19:33:54 +0000 (20:33 +0100)]
ipw2200: set MAC address on radiotap interface

Commit bada339ba24dee9e143bfb42e1dc61f146619846 enforces that all
interfaces have a valid MAC address before they are brought up.

ipw2200 does not assign a MAC address to it's radiotap interface, meaning
that the radiotap interface cannot be brought up in 2.6.24.
https://bugs.gentoo.org/show_bug.cgi?id=215714

Fix this by copying the MAC address from the real interface.

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agolibertas: fix mode initialization problem
Holger Schurig [Wed, 2 Apr 2008 14:34:51 +0000 (16:34 +0200)]
libertas: fix mode initialization problem

After moving lbs_find_best_network_ssid() from scan.c to assoc.c gcc was
able to deduce that new_mode might stay uninitialized.

Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 8 Apr 2008 02:15:35 +0000 (19:15 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: more GFP_NOFS fixups to prevent selinux from re-entering the fs code

16 years agopvrusb2: fix broken build due to patch order dependency
Michael Krufky [Mon, 7 Apr 2008 00:40:17 +0000 (20:40 -0400)]
pvrusb2: fix broken build due to patch order dependency

Fix broken build due to patch order dependency.  A future patch requires
the lines that break the current build.  Disable those lines for now.

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoSELinux: more GFP_NOFS fixups to prevent selinux from re-entering the fs code
Stephen Smalley [Fri, 4 Apr 2008 12:46:05 +0000 (08:46 -0400)]
SELinux: more GFP_NOFS fixups to prevent selinux from re-entering the fs code

More cases where SELinux must not re-enter the fs code. Called from the
d_instantiate security hook.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
16 years agoMerge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/upstrea...
Linus Torvalds [Mon, 7 Apr 2008 21:54:07 +0000 (14:54 -0700)]
Merge branch 'upstream' of git://git./linux/kernel/git/ralf/upstream-linus

* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/upstream-linus:
  [MIPS] Handle aliases in vmalloc correctly.

16 years ago[MIPS] Handle aliases in vmalloc correctly.
Ralf Baechle [Sat, 5 Apr 2008 14:13:23 +0000 (15:13 +0100)]
[MIPS] Handle aliases in vmalloc correctly.

flush_cache_vmap / flush_cache_vunmap were calling flush_cache_all which -
having been deprecated - turned into a nop ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
Linus Torvalds [Mon, 7 Apr 2008 21:26:53 +0000 (14:26 -0700)]
Merge git://git./linux/kernel/git/bart/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  siimage: fix kernel oops on PPC 44x

16 years agosiimage: fix kernel oops on PPC 44x
Sergei Shtylyov [Mon, 7 Apr 2008 21:30:10 +0000 (23:30 +0200)]
siimage: fix kernel oops on PPC 44x

Fix kernel oops due to machine check occuring in init_chipset_siimage() on PPC
44x platforms.  These 32-bit CPUs have 36-bit physical address and PCI I/O and
memory spaces are mapped beyond 4 GB; arch/ppc/ code has a fixup in ioremap()
that creates an illusion of the PCI I/O and memory resources being mapped below
4 GB, while arch/powerpc/ code got rid of this fixup with PPC 44x having instead
CONFIG_RESOURCES_64BIT=y -- this causes the resources to be truncated to 32-bit
'unsigned long' type in this driver, and so non-existant memory being ioremap'ed
and then accessed...

Thanks to Valentine Barshak for providing an initial patch and explanations.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
16 years agovirtio_net: remove overzealous printk
Anthony Liguori [Mon, 7 Apr 2008 20:33:16 +0000 (15:33 -0500)]
virtio_net: remove overzealous printk

The 'disable_cb' is really just a hint and as such, it's possible for more
work to get queued up while callbacks are disabled.  Under stress with an
SMP guest, this printk triggers very frequently.  There is no race here, this
is how things are designed to work so let's just remove the printk.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoRevert "smc91x: fix build breakage from the SMC_GET_MAC_ADDR API upgrade"
Linus Torvalds [Mon, 7 Apr 2008 20:20:08 +0000 (13:20 -0700)]
Revert "smc91x: fix build breakage from the SMC_GET_MAC_ADDR API upgrade"

This reverts commit 9e6db60825ef7e7999abc610ce256ba768e58162, which was
merged without the API it needed, causing build breakage.

Reported-by: Bryan Wu <cooloney@kernel.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux...
Linus Torvalds [Mon, 7 Apr 2008 20:14:37 +0000 (13:14 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/x86/linux-2.6-x86

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: fix 64-bit asm NOPS for CONFIG_GENERIC_CPU
  x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier()
  revert "x86: tsc prevent time going backwards"

16 years agovirtio: remove overzealous BUG_ON.
Rusty Russell [Mon, 7 Apr 2008 04:30:28 +0000 (14:30 +1000)]
virtio: remove overzealous BUG_ON.

The 'disable_cb' callback is designed as an optimization to tell the host
we don't need callbacks now.  As it is not reliable, the debug check is
overzealous: it can happen on two CPUs at the same time.  Document this.

Even if it were reliable, the virtio_net driver doesn't disable
callbacks on transmit so the START_USE/END_USE debugging reentrance
protection can be easily tripped even on UP.

Thanks to Balaji Rao for the bug report and testing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: Balaji Rao <balajirrao@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agox86: fix 64-bit asm NOPS for CONFIG_GENERIC_CPU
Suresh Siddha [Mon, 7 Apr 2008 18:56:34 +0000 (11:56 -0700)]
x86: fix 64-bit asm NOPS for CONFIG_GENERIC_CPU

ASM_NOP's for 64-bit kernel with CONFIG_GENERIC_CPU is broken
with the recent x86 nops merge. They were using GENERIC_NOPS
which will truncate the upper 32bits of %rsi, because of the missing
64bit rex prefix.

For now, fall back ASM NOPS for generic cpu to K8 NOPS, similar
to the code before the wrong x86 nop merge.

This should resolve the crash seen by Ingo on a test-system:

BUG: unable to handle kernel paging request at 00000000d80d8ee8
IP: [<ffffffff802121af>] save_i387_ia32+0x61/0xd8
PGD b8e0067 PUD 51490067 PMD 0
Oops: 0000 [1] SMP
CPU 2
Modules linked in:
Pid: 3871, comm: distcc Not tainted 2.6.25-rc7-sched-devel.git-x86-latest.git #359
RIP: 0010:[<ffffffff802121af>]  [<ffffffff802121af>] save_i387_ia32+0x61/0xd8
RSP: 0000:ffff81003abd3cb8  EFLAGS: 00010246
RAX: ffff810082e93400 RBX: 00000000ffc37f84 RCX: ffff8100d80d8ee0
RDX: 0000000000000000 RSI: 00000000d80d8ee0 RDI: ffff810082e93400
RBP: 00000000ffc37fdc R08: 00000000ffc37f88 R09: 0000000000000008
R10: ffff81003abd2000 R11: 0000000000000000 R12: ffff810082e93400
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff81011fb12dc0(0063) knlGS:00000000f7f1a6c0
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000d80d8ee8 CR3: 0000000076922000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process distcc (pid: 3871, threadinfo ffff81003abd2000, task ffff8100d80d8ee0)
Stack:  ffff8100bb670380 ffffffff8026de50 0000000000000118 0000000000000002
 0000000000000002 ffff81003abd3e68 ffff81003abd3ed8 ffff81003abd3de8
 ffff81003abd3d18 ffffffff80229785 ffff8100d80d8ee0 ffff810001041280
Call Trace:
 [<ffffffff8026de50>] ? __generic_file_aio_write_nolock+0x343/0x377
 [<ffffffff80229785>] ? update_curr+0x54/0x64
 [<ffffffff80227cd3>] ? ia32_setup_sigcontext+0x125/0x1d2
 [<ffffffff8022839f>] ? ia32_setup_frame+0x73/0x1a5
 [<ffffffff8020b2a5>] ? do_notify_resume+0x1aa/0x7db
 [<ffffffff8024ae8c>] ? getnstimeofday+0x31/0x85
 [<ffffffff80249858>] ? ktime_get_ts+0x17/0x48
 [<ffffffff80249933>] ? ktime_get+0xc/0x41
 [<ffffffff8024973e>] ? hrtimer_nanosleep+0x75/0xd5
 [<ffffffff80249261>] ? hrtimer_wakeup+0x0/0x21
 [<ffffffff8020bfbc>] ? int_signal+0x12/0x17
 [<ffffffff8030e6b3>] ? dummy_file_free_security+0x0/0x1

Code: a6 08 05 00 00 f6 40 14 01 74 34 4c 89 e7 48 0f ae 07 48 8b 86 08 05 00 00 80 78 02 00 79 02 db e2 90 8d b4 26 00 00 00 00 89 f6 <48> 8b 46 08 83 60 14 fe 0f 20 c0 48 83 c8 08 0f 22 c0 eb 07 c6

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier()
Karsten Wiese [Mon, 7 Apr 2008 10:14:45 +0000 (12:14 +0200)]
x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier()

In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it
instead of smp_processor_id() in the call to set_cyc2ns_scale().
This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update
the intended cpu's cyc2ns.

Related mail/thread: http://lkml.org/lkml/2007/12/7/130

Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agorevert "x86: tsc prevent time going backwards"
Ingo Molnar [Mon, 7 Apr 2008 18:58:08 +0000 (20:58 +0200)]
revert "x86: tsc prevent time going backwards"

revert:

| commit 47001d603375f857a7fab0e9c095d964a1ea0039
| Author: Thomas Gleixner <tglx@linutronix.de>
| Date:   Tue Apr 1 19:45:18 2008 +0200
|
|     x86: tsc prevent time going backwards

it has been identified to cause suspend regression - and the
commit fixes a longstanding bug that existed before 2.6.25 was
opened - so it can wait some more until the effects are better
understood.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Mon, 7 Apr 2008 15:36:57 +0000 (08:36 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  fix endian lossage in forcedeth
  net/tokenring/olympic.c section fixes
  net: marvell.c fix sparse shadowed variable warning
  [VLAN]: Fix egress priority mappings leak.
  [TG3]: Add PHY workaround for 5784
  [NET]: srandom32 fixes for networking v2
  [IPV6]: Fix refcounting for anycast dst entries.
  [IPV6]: inet6_dev on loopback should be kept until namespace stop.
  [IPV6]: Event type in addrconf_ifdown is mis-used.
  [ICMP]: Ensure that ICMP relookup maintains status quo

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Mon, 7 Apr 2008 15:36:37 +0000 (08:36 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Fix user accesses in regset code.
  [SPARC64]: Fix FPU saving in 64-bit signal handling.

16 years agoMerge branch 'pci_id_updates' of git://git.kernel.org/pub/scm/linux/kernel/git/mcheha...
Linus Torvalds [Sun, 6 Apr 2008 23:12:24 +0000 (16:12 -0700)]
Merge branch 'pci_id_updates' of git://git./linux/kernel/git/mchehab/v4l-dvb

* 'pci_id_updates' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (7497): pvrusb2: add new usb pid for 73xxx models
  V4L/DVB (7496): pvrusb2: add new usb pid for 75xxx models

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
Linus Torvalds [Sun, 6 Apr 2008 23:11:57 +0000 (16:11 -0700)]
Merge git://git./linux/kernel/git/mchehab/v4l-dvb

* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (7499): v4l/dvb Kconfig: Fix bugzilla #10067
  V4L/DVB (7495): s5h1409: fix blown-away bit in function s5h1409_set_gpio
  V4L/DVB (7460): bttv: Bt832 - fix possible NULL pointer deref

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
Linus Torvalds [Sun, 6 Apr 2008 23:11:22 +0000 (16:11 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] it8712f_wdt Zero MSB timeout byte when disabling watchdog

16 years agoFix booting pentium+ with dodgy TSC
Rusty Russell [Sun, 6 Apr 2008 07:23:38 +0000 (17:23 +1000)]
Fix booting pentium+ with dodgy TSC

We handle a broken tsc these days, so no need to panic.  We clear the
TSC bit when tsc_init decides it's unreliable (eg.  under lguest w/ bad
host TSC), leading to bogus panic.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofix IS_I9XX macro in i915 DRM driver
Jesse Barnes [Sun, 6 Apr 2008 18:55:04 +0000 (11:55 -0700)]
fix IS_I9XX macro in i915 DRM driver

Now that we're mapping registers in the DRM driver at load time, the
driver actually checks the PCI ID, so we need to make sure the macros
have all the right bits (and longer term use the DRM headers as the sole
copy of the PCI & register definitions).

This patch adds 945GME support to the DRM headers, fixing a regression
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10395.

Tested-by: Alexander Oltu <alexander@all-2.com>
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoscsi: fix sense_slab/bio swapping livelock
Hugh Dickins [Sun, 6 Apr 2008 22:56:57 +0000 (23:56 +0100)]
scsi: fix sense_slab/bio swapping livelock

Since 2.6.25-rc7, I've been seeing an occasional livelock on one x86_64
machine, copying kernel trees to tmpfs, paging out to swap.

Signature: 6000 pages under writeback but never getting written; most
tasks of interest trying to reclaim, but each get_swap_bio waiting for a
bio in mempool_alloc's io_schedule_timeout(5*HZ); every five seconds an
atomic page allocation failure report from kblockd failing to allocate a
sense_buffer in __scsi_get_command.

__scsi_get_command has a (one item) free_list to protect against this,
but rc1's [SCSI] use dynamically allocated sense buffer
de25deb18016f66dcdede165d07654559bb332bc upset that slightly.  When it
fails to allocate from the separate sense_slab, instead of giving up, it
must fall back to the command free_list, which is sure to have a
sense_buffer attached.

Either my earlier -rc testing missed this, or there's some recent
contributory factor.  One very significant factor is SLUB, which merges
slab caches when it can, and on 64-bit happens to merge both bio cache
and sense_slab cache into kmalloc's 128-byte cache: so that under this
swapping load, bios above are liable to gobble up all the slots needed
for scsi_cmnd sense_buffers below.

That's disturbing behaviour, and I tried a few things to fix it.  Adding
a no-op constructor to the sense_slab inhibits SLUB from merging it, and
stops all the allocation failures I was seeing; but it's rather a hack,
and perhaps in different configurations we have other caches on the
swapout path which are ill-merged.

Another alternative is to revert the separate sense_slab, using
cache-line-aligned sense_buffer allocated beyond scsi_cmnd from the one
kmem_cache; but that might waste more memory, and is only a way of
diverting around the known problem.

While I don't like seeing the allocation failures, and hate the idea of
all those bios piled up above a scsi host working one by one, it does
seem to emerge fairly soon with the livelock fix.  So lacking better
ideas, stick with that one clear fix for now.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.ziljstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoV4L/DVB (7497): pvrusb2: add new usb pid for 73xxx models
Michael Krufky [Sun, 16 Mar 2008 02:59:29 +0000 (23:59 -0300)]
V4L/DVB (7497): pvrusb2: add new usb pid for 73xxx models

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
16 years agoV4L/DVB (7496): pvrusb2: add new usb pid for 75xxx models
Michael Krufky [Sat, 8 Mar 2008 09:07:38 +0000 (06:07 -0300)]
V4L/DVB (7496): pvrusb2: add new usb pid for 75xxx models

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
16 years agoV4L/DVB (7499): v4l/dvb Kconfig: Fix bugzilla #10067
Mauro Carvalho Chehab [Thu, 3 Apr 2008 23:08:04 +0000 (20:08 -0300)]
V4L/DVB (7499): v4l/dvb Kconfig: Fix bugzilla #10067

tda8290 breaks if tuner is selected, but CONFIG_DVB=n.

Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
16 years agoV4L/DVB (7495): s5h1409: fix blown-away bit in function s5h1409_set_gpio
Michael Krufky [Thu, 3 Apr 2008 01:14:41 +0000 (22:14 -0300)]
V4L/DVB (7495): s5h1409: fix blown-away bit in function s5h1409_set_gpio

Preserve all other bits when setting gpio.

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Steven Toth <stoth@hauppauge.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
16 years agoV4L/DVB (7460): bttv: Bt832 - fix possible NULL pointer deref
Cyrill Gorcunov [Tue, 1 Apr 2008 19:48:23 +0000 (16:48 -0300)]
V4L/DVB (7460): bttv: Bt832 - fix possible NULL pointer deref

This patch does fix potential NULL pointer dereference

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
16 years ago[WATCHDOG] it8712f_wdt Zero MSB timeout byte when disabling watchdog
Andrew Paprocki [Wed, 2 Apr 2008 06:43:19 +0000 (02:43 -0400)]
[WATCHDOG] it8712f_wdt Zero MSB timeout byte when disabling watchdog

I noticed this while testing the latest code. I'm not sure if it is required,
but the normal (or LSB) timeout value is set to zero, so the MSB should
be as well to stay consistent.

If the chip revision is >= 8, set MSB of the 16-bit timeout value to zero
when disabling the watchdog in it8712f_wdt_disable().

Signed-off-by: Andrew Paprocki <andrew@ishiboo.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
16 years agoRevert "ACPI: Ignore _BQC object when registering backlight device"
Linus Torvalds [Sat, 5 Apr 2008 19:14:13 +0000 (12:14 -0700)]
Revert "ACPI: Ignore _BQC object when registering backlight device"

This reverts commit 7c0ea45be4f114d85ee35caeead8e1660699c46f which
caused a regression with the backlight being set to off when a laptop
doesn't have a _BQC entry to query the actual backlight value.  The code
blindly then falls back on a value of 0.

See
http://bugzilla.kernel.org/show_bug.cgi?id=10387
http://lkml.org/lkml/2008/4/2/366

for details.

Bisected-and-reported-by: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/upstrea...
Linus Torvalds [Fri, 4 Apr 2008 22:09:44 +0000 (15:09 -0700)]
Merge branch 'upstream' of git://git./linux/kernel/git/ralf/upstream-linus

* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/upstream-linus:
  [MIPS] Make KGDB compile on UP
  [MIPS] Pb1200: Fix header breakage

16 years agoMerge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
David S. Miller [Fri, 4 Apr 2008 22:00:52 +0000 (15:00 -0700)]
Merge branch 'upstream-davem' of /linux/kernel/git/jgarzik/netdev-2.6

16 years agoipmi: change device node ordering to reflect probe order
Carol Hebert [Fri, 4 Apr 2008 21:30:03 +0000 (14:30 -0700)]
ipmi: change device node ordering to reflect probe order

In 2.6.14 a patch was merged which switching the order of the ipmi device
naming from in-order-of-discovery over to reverse-order-of-discovery.

So on systems with multiple BMC interfaces, the ipmi device names are being
created in reverse order relative to how they are discovered on the system
(e.g.  on an IBM x3950 multinode server with N nodes, the device name for the
BMC in the first node is /dev/ipmiN-1 and the device name for the BMC in the
last node is /dev/ipmi0, etc.).

The problem is caused by the list handling routines chosen in dmi_scan.c.
Using list_add() causes the multiple ipmi devices to be added to the device
list using a stack-paradigm and so the ipmi driver subsequently pulls them off
during initialization in LIFO order.  This patch changes the
dmi_save_ipmi_device() list handling paradigm to a queue, thereby allowing the
ipmi driver to build the ipmi device names in the order in which they are
found on the system.

Signed-off-by: Carol Hebert <cah@us.ibm.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomtd: fix broken state in CFI driver caused by FL_SHUTDOWN
Alexey Korolev [Fri, 4 Apr 2008 21:30:01 +0000 (14:30 -0700)]
mtd: fix broken state in CFI driver caused by FL_SHUTDOWN

THe CFI driver in 2.6.24 kernel is broken.  Not so intensive read/write
operations cause incomplete writes which lead to kernel panics in JFFS2.

We investigated the issue - it is caused by bug in FL_SHUTDOWN parsing code.
Sometimes chip returns -EIO as if it is in FL_SHUTDOWN state when it should
wait in FL_PONT (error in order of conditions).

The following patch fixes the bug in state parsing code of CFI.  Also I've
added comments to notify developers if they want to add new case in future.

Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Reviewed-by: Joern Engel <joern@logfs.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agomemory controller: make memory resource control aware of boot options
Balbir Singh [Fri, 4 Apr 2008 21:29:59 +0000 (14:29 -0700)]
memory controller: make memory resource control aware of boot options

A boot option for the memory controller was discussed on lkml.  It is a good
idea to add it, since it saves memory for people who want to turn off the
memory controller.

By default the option is on for the following two reasons:

1. It provides compatibility with the current scheme where the memory
   controller turns on if the config option is enabled
2. It allows for wider testing of the memory controller, once the config
   option is enabled

We still allow the create, destroy callbacks to succeed, since they are not
aware of boot options.  We do not populate the directory will memory resource
controller specific files.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agocgroups: add cgroup support for enabling controllers at boot time
Paul Menage [Fri, 4 Apr 2008 21:29:57 +0000 (14:29 -0700)]
cgroups: add cgroup support for enabling controllers at boot time

The effects of cgroup_disable=foo are:

- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem

As a result there will only ever be one call to foo->create(), at init time;
all processes will stay in this group, and the group will never be mounted on
a visible hierarchy.  Any additional effects (e.g.  not allocating metadata)
are up to the foo subsystem.

This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
but it could easily be extended to do so if any of the early_init systems
wanted it - I think it would just involve some nastier parameter processing
since it would occur before the command-line argument parser had been run.

Hugh said:

  Ballpark figures, I'm trying to get this question out rather than
  processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
  to the affected paths, booting with cgroup_disable=memory cuts that back to
  1% overhead (due to slightly bigger struct page).

  I'm no expert on distros, they may have no interest whatever in
  CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
  without it, or apply the cgroup_disable=memory patches.

Unix bench's execl test result on x86_64 was

== just after boot without mounting any cgroup fs.==
mem_cgorup=off : Execl Throughput       43.0     3150.1      732.6
mem_cgroup=on  : Execl Throughput       43.0     2932.6      682.0
==

[lizf@cn.fujitsu.com: fix boot option parsing]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years ago[MIPS] Make KGDB compile on UP
Sergei Shtylyov [Thu, 20 Mar 2008 17:59:34 +0000 (20:59 +0300)]
[MIPS] Make KGDB compile on UP

Building UP kernel with KGDB enabled produces the following errors and warning
(fatal due to -Werror in arch/mips/kernel/Makefile):

In file included from arch/mips/kernel/gdb-stub.c:142:
include/asm/smp.h:25:1: "raw_smp_processor_id" redefined
In file included from include/linux/sched.h:69,
                 from arch/mips/kernel/gdb-stub.c:126:
include/linux/smp.h:88:1: this is the location of the previous definition
In file included from arch/mips/kernel/gdb-stub.c:142:
include/asm/smp.h:62: error: redefinition of 'smp_send_reschedule'
include/linux/smp.h:102: error: previous definition of 'smp_send_reschedule' was here
include/asm/smp.h: In function `smp_send_reschedule':
include/asm/smp.h:65: error: dereferencing pointer to incomplete type
arch/mips/kernel/gdb-stub.c: At top level:
arch/mips/kernel/gdb-stub.c:660: warning: 'kgdb_wait' defined but not used

Fix the errors by not directly including <asm/smp.h> (which is already included
by <linux/smp.h>) and the warning by enclosing kgdb_wait() in #ifdef CONFIG_SMP.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
16 years ago[MIPS] Pb1200: Fix header breakage
Sergei Shtylyov [Wed, 2 Apr 2008 19:53:19 +0000 (23:53 +0400)]
[MIPS] Pb1200: Fix header breakage

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux...
Linus Torvalds [Fri, 4 Apr 2008 21:42:58 +0000 (14:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/x86/linux-2.6-x86

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: revert assign IRQs to hpet timer
  x86: tsc prevent time going backwards
  xen: Clear PG_pinned in release_{pt,pd}()
  xen: Do not pin/unpin PMD pages
  xen: refactor xen_{alloc,release}_{pt,pd}()
  x86, agpgart: scary messages are fortunately obsolete
  xen: fix grant table bug
  x86: fix breakage of vSMP irq operations
  x86: print message if nmi_watchdog=2 cannot be enabled
  x86: fix nmi_watchdog=2 on Pentium-D CPUs

16 years agom68k: update defconfigs for 2.6.25
Geert Uytterhoeven [Fri, 4 Apr 2008 12:58:42 +0000 (14:58 +0200)]
m68k: update defconfigs for 2.6.25

Long overdue update of the m68k defconfigs

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agom68k: use KBUILD_DEFCONFIG
Adrian Bunk [Fri, 4 Apr 2008 12:57:38 +0000 (14:57 +0200)]
m68k: use KBUILD_DEFCONFIG

The default defconfig should be one from arch/m68k/configs/

arch/m68k/defconfig was not exactly identical to amiga_defconfig but
also considering how long they have been without any update that doesn't
seem to have been on purpose.

Signed-off-by: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Fri, 4 Apr 2008 21:40:04 +0000 (14:40 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  pata_ali: disable ATAPI DMA
  libata: ATA_12/16 doesn't fall into ATAPI_MISC
  libata: uninline atapi_cmd_type()
  libata: fix IDENTIFY order in ata_bus_probe()

16 years agoBe more careful about marking buffers dirty
Linus Torvalds [Fri, 4 Apr 2008 21:38:17 +0000 (14:38 -0700)]
Be more careful about marking buffers dirty

Mikulas Patocka noted that the optimization where we check if a buffer
was already dirty (and we avoid re-dirtying it) was not really SMP-safe.

Since the read of the old status was not synchronized with anything, an
aggressive CPU re-ordering of memory accesses might have moved that read
up to before the data was even written to the buffer, and another CPU
that cleaned it again, causing the newly dirty state to never actually
hit the disk.

Admittedly this would probably never trigger in practice, but it's still
wrong.

Mikulas sent a patch that fixed the problem, but I dislike the subtlety
of the whole optimization, so this is an alternate fix that is more
explicit about the particular SMP ordering for the optimization, and
separates out the speculative reads of the buffer state into its own
conditional (and makes the memory barrier only happen if we are likely
to actually hit the optimized case in the first place).

I considered removing the optimization entirely, but Andrew argued for
it's continued existence. I'm a push-over.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoparport_pc: make sure to release IO ports after probing for IT87XX
Linus Torvalds [Fri, 4 Apr 2008 21:30:31 +0000 (14:30 -0700)]
parport_pc: make sure to release IO ports after probing for IT87XX

Commit f63fd7e299ee13da071ecfce2b90b58c5e1562b1 ("parport_pc: detection
for SuperIO IT87XX POST") only released the IO port region on success,
not when the probe for the IT87XX chip failed.

That caused not only a reserved region to leak, but also caused an oops
when the driver module was unloaded and somebody tried to cat
/proc/ioports - because the string that was assigned to the IO port
region was a static string in the module virtual address area.

Reported-by: Lubos Lunak <l.lunak@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Petr Cvek <petr.cvek@tul.cz>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofix endian lossage in forcedeth
Al Viro [Wed, 26 Mar 2008 05:57:12 +0000 (05:57 +0000)]
fix endian lossage in forcedeth

a) if you initialize something with le32_to_cpu(...), then |= it
with host-endian and feed to cpu_to_le32(), it's most definitely
*not* __le32.  As sparse would've told you...

b) the whole sequence is |= cpu_to_le32(host-endian constant)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
16 years agonet/tokenring/olympic.c section fixes
Adrian Bunk [Sun, 30 Mar 2008 22:40:04 +0000 (01:40 +0300)]
net/tokenring/olympic.c section fixes

My previous section fix only turned one section problem into another
section problem.

This patch fixes it for real.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
16 years agonet: marvell.c fix sparse shadowed variable warning
Harvey Harrison [Thu, 3 Apr 2008 00:33:35 +0000 (17:33 -0700)]
net: marvell.c fix sparse shadowed variable warning

The other if blocks don't redeclare temp, remove the redeclaration in
the final if() block.

drivers/net/phy/marvell.c:214:7: warning: symbol 'temp' shadows an earlier one
drivers/net/phy/marvell.c:160:6: originally declared here

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
16 years ago[VLAN]: Fix egress priority mappings leak.
Pavel Emelyanov [Fri, 4 Apr 2008 19:45:12 +0000 (12:45 -0700)]
[VLAN]: Fix egress priority mappings leak.

These entries are allocated in vlan_dev_set_egress_priority,
but are never released and leaks on vlan device removal.

Drop these in vlan's ->uninit callback - after the device is
brought down and everyone is notified about it is going to
be unregistered.

Found during testing vlan netnsization patchset.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agox86: revert assign IRQs to hpet timer
Thomas Gleixner [Fri, 4 Apr 2008 14:26:10 +0000 (16:26 +0200)]
x86: revert assign IRQs to hpet timer

The commits:

commit 37a47db8d7f0f38dac5acf5a13abbc8f401707fa
Author: Balaji Rao <balajirrao@gmail.com>
Date:   Wed Jan 30 13:30:03 2008 +0100

    x86: assign IRQs to HPET timers, fix

and

commit e3f37a54f690d3e64995ea7ecea08c5ab3070faf
Author: Balaji Rao <balajirrao@gmail.com>
Date:   Wed Jan 30 13:30:03 2008 +0100

    x86: assign IRQs to HPET timers

have been identified to cause a regression on some platforms due to
the assignement of legacy IRQs which makes the legacy devices
connected to those IRQs disfunctional.

Revert them.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: tsc prevent time going backwards
Thomas Gleixner [Tue, 1 Apr 2008 17:45:18 +0000 (19:45 +0200)]
x86: tsc prevent time going backwards

We already catch most of the TSC problems by sanity checks, but there
is a subtle bug which has been in the code for ever. This can cause
time jumps in the range of hours.

This was reported in:
     http://lkml.org/lkml/2007/8/23/96
and
     http://lkml.org/lkml/2008/3/31/23

I was able to reproduce the problem with a gettimeofday loop test on a
dual core and a quad core machine which both have sychronized
TSCs. The TSCs seems not to be perfectly in sync though, but the
kernel is not able to detect the slight delta in the sync check. Still
there exists an extremly small window where this delta can be observed
with a real big time jump. So far I was only able to reproduce this
with the vsyscall gettimeofday implementation, but in theory this
might be observable with the syscall based version as well.

CPU 0 updates the clock source variables under xtime/vyscall lock and
CPU1, where the TSC is slighty behind CPU0, is reading the time right
after the seqlock was unlocked.

The clocksource reference data was updated with the TSC from CPU0 and
the value which is read from TSC on CPU1 is less than the reference
data. This results in a huge delta value due to the unsigned
subtraction of the TSC value and the reference value. This algorithm
can not be changed due to the support of wrapping clock sources like
pm timer.

The huge delta is converted to nanoseconds and added to xtime, which
is then observable by the caller. The next gettimeofday call on CPU1
will show the correct time again as now the TSC has advanced above the
reference value.

To prevent this TSC specific wreckage we need to compare the TSC value
against the reference value and return the latter when it is larger
than the actual TSC value.

I pondered to mark the TSC unstable when the readout is smaller than
the reference value, but this would render an otherwise good and fast
clocksource unusable without a real good reason.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoxen: Clear PG_pinned in release_{pt,pd}()
Mark McLoughlin [Wed, 2 Apr 2008 14:36:38 +0000 (15:36 +0100)]
xen: Clear PG_pinned in release_{pt,pd}()

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoxen: Do not pin/unpin PMD pages
Mark McLoughlin [Wed, 2 Apr 2008 14:36:37 +0000 (15:36 +0100)]
xen: Do not pin/unpin PMD pages

i.e. with this simple test case:

    int fd = open("/dev/zero", O_RDONLY);
    munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000);
    close(fd);

we currently get:

   kernel BUG at arch/x86/xen/enlighten.c:678!
   ...
   EIP is at xen_release_pt+0x79/0xa9
   ...
   Call Trace:
    [<c041da25>] ? __pmd_free_tlb+0x1a/0x75
    [<c047a192>] ? free_pgd_range+0x1d2/0x2b5
    [<c047a2f3>] ? free_pgtables+0x7e/0x93
    [<c047b272>] ? unmap_region+0xb9/0xf5
    [<c047c1bd>] ? do_munmap+0x193/0x1f5
    [<c047c24f>] ? sys_munmap+0x30/0x3f
    [<c0408cce>] ? syscall_call+0x7/0xb
    =======================

and xen complains:

  (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned

Further details at:

  https://bugzilla.redhat.com/436453

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoxen: refactor xen_{alloc,release}_{pt,pd}()
Mark McLoughlin [Wed, 2 Apr 2008 14:36:36 +0000 (15:36 +0100)]
xen: refactor xen_{alloc,release}_{pt,pd}()

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Cc: xen-devel@lists.xensource.com
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86, agpgart: scary messages are fortunately obsolete
Pavel Machek [Tue, 1 Apr 2008 12:24:03 +0000 (14:24 +0200)]
x86, agpgart: scary messages are fortunately obsolete

Fix obsolete printks in aperture-64. We used not to handle missing
agpgart, but we handle it okay now.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoxen: fix grant table bug
Michael Abd-El-Malek [Fri, 4 Apr 2008 09:33:48 +0000 (02:33 -0700)]
xen: fix grant table bug

fix memory corruption and crash due to mis-sized grant table.

A PV OS has two grant table data structures: the grant table itself
and a free list.  The free list is composed of an array of pages,
which grow dynamically as the guest OS requires more grants.  While
the grant table contains 8-byte entries, the free list contains 4-byte
entries.  So we have half as many pages in the free list than in the
grant table.

There was a bug in the free list allocation code. The free list was
indexed as if it was the same size as the grant table.  But it's only
half as large.  So memory got corrupted, and I was seeing crashes in
the slab allocator later on.

Taken from:

  http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/4018c0da3360

Signed-off-by: Michael Abd-El-Malek <mabdelmalek@cmu.edu>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agox86: fix breakage of vSMP irq operations
Ravikiran G Thirumalai [Fri, 4 Apr 2008 10:06:29 +0000 (03:06 -0700)]
x86: fix breakage of vSMP irq operations

25-rc* stopped working with CONFIG_X86_VSMP on vSMP machines.

Looks like the vsmp irq ops got accidentally removed during merge of x86_64
pvops in 2.6.25. -- commit 6abcd98ffafbff81f0bfd7ee1d129e634af13245 removed
vsmp irq ops.

Tested with both CONFIG_X86_VSMP and without CONFIG_X86_VSMP, on vSMP and non
vSMP x86_64 machines.

Please apply.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>