11 years agoocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c
Tao Ma [Wed, 23 Dec 2009 06:31:15 +0000 (14:31 +0800)]
ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c

In ocfs2_value_metas_in_xattr_header, we should Use
le16_to_cpu for ocfs2_extent_list.l_next_free_rec.

Signed-off-by: Tao Ma <>
Signed-off-by: Joel Becker <>
11 years agoocfs2/trivial: Use proper mask for 2 places in hearbeat.c
Tao Ma [Tue, 22 Dec 2009 02:32:15 +0000 (10:32 +0800)]
ocfs2/trivial: Use proper mask for 2 places in hearbeat.c

I just noticed today that there are 2 places of "mlog(0,...)"
in  fs/ocfs2/cluster/heartbeat.c, but actually have no default
mask prefix in that file.
So change them to mlog(ML_HEARTBEAT,...).

Signed-off-by: Tao Ma <>
Signed-off-by: Joel Becker <>
11 years agoOcfs2: Let ocfs2 support fiemap for symlink and fast symlink.
Tristan Ye [Tue, 22 Dec 2009 01:11:58 +0000 (09:11 +0800)]
Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink.

For fast symlink, it can be treated the same as inlined files since
the data extent we want to return of both case all were stored in
metadata block. For symlink, it can be simply treated the same as we
did for regular files.

Signed-off-by: Tristan Ye <>
Acked-by: Sunil Mushran <>
Signed-off-by: Joel Becker <>
11 years agoOcfs2: Should ocfs2 support fiemap for S_IFDIR inode?
Tristan Ye [Thu, 17 Dec 2009 10:42:16 +0000 (18:42 +0800)]
Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode?

Let userspace have a chance to get the extent info of a
directory just like extN did.

Signed-off-by: Tristan Ye <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: Use FIEMAP_EXTENT_SHARED
Sunil Mushran [Thu, 3 Dec 2009 20:46:52 +0000 (12:46 -0800)]

Adds FIEMAP_EXTENT_SHARED flag to refcounted extents.

Signed-off-by: Sunil Mushran <>
Acked-by: Mark Fasheh <>
Signed-off-by: Joel Becker <>
11 years agofiemap: Add new extent flag FIEMAP_EXTENT_SHARED
Sunil Mushran [Thu, 3 Dec 2009 20:46:51 +0000 (12:46 -0800)]
fiemap: Add new extent flag FIEMAP_EXTENT_SHARED

Some filesystems may allow multiple files to point to a particular
extent.  This patch adds flag FIEMAP_EXTENT_SHARED to denote extents
that are shared with other inodes.

Signed-off-by: Sunil Mushran <>
Acked-by: Mark Fasheh <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: replace u8 by __u8 in ocfs2_fs.h
Coly Li [Sun, 6 Dec 2009 14:38:53 +0000 (22:38 +0800)]
ocfs2: replace u8 by __u8 in ocfs2_fs.h

This patch replaces date type 'u8' with '__u8', which follows the coding style of ocfs2_fs.h, and portable to user space
for ocfs2-tools.

Signed-off-by: Coly Li <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: explicit declare uninitialized var in user_cluster_connect()
Coly Li [Thu, 3 Dec 2009 18:02:35 +0000 (02:02 +0800)]
ocfs2: explicit declare uninitialized var in user_cluster_connect()

This patch explicitly declares an uninitialized local variable in user_cluster_connect(), to remove a compiling warning.

Signed-off-by: Coly Li <>
Signed-off-by: Joel Becker <>
11 years agoocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock()
Jeff Liu [Tue, 15 Dec 2009 06:08:28 +0000 (14:08 +0800)]
ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock()

osb->s_mount_opt has already been checked against OCFS2_MOUNT_POSIX_ACL_CHECK before
calling ocfs2_get_acl_nolock() in ocfs2_init_acl() && ocfs2_get_acl(), so remove it.

Signed-off-by: Jeff Liu <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: return -EAGAIN instead of EAGAIN in dlm
Tiger Yang [Thu, 19 Nov 2009 02:17:46 +0000 (10:17 +0800)]
ocfs2: return -EAGAIN instead of EAGAIN in dlm

We used to return positive EAGAIN to indicate a retry action
is needed in dlm_begin_reco_handler(). Now we return negative
-EAGAIN to erase the confusion caused by this error code.

Signed-off-by: Tiger Yang <>
Signed-off-by: Joel Becker <>
11 years agoocfs2/cluster: Make fence method configurable - v2
Sunil Mushran [Wed, 18 Nov 2009 00:29:19 +0000 (16:29 -0800)]
ocfs2/cluster: Make fence method configurable - v2

By default, o2cb fences the box by calling emergency_restart(). While this
scheme works well in production, it comes in the way during testing as it
does not let the tester take stack/core dumps for analysis.

This patch allows user to dynamically change the fence method to panic() by:
# echo "panic" > /sys/kernel/config/cluster/<clustername>/fence_method

Signed-off-by: Sunil Mushran <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: Set MS_POSIXACL on remount
Jan Kara [Thu, 15 Oct 2009 12:54:05 +0000 (14:54 +0200)]
ocfs2: Set MS_POSIXACL on remount

We have to set MS_POSIXACL on remount as well. Otherwise VFS
would not know we started supporting ACLs after remount and
thus ACLs would not work.

Signed-off-by: Jan Kara <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: Make acl use the default
Jan Kara [Thu, 15 Oct 2009 12:54:04 +0000 (14:54 +0200)]
ocfs2: Make acl use the default

Change acl mount options handling to match the one of XFS and BTRFS and
hopefully it is also easier to use now. When admin does not specify any
acl mount option, acls are enabled if and only if the filesystem has
xattr feature enabled. If admin specifies 'acl' mount option, we fail
the mount if the filesystem does not have xattr feature and thus acls
cannot be enabled.

Signed-off-by: Jan Kara <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: Always include ACL support
Jan Kara [Thu, 15 Oct 2009 12:54:03 +0000 (14:54 +0200)]
ocfs2: Always include ACL support

To become consistent with filesystems such as XFS or BTRFS, make posix
ACLs always available. This also reduces possibility of
misconfiguration on admin's side.

Signed-off-by: Jan Kara <>
Signed-off-by: Joel Becker <>
11 years agoocfs2: duplicate inline data properly during reflink.
Tao Ma [Thu, 15 Oct 2009 03:10:49 +0000 (11:10 +0800)]
ocfs2: duplicate inline data properly during reflink.

The old reflink fails to handle inodes with inline data and will oops
if it encounters them.  This patch copies inline data to the new inode.
Extended attributes may still be refcounted.

Signed-off-by: Tao Ma <>
Signed-off-by: Joel Becker <>
Tested-by: Tristan Ye <>
11 years agoocfs2: Move ocfs2_complete_reflink to the right place.
Tao Ma [Thu, 15 Oct 2009 03:10:48 +0000 (11:10 +0800)]
ocfs2: Move ocfs2_complete_reflink to the right place.

As its name ocfs2_complete_reflink indicates, it should
be called after all the work for reflink is done, so
it really should be called after we reflink xattr

Signed-off-by: Tao Ma <>
Signed-off-by: Joel Becker <>
Tested-by: Tristan Ye <>
11 years agoocfs2: Return -EINVAL when a device is not ocfs2.
Joel Becker [Thu, 29 Oct 2009 05:28:24 +0000 (22:28 -0700)]
ocfs2: Return -EINVAL when a device is not ocfs2.

In case of non-modular kernels the root filesystem is mounted by trying
several filesystems. If ocfs2 was tried before the actual filesystem
type, the mount would fail because ocfs2_sb_probe() returns -EAGAIN
instead of -EINVAL.  ocfs2 will now return -EINVAL properly.

Signed-off-by: Joel Becker <>
Reported-by: Laszlo Attila Toth <>
11 years agoMerge git://
Linus Torvalds [Thu, 22 Oct 2009 22:35:16 +0000 (07:35 +0900)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus

* git://
  move virtrng_remove to .devexit.text
  move virtballoon_remove to .devexit.text
  virtio_blk: Revert serial number support
  virtio: let header files include virtio_ids.h
  virtio_blk: revert QUEUE_FLAG_VIRT addition

11 years agoMerge git://
Linus Torvalds [Thu, 22 Oct 2009 22:34:23 +0000 (07:34 +0900)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git:// (21 commits)
  niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
  KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
  KS8851: Fix MAC address write order
  KS8851: Add soft reset at probe time
  net: fix section mismatch in fec.c
  net: Fix struct inet_timewait_sock bitfield annotation
  tcp: Try to catch MSG_PEEK bug
  bluetooth: static lock key fix
  bluetooth: scheduling while atomic bug fix
  tcp: fix TCP_DEFER_ACCEPT retrans calculation
  tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
  tcp: accept socket after TCP_DEFER_ACCEPT period
  Revert "tcp: fix tcp_defer_accept to consider the timeout"
  AF_UNIX: Fix deadlock on connecting to shutdown socket
  ethoc: clear only pending irqs
  ethoc: inline regs access
  vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
  virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
  be2net: fix support for PCI hot plug

11 years agomove virtrng_remove to .devexit.text
Uwe Kleine-König [Thu, 1 Oct 2009 08:28:35 +0000 (10:28 +0200)]
move virtrng_remove to .devexit.text

The function virtrng_remove is used only wrapped by __devexit_p so define
it using __devexit.

Signed-off-by: Uwe Kleine-König <>
Acked-by: Sam Ravnborg <>
Cc: Rusty Russell <>
Cc: Michael S. Tsirkin <>
Acked-by: Christian Borntraeger <>
Signed-off-by: Rusty Russell <>
11 years agomove virtballoon_remove to .devexit.text
Uwe Kleine-König [Thu, 1 Oct 2009 08:28:33 +0000 (10:28 +0200)]
move virtballoon_remove to .devexit.text

The function virtballoon_remove is used only wrapped by __devexit_p so
define it using __devexit.

Signed-off-by: Uwe Kleine-König <>
Acked-by: Sam Ravnborg <>
Acked-by: Michael S. Tsirkin <>
Signed-off-by: Rusty Russell <>
11 years agovirtio_blk: Revert serial number support
Rusty Russell [Thu, 22 Oct 2009 22:39:28 +0000 (16:39 -0600)]
virtio_blk: Revert serial number support

This reverts "Add serial number support for virtio_blk, V4a".

Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
on virtio config space, so noone could ever use this.

This is coming back later in a cleaner form.

Signed-off-by: Rusty Russell <>
Cc: john cooper <>
Cc: Jens Axboe <>
11 years agovirtio: let header files include virtio_ids.h
Christian Borntraeger [Wed, 30 Sep 2009 09:17:21 +0000 (11:17 +0200)]
virtio: let header files include virtio_ids.h


commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <>
Signed-off-by: Christian Borntraeger <>
Signed-off-by: Rusty Russell <>
11 years agovirtio_blk: revert QUEUE_FLAG_VIRT addition
Christoph Hellwig [Fri, 4 Sep 2009 20:44:42 +0000 (22:44 +0200)]
virtio_blk: revert QUEUE_FLAG_VIRT addition

It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
regressions for Fedora users:

while I can't reproduce those extreme regressions myself I think the flag
is wrong.


  QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
  unplugged immediately.  This is not a good behaviour for at least
  qemu and kvm where we do have significant overhead for every
  I/O operations.  Even with all the latested speeups (native AIO,
  MSI support, zero copy) we can only get native speed for up to 128kb
  I/O requests we already are down to 66% of native performance for 4kb
  requests even on my laptop running the Intel X25-M SSD for which the
  QUEUE_FLAG_NONROT was designed.
  If we ever get virtio-blk overhead low enough that this flag makes
  sense it should only be set based on a feature flag set by the host.

Signed-off-by: Christoph Hellwig <>
Signed-off-by: Rusty Russell <>
11 years agoniu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied...
Joyce Yu [Thu, 22 Oct 2009 00:21:10 +0000 (17:21 -0700)]
niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case

Signed-off-by: Joyce Yu <>
Signed-off-by: David S. Miller <>
11 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Wed, 21 Oct 2009 23:28:28 +0000 (08:28 +0900)]
Merge branch 'for-linus' of git://

* 'for-linus' of git://
  dnotify: ignore FS_EVENT_ON_CHILD
  inotify: fix coalesce duplicate events into a single event in special case
  inotify: deprecate the inotify kernel interface
  fsnotify: do not set group for a mark before it is on the i_list

11 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Wed, 21 Oct 2009 23:27:12 +0000 (08:27 +0900)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://
  Input: hp_sdc_rtc - fix test in hp_sdc_rtc_read_rt()
  Input: atkbd - consolidate force release quirks for volume keys
  Input: logips2pp - model 73 is actually TrackMan FX
  Input: i8042 - add Sony Vaio VGN-FZ240E to the nomux list
  Input: fix locking issue in /proc/bus/input/ handlers
  Input: atkbd - postpone restoring LED/repeat rate at resume
  Input: atkbd - restore resetting LED state at startup
  Input: i8042 - make pnp_data_busted variable boolean instead of int
  Input: synaptics - add another Protege M300 to rate blacklist

11 years agoMerge branch 'kvm-updates/2.6.32' of git://
Linus Torvalds [Wed, 21 Oct 2009 23:26:15 +0000 (08:26 +0900)]
Merge branch 'kvm-updates/2.6.32' of git://git./virt/kvm/kvm

* 'kvm-updates/2.6.32' of git://
  KVM: Prevent kvm_init from corrupting debugfs structures
  KVM: MMU: fix pointer cast
  KVM: use proper hrtimer function to retrieve expiration time

11 years agoMerge git://
Linus Torvalds [Wed, 21 Oct 2009 23:25:36 +0000 (08:25 +0900)]
Merge git://git./linux/kernel/git/agk/linux-2.6-dm

* git://
  dm snapshot: allow chunk size to be less than page size
  dm snapshot: use unsigned integer chunk size
  dm snapshot: lock snapshot while supplying status
  dm exception store: fix failed set_chunk_size error path
  dm snapshot: require non zero chunk size by end of ctr
  dm: dec_pending needs locking to save error value
  dm: add missing del_gendisk to alloc_dev error path
  dm log: userspace fix incorrect luid cast in userspace_ctr
  dm snapshot: free exception store on init failure
  dm snapshot: sort by chunk size to fix race

11 years agoPM: Make warning in suspend_test_finish() less likely to happen
Rafael J. Wysocki [Tue, 20 Oct 2009 04:45:02 +0000 (06:45 +0200)]
PM: Make warning in suspend_test_finish() less likely to happen

Increase TEST_SUSPEND_SECONDS to 10 so the warning in
suspend_test_finish() doesn't annoy the users of slower systems so much.

Also, make the warning print the suspend-resume cycle time, so that we
know why the warning actually triggered.

Patch prepared during the hacking session at the Kernel Summit in Tokyo.

Signed-off-by: Rafael J. Wysocki <>
Signed-off-by: Linus Torvalds <>
11 years agommc: at91_mci: Don't include asm/mach/mmc.h
Uwe Kleine-König [Wed, 21 Oct 2009 07:46:59 +0000 (09:46 +0200)]
mmc: at91_mci: Don't include asm/mach/mmc.h

This fixes a compile bug introduced in

6ef297f (ARM: 5720/1: Move MMCI header to amba include dir)

That commit moved arch/arm/include/asm/mach/mmc.h to
include/linux/amba/mmci.h.  Just removing the include was enough.

Signed-off-by: Uwe Kleine-König <>
Acked-by: Linus Walleij <>
Acked-by: Nicolas Ferre <>
Acked-by: Bill Gatliff <>
Cc: Catalin Marinas <>
Cc: Russell King <>
Cc: Pierre Ossman <>
Cc: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
11 years agoMerge branch 'sh/for-2.6.32' of git://
Linus Torvalds [Wed, 21 Oct 2009 23:17:15 +0000 (08:17 +0900)]
Merge branch 'sh/for-2.6.32' of git://git./linux/kernel/git/lethal/sh-2.6

* 'sh/for-2.6.32' of git://
  sh: Kill off stray HAVE_FTRACE_SYSCALLS reference.
  sh: Remove BKL from landisk gio.
  sh: disabled cache handling fix.
  sh: Fix up single page flushing to use PAGE_SIZE.

11 years agoMerge git://
Linus Torvalds [Wed, 21 Oct 2009 23:16:01 +0000 (08:16 +0900)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

* git://
  crypto: aesni-intel - Fix irq_fpu_usable usage
  crypto: padlock-sha - Fix stack alignment

11 years agonfs: Fix nfs_parse_mount_options() kfree() leak
Yinghai Lu [Tue, 20 Oct 2009 05:13:46 +0000 (14:13 +0900)]
nfs: Fix nfs_parse_mount_options() kfree() leak

Fix a (small) memory leak in one of the error paths of the NFS mount
options parsing code.

Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
rpc/rdma transport module automatically).

Reported-by: Yinghai Lu <>
Reported-by: Pekka Enberg <>
Signed-off-by: Ingo Molnar <>
Signed-off-by: Trond Myklebust <>
Signed-off-by: Linus Torvalds <>
11 years agofs: pipe.c null pointer dereference
Earl Chew [Mon, 19 Oct 2009 22:55:41 +0000 (15:55 -0700)]
fs: pipe.c null pointer dereference

This patch fixes a null pointer exception in pipe_rdwr_open() which
generates the stack trace:

> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
>  [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
>  [<ffffffff8028125c>] __dentry_open+0x13c/0x230
>  [<ffffffff8028143d>] do_filp_open+0x2d/0x40
>  [<ffffffff802814aa>] do_sys_open+0x5a/0x100
>  [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67

The failure mode is triggered by an attempt to open an anonymous
pipe via /proc/pid/fd/* as exemplified by this script:

while : ; do
   { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
   OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
        { read PID REST ; echo $PID; } )
   OUT="${OUT%% *}"
   DELAY=$((RANDOM * 1000 / 32768))
   usleep $((DELAY * 1000 + RANDOM % 1000 ))
   echo n > /proc/$OUT/fd/1                 # Trigger defect

Note that the failure window is quite small and I could only
reliably reproduce the defect by inserting a small delay
in pipe_rdwr_open(). For example:

 static int
 pipe_rdwr_open(struct inode *inode, struct file *filp)

Although the defect was observed in pipe_rdwr_open(), I think it
makes sense to replicate the change through all the pipe_*_open()

The core of the change is to verify that inode->i_pipe has not
been released before attempting to manipulate it. If inode->i_pipe
is no longer present, return ENOENT to indicate so.

The comment about potentially using atomic_t for i_pipe->readers
and i_pipe->writers has also been removed because it is no longer
relevant in this context. The inode->i_mutex lock must be used so
that inode->i_pipe can be dealt with correctly.

Signed-off-by: Earl Chew <>
Signed-off-by: Linus Torvalds <>
11 years agoKS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
Ben Dooks [Mon, 19 Oct 2009 23:49:05 +0000 (23:49 +0000)]
KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST

In ks8851_set_rx_mode() the case handling IFF_MULTICAST was also setting
the RXCR1_AE bit by accident. This meant that all unicast frames where
being accepted by the device. Remove RXCR1_AE from this case.

Note, RXCR1_AE was also masking a problem with setting the MAC address
properly, so needs to be applied after fixing the MAC write order.

Fixes a bug reported by Doong, Ping of Micrel. This version of the
patch avoids setting RXCR1_ME for all cases.

Signed-off-by: Ben Dooks <>
Signed-off-by: David S. Miller <>
11 years agoKS8851: Fix MAC address write order
Ben Dooks [Mon, 19 Oct 2009 23:49:04 +0000 (23:49 +0000)]
KS8851: Fix MAC address write order

The MAC address register was being written in the wrong order, so add
a new address macro to convert mac-address byte to register address and
a ks8851_wrreg8() function to write each byte without having to worry
about any difficult byte swapping.

Fixes a bug reported by Doong, Ping of Micrel.

Signed-off-by: Ben Dooks <>
Signed-off-by: David S. Miller <>
11 years agoKS8851: Add soft reset at probe time
Ben Dooks [Mon, 19 Oct 2009 23:49:03 +0000 (23:49 +0000)]
KS8851: Add soft reset at probe time

Issue a full soft reset at probe time.

This was reported by Doong Ping of Micrel, but no explanation of why this
is necessary or what bug it is fixing. Add it as it does not seem to hurt
the current driver and ensures that the device is in a known state when we
start setting it up.

Signed-off-by: Ben Dooks <>
Signed-off-by: David S. Miller <>
11 years agonet: fix section mismatch in fec.c
Steven King [Wed, 21 Oct 2009 01:51:37 +0000 (18:51 -0700)]
net: fix section mismatch in fec.c

fec_enet_init is called by both fec_probe and fec_resume, so it
shouldn't be marked as __init.

Signed-off-by: Steven King <>
Signed-off-by: David S. Miller <>
11 years agodnotify: ignore FS_EVENT_ON_CHILD
Andreas Gruenbacher [Wed, 14 Oct 2009 22:13:23 +0000 (00:13 +0200)]
dnotify: ignore FS_EVENT_ON_CHILD

Mask off FS_EVENT_ON_CHILD in dnotify_handle_event().  Otherwise, when there
is more than one watch on a directory and dnotify_should_send_event()
succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause
spurious events.

This case was overlooked in commit e42e2773.

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

static void create_event(int s, siginfo_t* si, void* p)

static void delete_event(int s, siginfo_t* si, void* p)

int main (void) {
struct sigaction action;
char *tmpdir, *file;
int fd1, fd2;

sigemptyset (&action.sa_mask);
action.sa_flags = SA_SIGINFO;

action.sa_sigaction = create_event;
sigaction (SIGRTMIN + 0, &action, NULL);

action.sa_sigaction = delete_event;
sigaction (SIGRTMIN + 1, &action, NULL);

# define TMPDIR "/tmp/test.XXXXXX"
tmpdir = malloc(strlen(TMPDIR) + 1);
strcpy(tmpdir, TMPDIR);

# define TMPFILE "/file"
file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1);
sprintf(file, "%s/%s", tmpdir, TMPFILE);

fd1 = open (tmpdir, O_RDONLY);
fcntl(fd1, F_SETSIG, SIGRTMIN);

fd2 = open (tmpdir, O_RDONLY);
fcntl(fd2, F_SETSIG, SIGRTMIN + 1);

if (fork()) {
/* This triggers a create event */
creat(file, 0600);
/* This triggers a create and delete event (!) */
} else {

return 0;

Signed-off-by: Andreas Gruenbacher <>
Signed-off-by: Eric Paris <>
11 years agonet: Fix struct inet_timewait_sock bitfield annotation
Eric Dumazet [Sun, 18 Oct 2009 22:48:51 +0000 (22:48 +0000)]
net: Fix struct inet_timewait_sock bitfield annotation

commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
added 4/8 bytes in struct inet_timewait_sock.

Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
The 14 bits hole is named tw_pad to make it cleary apparent.

Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
11 years agotcp: Try to catch MSG_PEEK bug
Herbert Xu [Mon, 19 Oct 2009 19:41:06 +0000 (19:41 +0000)]
tcp: Try to catch MSG_PEEK bug

This patch tries to print out more information when we hit the
MSG_PEEK bug in tcp_recvmsg.  It's been around since at least
2005 and it's about time that we finally fix it.

Signed-off-by: Herbert Xu <>
Signed-off-by: David S. Miller <>
11 years agocrypto: aesni-intel - Fix irq_fpu_usable usage
Huang Ying [Tue, 20 Oct 2009 07:20:47 +0000 (16:20 +0900)]
crypto: aesni-intel - Fix irq_fpu_usable usage

When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
function is changed too, from mesuring whether kernel is using FPU,
that is, the FPU is NOT available, to measuring whether FPU is usable,
that is, the FPU is available.

But the usage of irq_fpu_usable in aesni-intel_glue.c is not changed
accordingly. This patch fixes this.

Signed-off-by: Huang Ying <>
Signed-off-by: Herbert Xu <>
11 years agonet: Fix IP_MULTICAST_IF
Eric Dumazet [Mon, 19 Oct 2009 06:41:58 +0000 (06:41 +0000)]

ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.

This function should be called only with RTNL or dev_base_lock held, or reader
could see a corrupt hash chain and eventually enter an endless loop.

Fix is to call dev_get_by_index()/dev_put().

If this happens to be performance critical, we could define a new dev_exist_by_index()
function to avoid touching dev refcount.

Signed-off-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
11 years agobluetooth: static lock key fix
Dave Young [Sun, 18 Oct 2009 20:28:30 +0000 (20:28 +0000)]
bluetooth: static lock key fix

When shutdown ppp connection, lockdep waring about non-static key
will happen, it is caused by the lock is not initialized properly
at that time.

Fix with tuning the lock/skb_queue_head init order

[   94.339261] INFO: trying to register non-static key.
[   94.342509] the code is fine but needs lockdep annotation.
[   94.342509] turning off the locking correctness validator.
[   94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
[   94.342509] Call Trace:
[   94.342509]  [<c0248fbe>] register_lock_class+0x58/0x241
[   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
[   94.342509]  [<c024ab34>] __lock_acquire+0xac/0xb73
[   94.342509]  [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
[   94.342509]  [<c024b662>] lock_acquire+0x67/0x84
[   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
[   94.342509]  [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
[   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
[   94.342509]  [<c04cd1eb>] skb_dequeue+0x15/0x41
[   94.342509]  [<c054a648>] ? _read_unlock+0x1d/0x20
[   94.342509]  [<c04cd641>] skb_queue_purge+0x14/0x1b
[   94.342509]  [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
[   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
[   94.342509]  [<c0249c04>] ? mark_lock+0x1e/0x1c7
[   94.342509]  [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
[   94.342509]  [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
[   94.342509]  [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
[   94.342509]  [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
[   94.342509]  [<c02302c4>] tasklet_action+0x69/0xc1
[   94.342509]  [<c022fbef>] __do_softirq+0x94/0x11e
[   94.342509]  [<c022fcaf>] do_softirq+0x36/0x5a
[   94.342509]  [<c022fe14>] irq_exit+0x35/0x68
[   94.342509]  [<c0204ced>] do_IRQ+0x72/0x89
[   94.342509]  [<c02038ee>] common_interrupt+0x2e/0x34
[   94.342509]  [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
[   94.342509]  [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
[   94.342509]  [<c049d238>] cpuidle_idle_call+0x5c/0x94
[   94.342509]  [<c02023f8>] cpu_idle+0x4e/0x6f
[   94.342509]  [<c0534153>] rest_init+0x53/0x55
[   94.342509]  [<c0781894>] start_kernel+0x2f0/0x2f5
[   94.342509]  [<c0781091>] i386_start_kernel+0x91/0x96

Reported-by: Oliver Hartkopp <>
Signed-off-by: Dave Young <>
Tested-by: Oliver Hartkopp <>
Signed-off-by: David S. Miller <>
11 years agobluetooth: scheduling while atomic bug fix
Dave Young [Sun, 18 Oct 2009 20:24:41 +0000 (20:24 +0000)]
bluetooth: scheduling while atomic bug fix

Due to driver core changes dev_set_drvdata will call kzalloc which should be
in might_sleep context, but hci_conn_add will be called in atomic context

Like dev_set_name move dev_set_drvdata to work queue function.

oops as following:

Oct  2 17:41:59 darkstar kernel: [  438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
Oct  2 17:41:59 darkstar kernel: [  438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
Oct  2 17:41:59 darkstar kernel: [  438.001348] 2 locks held by sdptool/2133:
Oct  2 17:41:59 darkstar kernel: [  438.001350]  #0:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap]
Oct  2 17:41:59 darkstar kernel: [  438.001360]  #1:  (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap]
Oct  2 17:41:59 darkstar kernel: [  438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
Oct  2 17:41:59 darkstar kernel: [  438.001373] Call Trace:
Oct  2 17:41:59 darkstar kernel: [  438.001381]  [<c022433f>] __might_sleep+0xde/0xe5
Oct  2 17:41:59 darkstar kernel: [  438.001386]  [<c0298843>] __kmalloc+0x4a/0x15a
Oct  2 17:41:59 darkstar kernel: [  438.001392]  [<c03f0065>] ? kzalloc+0xb/0xd
Oct  2 17:41:59 darkstar kernel: [  438.001396]  [<c03f0065>] kzalloc+0xb/0xd
Oct  2 17:41:59 darkstar kernel: [  438.001400]  [<c03f04ff>] device_private_init+0x15/0x3d
Oct  2 17:41:59 darkstar kernel: [  438.001405]  [<c03f24c5>] dev_set_drvdata+0x18/0x26
Oct  2 17:41:59 darkstar kernel: [  438.001414]  [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001422]  [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001429]  [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001437]  [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001442]  [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap]
Oct  2 17:41:59 darkstar kernel: [  438.001448]  [<c04c8df5>] sys_connect+0x60/0x7a
Oct  2 17:41:59 darkstar kernel: [  438.001453]  [<c024b703>] ? lock_release_non_nested+0x84/0x1de
Oct  2 17:41:59 darkstar kernel: [  438.001458]  [<c028804b>] ? might_fault+0x47/0x81
Oct  2 17:41:59 darkstar kernel: [  438.001462]  [<c028804b>] ? might_fault+0x47/0x81
Oct  2 17:41:59 darkstar kernel: [  438.001468]  [<c033361f>] ? __copy_from_user_ll+0x11/0xce
Oct  2 17:41:59 darkstar kernel: [  438.001472]  [<c04c9419>] sys_socketcall+0x82/0x17b
Oct  2 17:41:59 darkstar kernel: [  438.001477]  [<c020329d>] syscall_call+0x7/0xb

Signed-off-by: Dave Young <>
Signed-off-by: David S. Miller <>
11 years agotcp: fix TCP_DEFER_ACCEPT retrans calculation
Julian Anastasov [Mon, 19 Oct 2009 10:10:40 +0000 (10:10 +0000)]
tcp: fix TCP_DEFER_ACCEPT retrans calculation

Fix TCP_DEFER_ACCEPT conversion between seconds and
retransmission to match the TCP SYN-ACK retransmission periods
because the time is converted to such retransmissions. The old
algorithm selects one more retransmission in some cases. Allow
up to 255 retransmissions.

Signed-off-by: Julian Anastasov <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
11 years agotcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
Julian Anastasov [Mon, 19 Oct 2009 10:03:58 +0000 (10:03 +0000)]
tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT

Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT
users to not retransmit SYN-ACKs during the deferring period if
ACK from client was received. The goal is to reduce traffic
during the deferring period. When the period is finished
we continue with sending SYN-ACKs (at least one) but this time
any traffic from client will change the request to established
socket allowing application to terminate it properly.
Also, do not drop acked request if sending of SYN-ACK fails.

Signed-off-by: Julian Anastasov <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
11 years agotcp: accept socket after TCP_DEFER_ACCEPT period
Julian Anastasov [Mon, 19 Oct 2009 10:01:56 +0000 (10:01 +0000)]
tcp: accept socket after TCP_DEFER_ACCEPT period

Willy Tarreau and many other folks in recent years
were concerned what happens when the TCP_DEFER_ACCEPT period
expires for clients which sent ACK packet. They prefer clients
that actively resend ACK on our SYN-ACK retransmissions to be
converted from open requests to sockets and queued to the
listener for accepting after the deferring period is finished.
Then application server can decide to wait longer for data
or to properly terminate the connection with FIN if read()
returns EAGAIN which is an indication for accepting after
the deferring period. This change still can have side effects
for applications that expect always to see data on the accepted
socket. Others can be prepared to work in both modes (with or
without TCP_DEFER_ACCEPT period) and their data processing can
ignore the read=EAGAIN notification and to allocate resources for
clients which proved to have no data to send during the deferring
period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
as a timeout) to wait for data will notice clients that didn't
send data for 3 seconds but that still resend ACKs.
Thanks to Willy Tarreau for the initial idea and to
Eric Dumazet for the review and testing the change.

Signed-off-by: Julian Anastasov <>
Acked-by: Eric Dumazet <>
Signed-off-by: David S. Miller <>
11 years agoRevert "tcp: fix tcp_defer_accept to consider the timeout"
David S. Miller [Tue, 20 Oct 2009 02:12:36 +0000 (19:12 -0700)]
Revert "tcp: fix tcp_defer_accept to consider the timeout"

This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea.

Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
with a more correct way to deal with this.

Signed-off-by: David S. Miller <>
11 years agoAF_UNIX: Fix deadlock on connecting to shutdown socket
Tomoki Sekiyama [Mon, 19 Oct 2009 06:17:37 +0000 (23:17 -0700)]
AF_UNIX: Fix deadlock on connecting to shutdown socket

I found a deadlock bug in UNIX domain socket, which makes able to DoS
attack against the local machine by non-root users.

How to reproduce:
1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
    namespace(*), and shutdown(2) it.
 2. Repeat connect(2)ing to the listening socket from the other sockets
    until the connection backlog is full-filled.
 3. connect(2) takes the CPU forever. If every core is taken, the
    system hangs.

PoC code: (Run as many times as cores on SMP machines.)

int main(void)
int ret;
int csd;
int lsd;
struct sockaddr_un sun;

/* make an abstruct name address (*) */
memset(&sun, 0, sizeof(sun));
sun.sun_family = PF_UNIX;
sprintf(&sun.sun_path[1], "%d", getpid());

/* create the listening socket and shutdown */
lsd = socket(AF_UNIX, SOCK_STREAM, 0);
bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
listen(lsd, 1);
shutdown(lsd, SHUT_RDWR);

/* connect loop */
alarm(15); /* forcely exit the loop after 15 sec */
for (;;) {
csd = socket(AF_UNIX, SOCK_STREAM, 0);
ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
if (-1 == ret) {
puts("Connection OK");
return 0;

(*) Make sun_path[0] = 0 to use the abstruct namespace.
    If a file-based socket is used, the system doesn't deadlock because
    of context switches in the file system layer.

Why this happens:
 Error checks between unix_socket_connect() and unix_wait_for_peer() are
 inconsistent. The former calls the latter to wait until the backlog is
 processed. Despite the latter returns without doing anything when the
 socket is shutdown, the former doesn't check the shutdown state and
 just retries calling the latter forever.

 The patch below adds shutdown check into unix_socket_connect(), so
 connect(2) to the shutdown socket will return -ECONREFUSED.

Signed-off-by: Tomoki Sekiyama <>
Signed-off-by: Masanori Yoshida <>
Signed-off-by: David S. Miller <>
11 years agoethoc: clear only pending irqs
Thomas Chou [Wed, 7 Oct 2009 14:16:43 +0000 (14:16 +0000)]
ethoc: clear only pending irqs

This patch fixed the problem of dropped packets due to lost of
interrupt requests. We should only clear what was pending at the
moment we read the irq source reg.

Signed-off-by: Thomas Chou <>
Signed-off-by: David S. Miller <>
11 years agoethoc: inline regs access
Thomas Chou [Wed, 7 Oct 2009 14:16:42 +0000 (14:16 +0000)]
ethoc: inline regs access

Signed-off-by: Thomas Chou <>
Signed-off-by: David S. Miller <>
11 years agoinotify: fix coalesce duplicate events into a single event in special case
Wei Yongjun [Wed, 14 Oct 2009 12:54:03 +0000 (20:54 +0800)]
inotify: fix coalesce duplicate events into a single event in special case

If we do rename a dir entry, like this:

  rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2")
  rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ")

The duplicate events should be coalesced into a single event. But those two
events do not be coalesced into a single event, due to some bad check in
event_compare(). It can not match the two NULL inodes as the same event.

Signed-off-by: Wei Yongjun <>
Signed-off-by: Eric Paris <>
11 years agoinotify: deprecate the inotify kernel interface
Eric Paris [Mon, 29 Jun 2009 15:13:30 +0000 (11:13 -0400)]
inotify: deprecate the inotify kernel interface

In 2.6.33 there will be no users of the inotify interface.  Mark it for
removal as fsnotify is more generic and is easier to use.

Signed-off-by: Eric Paris <>
11 years agofsnotify: do not set group for a mark before it is on the i_list
Eric Paris [Fri, 11 Sep 2009 17:03:19 +0000 (13:03 -0400)]
fsnotify: do not set group for a mark before it is on the i_list

fsnotify_add_mark is supposed to add a mark to the g_list and i_list and to
set the group and inode for the mark.  fsnotify_destroy_mark_by_entry uses
the fact that ->group != NULL to know if this group should be destroyed or
if it's already been done.

But fsnotify_add_mark sets the group and inode before it actually adds the
mark to the i_list and g_list.  This can result in a race in inotify, it
requires 3 threads.

sys_inotify_add_watch("file") sys_inotify_add_watch("file") sys_inotify_rm_watch([a])
   ^--- returns wd = [a]
   ^--- returns wd = [b]
returns to userspace;
   ^--- gives us the pointer from task 1
   ^--- this is going to set the mark->group and mark->inode fields, but will
return -EEXIST because of the race with [b].
   ^--- since ->group != NULL we call back
into inotify_freeing_mark() which calls

since fsnotify_add_mark() failed we call:
inotify_remove_from_idr([a])     <------WHOOPS it's not in the idr, this could
have been any entry added later!

The fix is to make sure we don't set mark->group until we are sure the mark is
on the inode and fsnotify_add_mark will return success.

Signed-off-by: Eric Paris <>
11 years agoInput: hp_sdc_rtc - fix test in hp_sdc_rtc_read_rt()
Roel Kluin [Sun, 18 Oct 2009 07:17:15 +0000 (00:17 -0700)]
Input: hp_sdc_rtc - fix test in hp_sdc_rtc_read_rt()

If left unsigned the hp_sdc_rtc_read_i8042timer() return value will not
be checked correctly.

Signed-off-by: Roel Kluin <>
Signed-off-by: Dmitry Torokhov <>
11 years agoInput: atkbd - consolidate force release quirks for volume keys
Herton Ronaldo Krzesinski [Fri, 16 Oct 2009 23:13:59 +0000 (16:13 -0700)]
Input: atkbd - consolidate force release quirks for volume keys

Some machines share same key list for volume up/down release key quirks,
use only one key list.

Signed-off-by: Herton Ronaldo Krzesinski <>
Signed-off-by: Dmitry Torokhov <>
11 years agoInput: logips2pp - model 73 is actually TrackMan FX
Dmitry Torokhov [Thu, 15 Oct 2009 16:46:48 +0000 (09:46 -0700)]
Input: logips2pp - model 73 is actually TrackMan FX

Reported-and-tested-by: Harald Dunkel <>
Signed-off-by: Dmitry Torokhov <>
11 years agoInput: i8042 - add Sony Vaio VGN-FZ240E to the nomux list
Dmitry Torokhov [Thu, 15 Oct 2009 16:46:48 +0000 (09:46 -0700)]
Input: i8042 - add Sony Vaio VGN-FZ240E to the nomux list

On this model, when KBD is in active multiplexing mode, acknowledgements
to reset and get ID commands issued on KBD port sometimes are delivered
to AUX3 port (touchpad) which messes up device detection. Legacy KBC
mode works fine and since there are no external PS/2 ports on this laptop
and no support for docking station we can safely disable active MUX mode.

Tested-by: Carlos R. Mafra <>
Signed-off-by: Dmitry Torokhov <>
11 years agovmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
Randy Dunlap [Sat, 17 Oct 2009 00:54:34 +0000 (17:54 -0700)]
vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n

vmxnet3 was using dprintk() for debugging output.  This was
defined in <linux/dst.h> and was the only thing that was
used from that header file.  This caused compile errors
when CONFIG_BLOCK was not enabled due to bio* and BIO*
uses in the header file, so change this driver to use
dev_dbg() for debugging output.

include/linux/dst.h:520: error: dereferencing pointer to incomplete type
include/linux/dst.h:520: error: 'BIO_POOL_BITS' undeclared (first use in this function)
include/linux/dst.h:521: error: dereferencing pointer to incomplete type
include/linux/dst.h:522: error: dereferencing pointer to incomplete type
include/linux/dst.h:525: error: dereferencing pointer to incomplete type
make[4]: *** [drivers/net/vmxnet3/vmxnet3_drv.o] Error 1

Signed-off-by: Randy Dunlap <>
Signed-off-by: Bhavesh Davda <>
Signed-off-by: David S. Miller <>
11 years agodm snapshot: allow chunk size to be less than page size
Mikulas Patocka [Fri, 16 Oct 2009 22:18:22 +0000 (23:18 +0100)]
dm snapshot: allow chunk size to be less than page size

Allow the snapshot chunk size to be smaller than the page size
The code is now capable of handling this due to some previous
fixes and enhancements.

As the page size varies between computers, prior to this patch,
the chunk size of a snapshot dictated which machines could read it:
Snapshots created on one machine might not be readable on another.

Signed-off-by: Mikulas Patocka <>
Reviewed-by: Mike Snitzer <>
Reviewed-by: Jonathan Brassow <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm snapshot: use unsigned integer chunk size
Mikulas Patocka [Fri, 16 Oct 2009 22:18:17 +0000 (23:18 +0100)]
dm snapshot: use unsigned integer chunk size

Use unsigned integer chunk size.

Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
so the number can be 32-bit. This fixes compiler failure on 32-bit systems
with large block devices.

Signed-off-by: Mikulas Patocka <>
Signed-off-by: Mike Snitzer <>
Reviewed-by: Jonathan Brassow <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm snapshot: lock snapshot while supplying status
Mikulas Patocka [Fri, 16 Oct 2009 22:18:16 +0000 (23:18 +0100)]
dm snapshot: lock snapshot while supplying status

This patch locks the snapshot when returning status.  It fixes a race
when it could return an invalid number of free chunks if someone
was simultaneously modifying it.

Signed-off-by: Mikulas Patocka <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm exception store: fix failed set_chunk_size error path
Mikulas Patocka [Fri, 16 Oct 2009 22:18:16 +0000 (23:18 +0100)]
dm exception store: fix failed set_chunk_size error path

Properly close the device if failing because of an invalid chunk size.

Signed-off-by: Mikulas Patocka <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm snapshot: require non zero chunk size by end of ctr
Mikulas Patocka [Fri, 16 Oct 2009 22:18:16 +0000 (23:18 +0100)]
dm snapshot: require non zero chunk size by end of ctr

If we are creating snapshot with memory-stored exception store, fail if
the user didn't specify chunk size. Zero chunk size would probably crash
a lot of places in the rest of snapshot code.

Signed-off-by: Mikulas Patocka <>
Reviewed-by: Jonathan Brassow <>
Reviewed-by: Mike Snitzer <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm: dec_pending needs locking to save error value
Kiyoshi Ueda [Fri, 16 Oct 2009 22:18:15 +0000 (23:18 +0100)]
dm: dec_pending needs locking to save error value

Multiple instances of dec_pending() can run concurrently so a lock is
needed when it saves the first error code.

I have never experienced actual problem without locking and just found
this during code inspection while implementing the barrier support
patch for request-based dm.

This patch adds the locking.
I've done compile, boot and basic I/O testings.

Signed-off-by: Kiyoshi Ueda <>
Signed-off-by: Jun'ichi Nomura <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm: add missing del_gendisk to alloc_dev error path
Zdenek Kabelac [Fri, 16 Oct 2009 22:18:15 +0000 (23:18 +0100)]
dm: add missing del_gendisk to alloc_dev error path

Add missing del_gendisk() to error path when creation of workqueue fails.
Otherwice there is a resource leak and following warning is shown:

WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x160()
sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0'

Signed-off-by: Zdenek Kabelac <>
Reviewed-by: Jonathan Brassow <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm log: userspace fix incorrect luid cast in userspace_ctr
Andrew Morton [Fri, 16 Oct 2009 22:18:15 +0000 (23:18 +0100)]
dm log: userspace fix incorrect luid cast in userspace_ctr


drivers/md/dm-log-userspace-base.c: In function `userspace_ctr':
drivers/md/dm-log-userspace-base.c:159: warning: cast from pointer to integer of different size

Cc: Jonathan Brassow <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm snapshot: free exception store on init failure
Jonathan Brassow [Fri, 16 Oct 2009 22:18:14 +0000 (23:18 +0100)]
dm snapshot: free exception store on init failure

While initializing the snapshot module, if we fail to register
the snapshot target then we must back-out the exception store
module initialization.

Signed-off-by: Jonathan Brassow <>
Reviewed-by: Mikulas Patocka <>
Reviewed-by: Mike Snitzer <>
Signed-off-by: Alasdair G Kergon <>
11 years agodm snapshot: sort by chunk size to fix race
Mikulas Patocka [Fri, 16 Oct 2009 22:18:14 +0000 (23:18 +0100)]
dm snapshot: sort by chunk size to fix race

Avoid a race causing corruption when snapshots of the same origin have
different chunk sizes by sorting the internal list of snapshots by chunk
size, largest first.

For example, let's have two snapshots with different chunk sizes. The
first snapshot (1) has small chunk size and the second snapshot (2) has
large chunk size.  Let's have chunks A, B, C in these snapshots:
snapshot1: ====A====   ====B====
snapshot2: ==========C==========

(Chunk size is a power of 2. Chunks are aligned.)

A write to the origin at a position within A and C comes along. It
triggers reallocation of A, then reallocation of C and links them
together using A as the 'primary' exception.

Then another write to the origin comes along at a position within B and
C.  It creates pending exception for B.  C already has a reallocation in
progress and it already has a primary exception (A), so nothing is done
to it: B and C are not linked.

If the reallocation of B finishes before the reallocation of C, because
there is no link with the pending exception for C it does not know to
wait for it and, the second write is dispatched to the origin and causes
data corruption in the chunk C in snapshot2.

To avoid this situation, we maintain snapshots sorted in descending
order of chunk size.  This leads to a guaranteed ordering on the links
between the pending exceptions and avoids the problem explained above -
both A and B now get linked to C.

Signed-off-by: Mikulas Patocka <>
Signed-off-by: Alasdair G Kergon <>
11 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Fri, 16 Oct 2009 17:13:58 +0000 (10:13 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/bp/bp

* 'for-linus' of git://
  amd64_edac: fix DRAM base and limit extraction masks, v2

11 years agoamd64_edac: fix DRAM base and limit extraction masks, v2
Borislav Petkov [Mon, 12 Oct 2009 15:23:03 +0000 (17:23 +0200)]
amd64_edac: fix DRAM base and limit extraction masks, v2

This is a proper fix as a follow-up to 66216a7 and 916d11b.

Signed-off-by: Borislav Petkov <>
11 years agoMerge branch 'upstream-linus' of git://
Linus Torvalds [Fri, 16 Oct 2009 16:25:11 +0000 (09:25 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://
  sata_mv: Prevent PIO commands to be defered too long if traffic in progress.
  pata_sc1200: Fix crash on boot
  libata: fix internal command failure handling
  libata: fix PMP initialization
  sata_nv: make sure link is brough up online when skipping hardreset
  ahci / atiixp / pci quirks: rename AMD SB900 into Hudson-2
  ahci: Add the AHCI controller Linux Device ID for NVIDIA chipsets.
  pata_via: extend the rev_max for VT6330

11 years agoKVM: Prevent kvm_init from corrupting debugfs structures
Darrick J. Wong [Wed, 14 Oct 2009 23:21:00 +0000 (16:21 -0700)]
KVM: Prevent kvm_init from corrupting debugfs structures

I'm seeing an oops condition when kvm-intel and kvm-amd are modprobe'd
during boot (say on an Intel system) and then rmmod'd:

   # modprobe kvm-intel
     kvm_arch_init()  <-- stores debugfs dentries internally
     (success, etc)

   # modprobe kvm-amd
     kvm_init_debug() <-- second initialization clobbers kvm's
                          internal pointers to dentries
     kvm_exit_debug() <-- and frees them

   # rmmod kvm-intel
     kvm_exit_debug() <-- double free of debugfs files!


If execution gets to the end of kvm_init(), then the calling module has been
established as the kvm provider.  Move the debugfs initialization to the end of
the function, and remove the now-unnecessary call to kvm_exit_debug() from the
error path.  That way we avoid trampling on the debugfs entries and freeing
them twice.

Signed-off-by: Darrick J. Wong <>
Signed-off-by: Marcelo Tosatti <>
11 years agoKVM: MMU: fix pointer cast
Frederik Deweerdt [Fri, 9 Oct 2009 11:42:56 +0000 (11:42 +0000)]
KVM: MMU: fix pointer cast

On a 32 bits compile, commit 3da0dd433dc399a8c0124d0614d82a09b6a49bce
introduced the following warnings:

arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’:
arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different size
arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’:
arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different size

The following patch uses 'unsigned long' instead of u64 to match the
pointer size on both arches.

Signed-off-by: Frederik Deweerdt <>
Signed-off-by: Marcelo Tosatti <>
11 years agoKVM: use proper hrtimer function to retrieve expiration time
Marcelo Tosatti [Thu, 8 Oct 2009 13:55:03 +0000 (10:55 -0300)]
KVM: use proper hrtimer function to retrieve expiration time

hrtimer->base can be temporarily NULL due to racing hrtimer_start.
See switch_hrtimer_base/lock_hrtimer_base.

Use hrtimer_get_remaining which is robust against it.

Signed-off-by: Marcelo Tosatti <>
Signed-off-by: Avi Kivity <>
11 years agosata_mv: Prevent PIO commands to be defered too long if traffic in progress.
Gwendal Grignou [Mon, 12 Oct 2009 22:44:00 +0000 (15:44 -0700)]
sata_mv: Prevent PIO commands to be defered too long if traffic in progress.

Use excl_link when non NCQ commands are defered, to be sure they are processed
as soon as outstanding commands are completed. It prevents some commands to be
defered indifinitely when using a port multiplier.

Signed-off-by: Gwendal Grignou <>
Signed-off-by: Jeff Garzik <>
11 years agopata_sc1200: Fix crash on boot
Alan Cox [Tue, 6 Oct 2009 15:07:51 +0000 (16:07 +0100)]
pata_sc1200: Fix crash on boot

The SC1200 needs a NULL terminator or it may cause a crash on boot.

Bug #14227

Also correct a bogus comment as the driver had serializing added so can run
dual port.

Signed-off-by: Alan Cox <>
Signed-off-by: Jeff Garzik <>
11 years agolibata: fix internal command failure handling
Tejun Heo [Fri, 16 Oct 2009 04:00:51 +0000 (13:00 +0900)]
libata: fix internal command failure handling

When an internal command fails, it should be failed directly without
invoking EH.  In the original implemetation, this was accomplished by
letting internal command bypass failure handling in ata_qc_complete().
However, later changes added post-successful-completion handling to
that code path and the success path is no longer adequate as internal
command failure path.  One of the visible problems is that internal
command failure due to timeout or other freeze conditions would
spuriously trigger WARN_ON_ONCE() in the success path.

This patch updates failure path such that internal command failure
handling is contained there.

Signed-off-by: Tejun Heo <>
Signed-off-by: Jeff Garzik <>
11 years agolibata: fix PMP initialization
Tejun Heo [Thu, 15 Oct 2009 14:37:32 +0000 (23:37 +0900)]
libata: fix PMP initialization

Commit 842faa6c1a1d6faddf3377948e5cf214812c6c90 fixed error handling
during attach by not committing detected device class to dev->class
while attaching a new device.  However, this change missed the PMP
class check in the configuration loop causing a new PMP device to go
through ata_dev_configure() as if it were an ATA or ATAPI device.

As PMP device doesn't have a regular IDENTIFY data, this makes
ata_dev_configure() tries to configure a PMP device using an invalid
data.  For the most part, it wasn't too harmful and went unnoticed but
this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
sata_pmp_attach().  This means that SATA_PMP_FEAT_NOTIFY ends up being
disabled on PMPs and on PMPs which honor the flag breaks hotplug

This problem was discovered and reported by Ethan Hsiao.

Signed-off-by: Tejun Heo <>
Reported-by: Ethan Hsiao <>
Signed-off-by: Jeff Garzik <>
11 years agosata_nv: make sure link is brough up online when skipping hardreset
Tejun Heo [Wed, 14 Oct 2009 02:18:28 +0000 (11:18 +0900)]
sata_nv: make sure link is brough up online when skipping hardreset

prereset doesn't bring link online if hardreset is about to happen and
nv_hardreset() may skip if conditions are not right so softreset may
be entered with non-working link status if the system firmware didn't
bring it up before entering OS code which can happen during resume.
This patch makes nv_hardreset() to bring up the link if it's skipping

This bug was reported by in the following bug entry.

Signed-off-by: Tejun Heo <>
Signed-off-by: Jeff Garzik <>
11 years agoahci / atiixp / pci quirks: rename AMD SB900 into Hudson-2
Shane Huang [Tue, 13 Oct 2009 03:14:00 +0000 (11:14 +0800)]
ahci / atiixp / pci quirks: rename AMD SB900 into Hudson-2

This patch renames the code name SB900 into Hudson-2

Signed-off-by: Shane Huang <>
Signed-off-by: Jeff Garzik <>
11 years agoahci: Add the AHCI controller Linux Device ID for NVIDIA chipsets.
peer chen [Thu, 15 Oct 2009 08:34:56 +0000 (16:34 +0800)]
ahci: Add the AHCI controller Linux Device ID for NVIDIA chipsets.

Add the generic device ID for NVIDIA AHCI controller.

Signed-off-by: Peer Chen <>
Signed-off-by: Jeff Garzik <>
11 years agopata_via: extend the rev_max for VT6330 [Fri, 16 Oct 2009 07:45:23 +0000 (15:45 +0800)]
pata_via: extend the rev_max for VT6330

Fix the VT6330 issue, it's because the rev_max of VT6330 exceeds 0x2f.
The VT6415 and VT6330 share the same device ID.

Signed-off-by: Joseph Chan <>
Signed-off-by: Jeff Garzik <>
11 years agosh: Kill off stray HAVE_FTRACE_SYSCALLS reference.
Paul Mundt [Fri, 16 Oct 2009 09:14:19 +0000 (18:14 +0900)]
sh: Kill off stray HAVE_FTRACE_SYSCALLS reference.

This seems to have popped back in via some merge damage. Kill it off.

Signed-off-by: Paul Mundt <>
11 years agosh: Remove BKL from landisk gio.
Thomas Gleixner [Fri, 16 Oct 2009 05:42:33 +0000 (14:42 +0900)]
sh: Remove BKL from landisk gio.

The open function got the BKL via the big push down. Replace it by
preempt_enable/disable as this is sufficient for an UP machine.

The ioctl can be unlocked because there is no functionality which
requires serialization. The usage by multiple callers is broken with
and without the BKL due to the local static variable addr.

Signed-off-by: Thomas Gleixner <>
Signed-off-by: Paul Mundt <>
11 years agosh: disabled cache handling fix.
Magnus Damm [Fri, 16 Oct 2009 05:38:48 +0000 (14:38 +0900)]
sh: disabled cache handling fix.

Add code to handle the cache disabled case. Fixes breakage introduced by
37443ef3f0406e855e169c87ae3f4ffb4b6ff635 ("sh: Migrate SH-4 cacheflush
ops to function pointers."). Without this patch configuring caches off
with CONFIG_CACHE_OFF=y makes kfr2r09 and migo-r lock up in fbdev
deferred io or early user space.

Signed-off-by: Magnus Damm <>
Signed-off-by: Paul Mundt <>
11 years agosh: Fix up single page flushing to use PAGE_SIZE.
Valentin Sitdikov [Fri, 16 Oct 2009 05:15:38 +0000 (14:15 +0900)]
sh: Fix up single page flushing to use PAGE_SIZE.

Presently The SH-4 cache flushing code uses flush_cache_4096() for most
of the real flushing work, which breaks down to a fixed 4096 unroll and
increment. Not only is this sub-optimal for larger page sizes, it's also
uncovered a bug in sh4_flush_dcache_page() when large page sizes are used
and we have no cache aliases -- resulting in only a part of the page's
D-cache lines being written back.

Signed-off-by: Valentin Sitdikov <>
Signed-off-by: Paul Mundt <>
11 years agoLinux 2.6.32-rc5 v2.6.32-rc5
Linus Torvalds [Fri, 16 Oct 2009 00:41:50 +0000 (17:41 -0700)]
Linux 2.6.32-rc5

11 years agoMerge branch 'docs-next' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:21:42 +0000 (15:21 -0700)]
Merge branch 'docs-next' of git://

* 'docs-next' of git://
  Update flex_arrays.txt

11 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:21:20 +0000 (15:21 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/teigland/dlm

* 'for-linus' of git://
  dlm: fix socket fd translation
  dlm: fix lowcomms_connect_node for sctp

11 years agoMerge branch 'x86-fixes-for-linus' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:20:17 +0000 (15:20 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://
  Revert "x86: linker script syntax nits"
  x86, perf_event: Rename 'performance counter interrupt'

11 years agoKEYS: get_instantiation_keyring() should inc the keyring refcount in all cases
David Howells [Thu, 15 Oct 2009 09:14:35 +0000 (10:14 +0100)]
KEYS: get_instantiation_keyring() should inc the keyring refcount in all cases

The destination keyring specified to request_key() and co. is made available to
the process that instantiates the key (the slave process started by
/sbin/request-key typically).  This is passed in the request_key_auth struct as
the dest_keyring member.

keyctl_instantiate_key and keyctl_negate_key() call get_instantiation_keyring()
to get the keyring to attach the newly constructed key to at the end of
instantiation.  This may be given a specific keyring into which a link will be
made later, or it may be asked to find the keyring passed to request_key().  In
the former case, it returns a keyring with the refcount incremented by
lookup_user_key(); in the latter case, it returns the keyring from the
request_key_auth struct - and does _not_ increment the refcount.

The latter case will eventually result in an oops when the keyring prematurely
runs out of references and gets destroyed.  The effect may take some time to
show up as the key is destroyed lazily.

To fix this, the keyring returned by get_instantiation_keyring() must always
have its refcount incremented, no matter where it comes from.

This can be tested by setting /etc/request-key.conf to:

#====== ======= =============== =============== ===============================
create  * test:* * |/bin/false %u %g %d %{user:_display}
negate * * * /bin/keyctl negate %k 10 @u

and then doing:

keyctl add user _display aaaaaaaa @u
        while keyctl request2 user test:x test:x @u &&
        keyctl list @u;
                keyctl request2 user test:x test:x @u;
                sleep 31;
                keyctl list @u;

which will oops eventually.  Changing the negate line to have @u rather than
%S at the end is important as that forces the latter case by passing a special
keyring ID rather than an actual keyring ID.

Reported-by: Alexander Zangerl <>
Signed-off-by: David Howells <>
Tested-by: Alexander Zangerl <>
Signed-off-by: Linus Torvalds <>
11 years agoMerge branch 'merge' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:15:03 +0000 (15:15 -0700)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

* 'merge' of git://
  powerpc/pci: Fix MODPOST warning
  powerpc/oprofile: Add ppc750 CL as supported by oprofile
  powerpc: warning: allocated section `.data_nosave' not in segment
  powerpc/kgdb: Fix build failure caused by "kgdb.c: unused variable 'acc'"
  powerpc: Fix hypervisor TLB batching
  powerpc/mm: Fix hang accessing top of vmalloc space
  powerpc: Fix memory leak in axon_msi.c
  powerpc/pmac: Fix issues with sleep on some powerbooks
  powerpc64/ftrace: use PACA to retrieve TOC in mod_return_to_handler
  powerpc/ftrace: show real return addresses in modules

11 years agoMerge branch 'release' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:10:27 +0000 (15:10 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6

* 'release' of git://
  ACPI button: don't try to use a non-existent lid device
  ACPI: video: Loosen strictness of video bus detection code
  eeepc-laptop: Prevent a panic when disabling RT2860 wireless when associated
  eeepc-laptop: Properly annote eeepc_enable_camera().
  ACPI / PCI: Fix NULL pointer dereference in acpi_get_pci_dev() (rev. 2)
  fujitsu-laptop: address missed led-class ifdef fixup
  ACPI: Kconfig, fix proc aggregator text
  ACPI: add AC/DC notifier

11 years agoMerge branch 'omap-fixes-for-linus' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:09:55 +0000 (15:09 -0700)]
Merge branch 'omap-fixes-for-linus' of git://git./linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://
  OMAP2xxx clock: set up clockdomain pointer in struct clk
  OMAP: Fix race condition with autodeps
  omap: McBSP: Fix incorrect receiver stop in omap_mcbsp_stop
  omap: Initialization of SDRC params on Zoom2
  omap: RX-51: Drop I2C-1 speed to 2200
  omap: SDMA: Fixing bug in omap_dma_set_global_params()
  omap: CONFIG_ISP1301_OMAP redefined in Beagle defconfig

11 years agoMerge branch 'master' of git://
Linus Torvalds [Thu, 15 Oct 2009 22:06:37 +0000 (15:06 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/mason/btrfs-unstable

* 'master' of git://
  Btrfs: always pin metadata in discard mode
  Btrfs: enable discard support
  Btrfs: add -o discard option
  Btrfs: properly wait log writers during log sync
  Btrfs: fix possible ENOSPC problems with truncate
  Btrfs: fix btrfs acl #ifdef checks
  Btrfs: streamline tree-log btree block writeout
  Btrfs: avoid tree log commit when there are no changes
  Btrfs: only write one super copy during fsync

11 years agoMerge git://
Linus Torvalds [Thu, 15 Oct 2009 22:06:02 +0000 (15:06 -0700)]
Merge git://git./linux/kernel/git/gregkh/tty-2.6

* git://
  tty: fix vt_compat_ioctl

11 years agoMerge git://
Linus Torvalds [Thu, 15 Oct 2009 22:05:46 +0000 (15:05 -0700)]
Merge git://git./linux/kernel/git/gregkh/driver-core-2.6

* git://
  sysfs: Allow sysfs_notify_dirent to be called from interrupt context.
  sysfs: Allow sysfs_move_dir(..., NULL) again.