we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checksum calculation in nf_nat_snmp_basic.
Based on patches by Clark Wang <wtweeker@163.com> and
Stephen Hemminger <shemminger@vyatta.com>.
https://bugzilla.kernel.org/show_bug.cgi?id=17622
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as rcu_read_unlock() is called, there is no guarantee current
thread can safely derefence t pointer, rcu protected.
Fix is to copy t->alloc_size in a temporary variable.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_me_harder can't create the route cache when the outdev is the same
with the indev for the skbs whichout a valid protocol set.
__mkroute_input functions has this check:
1998 if (skb->protocol != htons(ETH_P_IP)) {
1999 /* Not IP (i.e. ARP). Do not create route, if it is
2000 * invalid for proxy arp. DNAT routes are always valid.
2001 *
2002 * Proxy arp feature have been extended to allow, ARP
2003 * replies back to the same interface, to support
2004 * Private VLAN switch technologies. See arp.c.
2005 */
2006 if (out_dev == in_dev &&
2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
2008 err = -EINVAL;
2009 goto cleanup;
2010 }
2011 }
This patch gives the new skb a valid protocol to bypass this check. In order
to make ipt_REJECT work with bridges, you also need to enable ip_forward.
This patch also fixes a regression. When we used skb_copy_expand(), we
didn't have this issue stated above, as the protocol was properly set.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I initially noticed this because of the compiler warning below, but it
does seem to be a valid concern in the case where ct_sip_get_header()
returns 0 in the first iteration of the while loop.
net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp':
net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function
Signed-off-by: Simon Horman <horms@verge.net.au>
[Patrick: changed NF_DROP to NF_ACCEPT]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Prevent no-handler signal syscall restart recursion.
sparc: Don't mask signal when we can't setup signal frame.
sparc64: Fix race in signal instruction flushing.
sparc64: Support RAW perf events.
Make sigreturn zero regs->trap, make do_signal() do the same on all
paths. As it is, signal interrupting e.g. read() from fd 512 (==
ERESTARTSYS) with another signal getting unblocked when the first
handler finishes will lead to restart one insn earlier than it ought
to. Same for multiple signals with in-kernel handlers interrupting
that sucker at the same time. Same for multiple signals of any kind
interrupting that sucker on 64bit...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
bdi: Fix warnings in __mark_inode_dirty for /dev/zero and friends
char: Mark /dev/zero and /dev/kmem as not capable of writeback
bdi: Initialize noop_backing_dev_info properly
cfq-iosched: fix a kernel OOPs when usb key is inserted
block: fix blk_rq_map_kern bio direction flag
cciss: freeing uninitialized data on error path
The log eventfd signalling got put in dead code.
We didn't notice because qemu currently does polling
instead of eventfd select.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Make sure we stay within the cache boundaries when updating the
register cache.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
On the HT-Omega Claro halo card, the ADC data must be captured from the
second I2S input. Using the default first input, which isn't connected
to anything, would result in silence.
Signed-off-by: Erik J. Staab <ejs@insightbb.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Inodes of devices such as /dev/zero can get dirty for example via
utime(2) syscall or due to atime update. Backing device of such inodes
(zero_bdi, etc.) is however unable to handle dirty inodes and thus
__mark_inode_dirty complains. In fact, inode should be rather dirtied
against backing device of the filesystem holding it. This is generally a
good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes
in these pseudofilesystems are referenced from ordinary filesystem
inodes and carry mapping with real data of the device. Thus for these
inodes we have to use inode->i_mapping->backing_dev_info as we did so
far. We distinguish these filesystems by checking whether sb->s_bdi
points to a non-trivial backing device or not.
Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /.
There's a device inode A described by a path "/dev/sdb" on this
filesystem. This inode will be dirtied against backing device "8:0"
after this patch. bdev filesystem contains block device inode B coupled
with our inode A. When someone modifies a page of /dev/sdb, it's B that
gets dirtied and the dirtying happens against the backing device "8:16".
Thus both inodes get filed to a correct bdi list.
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
These devices don't do any writeback but their device inodes still can get
dirty so mark bdi appropriately so that bdi code does the right thing and files
inodes to lists of bdi carrying the device inodes.
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Properly initialize this backing dev info so that writeback code does not
barf when getting to it e.g. via sb->s_bdi.
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Explicitly clear the "in-syscall" bit when we have no signal
handler and back up the program counters to back up the system
call.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't invoke the signal handler tracehook in that situation
either.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes sense for a BO to move after a process has requested
exclusive RW access on it (e.g. because the BO used to be located in
unmappable VRAM and we intercepted the CPU access from the fault
handler).
If we let the ghost object inherit cpu_writers from the original
object, ttm_bo_release_list() will raise a kernel BUG when the ghost
object is destroyed. This can be reproduced with the nouveau driver on
nv5x.
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The lock structs are currently protected by the BKL, but are accessed by
code in fs/locks.c and misc file system and DLM code. These stubs will
allow all users to switch to the new interface before the implementation
is changed to a spinlock.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the i2c bus receives an interrupt with both BB (bus busy) and
ARDY (register access ready) statuses set during the tranfer of the last message
the bus was put to idle while still busy.
This caused bus to timeout.
Signed-off-by: Mathias Nyman <mathias.nyman@nokia.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Special care should be taken when slow path is hit in ip_fragment() :
When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.
Many thanks to Nick Bowler, who provided a very clean bug report and
test program.
Thanks to Jarek for reviewing my first patch and providing a V2
While Nick bisection pointed to commit 2b85a34e91 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)
A side effect is to extend work done in commit b2722b1c3a
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9eecabcb9a ("intel-iommu: Abort
IOMMU setup for igfx if BIOS gave no shadow GTT space") uses a bunch of
magic numbers. Provide #defines for those to make it look slightly saner.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix nohz balance kick
sched: Fix user time incorrectly accounted as system time on 32-bit
skb->truesize is set in core network.
Dont change it unless dealing with fragments.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->truesize is set in core network.
Dont change it unless dealing with fragments.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: select CRYPTO
ceph: check mapping to determine if FILE_CACHE cap is used
ceph: only send one flushsnap per cap_snap per mds session
ceph: fix cap_snap and realm split
ceph: stop sending FLUSHSNAPs when we hit a dirty capsnap
ceph: correctly set 'follows' in flushsnap messages
ceph: fix dn offset during readdir_prepopulate
ceph: fix file offset wrapping at 4GB on 32-bit archs
ceph: fix reconnect encoding for old servers
ceph: fix pagelist kunmap tail
ceph: fix null pointer deref on anon root dentry release
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
drm/i915: Hold a reference to the object whilst unbinding the eviction list
drm/i915,agp/intel: Add second set of PCI-IDs for B43
drm/i915: Fix Sandybridge fence registers
drm/i915/crt: Downgrade warnings for hotplug failures
drm/i915: Ensure that the crtcinfo is populated during mode_fixup()
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: update comments to reflect LHCALL_LOAD_GDT_ENTRY.
virtio: console: Prevent userspace from submitting NULL buffers
virtio: console: Fix poll blocking even though there is data to read
earlyprintk can take and I/O port, so we need to handle this case in
the setup code too, otherwise 0x3f8 will be treated as a baud rate.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C7B05A6.4010801@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Torsten reported that there is garbage output,
after commit 8fee13a48e (x86,
setup: enable early console output from the decompressor)
It turns out we missed the offset for that case.
Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C7B0578.8090807@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
There's a situation where the nohz balancer will try to wake itself:
cpu-x is idle which is also ilb_cpu
got a scheduler tick during idle
and the nohz_kick_needed() in trigger_load_balance() checks for
rq_x->nr_running which might not be zero (because of someone waking a
task on this rq etc) and this leads to the situation of the cpu-x
sending a kick to itself.
And this can cause a lockup.
Avoid this by not marking ourself eligible for kicking.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1284400941.2684.19.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike reported a kernel crash when a usb key hotplug is performed while all
kernel thrads are not in a root cgroup and are running in one of the child
cgroups of blkio controller.
BUG: unable to handle kernel NULL pointer dereference at 0000002c
IP: [<c11c7b08>] cfq_get_queue+0x232/0x412
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:1.0/host3/scsi_host/host3/uevent
[..]
Pid: 30039, comm: scsi_scan_3 Not tainted 2.6.35.2-fg.roam #1 Volvi2 /Aspire 4315
EIP: 0060:[<c11c7b08>] EFLAGS: 00010086 CPU: 0
EIP is at cfq_get_queue+0x232/0x412
EAX: f705f9c0 EBX: e977abac ECX: 00000000 EDX: 00000000
ESI: f00da400 EDI: f00da4ec EBP: e977a800 ESP: dff8fd00
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process scsi_scan_3 (pid: 30039, ti=dff8e000 task=f6b6c9a0 task.ti=dff8e000)
Stack:
00000000 00000000 00000001 01ff0000 f00da508 00000000 f00da524 f00da540
<0> e7994940 dd631750 f705f9c0 e977a820 e977ac44 f00da4d0 00000001 f6b6c9a0
<0> 00000010 00008010 0000000b 00000000 00000001 e977a800 dd76fac0 00000246
Call Trace:
[<c11c7f10>] ? cfq_set_request+0x228/0x34c
[<c11c7ce8>] ? cfq_set_request+0x0/0x34c
[<c11bb3b9>] ? elv_set_request+0xf/0x1c
[<c11bdd51>] ? get_request+0x1ad/0x22f
[<c11bddf2>] ? get_request_wait+0x1f/0x11a
[<c11d013b>] ? kvasprintf+0x33/0x3b
[<c127b537>] ? scsi_execute+0x1d/0x103
[<c127b675>] ? scsi_execute_req+0x58/0x83
[<c127c391>] ? scsi_probe_and_add_lun+0x188/0x7c2
[<c12718c6>] ? attribute_container_add_device+0x15/0xfa
[<c11c95d1>] ? kobject_get+0xf/0x13
[<c126d1db>] ? get_device+0x10/0x14
[<c127be93>] ? scsi_alloc_target+0x217/0x24d
[<c127cbd8>] ? __scsi_scan_target+0x95/0x480
[<c10204eb>] ? dequeue_entity+0x14/0x1fe
[<c1020491>] ? update_curr+0x165/0x1ab
[<c1020491>] ? update_curr+0x165/0x1ab
[<c127d00d>] ? scsi_scan_channel+0x4a/0x76
[<c127d0b0>] ? scsi_scan_host_selected+0x77/0xad
[<c127d13c>] ? do_scan_async+0x0/0x11a
[<c127d137>] ? do_scsi_scan_host+0x51/0x56
[<c127d13c>] ? do_scan_async+0x0/0x11a
[<c127d14a>] ? do_scan_async+0xe/0x11a
[<c127d13c>] ? do_scan_async+0x0/0x11a
[<c10354c5>] ? kthread+0x5e/0x63
[<c1035467>] ? kthread+0x0/0x63
[<c1002af6>] ? kernel_thread_helper+0x6/0x10
Code: 44 24 1c 54 83 44 24 18 54 83 fa 03 75 94 8b 06 c7 86 64 02 00 00 01 00 00 00 83 e0 03 09 f0 89 06 8b 44 24 28 8b 90 58 01 00 00 <8b> 42 2c 85 c0 75 03 8b 42 08 8d 54 24 48 52 8d 4c 24 50 51 68
EIP: [<c11c7b08>] cfq_get_queue+0x232/0x412 SS:ESP 0068:dff8fd00
CR2: 000000000000002c
---[ end trace 9a88306573f69b12 ]---
The problem here is that we don't have bdi->dev information available when
thread does some IO. Hence when dev_name() tries to access bdi->dev, it
crashes.
This problem does not happen if kernel threads are in root group as root
group is statically allocated at device initialization time and we don't
hit this piece of code.
Fix it by delaying the filling of major and minor number information of
device in blk_group. Initially a blk_group is created with 0 as device
information and this information is filled later once some more IO comes
in from same group.
Reported-by: Mike Kazantsev <mk.fraggod@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This bug was introduced in 7b6d91daee
"block: unify flags for struct bio and struct request"
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The "h->scatter_list" is allocated inside a for loop. If any of those
allocations fail, then the rest of the list is uninitialized data. When
we free it we should start from the top and free backwards so that we
don't call kfree() on uninitialized pointers.
Also if the allocation for "h->scatter_list" fails then we would get an
Oops here. I should have noticed this when I send: 4ee69851c "cciss:
handle allocation failure." but I didn't. Sorry about that.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
If another cpu does a very wide munmap() on the signal frame area,
it can tear down the page table hierarchy from underneath us.
Borrow an idea from the 64-bit fault path's get_user_insn(), and
disable cross call interrupts during the page table traversal
to lock them in place while we operate.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
pcpu_first/last_unit_cpu are used to track which cpu has the first and
last units assigned. This in turn is used to determine the span of a
chunk for man/unmap cache flushes and whether an address belongs to
the first chunk or not in per_cpu_ptr_to_phys().
When the number of possible CPUs isn't power of two, a chunk may
contain unassigned units towards the end of a chunk. The logic to
determine pcpu_last_unit_cpu was incorrect when there was an unused
unit at the end of a chunk. It failed to ignore the unused unit and
assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.
This was discovered through kdump failure which was caused by
malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
CPUs by CAI Qian.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: CAI Qian <caiqian@redhat.com>
Cc: stable@kernel.org
We used to have a hypercall which reloaded the entire GDT, then we
switched to one which loaded a single entry (to match the IDT code).
Some comments were not updated, so fix them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported by: Eviatar Khen <eviatarkhen@gmail.com>
A userspace could submit a buffer with 0 length to be written to the
host. Prevent such a situation.
This was not needed previously, but recent changes in the way write()
works exposed this condition to trigger a virtqueue event to the host,
causing a NULL buffer to be sent across.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: stable@kernel.org
I found this while working on a Linux agent for spice, the symptom I was
seeing was select blocking on the spice vdagent virtio serial port even
though there were messages queued up there.
virtio_console's port_fops_poll checks port->inbuf != NULL to determine
if read won't block. However if an application reads enough bytes from
inbuf through port_fops_read, to empty the current port->inbuf,
port->inbuf will be NULL even though there may be buffers left in the
virtqueue.
This causes poll() to block even though there is data to be read,
this patch fixes this by using will_read_block(port) instead of the
port->inbuf != NULL check.
Signed-off-By: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org