Device mapper uses its own bounce_pfn that may differ from one on underlying
device. In that way dm can build incorrect requests that contain sg elements
greater than underlying device is able to handle.
This is the cause of slab corruption in i2o layer, occurred on i386 arch when
very long direct IO requests are addressed to dm-over-i2o device.
Signed-off-by: Vasily Averin <vvs@sw.ru>
Cc: <stable@kernel.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
After switching data directions, deadline always starts the next batch
from the lowest-sector request. This gives excessive deadline expiries
and large latency and throughput disparity between high- and low-sector
requests; an order of magnitude in some tests.
This patch changes the batching behaviour so new batches start from the
request whose expiry is earliest.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The deadline I/O scheduler does not reset the batch count when starting
a new batch at a higher-sectored request. This means the second and
subsequent batch in the same data direction will never exceed a single
request in size whenever higher-sectored requests are pending.
This patch gives new batches in the same data direction as old ones
their full quota of requests by resetting the batch count.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Factor finding the next request in sector-sorted order into
a function deadline_latter_request.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sg_mark_end() overwrites the page_link information, but all users want
__sg_mark_end() behaviour where we just set the end bit. That is the most
natural way to use the sg list, since you'll fill it in and then mark the
end point.
So change sg_mark_end() to only set the termination bit. Add a sg_magic
debug check as well, and clear a chain pointer if it is set.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Not architecture specific code should not #include <asm/scatterlist.h>.
This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds in the x3proto and magicpanelr2 mach types, plugs in
highlander and rts7751r2d groups, and also hooks up the r2d
subtypes.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
R7780RP can't do byte-sized accesses to CF, so needs to do word
sized access with low-byte masking. This same problem exists
on older versions of the R2D, with the same workaround having
been implemented in 43f4b8c757
there. Follow that change for the highlander boards.
This does not impact R7780MP or SH7785 based Highlander modules.
If you're unfortunate enough to be stuck with an R7780RP, this
patch is for you!
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
It's assumed that .eh_frame is terminated with 4-byte 0 in shared
libraries and executable. It seems to be the case for VDSOs too.
Without this terminator, I saw failures when unwinding from VDSO,
though I don't know how other architectures handle this issue.
For the normal libs, crtendS.o gives this terminator. We can use
such terminating objects. Or we can add a 4-byte 0 with modifying
the linker script like as the patch below.
Signed-off-by: Kaz Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
When configuring the kernel natively the uname matching is off,
so fix up the uname mangling to get the proper SUBARCH. Needs
an explicit range so that SH-5 doesn't break.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
While using separate IRQ stacks can cut down on stack consumption,
many users can also use 4k stacks directly without the additional
need of separate stacks for soft and hardirqs.
With this split, we support the same rationale for 4KSTACKS as
m68knommu, with the IRQSTACKS abstraction as per ppc64.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
movca.l is restricted to SH-4 and up only, though compilers that
are unable to support ISA tuning (especially older versions of
binutils) will happily compile in the bogus opcode on older parts.
Conditionalize it to fix SH-3 regressions noted by Kristoffer.
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Many mouse drivers are often compiled (e.g. in Linux distributions) into the
kernel at the same time just to make sure that at least one driver will suceed
in find it's mouse device. Nevertheless, only the inport and logitech busmouse
mouse drivers report with KERN_ERR log level if the mouse wasn't found. They
should use KERN_INFO instead, because it's not an error if the mouse isn't
attached at all.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Fountains do not support change mode request and therefore
should be excluded from idle reset attempts.
Also:
- do not re-submit URB when we decide that touchpad needs to be
reinicialized
- do not repeat size detection when reinitializing the touchpad
- Add missing KERN_* prefixes to messages
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
When testing the myri10ge driver with 2.6.24-rc1, I found
that the machine crashed under heavy load:
Unable to handle kernel paging request at 0000000000100108 RIP:
[<ffffffff803cc8dd>] net_rx_action+0x11b/0x184
The address corresponds to the list_move_tail() in
netif_rx_complete():
if (unlikely(work == weight))
list_move_tail(&n->poll_list, list);
Eventually, I traced the crashes to calling netif_rx_complete() with
work_done == budget. From looking at other drivers, it appears that
one should only call netif_rx_complete() when work_done < budget.
To fix it, I changed the test in myri10ge_poll() so that it refers
to to work_done rather than looking at the rx ring status. If
work_done is < budget, then that implies we have no more packets to
process. Any races will be resolved by the NIC when the write to
irq_claim is made.
In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
report a work_done one larger than was acutally done. This is because
the increment was done in the conditional, so work_done would be
incremented regardless of whether or not the test passed or failed.
This would lead to the WARN_ON_ONCE(work > weight); warning in
net_rx_action triggering. I've moved the increment of work_done
inside the loop. Note that this would only be a problem when we had
exceeded our budget.
Signed off by: Andrew Gallatin <gallatin@myri.com>
Andrew Gallatin Myricom Inc
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Driver shouldn't complain if the register range is larger than what
it expects. This works around failures with some device trees.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When not building an arch/powerpc kernel, the mpc5200 FEC driver depends
on some symbols which are not defined (BESTCOMM & BESTCOMM_FEC).
This patch flips around the dependancy logic so that it cannot be
selected unless BESTCOMM_FEC is selected first. Kconfig stops
complaining this way.
Also, the driver only works for arch/powerpc (not arch/ppc) anyway so
it should depend on PPC_MERGE also.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[IRDA] IRNET: Fix build when TCGETS2 is defined.
[NET]: docbook fixes for netif_ functions
[NET]: Hide the net_ns kmem cache
[NET]: Mark the setup_net as __net_init
[NET]: Hide the dead code in the net_namespace.c
[NET]: Relax the reference counting of init_net_ns
[NETNS]: Make the init/exit hooks checks outside the loop
[NET]: Forget the zero_it argument of sk_alloc()
[NET]: Remove bogus zero_it argument from sk_alloc
[NET]: Make the sk_clone() lighter
[NET]: Move some core sock setup into sk_prot_alloc
[NET]: Auto-zero the allocated sock object
[NET]: Cleanup the allocation/freeing of the sock object
[NET]: Move the get_net() from sock_copy()
[NET]: Move the sock_copy() from the header
[TCP]: Another TAGBITS -> SACKED_ACKED|LOST conversion
[TCP]: Process DSACKs that reside within a SACK block
Documentation updates for network interfaces.
1. Add doc for netif_napi_add
2. Remove doc for unused returns from netif_rx
3. Add doc for netif_receive_skb
[ Incorporated minor mods from Randy Dunlap -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This cache is only required to create new namespaces,
but we won't have them in CONFIG_NET_NS=n case.
Hide it under the appropriate ifdef.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The setup_net is called for the init net namespace
only (int the CONFIG_NET_NS=n of course) from the __init
function, so mark it as __net_init to disappear with the
caller after the boot.
Yet again, in the perfect world this has to be under
#ifdef CONFIG_NET_NS, but it isn't guaranteed that every
subsystem is registered *after* the init_net_ns is set
up. After we are sure, that we don't start registering
them before the init net setup, we'll be able to move
this code under the ifdef.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The namespace creation/destruction code is never called
if the CONFIG_NET_NS is n, so it's OK to move it under
appropriate ifdef.
The copy_net_ns() in the "n" case checks for flags and
returns -EINVAL when new net ns is requested. In a perfect
world this stub must be in net_namespace.h, but this
function need to know the CLONE_NEWNET value and thus
requires sched.h. On the other hand this header is to be
injected into almost every .c file in the networking code,
and making all this code depend on the sched.h is a
suicidal attempt.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the "n" case.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the new pernet something (subsys, device or operations) is
being registered, the init callback is to be called for each
namespace, that currently exitst in the system. During the
unregister, the same is to be done with the exit callback.
However, not every pernet something has both calls, but the
check for the appropriate pointer to be not NULL is performed
inside the for_each_net() loop.
This is (at least) strange, so tune this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.
Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.
This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope
this particular split helped.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
At this point nobody calls the sk_alloc(() with zero_it == 0,
so remove unneeded checks from it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sk_prot_alloc() already performs all the stuff needed by the
sk_clone(). Besides, the sk_prot_alloc() requires almost twice
less arguments than the sk_alloc() does, so call the sk_prot_alloc()
saving the stack a bit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The security_sk_alloc() and the module_get is a part of the
object allocations - move it in the proper place.
Note, that since we do not reset the newly allocated sock
in the sk_alloc() (memset() is removed with the previous
patch) we can safely do this.
Also fix the error path in sk_prot_alloc() - release the security
context if needed.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a __GFP_ZERO flag that allocates a zeroed chunk of memory.
Use it in the sk_alloc() and avoid a hand-made memset().
This is a temporary patch that will help us in the nearest future :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock object is allocated either from the generic cache with
the kmalloc, or from the proc->slab cache.
Move this logic into an isolated set of helpers and make the
sk_alloc/sk_free look a bit nicer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_copy() is supposed to just clone the socket. In a perfect
world it has to be just memcpy, but we have to handle the security
mark correctly. All the extra setup must be performed in sk_clone()
call, so move the get_net() into more proper place.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_copy() call is not used outside the sock.c file,
so just move it into a sock.c
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to commit 3eec0047d9, point of this is to avoid
skipping R-bit skbs.
Signed-off-by: Ilpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSACK inside another SACK block were missed if start_seq of DSACK
was larger than SACK block's because sorting prioritizes full
processing of the SACK block before DSACK. After SACK block
sorting situation is like this:
SSSSSSSSS
D
SSSSSS
SSSSSSS
Because write_queue is walked in-order, when the first SACK block
has been processed, TCP is already past the skb for which the
DSACK arrived and we haven't taught it to backtrack (nor should
we), so TCP just continues processing by going to the next SACK
block after the DSACK (if any).
Whenever such DSACK is present, do an embedded checking during
the previous SACK block.
If the DSACK is below snd_una, there won't be overlapping SACK
block, and thus no problem in that case. Also if start_seq of
the DSACK is equal to the actual block, it will be processed
first.
Tested this by using netem to duplicate 15% of packets, and
by printing SACK block when found_dup_sack is true and the
selected skb in the dup_sack = 1 branch (if taken):
SACK block 0: 4344-5792 (relative to snd_una 2019137317)
SACK block 1: 4344-5792 (relative to snd_una 2019137317)
equal start seqnos => next_dup = 0, dup_sack = 1 won't occur...
SACK block 0: 5792-7240 (relative to snd_una 2019214061)
SACK block 1: 2896-7240 (relative to snd_una 2019214061)
DSACK skb match 5792-7240 (relative to snd_una)
...and next_dup = 1 case (after the not shown start_seq sort),
went to dup_sack = 1 branch.
Signed-off-by: Ilpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was found by make randconfig
If the kernel .text is very large, the .fixup section branches
are too far away to be relocated correctly.
Use "sethi %hi(label), reg; jmpl reg + %lo(label); %g0" sequence
instead of the branch to fix this.
There is another case in switch_to() involving a branch, which
is fixed similarly.
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't export verify_compat_iovec when CONFIG_NET is
disabled, and consequently the Solaris compat module
should also depend upon CONFIG_NET.
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_BUG is turned off, the standard trick of:
switch (x) {
case X:
...
case Y:
...
default:
BUG();
};
to mark impossible cases does not work because BUG() evalutes
to nothing and thus GCC just sees a fallthrough code path.
Add an explicit KERN_ERR log message and a do_exit() to trap
this case.
Signed-off-by: David S. Miller <davem@davemloft.net>
It is unused since we went to an I-cache flush that solely used
the 'flush' instruction, and it's presence breaks the build
when PAGE_SIZE is 512KB.
Signed-off-by: David S. Miller <davem@davemloft.net>