* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (123 commits)
wimax/i2400m: add CREDITS and MAINTAINERS entries
wimax: export linux/wimax.h and linux/wimax/i2400m.h with headers_install
i2400m: Makefile and Kconfig
i2400m/SDIO: TX and RX path backends
i2400m/SDIO: firmware upload backend
i2400m/SDIO: probe/disconnect, dev init/shutdown and reset backends
i2400m/SDIO: header for the SDIO subdriver
i2400m/USB: TX and RX path backends
i2400m/USB: firmware upload backend
i2400m/USB: probe/disconnect, dev init/shutdown and reset backends
i2400m/USB: header for the USB bus driver
i2400m: debugfs controls
i2400m: various functions for device management
i2400m: RX and TX data/control paths
i2400m: firmware loading and bootrom initialization
i2400m: linkage to the networking stack
i2400m: Generic probe/disconnect, reset and message passing
i2400m: host/device procotol and core driver definitions
i2400m: documentation and instructions for usage
wimax: Makefile, Kconfig and docbook linkage for the stack
...
A race between svc_revisit and svc_delete_xprt can result in
deferred requests holding references on a transport that can never be
recovered because dead transports are not enqueued for subsequent
processing.
Check for XPT_DEAD in revisit to clean up completing deferrals on a dead
transport and sweep a transport's deferred queue to do the same for queued
but unprocessed deferrals.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The rqstp structure has a pointer to a svc_deferred_req record
that is allocated when requests are deferred. This record is common
to all transports and can be freed in common code.
Move the kfree of the rq_deferred to the common svc_xprt_release
function.
This also fixes a memory leak in the RDMA transport which does not
kfree the dr structure in it's version of the xpo_release_rqst callback.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
This patch provides Makefile and KConfig for the WiMAX stack,
integrating them into the networking stack's Makefile, Kconfig and
doc-book templates.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Implements the three basic operations provided by the stack's control
interface to WiMAX devices:
- Messaging channel between user space and driver/device
This implements a direct communication channel between user space
and the driver/device, by which free form messages can be sent back
and forth.
This is intended for device-specific features, vendor quirks, etc.
- RF-kill framework integration
Provide most of the RF-Kill integration for WiMAX drivers so that
all device drivers have to do is after wimax_dev_add() is call
wimax_report_rfkill_{hw,sw}() to update initial state and then every
time it changes.
Provides wimax_rfkill() for the kernel to call to set software
RF-Kill status and/or query current hardware and software switch
status.
Exports wimax_rfkill() over generic netlink to user space.
- Reset a WiMAX device
Provides wimax_reset() for the kernel to reset a wimax device as
needed and exports it over generic netlink to user space.
This API is clearly limited, as it still provides no way to do the
basic scan, connect and disconnect in a hardware independent way. The
WiMAX case is more complex than WiFi due to the way networks are
discovered and provisioned.
The next developments are to add the basic operations so they can be
offerent by different drivers. However, we'd like to get more vendors
to jump in and provide feedback of how the user/kernel API/abstraction
layer should be.
The user space code for the i2400m, as of now, uses the messaging
channel, but that will change as the API evolves.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add an EXPORT_SYMBOL() to genl_unregister_mc_group(), to allow
unregistering groups on the run. EXPORT_SYMBOL_GPL() is not used as
the rest of the functions exported by this module (eg:
genl_register_mc_group) are also not _GPL().
Cleanup is currently done when unregistering a family, but there is
no way to unregister a single multicast group due to that function not
being exported. Seems to be a mistake as it is documented as for
external consumption.
This is needed by the WiMAX stack to be able to cleanup unused mc
groups.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Implements the basic life cycles of a 'struct wimax_dev', some common
generic netlink functionality for marshalling calls to user space,
and the device state machine.
For looking up net devices based on their generic netlink family IDs,
use a low overhead method that optimizes for the case where most
systems have a single WiMAX device, or at most, a very low number of
WiMAX adaptors.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This file contains a simple debug framework that is used in the stack;
it allows the debug level to be controlled at compile-time (so the
debug code is optimized out) and at run-time (for what wasn't compiled
out).
This is eventually going to be moved to use dynamic_printk(). Just
need to find time to do it.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This file contains declarations and definitions used by the different
submodules of the stack.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #3]
Revert "CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]"
SELinux: shrink sizeof av_inhert selinux_class_perm and context
CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]
keys: fix sparse warning by adding __user annotation to cast
smack: Add support for unlabeled network hosts and networks
selinux: Deprecate and schedule the removal of the the compat_net functionality
netlabel: Update kernel configuration API
Convert this driver to use net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AF_CAN core delivered always cloned sk_buffs to the AF_CAN
protocols, although this was _only_ needed by the can-raw protocol.
With this (additionally documented) change, the AF_CAN core calls the
callback functions of the registered AF_CAN protocols with the original
(uncloned) sk_buff pointer and let's the can-raw protocol do the
skb_clone() itself which omits all unneeded skb_clone() calls for other
AF_CAN protocols.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GRO interfaces for hardware-assisted VLAN reception.
With this in place we're now at parity with LRO as far as the
interface is concerned. That is, you can now take any LRO driver
and convert it over to GRO.
As the CB memory clashes with GRO's use of CB, I've removed it
entirely by storing dev in skb->dev. This is OK because VLAN
gets called first thing in netif_receive_skb and skb->dev is
not used in between us calling netif_rx and netif_receive_skb
getting called.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously GRO's only entry point from the outside is through
napi_gro_receive and napi_gro_frags. These interfaces are for
device drivers.
This patch rearranges things to provide a new set of interfaces
for VLANs. These interfaces are for internal use only. The
VLAN code itself can then provide a set of entry points for
device drivers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users have been converted to either the general-purpose allocator,
dma_find_channel, or dma_request_channel.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that clients no longer need to be notified of channel arrival
dma_async_client_register can simply increment the dmaengine_ref_count.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use the general-purpose channel allocation provided by dmaengine.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
async_tx and net_dma each have open-coded versions of issue_pending_all,
so provide a common routine in dmaengine.
The implementation needs to walk the global device list, so implement
rcu to allow dma_issue_pending_all to run lockless. Clients protect
themselves from channel removal events by holding a dmaengine reference.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed. Once the clients are done re-enable module
removal.
Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
is currently done, requires a complicated scheme to avoid cache-line
bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
dma-driver can be gracefully removed ahead of its user (net, md, or
dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
if such an engine were built one day we still would not need to notify
clients of remove events. The driver can simply return NULL to a
->prep() request, something that is much easier for a client to handle.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We want to ensure that connected sockets close down the connection when we
set XPT_CLOSE, so that we don't keep it hanging while cleaning up all the
stuff that is keeping a reference to the socket.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:
(serv->sv_nrthreads + 3) * 20
Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.
Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.
This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
qlge: Fix sparse warnings for tx ring indexes.
qlge: Fix sparse warning regarding rx buffer queues.
qlge: Fix sparse endian warning in ql_hw_csum_setup().
qlge: Fix sparse endian warning for inbound packet control block flags.
qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
myri10ge: print MAC and serial number on probe failure
pkt_sched: cls_u32: Fix locking in u32_change()
iucv: fix cpu hotplug
af_iucv: Free iucv path/socket in path_pending callback
af_iucv: avoid left over IUCV connections from failing connects
af_iucv: New error return codes for connect()
net/ehea: bitops work on unsigned longs
Revert "net: Fix for initial link state in 2.6.28"
tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
tcp: don't mask EOF and socket errors on nonblocking splice receive
dccp: Integrate the TFRC library with DCCP
dccp: Clean up ccid.c after integration of CCID plugins
dccp: Lockless integration of CCID congestion-control plugins
qeth: get rid of extra argument after printk to dev_* conversion
qeth: No large send using EDDP for HiperSockets.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
inotify: fix type errors in interfaces
fix breakage in reiserfs_new_inode()
fix the treatment of jfs special inodes
vfs: remove duplicate code in get_fs_type()
add a vfs_fsync helper
sys_execve and sys_uselib do not call into fsnotify
zero i_uid/i_gid on inode allocation
inode->i_op is never NULL
ntfs: don't NULL i_op
isofs check for NULL ->i_op in root directory is dead code
affs: do not zero ->i_op
kill suid bit only for regular files
vfs: lseek(fd, 0, SEEK_CUR) race condition
New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
This isn't enough without rmb() on the read side, but on the other
hand adding such barriers doesn't give any savings, so the lock is
added instead.
Reported-by: m0sia <m0sia@plotinka.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the iucv module is compiled in/loaded but no user is registered cpu
hot remove doesn't work. Reason for that is that the iucv cpu hotplug
notifier on CPU_DOWN_PREPARE checks if the iucv_buffer_cpumask would
be empty after the corresponding bit would be cleared. However the bit
was never set since iucv wasn't enable. That causes all cpu hot unplug
operations to fail in this scenario.
To fix this use iucv_path_table as an indicator wether iucv is enabled
or not.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free iucv path after iucv_path_sever() calls in iucv_callback_connreq()
(path_pending() iucv callback).
If iucv_path_accept() fails, free path and free/kill newly created socket.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For certain types of AFIUCV socket connect failures IUCV connections
are left over. Add some cleanup-statements to avoid cluttered IUCV
connections.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the iucv_path_connect() call fails then return an error code that
corresponds to the iucv_path_connect() failure condition; instead of
returning -ECONNREFUSED for any failure.
This helps to improve error handling for user space applications
(e.g. inform the user that the z/VM guest is not authorized to
connect to other guest virtual machines).
The error return codes are based on those described in connect(2).
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 22604c8668.
We can't fix this issue in this way, because we now can try
to take the dev_base_lock rwlock as a writer in software interrupt
context and that is not allowed without major surgery elsewhere.
This initial link state problem needs to be solved in some other
way.
Signed-off-by: David S. Miller <davem@davemloft.net>
... and don't bother in callers. Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.
i_mode is not worth the effort; it has no common default value.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In splice TCP receive, the SPLICE_F_NONBLOCK flag is used
to compute the "timeo" value. So checking it again inside
of the main receive loop to trigger -EAGAIN processing is
entirely unnecessary.
Noticed by Jarek P. and Lennert Buytenhek.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket
results in masking of EOF (RDHUP) and error conditions on the socket
by an -EAGAIN return. Move the NONBLOCK check in tcp_splice_read()
to be after the EOF and error checks to fix this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch integrates the TFRC library, which is a dependency of CCID-3 (and
CCID-4), with the new use of CCIDs in the DCCP module.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans up after integrating the CCID modules and, in addition,
* moves the if/else cases from ccid_delete() into ccid_hc_{tx,rx}_delete();
* removes the 'gfp' argument to ccid_new() - since it is always gfp_any().
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:
* enables a faster connection path by eliminating the need to always go
through the CCID registration lock;
* updates the implementation to use only a single array whose size equals
the number of configured CCIDs instead of the maximum (256);
* since the CCIDs are now fixed array elements, synchronization is no
longer needed, simplifying use and implementation.
CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit ca109491f6 ("hrtimer:
removing all ur callback modes") the hrtimer callbacks are processed
only in hardirq context.
This patch moves some functionality into tasklets to run in softirq
context.
Additionally some duplicated code was removed in bcm_rx_thr_flush()
and an avoidable memcpy was removed from bcm_rx_handler().
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kfree_skb instead of kfree for struct sk_buff pointers.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Michael Marineau <mike@marineau.org>
Commit b47300168e "Do not fire linkwatch
events until the device is registered." was made as a workaround for
drivers that call netif_carrier_off before registering the device.
Unfortunately this causes these drivers to incorrectly report their
link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
flag when the interface is first brought up. This issues was
previously pointed out[1] but was dismissed saying that IFF_RUNNING is
not related to the link status. From my digging IFF_RUNNING, as
reported to userspace, is based on the link state. It is set based on
__LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
not reported to user space so it may well be independent of the link,
I don't know if and when it may get set.)
The end result depends slightly depending on the driver. The the two I
tested were e1000e and b44. With e1000e if the system is booted
without a network cable attached the interface will falsely report
RUNNING when it is brought up causing NetworkManager to attempt to
start it and eventually time out. With b44 when the system is booted
with a network cable attached and brought up with dhcpcd it will time
out the first time.
The attached patch that will still set the operstate variable
correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
but then return rather than skipping the linkwatch_fire_event call
entirely as the previous fix did. (sorry it isn't inline, I don't have
a patch friendly email client at the moment)
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 4dec9b807b ("rfkill: strip pointless
notifier chain") removed the only user of rfkill_led_trigger() that was not
guarded by #ifdef CONFIG_RFKILL_LEDS. Therefore, move rfkill_led_trigger()
completely inside #ifdef CONFIG_RFKILL_LEDS and avoid the compile time
warning:
net/rfkill/rfkill.c:59: warning: 'rfkill_led_trigger' defined but not used
Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.
It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb. This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.
The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.
The merging itself is rather simple. We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry. Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.
If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself. The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to excellent diagnosis by Eduard Guzovsky.
The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up. If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.
IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).
So simply mimick this on the ipv6 side.
Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roel Kluin noted that line is unsigned so one test is unneccessary. Also
add a warning for another flaw I noticed while making this change.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add new LSM hooks for path-based checks. Call them on directory-modifying
operations at the points where we still know the vfsmount involved.
Signed-off-by: Kentaro Takeda <takedakn@nttdata.co.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Toshiharu Harada <haradats@nttdata.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.
Signed-off-by: Paul Moore <paul.moore@hp.com>
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (70 commits)
fs/nfs/nfs4proc.c: make nfs4_map_errors() static
rpc: add service field to new upcall
rpc: add target field to new upcall
nfsd: support callbacks with gss flavors
rpc: allow gss callbacks to client
rpc: pass target name down to rpc level on callbacks
nfsd: pass client principal name in rsc downcall
rpc: implement new upcall
rpc: store pointer to pipe inode in gss upcall message
rpc: use count of pipe openers to wait for first open
rpc: track number of users of the gss upcall pipe
rpc: call release_pipe only on last close
rpc: add an rpc_pipe_open method
rpc: minor gss_alloc_msg cleanup
rpc: factor out warning code from gss_pipe_destroy_msg
rpc: remove unnecessary assignment
NFS: remove unused status from encode routines
NFS: increment number of operations in each encode routine
NFS: fix comment placement in nfs4xdr.c
NFS: fix tabs in nfs4xdr.c
...
When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.
Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.
Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators
and other comparisons.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
cls_cgroup can't be compiled as a module, since it's not supported by
cgroup.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- It's better to use container_of() instead of casting cgroup_subsys_state *
to cgroup_cls_state *.
- Add helper function task_cls_state().
- Rename net_cls_state() to cgrp_cls_state().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing a cgroup, an oops was triggered immediately. The cause
is wrong kfree() in cgrp_destroy().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During network namespace teardown we either move or delete
all of the network devices associated with a network namespace.
In the case of veth devices deleting one will also delete it's
pair device. If both devices are in the same network namespace
then for_each_netdev_safe is insufficient as next may point
to the second veth device we have deleted.
To avoid problems I do what we do in __rtnl_kill_links and
restart the scan of the device list, after we have deleted
a device.
Currently dev_change_netnamespace does not appear to suffer from
this problem, but wireless devices are also paired and likely
should be moved between network namespaces together. So I have
errored on the side of caution and restart the scan of the network
devices in that case as well.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
The initial skb may have been freed after napi_gro_complete in
napi_gro_receive if it was merged into an existing packet. Thus
we cannot check same_flow (which indicates whether it was merged)
after calling napi_gro_complete.
This patch fixes this by saving the same_flow status before the
call to napi_gro_complete.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent GRO patches introduced the NAPI removal of devices in
free_netdev. For drivers that can change the number of queues during
driver operation, the NAPI infrastructure doesn't allow the freeing and
re-addition of NAPI entities without reloading the driver.
This change reinitializes the dev_list in each NAPI struct on delete,
instead of just deleting it (and assigning the list pointers to POISON).
Drivers that wish to remove/re-add NAPI will need to re-initialize the
netdev napi_list after removing all NAPI instances, before re-adding NAPI
devices again.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a useless ret variable from the IPv4 ESP/UDP
decapsulation code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
atif is tested for being NULL twice, with the same effect in each case. I
have kept the second test, as it seems to fit well with the comment above it.
A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@
if (x@p1 == NULL || ...) { ... when forall
return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
x@p2 == NULL
|
x@p2 != NULL
)
// another path to the test that is not through p1?
@s exists@
local idexpression r.x;
position r.p1,r.p2;
@@
... when != x@p1
(
x@p2 == NULL
|
x@p2 != NULL
)
@fix depends on !s@
position r.p1,r.p2;
expression x,E;
statement S1,S2;
@@
(
- if ((x@p2 != NULL) || ...)
S1
|
- if ((x@p2 == NULL) && ...) S1
|
- BUG_ON(x@p2 == NULL);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.
This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.
Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.
Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted. This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.
The only potential downside is that applications on the remote
end may see multiple SIGURG notifications. However, this would
occur anyway with other TCP stacks. More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The latest ietf socket extensions API draft said:
8.1.21. Set or Get the SCTP Partial Delivery Point
Note also that the call will fail if the user attempts to set
this value larger than the socket receive buffer size.
This patch add this validity check for SCTP_PARTIAL_DELIVERY_POINT
socket option.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If FWD-TSN chunk is received with bad stream ID, the sctp will not do the
validity check, this may cause memory overflow when overwrite the TSN of
the stream ID.
The FORWARD-TSN chunk is like this:
FORWARD-TSN chunk
Type = 192
Flags = 0
Length = 172
NewTSN = 99
Stream = 10000
StreamSequence = 0xFFFF
This patch fix this problem by discard the chunk if stream ID is not
less than MIS.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket
extensions API draft.
8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER)
This option gets the current number of associations that are attached
to a one-to-many style socket. The option value is an uint32_t.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just fix a typo in socket.c.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brings maxseg socket option set/get into line with the latest ietf socket
extensions API draft, while maintaining backwards compatibility.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 656299f706
(vlan: convert to net_device_ops) added a net_device_ops
with a NULL ndo_start_xmit field.
This gives a crash in dev_hard_start_xmit()
Fix it using two net_device_ops structures, one for hwaccel vlan,
one for non hwaccel vlan.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the followinf proc entries per-netns:
/proc/net/igmp
/proc/net/mcfilter
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks like everything is already ready.
Required for ebtables(8) for one thing.
Also, required for ipmr per-netns (coming soon). (Benjamin)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a locking free version of iucv_message_receive and iucv_message_send
that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
This patch extends the new upcall with a "service" field that currently
can have 2 values: "*" or "nfs". These values specify matching rules for
principals in the keytab file. The "*" means that gssd is allowed to use
"root", "nfs", or "host" keytab entries while the other option requires
"nfs".
Restricting gssd to use the "nfs" principal is needed for when the
server performs a callback to the client. The server in this case has
to authenticate itself as an "nfs" principal.
We also need "service" field to distiguish between two client-side cases
both currently using a uid of 0: the case of regular file access by the
root user, and the case of state-management calls (such as setclientid)
which should use a keytab for authentication. (And the upcall should
fail if an appropriate principal can't be found.)
Signed-off: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch extends the new upcall by adding a "target" field
communicating who we want to authenticate to (equivalently, the service
principal that we want to acquire a ticket for).
Signed-off: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds server-side support for callbacks other than AUTH_SYS.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds client-side support to allow for callbacks other than
AUTH_SYS.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rpc client needs to know the principal that the setclientid was done
as, so it can tell gssd who to authenticate to.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Two principals are involved in krb5 authentication: the target, who we
authenticate *to* (normally the name of the server, like
nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we
authenticate *as* (normally a user, like bfields@UMICH.EDU)
In the case of NFSv4 callbacks, the target of the callback should be the
source of the client's setclientid call, and the source should be the
nfs server's own principal.
Therefore we allow svcgssd to pass down the name of the principal that
just authenticated, so that on setclientid we can store that principal
name with the new client, to be used later on callbacks.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Implement the new upcall. We decide which version of the upcall gssd
will use (new or old), by creating both pipes (the new one named "gssd",
the old one named after the mechanism (e.g., "krb5")), and then waiting
to see which version gssd actually opens.
We don't permit pipes of the two different types to be opened at once.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Keep a pointer to the inode that the message is queued on in the struct
gss_upcall_msg. This will be convenient, especially after we have a
choice of two pipes that an upcall could be queued on.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a global variable pipe_version which will eventually be used
to keep track of which version of the upcall gssd is using.
For now, though, it only keeps track of whether any pipe is open or not;
it is negative if not, zero if one is opened. We use this to wait for
the first gssd to open a pipe.
(Minor digression: note this waits only for the very first open of any
pipe, not for the first open of a pipe for a given auth; thus we still
need the RPC_PIPE_WAIT_FOR_OPEN behavior to wait for gssd to open new
pipes that pop up on subsequent mounts.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Keep a count of the number of pipes open plus the number of messages on
a pipe. This count isn't used yet.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I can't see any reason we need to call this until either the kernel or
the last gssd closes the pipe.
Also, this allows to guarantee that open_pipe and release_pipe are
called strictly in pairs; open_pipe on gssd's first open, release_pipe
on gssd's last close (or on the close of the kernel side of the pipe, if
that comes first).
That will make it very easy for the gss code to keep track of which
pipes gssd is using.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want to transition to a new gssd upcall which is text-based and more
easily extensible.
To simplify upgrades, as well as testing and debugging, it will help if
we can upgrade gssd (to a version which understands the new upcall)
without having to choose at boot (or module-load) time whether we want
the new or the old upcall.
We will do this by providing two different pipes: one named, as
currently, after the mechanism (normally "krb5"), and supporting the
old upcall. One named "gssd" and supporting the new upcall version.
We allow gssd to indicate which version it supports by its choice of
which pipe to open.
As we have no interest in supporting *simultaneous* use of both
versions, we'll forbid opening both pipes at the same time.
So, add a new pipe_open callback to the rpc_pipefs api, which the gss
code can use to track which pipes have been open, and to refuse opens of
incompatible pipes.
We only need this to be called on the first open of a given pipe.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I want to add a little more code here, so it'll be convenient to have
this flatter.
Also, I'll want to add another error condition, so it'll be more
convenient to return -ENOMEM than NULL in the error case. The only
caller is already converting NULL to -ENOMEM anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We'll want to call this from elsewhere soon. And this is a bit nicer
anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We're just about to kfree() gss_auth, so there's no point to setting any
of its fields.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There's a bit of a chicken and egg problem when it comes to destroying
auth_gss credentials. When we destroy the last instance of a GSSAPI RPC
credential, we should send a NULL RPC call with a GSS procedure of
RPCSEC_GSS_DESTROY to hint to the server that it can destroy those
creds.
This isn't happening because we're setting clearing the uptodate bit on
the credentials and then setting the operations to the gss_nullops. When
we go to do the RPC call, we try to refresh the creds. That fails with
-EACCES and the call fails.
Fix this by not clearing the UPTODATE bit for the credentials and adding
a new crdestroy op for gss_nullops that just tears down the cred without
trying to destroy the context.
The only difference between this patch and the first one is the removal
of some minor formatting deltas.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hi.
I've been looking at a bugzilla which describes a problem where
a customer was advised to use either the "noac" or "actimeo=0"
mount options to solve a consistency problem that they were
seeing in the file attributes. It turned out that this solution
did not work reliably for them because sometimes, the local
attribute cache was believed to be valid and not timed out.
(With an attribute cache timeout of 0, the cache should always
appear to be timed out.)
In looking at this situation, it appears to me that the problem
is that the attribute cache timeout code has an off-by-one
error in it. It is assuming that the cache is valid in the
region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The
cache should be considered valid only in the region,
[read_cache_jiffies, read_cache_jiffies + attrtimeo). With this
change, the options, "noac" and "actimeo=0", work as originally
expected.
This problem was previously addressed by special casing the
attrtimeo == 0 case. However, since the problem is only an off-
by-one error, the cleaner solution is address the off-by-one
error and thus, not require the special case.
Thanx...
ps
Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We've never considered the sunrpc code as part of any ABI to be used by
out-of-tree modules.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Somehow, this escaped the previous purge. There should be no need to keep
any extra locks in the XDR callbacks.
The NFS client XDR code only writes into private objects, whereas all reads
of shared objects are confined to fields that do not change, such as
filehandles...
Ditto for lockd, the NFSv2/v3 client mount code, and rpcbind.
The nfsd XDR code may require the BKL, but since it does a synchronous RPC
call from a thread that already holds the lock, that issue is moot.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
While implementing a TCQ_F_THROTTLED flag there was used an smp_wmb()
in qdisc_watchdog(), but since this flag is practically used only in
sch_netem(), and since it's not even clear what reordering is avoided
here (TCQ_F_THROTTLED vs. __QDISC_STATE_SCHED?) it seems the barrier
could be safely removed.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A command like this: "brctl addif br1 eth1" issued as a user gave me
an oops when bridge module wasn't loaded. It's caused by using a dev
pointer before checking for NULL.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some gcc versions warn that ret may be used uninitialized in
sfq_enqueue(). It's a false positive, so let's annotate this.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds the Backward Congestion Notification Address (BCNA) attribute to the
Backward Congestion Notification (BCN) interface for Data Center Bridging
(DCB), which was missing. Receive the BCNA attribute in the ixgbe driver.
The BCNA attribute is for a switch to inform the endstation about the physical
port identification in order to support BCN on aggregated links.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Data Center Bridging (DCB) had no way to know if setstate had failed in the
driver. This patch enables dcb netlink code to handle the status for the DCB
setstate interface. Likewise it allows the driver to return a failed status
if MSI-X isn't enabled.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements dynamic power save for mac80211. Basically it
means enabling power save mode after an idle period. Implementing it
dynamically gives a good compromise of low power consumption and low
latency. Some hardware have support for this in firmware, but some
require the host to do it.
The dynamic power save is implemented by adding an timeout to
ieee80211_subif_start_xmit(). The timeout can be enabled from userspace
with Wireless Extensions. For example, the command below enables the
dynamic power save and sets the time timeout to 500 ms:
iwconfig wlan0 power timeout 500m
Power save now only works with devices which handle power save in firmware.
It's also disabled by default and the heuristics when and how to enable is
considered as a policy decision and will be left for the userspace to handle.
In case the firmware has support for this, drivers can disable this feature
with IEEE80211_HW_NO_STACK_DYNAMIC_PS.
Big thanks to Johannes Berg for the help with the design and code.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is a preparation for the dynamic power save support. In future there are
two paths to stop the master queues and we need to track this properly to
avoid starting queues incorrectly. Implement this by adding a status
array for each queue.
The original idea and design is from Johannes Berg, I just did
the implementation based on his notes. All the bugs are mine, of course.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Also disable power save when disassociated. It makes no sense to have
power save enabled while disassociated.
iwlwifi seems to have this check in the driver, but it's better to do this
in mac80211 instead.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In stress testing p54usb, the WARN_ON() in ieee80211_tasklet_handler() was
triggered; however, there is no logging of the received value for packet
type. Adding that feature will improve the warning.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes a typo in ieee80211_send_assoc(), net/mac80211/mlme.c.
The error is usage of a wrong member when building
the ie80211 management frame (it should be assoc_req, and not reassoc_req).
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we do not currently report HT rates (MCS index) in radiotap
header for HT rates, we should not claim the rate is present. The rate
octet itself is used as padding in this case, so only the it_present
flag needs to be removed in case of HT rates.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a STA roams back to the same AP before the previous STA entry has
expired, a new STA entry is not added in mac80211. However, a Layer 2
Update frame still needs to be transmitted to update layer 2 devices
about the new location for the STA. Without this, switches may
continue to forward frames to the previous (now incorrect) port when
STA roams between APs.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds option for HT-enabled drivers to report HT rates
(HT20/HT40, short GI, MCS index) to mac80211. These rates are
currently not in the rate table, so the rate_idx is used to indicate
MCS index.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
HT management is done differently for AP and STA modes, unify
to just the ->config() callback since HT is fundamentally a
PHY property and cannot be per-BSS.
Rename enum nl80211_sec_chan_offset as nl80211_channel_type to denote
the channel type ( NO_HT, HT20, HT40+, HT40- ).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds signal strength and transmission bitrate
to the station_info of nl80211.
Signed-off-by: Henning Rogge <rogge@fgan.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The original message was unhelpful and extremely alarming to our poor
users, despite its charm. Make it less frightening.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel_accept() does not hold the module refcount of newsock->ops->owner,
so we need __module_get(newsock->ops->owner) code after call kernel_accept()
by hand.
In sunrpc, the module refcount is missing to hold. So this cause kernel panic.
Used following script to reproduct:
while [ 1 ];
do
mount -t nfs4 192.168.0.19:/ /mnt
touch /mnt/file
umount /mnt
lsmod | grep ipv6
done
This patch fixed the problem by add __module_get(newsock->ops->owner) to
kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
in every place when used kernel_accept().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 7035560287.
As pointed out by Mark McLoughlin IP_PKTINFO cmsg data is one
post-queueing user, so this optimization is not valid right
now.
Signed-off-by: David S. Miller <davem@davemloft.net>
And thus when we try to use 'ss -danemi' on these sockets that have no
ccid blocks (data collected using systemtap after I fixed the problem):
dccp_diag_get_info sk=0xffff8801220a3100, dp->dccps_hc_rx_ccid=0x0000000000000000, dp->dccps_hc_tx_ccid=0x0000000000000000
We get an OOPS:
mica.ghostprotocols.net login: BUG: unable to handle kernel NULL pointer
dereferenc0
IP: [<ffffffffa0136082>] dccp_diag_get_info+0x82/0xc0 [dccp_diag]
PGD 12106f067 PUD 122488067 PMD 0
Oops: 0000 [#1] PREEMPT
Fix is trivial, and 'ss -d' is working again:
[root@mica ~]# ss -danemi
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 0 *:5001 *:*
ino:7288 sk:220a3100ffff8801
mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
[root@mica ~]#
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GPRS TX flow control won't need to lock the underlying socket anymore.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A separate xmit lock class supports GPRS over a Phonet pipe over a TUN
device (type ARPHRD_NONE).
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also leave some room for more 802.11 types.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1.When no interface is specified in an IPV6_PKTINFO ancillary data
item, the interface specified in an IPV6_PKTINFO sticky optionis
is used.
RFC3542:
6.7. Summary of Outgoing Interface Selection
This document and [RFC-3493] specify various methods that affect the
selection of the packet's outgoing interface. This subsection
summarizes the ordering among those in order to ensure deterministic
behavior.
For a given outgoing packet on a given socket, the outgoing interface
is determined in the following order:
1. if an interface is specified in an IPV6_PKTINFO ancillary data
item, the interface is used.
2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
option, the interface is used.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When get receiving interface index while no message is received,
the the value seted with setsockopt() should be returned.
RFC 3542:
Issuing getsockopt() for the above options will return the sticky
option value i.e., the value set with setsockopt(). If no sticky
option value has been set getsockopt() will return the following
values:
- For the IPV6_PKTINFO option, it will return an in6_pktinfo
structure with ipi6_addr being in6addr_any and ipi6_ifindex being
zero.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are three reasons for me to add this support:
1.When no interface is specified in an IPV6_PKTINFO ancillary data
item, the interface specified in an IPV6_PKTINFO sticky optionis
is used.
RFC3542:
6.7. Summary of Outgoing Interface Selection
This document and [RFC-3493] specify various methods that affect the
selection of the packet's outgoing interface. This subsection
summarizes the ordering among those in order to ensure deterministic
behavior.
For a given outgoing packet on a given socket, the outgoing interface
is determined in the following order:
1. if an interface is specified in an IPV6_PKTINFO ancillary data
item, the interface is used.
2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
option, the interface is used.
2.When no IPV6_PKTINFO ancillary data is received,getsockopt() should
return the sticky option value which set with setsockopt().
RFC 3542:
Issuing getsockopt() for the above options will return the sticky
option value i.e., the value set with setsockopt(). If no sticky
option value has been set getsockopt() will return the following
values:
3.Make the setsockopt implementation POSIX compliant.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ethtool ops to enable and disable GRO. It also
makes GRO depend on RX checksum offload much the same as how TSO
depends on SG support.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the TCP-specific portion of GRO. The criterion for
merging is extremely strict (the TCP header must match exactly apart
from the checksum) so as to allow refragmentation. Otherwise this
is pretty much identical to LRO, except that we support the merging
of ECN packets.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the helper skb_gro_receive to merge packets for
GRO. The current method is to allocate a new header skb and then
chain the original packets to its frag_list. This is done to
make it easier to integrate into the existing GSO framework.
In future as GSO is moved into the drivers, we can undo this and
simply chain the original packets together.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GRO support for IPv4.
The criteria for merging is more stringent than LRO, in particular,
we require all fields in the IP header to be identical except for
the length, ID and checksum. In addition, the ID must form an
arithmetic sequence with a difference of one.
The ID requirement might seem overly strict, however, most hardware
TSO solutions already obey this rule. Linux itself also obeys this
whether GSO is in use or not.
In future we could relax this rule by storing the IDs (or rather
making sure that we don't drop them when pulling the aggregate
skb's tail).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
This is pretty similar to LRO except that this is protocol-independent.
Instead of holding packets in an lro_mgr structure, they're now held in
napi_struct.
For drivers that intend to use this, they can set the NETIF_F_GRO bit and
call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
The latter will call napi_receive_skb automatically. When napi_gro_receive
is used, the driver must either call napi_complete/napi_rx_complete, or
call napi_gro_flush in softirq context if the driver uses the primitives
__napi_complete/__napi_rx_complete.
Protocols will set the gro_receive and gro_complete function pointers in
order to participate in this scheme.
In addition to the packet, gro_receive will get a list of currently held
packets. Each packet in the list has a same_flow field which is non-zero
if it is a potential match for the new packet. For each packet that may
match, they also have a flush field which is non-zero if the held packet
must not be merged with the new packet.
Once gro_receive has determined that the new skb matches a held packet,
the held packet may be processed immediately if the new skb cannot be
merged with it. In this case gro_receive should return the pointer to
the existing skb in gro_list. Otherwise the new skb should be merged into
the existing packet and NULL should be returned, unless the new skb makes
it impossible for any further merges to be made (e.g., FIN packet) where
the merged skb should be returned.
Whenever the skb is merged into an existing entry, the gro_receive
function should set NAPI_GRO_CB(skb)->same_flow. Note that if an skb
merely matches an existing entry but can't be merged with it, then
this shouldn't be set.
If gro_receive finds it pointless to hold the new skb for future merging,
it should set NAPI_GRO_CB(skb)->flush.
Held packets will be flushed by napi_gro_flush which is called by
napi_complete and napi_rx_complete.
Currently held packets are stored in a singly liked list just like LRO.
The list is limited to a maximum of 8 entries. In future, this may be
expanded to use a hash table to allow more flows to be held for merging.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows GSO to handle frag_list in a limited way for the
purposes of allowing packets merged by GRO to be refragmented on
output.
Most hardware won't (and aren't expected to) support handling GRO
frag_list packets directly. Therefore we will perform GSO in
software for those cases.
However, for drivers that can support it (such as virtual NICs) we
may not have to segment the packets at all.
Whether the added overhead of GRO/GSO is worthwhile for bridges
and routers when weighed against the benefit of potentially
increasing the MTU within the host is still an open question.
However, for the case of host nodes this is undoubtedly a win.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds limited support for handling frag_list packets in
skb_segment. The intention is to support GRO (Generic Receive Offload)
packets which will be constructed by chaining normal packets using
frag_list.
As such we require all frag_list members terminate on exact MSS
boundaries. This is checked using BUG_ON.
As there should only be one producer in the kernel of such packets,
namely GRO, this requirement should not be difficult to maintain.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
Phonet: keep TX queue disabled when the device is off
SCHED: netem: Correct documentation comment in code.
netfilter: update rwlock initialization for nat_table
netlabel: Compiler warning and NULL pointer dereference fix
e1000e: fix double release of mutex
IA64: HP_SIMETH needs to depend upon NET
netpoll: fix race on poll_list resulting in garbage entry
ipv6: silence log messages for locally generated multicast
sungem: improve ethtool output with internal pcs and serdes
tcp: tcp_vegas cong avoid fix
sungem: Make PCS PHY support partially work again.
dcbml_setnumtcs wasn't checking for the presence of the setnumtcs
function. Instead, it was checking for setstate which was a bug.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netem simulator is no longer limited by Linux timer resolution HZ.
Not since Patrick McHardy changed the QoS system to use hrtimer.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit e099a17357
(netfilter: netns nat: per-netns NAT table) renamed the
nat_table from __nat_table to nat_table without updating the
__RW_LOCK_UNLOCKED(__nat_table.lock).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes unneeded member (skbuff) from
ieee80211_ibss_add_sta() method in its declaration (in ieee80211_i.h)
and its callers (in rx.c and mlme.c)
This patch removes unneeded member from struct ieee80211_rx_data
in ieee80211_i.h.
(Originally posted as two patches. -- JWL)
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
No users, so no reason to have it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
AP mode operations are seriously affected if mac80211 runs through a
multi-second scan while the AP is trying to send Beacon frames on the
operation channel. While this could be implemented in a way that does
not cause too many problems, it is not very simple and will require
synchronization with Beacon frame scheduling in the drivers (scan one
channel at a time between Beacon frames). Furthermore, such scanning
takes quite a bit longer time and existing userspace applications
would be likely to timeout while waiting for the results.
For now, just refuse requests for new scans (SIOCSIWSCAN) when in AP
mode. In practice, this moves the rejection from iwl* drivers into
mac80211 to make it apply to every mac80211-based driver.
This issue shows up in associated stations getting disconnected when
something (e.g., Network Manager) requests a scan while the interface
is in AP mode. When doing this continuously (e.g., NM does it every 120
seconds), the network gets close to useless.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch replaces the newly introduced sta_notify_ps function,
which can be used to notify the driver about every power state
transition for all associated stations, by integrating its functionality
back into the original sta_notify callback.
Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make sure sparse checks endianness when run on mac80211/cfg80211.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's no driver that actually does fragmentation on the
device, and the callback is buggy (when it returns an error,
mac80211's fragmentation status is changed so reading the
frag threshold from userspace reads the new value despite
the error). Let's just remove it, if we really find some
hardware supporting it we can add it back later.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers will support this, obviously, but this forces them to
set it up properly.
(This includes the fix posted as "mac80211: fix ifmodes check" and
tested in wireless-testing by Hin-Tak and others. -- JWL)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix two small bugs with HT frequency setting:
* HT is accepted even when the driver is incapable
* HT40 is accepted when the driver cannot do 40 MHz
(both on the selected band)
Also simplify the code a little.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ieee80211_hw_config can return an error when the hardware
has rfkill enabled. A WARN_ON() is too harsh for this
failure as it is a valid scenario. Only comment this warning
as we would like to have it back when rfkill is integrated into
mac80211.
Also reintroduce propagation of error if ieee80211_hw_config fails
in ieee80211_config_beacon.
This patch partially reverts patch:
5f0387fc3337ca26f0745f945f550f0c3734960f
"mac80211: clean up ieee80211_hw_config errors"
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fix the two compiler warnings show below. Thanks to Geert Uytterhoeven for
finding and reporting the problem.
net/netlabel/netlabel_unlabeled.c:567: warning: 'entry' may be used
uninitialized in this function
net/netlabel/netlabel_unlabeled.c:629: warning: 'entry' may be used
uninitialized in this function
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch "don't call nf_log_packet in NFLOG module" make xt_NFLOG
dependant of nfnetlink_log. This patch forces the dependencies to fix
compilation in case only xt_NFLOG compilation was asked and modifies the
help message accordingly to the change.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This last patch makes the appropriate changes to use and propagate the
network namespace where needed in IPv6 multicast forwarding code.
This consists mainly in replacing all the remaining init_net occurences
with current netns pointer retrieved from sockets, net devices or
mfc6_caches depending on the routines' contexts.
Some routines receive a new 'struct net' parameter to propagate the current
netns:
* ip6mr_get_route
* ip6mr_cache_report
* ip6mr_cache_find
* ip6mr_cache_unresolved
* mif6_add/mif6_delete
* ip6mr_mfc_add/ip6mr_mfc_delete
* ip6mr_reg_vif
All the IPv6 multicast forwarding variables moved to struct netns_ipv6 by
the previous patches are now referenced in the correct namespace.
Changelog:
==========
* Take into account the net associated to mfc6_cache when matching entries in
mfc_unres_queue list.
* Call mroute_clean_tables() in ip6mr_net_exit() to free memory allocated
per-namespace.
* Call dev_net_set() in ip6mr_reg_vif() to initialize dev->nd_net
correctly.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Declare IPv6 multicast forwarding /proc/net entries per-namespace:
/proc/net/ip6_mr_vif
/proc/net/ip6_mr_cache
Changelog
=========
V2:
* In routine ipmr_mfc_seq_idx(), only match entries belonging to current
netns in mfc_unres_queue list.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare variable 'reg_vif_num' per-namespace, moves into struct netns_ipv6.
At the moment, this variable is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare IPv6 multicast forwarding variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv6.
At the moment, these variables are only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Declare variable cache_resolve_queue_len per-namespace: moves it into
struct netns_ipv6.
This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ip6mr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc6_cache.
Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.
At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Dynamically allocates IPv6 multicast forwarding cache, mfc6_cache_array,
and moves it to struct netns_ipv6.
At the moment, mfc6_cache_array is only referenced in init_net.
Replace 'ARRAY_SIZE(mfc6_cache_array)' with mfc6_cache_array size: MFC6_LINES.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch stores into struct mfc6_cache the network namespace each
mfc6_cache belongs to. The new member is mfc6_net.
mfc6_net is assigned at cache allocation and doesn't change during
the rest of the cache entry life.
This will help to retrieve the current netns around the IPv6 multicast
forwarding code.
At the moment, all mfc6_cache are allocated in init_net.
Changelog:
==========
* Use write_pnet()/read_pnet() to set and get mfc6_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Dynamically allocates interface table vif6_table and moves it to
struct netns_ipv6, and updates MIF_EXISTS() macro.
At the moment, vif6_table is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Preliminary work to make IPv6 multicast forwarding netns-aware.
Make IPv6 multicast forwarding mroute6_socket per-namespace,
moves it into struct netns_ipv6.
At the moment, mroute6_socket is only referenced in init_net.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few months back a race was discused between the netpoll napi service
path, and the fast path through net_rx_action:
http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
A patch was submitted for that bug, but I think we missed a case.
Consider the following scenario:
INITIAL STATE
CPU0 has one napi_struct A on its poll_list
CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
napi_struct A that CPU0 has on its list
CPU0 CPU1
net_rx_action poll_napi
!list_empty (returns true) locks poll_lock for A
poll_one_napi
napi->poll
netif_rx_complete
__napi_complete
(removes A from poll_list)
list_entry(list->next)
In the above scenario, net_rx_action assumes that the per-cpu poll_list is
exclusive to that cpu. netpoll of course violates that, and because the netpoll
path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
list at the top of the while loop in net_rx_action, but have it become empty by
the time it calls list_entry. Since the poll_list isn't surrounded by any other
structure, the returned data from that list_entry call in this situation is
garbage, and any number of crashes can result based on what exactly that garbage
is.
Given that its not fasible for performance reasons to place exclusive locks
arround each cpus poll list to provide that mutal exclusion, I think the best
solution is modify the netpoll path in such a way that we continue to guarantee
that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've
implemented the patch below. It adds an additional bit to the state field in
the napi_struct. When executing napi->poll from the netpoll_path, this bit will
be set. When a driver calls netif_rx_complete, if that bit is set, it will not
remove the napi_struct from the poll_list. That work will be saved for the next
iteration of net_rx_action.
I've tested this and it seems to work well. About the biggest drawback I can
see to it is the fact that it might result in an extra loop through
net_rx_action in the event that the device is actually contended for (i.e. the
netpoll path actually preforms all the needed work no the device, and the call
to net_rx_action winds up doing nothing, except removing the napi_struct from
the poll_list. However I think this is probably a small price to pay, given
that the alternative is a crash.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can skip WARN_ON() in htb_dequeue_tree() because there should be
always a similar warning from htb_lookup_leaf() earlier.
The first WARN_ON() in in htb_lookup_leaf() is changed to BUG_ON()
because most likly this should end with oops anyway.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
htb_id_find_next_upper() is usually called to find a class with next
id after some previously removed class, so let's move a check for
equality to the end: it's the least likely here.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes minor annoyance during transmission of unsolicited
neighbor advertisements from userspace to multicast addresses (as
far as I can see in RFC, this is allowed and the similar functionality
for IPv4 has been in arping for a long time).
Outgoing multicast packets get reinserted into local processing as if they
are received from the network. The machine thus sees its own NA and fills
the logs with error messages. This patch removes the message if NA has been
generated locally.
Signed-off-by: Jan Sembera <jsembera@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Stephen Rothwell points out, we don't want 'sock' here but
rather we really do want 'sk'.
This local var is protected by all sorts of bluetooth debugging
kconfig vars, but BT_DBG() is just a straight pr_debug() call
which is unconditional.
pr_debug() evaluates it's args only if either DEBUG or
CONFIG_DYNAMIC_PRINTK_DEBUG is defined.
Solving this inside of the BT_DBG() macro is non-trivial since
it's varargs. And these ifdefs are ugly.
So, just mark this 'sk' thing __maybe_unused and kill the ifdefs.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses a book-keeping issue in tcp_vegas.c. At present
tcp_vegas does separate book-keeping of cwnd based on packet sequence
numbers. A mismatch can develop between this book-keeping and
tp->snd_cwnd due, for example, to delayed acks acking multiple
packets. When vegas transitions to reno operation (e.g. following
loss), then this mismatch leads to incorrect behaviour (akin to a cwnd
backoff). This seems mostly to affect operation at low cwnds where
delayed acking can lead to a significant fraction of cwnd being
covered by a single ack, leading to the book-keeping mismatch. This
patch modifies the congestion avoidance update to avoid the need for
separate book-keeping while leaving vegas congestion avoidance
functionally unchanged. A secondary advantage of this modification is
that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic
is no longer necessary, simplifying the code.
Some example test measurements with the patched code (confirming no functional
change in the congestion avoidance algorithm) can be seen at:
http://www.hamilton.ie/doug/vegaspatch/
Signed-off-by: Doug Leith <doug.leith@nuim.ie>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
tproxy: fixe a possible read from an invalid location in the socket match
zd1211rw: use unaligned safe memcmp() in-place of compare_ether_addr()
mac80211: use unaligned safe memcmp() in-place of compare_ether_addr()
ipw2200: fix netif_*_queue() removal regression
iwlwifi: clean key table in iwl_clear_stations_table function
tcp: tcp_vegas ssthresh bug fix
can: omit received RTR frames for single ID filter lists
ATM: CVE-2008-5079: duplicate listen() on socket corrupts the vcc table
netx-eth: initialize per device spinlock
tcp: make urg+gso work for real this time
enc28j60: Fix sporadic packet loss (corrected again)
hysdn: fix writing outside the field on 64 bits
b1isa: fix b1isa_exit() to really remove registered capi controllers
can: Fix CAN_(EFF|RTR)_FLAG handling in can_filter
Phonet: do not dump addresses from other namespaces
netlabel: Fix a potential NULL pointer dereference
bnx2: Add workaround to handle missed MSI.
xfrm: Fix kernel panic when flush and dump SPD entries
This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, as it now is handled fully dynamically via feature negotiation
(i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
RFC 4341, 4.).
Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type
if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */
The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.
The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.
There was a FIXME to change the error code when dccp_ackvec_add() fails.
I removed this after finding out that:
* the check whether ackno < ISN is already made earlier,
* this Response is likely the 1st packet with an Ackno that the client gets,
* so when dccp_ackvec_add() fails, the reason is likely not a packet error.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updating the NDP count feature is handled automatically now:
* for CCID-2 it is disabled, since the code does not use NDP counts;
* for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.
Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).
This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.
At startup the initialisation of the NDP count feature uses the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.
The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs.
Also removed are the constructors for the TX CCID and the RX CCID, since the
switch "rx <-> non-rx" is done by the handler in minisocks.c (and the handler
is the only place in the code where CCIDs are loaded).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code removed by this patch is no longer referenced or used, the added
lines update documentation and copyrights.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This integrates feature-activation in the client:
1. When dccp_parse_options() fails, the reset code is already set; request_sent\
_state_process() currently overrides this with `Packet Error', which is not
intended - changed to use the reset code supplied by dccp_parse_options().
2. When feature negotiation fails, the socket should be marked as not usable,
so that the application is notified that an error occurred. This is achieved
by a new label 'unable_to_proceed': generating an error code of `Aborted',
setting the socket state to CLOSED, returning with ECOMM in sk_err.
3. Avoids parsing the Ack twice in Respond state by not doing option processing
again in dccp_rcv_respond_partopen_state_process (as option processing has
already been done on the request_sock in dccp_check_req).
Since this addresses congestion-control initialisation, a corresponding
FIXME has been removed.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch integrates the activation of features at the end of negotiation
into the server-side code.
Note regarding the removal of 'const':
--------------------------------------
The 'const' attribute has been removed from 'dreq' since dccp_activate_values()
needs to operate on dreq's feature list. Part of the activation is to remove
those options from the list that have already been confirmed, hence it is not
purely read-only.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This first patch out of three replaces the hardcoded default settings with
initialisation code for the dynamic feature negotiation.
The patch also ensures that the client feature-negotiation queue is flushed
only when entering the OPEN state.
Since confirmed Change options are removed as soon as they are confirmed
(in the DCCP-Response), this ensures that Confirm options are retransmitted.
Note on retransmitting Confirm options:
---------------------------------------
Implementation experience showed that it is necessary to retransmit Confirm
options. Thanks to Leandro Melo de Sales who reported a bug in an earlier
revision of the patch set, resulting from not retransmitting these options.
As long as the client is in PARTOPEN, it needs to retransmit the Confirm
options for the Change options received on the DCCP-Response from the server.
Otherwise, if the packet containing the Confirm options gets dropped in the
network, the connection aborts due to undefined feature negotiation state.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last shoot of this series.
After I removing all directly reference of netdev->priv, I am killing
"priv" of "struct net_device" and fixing relative comments/docs.
Anyone will not be allowed to reference netdev->priv directly.
If you want to reference the memory of private data, use netdev_priv()
instead.
If the private data is not allocted when alloc_netdev(), use
netdev->ml_priv to point that memory after you creating that private
data.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIME_WAIT sockets need to be handled specially, and the socket match
casted inet_timewait_sock instances to inet_sock, which are not
compatible.
Handle this special case by checking sk->sk_state.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrect use of loose in wext.c
It should be 'lose', not 'loose'.
Signed-off-by: Nick Andrew <nick@nick-andrew.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since jiffies is unsigned long, the types get expanded into
that and after long enough time the difference will therefore
always be > 1 (and that probably happens near boot as well as
iirc the first jiffies wrap is scheduler close after boot to
find out problems related to that early).
This was originally noted by Bill Fink in Dec'07 but nobody
never ended fixing it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_minshall_update is not significant difference since it only
checks for not full-sized skb which is BUG'ed on the push_one
path anyway.
tcp_snd_test is tcp_nagle_test+tcp_cwnd_test+tcp_snd_wnd_test,
just the order changed slightly.
net/ipv4/tcp_output.c:
tcp_snd_test | -89
tcp_mss_split_point | -91
tcp_may_send_now | +53
tcp_cwnd_validate | -98
tso_fragment | -239
__tcp_push_pending_frames | -1340
tcp_push_one | -146
7 functions changed, 53 bytes added, 2003 bytes removed, diff: -1950
net/ipv4/tcp_output.c:
tcp_write_xmit | +1772
1 function changed, 1772 bytes added, diff: +1772
tcp_output.o.new:
8 functions changed, 1825 bytes added, 2003 bytes removed, diff: -178
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>