This also makes ndisc_opt_addr_data() and ndisc_fill_addr_option()
use ndisc_opt_addr_space().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM_REPLAY_SEQ, XFRM_REPLAY_OSEQ and XFRM_REPLAY_SEQ_MASK
were introduced years ago but actually never used.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
ipv6_addr_is_{multicast,ll_all_nodes,ll_all_routers,isatap}()
return boolean.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be able to implement NS response offloading (in
regular operation or while in WoWLAN) drivers need
to know the IPv6 addresses assigned to interfaces.
Implement an IPv6 notifier in mac80211 to call the
driver when addresses change.
Unlike for IPv4, implement it as a callback rather
than as a list in the BSS configuration, that is
more flexible.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Depending on the driver, having ARP filtering for
some addresses may be possible. Remove the logic
that tracks whether ARP filter is enabled or not
and give the driver the total number of addresses
instead of the length of the list so it can make
its own decision.
Reviewed-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Because of rt->n removal, we do not need neigh argument any more.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are hardwares which support offload of data packets
for example when auto ARP is enabled the hw will send
the ARP response. In such cases if WEP encryption is
configured the hw must know the default WEP key in order
to encrypt the packets correctly.
When hw_accel is enabled and encryption type is set to WEP,
the driver should get the default key index from mac80211.
Signed-off-by: Yoni Divinsky <yoni.divinsky@ti.com>
[cleanups, fixes, documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The hci_request function is blocking and cannot be called through the
usual per-HCI device workqueue (hdev->workqueue). While hci_request is
in progress any other work from the queue, including sending HCI
commands to the controller would be blocked and eventually cause the
hci_request call to time out.
This patch adds a second workqueue to be used by operations needing
hci_request and thereby avoiding issues with blocking other workqueue
users.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RTF_GATEWAY route, return rt->rt6i_gateway.
Otherwise, return 2nd argument (destination address).
This will be used by following patches which remove rt->n
dependency patches in ip6_dst_lookup_tail() and ip6_finish_output2().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function, which looks up neighbour entry for an IPv6 address
without touching refcnt, will be used for patches to remove
dependency on rt->n (neighbour entry in rt6_info).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to set/clear labels assigned to a conntrack
via ctnetlink.
To allow userspace to only alter specific bits, Pablo suggested to add
a new CTA_LABELS_MASK attribute:
The new set of active labels is then determined via
active = (active & ~mask) ^ changeset
i.e., the mask selects those bits in the existing set that should be
changed.
This follows the same method already used by MARK and CONNMARK targets.
Omitting CTA_LABELS_MASK is the same as setting all bits in CTA_LABELS_MASK
to 1: The existing set is replaced by the one from userspace.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.
Up to 128 labels are supported. Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.
Mapping of bit-identifier to label name is done in userspace.
The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix the 64bit optimized version of ipv6_prefix_equal to convert the
bitmask to network byte order only after the bit-shift.
The bug was introduced in:
3867517 ipv6: 64bit version of ipv6_prefix_equal().
Signed-off-by: Fabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the amount of memory usage limits for incomplete
IP fragments.
Arguing for new thresh high/low values:
High threshold = 4 MBytes
Low threshold = 3 MBytes
The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.
We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).
For IPv4, how much memory does the largest frag consume.
Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200. A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)
(44*2944)+200 = 129736 bytes
The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.
How many 64K fragment can we fit into 4 MBytes:
4*2^20/((44*2944)+200) = 32.34 fragment in queues
An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.
How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.
Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.
4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length
An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).
4*2^20/(896+200) = 3827 frag queues -> 59 avg list length
When increasing these number, we also need to followup with
improvements, that is going to help scalability. Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While a privileged program can open a raw socket, attach some
restrictive filter and drop its privileges (or send the socket to an
unprivileged program through some Unix socket), the filter can still
be removed or modified by the unprivileged program. This commit adds a
socket option to lock the filter (SO_LOCK_FILTER) preventing any
modification of a socket filter program.
This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even
root is not allowed change/drop the filter.
The state of the lock can be read with getsockopt(). No error is
triggered if the state is not changed. -EPERM is returned when a user
tries to remove the lock or to change/remove the filter while the lock
is active. The check is done directly in sk_attach_filter() and
sk_detach_filter() and does not affect only setsockopt() syscall.
Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3e4e4c1f ("ipv6: Introduce ip6_flow_hdr() to fill version,
tclass and flowlabel.) uses ntohl(), which should be htonl().
Found by Fengguang Wu <fengguang.wu@intel.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To ease further DFS development regarding interface combinations, use
the interface combinations structure to test for radar capabilities.
Drivers can specify which channel widths they support, and in which
modes. Right now only a single AP interface is allowed, but as the
DFS code evolves other combinations can be enabled.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The NL80211_ATTR_USE_MFP attribute was originally added for
NL80211_CMD_ASSOCIATE, but it is actually as useful (if not even more
useful) with NL80211_CMD_CONNECT, so process that attribute with the
connect command, too.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As per email discussion Jouni Malinen pointed out that:
"P2P message exchanges can be executed on the current operating channel
of any operation (both P2P and non-P2P station). These can be on 5 GHz
and even on 60 GHz (so yes, you _can_ do GO Negotiation on 60 GHz).
As an example, it would be possible to receive a GO Negotiation Request
frame on a 5 GHz only radio and then to complete GO Negotiation on that
band. This can happen both when connected to a P2P group (through client
discoverability mechanism) and when connected to a legacy AP (assuming
the station receive Probe Request frame from full scan in the beginning
of P2P device discovery)."
This means that P2P messages can be sent over different radio devices.
However, these should use the same P2P device address so it should be
able to provision this from user-space. This patch adds a parameter for
this to struct vif_params which should only be used during creation of
the P2P device interface.
Cc: Jouni Malinen <j@w1.fi>
Cc: Greg Goldman <ggoldman@broadcom.com>
Cc: Jithu Jance <jithu@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
[add error checking]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the nl80211_mesh_power_mode enumeration which holds possible
values for the mesh power mode. These modes are unknown, active,
light sleep and deep sleep.
Add power_mode entry to the mesh config structure to hold the
user-configured default mesh power mode. This value will be used
for new peer links.
Add the dot11MeshAwakeWindowDuration value to the mesh config.
The awake window is a duration in TU describing how long the STA
will stay awake after transmitting its beacon in PS mode.
Add access routines to:
- get/set local link-specific power mode (STA)
- get remote STA's link-specific power mode (STA)
- get remote STA's non-peer power mode (STA)
- get/set default mesh power mode (mesh config)
- get/set mesh awake window duration (mesh config)
All config changes may be done at mesh runtime and take effect
immediately.
Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Ivan Bezyazychnyy <ivan.bezyazychnyy@gmail.com>
Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com>
[fix commit message line length, error handling in set station]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move the default mesh beacon interval and DTIM period to cfg80211
and make them accessible to nl80211. This enables setting both
values when joining an MBSS.
Previously the DTIM parameter was not set by mac80211 so the
driver's default value was used.
Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the driver's resume function can't completely
restore the configuration in the device, it returns
1 from the callback which will be treated like a HW
restart request, but done directly.
In this case, also call the driver's restart_complete()
function so it can finish the reconfiguration there.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When building the 80211 DocBook, scripts/kernel-doc reports
the following type of warnings:
Warning(include/net/cfg80211.h:334): No description found for return value of 'cfg80211_get_chandef_type'
These warnings are only reported when scripts/kernel-doc
runs in verbose mode.
To fix these use "Return:" to describe function return values.
Signed-off-by: Yacine Belkadi <yacine.belkadi.1@gmail.com>
[adjust for freq_reg_info() change]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Conflicts:
Documentation/networking/ip-sysctl.txt
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
Both conflicts were simply overlapping context.
A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
and struct net pointer is not provided in the call chain. His original
patch made use of current->nsproxy->net_ns to find the network namespace,
but this fails to work correctly for userspace code that makes use of
netlink sockets in different network namespaces. Instead, pass the
"struct net *" down along the call chain to where it is needed.
This version removes the ifb changes as Eric has submitted that patch
separately, but is otherwise identical to the previous version.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user is cxgb3 driver.
old_neigh is used to check device change, but it must not happen
on redirect. In this sense, we can remove old_neigh argument.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_prefix_equal() just casts its arguments and it is the only
user of __ipv6_prefix_equal().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce __ipv6_addr_diff64() to to find the first different
bit between two addresses on 64bit architectures.
32bit version is still available as __ipv6_addr_diff32(),
and __ipv6_addr_diff() automatically selects appropriate
version.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reg_notifier()'s return value need not be checked
as it is only supposed to do post regulatory work and
that should never fail. Any behaviour to regulatory
that needs to be considered before cfg80211 does work
to a driver should be specified by using the already
existing flags, the reg_notifier() just does post
processing should it find it needs to.
Also make lbs_reg_notifier static.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[move lbs_reg_notifier to not break compile]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move generalized version of ipv6_is_mld() to header,
and use it from ip6_mc_input().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
canqun zhang reported that we're hitting BUG_ON in the
nf_conntrack_destroy path when calling kfree_skb while
rmmod'ing the nf_conntrack module.
Currently, the nf_ct_destroy hook is being set to NULL in the
destroy path of conntrack.init_net. However, this is a problem
since init_net may be destroyed before any other existing netns
(we cannot assume any specific ordering while releasing existing
netns according to what I read in recent emails).
Thanks to Gao feng for initial patch to address this issue.
Reported-by: canqun zhang <canqunzhang@gmail.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Do not convert endian back and forth.
If the caller uses contant "mask" argument (and most callers do),
we can omit runtime endian conversion here.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each NFC adapter can have several links to different secure elements and
that property needs to be exported by the drivers.
A secure element link can be enabled and disabled, and card emulation will
be handled by the currently active one. Otherwise card emulation will be
host implemented.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some chips diverge from the HCI spec in their implementation of standard
features. This adds a new quirks parameter to
nfc_hci_allocate_device() to let the driver indicate its divergence.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When an adapter is removed, it will unregister itself from hci and/or
nfc core. In order to do that safely, work tasks must first be canceled
and prevented to be scheduled again, before the hci or nfc device can be
destroyed.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Using bit operations solves problems with multiple requests
and clearing state.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This patch removes srej_queue_next from include/net/bluetooth/l2cap.h as it
is not used.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
As suggested by David, udp6_csum_init() is too big to be inlined,
move it to ipv6 static library, net/ipv6/ip6_checksum.c.
And the generic csum_ipv6_magic() too.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
namespace aware. The reason behind this patch is to ease the testing
of ecn problems on the internet and allows applications to tune their
own use of ecn.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TX aggregation is stopped, there are a few
different cases:
- connection with the peer was dropped
- session stop was requested locally
- session stop was requested by the peer
- connection was dropped while a session is stopping
The behaviour in these cases should be different, if
the connection is dropped then the driver should drop
all frames, otherwise the frames may continue to be
transmitted, aggregated in the case of a locally
requested session stop or unaggregated in the case of
the peer requesting session stop.
Split these different cases so that the driver can
act accordingly; however, treat local and remote stop
the same way and ask the driver to not send frames as
aggregated packets any more.
In the case of connection drop, the stop callback the
driver is otherwise supposed to call is no longer
required.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
During suspend/resume channel contexts might be
iterated even if they haven't been re-added to
the driver, keep track of this and skip them in
iteration. Also use the new status for sanity
checks.
Also clarify the fact that during HW restart all
contexts are iterated over (thanks Eliad.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of returning an error and filling a pointer
return the pointer and an ERR_PTR value in error cases.
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will allow making freq_reg_info() lock-free.
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To simplify the locking and not require cfg80211_mutex
(which nl80211 uses to access the global regdomain) and
also to make it possible for drivers to access their
wiphy->regd safely, use RCU to protect these pointers.
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The channel bandwidth handling isn't really quite right,
it assumes that a 40 MHz channel is really two 20 MHz
channels, which isn't strictly true. This is the way the
regulatory database handling is defined right now though
so remove the logic to handle other channel widths.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
The following batch contains Netfilter fixes for 3.8-rc1. They are
a mixture of old bugs that have passed unnoticed (I'll pass these to
stable) and more fresh ones from the previous merge window, they are:
* Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd
showing up wrong address, from Bob Hockney.
* Fix a comment in nf_conntrack_ipv6, from Florent Fourcot.
* Fix a leak an error path in ctnetlink while creating an expectation,
from Jesper Juhl.
* Fix missing ICMP time exceeded in the IPv6 defragmentation code, from
Haibo Xi.
* Fix inconsistent handling of routing changes in MASQUERADE for the
new connections case, from Andrew Collins.
* Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to
crashes in the ixgbe driver (since it seems to access the transport
header with TSO enabled), from Mukund Jampala.
* Recover obsoleted NOTRACK target by including it into the CT and spot
a warning via printk about being obsoleted. Many people don't check the
scheduled to be removal file under Documentation, so we follow some
less agressive approach to kill this in a year or so. Spotted by Florian
Westphal, patch from myself.
* Fix race condition in xt_hashlimit that allows to create two or more
entries, from myself.
* Fix crash if the CT is used due to the recently added facilities to
consult the dying and unconfirmed conntrack lists, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.
That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt
What: xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When: April 2011
Why: Superseded by xt_CT
Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.
Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.
As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking fixes from David Miller:
1) Really fix tuntap SKB use after free bug, from Eric Dumazet.
2) Adjust SKB data pointer to point past the transport header before
calling icmpv6_notify() so that the headers are in the state which
that function expects. From Duan Jiong.
3) Fix ambiguities in the new tuntap multi-queue APIs. From Jason
Wang.
4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.
5) Don't destroy mutex after freeing up device private in mac802154,
fix also from Konstantin Khlebnikov.
6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.
7) SCTP HMAC kconfig rework, from Neil Horman.
8) Fix SCTP jprobes function signature, otherwise things explode, from
Daniel Borkmann.
9) Fix typo in ipv6-offload Makefile variable reference, from Simon
Arlott.
10) Don't fail USBNET open just because remote wakeup isn't supported,
from Oliver Neukum.
11) be2net driver bug fixes from Sathya Perla.
12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
Woodhouse.
13) Fix MTU changing regression in 8139cp driver, from John Greene.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
solos-pci: ensure all TX packets are aligned to 4 bytes
solos-pci: add firmware upgrade support for new models
solos-pci: remove superfluous debug output
solos-pci: add GPIO support for newer versions on Geos board
8139cp: Prevent dev_close/cp_interrupt race on MTU change
net: qmi_wwan: add ZTE MF880
drivers/net: Use of_match_ptr() macro in smsc911x.c
drivers/net: Use of_match_ptr() macro in smc91x.c
ipv6: addrconf.c: remove unnecessary "if"
bridge: Correctly encode addresses when dumping mdb entries
bridge: Do not unregister all PF_BRIDGE rtnl operations
use generic usbnet_manage_power()
usbnet: generic manage_power()
usbnet: handle PM failure gracefully
ksz884x: fix receive polling race condition
qlcnic: update driver version
qlcnic: fix unused variable warnings
net: fec: forbid FEC_PTP on SoCs that do not support
be2net: fix wrong frag_idx reported by RX CQ
be2net: fix be_close() to ensure all events are ack'ed
...
Pull user namespace changes from Eric Biederman:
"While small this set of changes is very significant with respect to
containers in general and user namespaces in particular. The user
space interface is now complete.
This set of changes adds support for unprivileged users to create user
namespaces and as a user namespace root to create other namespaces.
The tyranny of supporting suid root preventing unprivileged users from
using cool new kernel features is broken.
This set of changes completes the work on setns, adding support for
the pid, user, mount namespaces.
This set of changes includes a bunch of basic pid namespace
cleanups/simplifications. Of particular significance is the rework of
the pid namespace cleanup so it no longer requires sending out
tendrils into all kinds of unexpected cleanup paths for operation. At
least one case of broken error handling is fixed by this cleanup.
The files under /proc/<pid>/ns/ have been converted from regular files
to magic symlinks which prevents incorrect caching by the VFS,
ensuring the files always refer to the namespace the process is
currently using and ensuring that the ptrace_mayaccess permission
checks are always applied.
The files under /proc/<pid>/ns/ have been given stable inode numbers
so it is now possible to see if different processes share the same
namespaces.
Through the David Miller's net tree are changes to relax many of the
permission checks in the networking stack to allowing the user
namespace root to usefully use the networking stack. Similar changes
for the mount namespace and the pid namespace are coming through my
tree.
Two small changes to add user namespace support were commited here adn
in David Miller's -net tree so that I could complete the work on the
/proc/<pid>/ns/ files in this tree.
Work remains to make it safe to build user namespaces and 9p, afs,
ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
Kconfig guard remains in place preventing that user namespaces from
being built when any of those filesystems are enabled.
Future design work remains to allow root users outside of the initial
user namespace to mount more than just /proc and /sys."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
proc: Usable inode numbers for the namespace file descriptors.
proc: Fix the namespace inode permission checks.
proc: Generalize proc inode allocation
userns: Allow unprivilged mounts of proc and sysfs
userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
procfs: Print task uids and gids in the userns that opened the proc file
userns: Implement unshare of the user namespace
userns: Implent proc namespace operations
userns: Kill task_user_ns
userns: Make create_new_namespaces take a user_ns parameter
userns: Allow unprivileged use of setns.
userns: Allow unprivileged users to create new namespaces
userns: Allow setting a userns mapping to your current uid.
userns: Allow chown and setgid preservation
userns: Allow unprivileged users to create user namespaces.
userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
userns: fix return value on mntns_install() failure
vfs: Allow unprivileged manipulation of the mount namespace.
vfs: Only support slave subtrees across different user namespaces
vfs: Add a user namespace reference from struct mnt_namespace
...
In (d871bef netfilter: ctnetlink: dump entries from the dying and
unconfirmed lists), we assume that all conntrack objects are
inserted in any of the existing lists. However, template conntrack
objects were not. This results in hitting BUG_ON in the
destroy_conntrack path while removing a rule that uses the CT target.
This patch fixes the situation by adding the template lists, which
is where template conntrack objects reside now.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:
unreferenced object 0xffff88022e8a92c0 (size 1592):
comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
hex dump (first 32 bytes):
0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................
02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
[<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
[<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
[<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
[<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
[<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
[<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
[<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
[<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
[<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
[<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
[<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
[<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
[<ffffffff814cee68>] ip_rcv+0x217/0x267
[<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
[<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82
This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.
This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...
Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.
Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().
A similar approach is taken for dccp by calling dccp_done().
This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function ndisc_redirect_rcv(), the skb->data points to the transport
header, but function icmpv6_notify() need the skb->data points to the
inner IP packet. So before using icmpv6_notify() to propagate redirect,
change skb->data to point the inner IP packet that triggered the sending
of the Redirect, and introduce struct rd_msg to make it easy.
Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
Pull networking changes from David Miller:
1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.
2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.
3) Networking user namespace support from Eric W. Biederman.
4) tuntap/virtio-net multiqueue support by Jason Wang.
5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.
6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.
7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.
8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.
9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.
10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.
11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.
12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.
13) Add "oops_only" mode to netconsole, from Amerigo Wang.
14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.
15) Support PTP in the Tigon3 driver, from Matt Carlson.
16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.
17) Support per-association statistics in SCTP, from Michele
Baldessari.
And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...
These symbols were exported for bonding device by commit 305d552a
("bonding: send IPv6 neighbor advertisement on failover").
It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle
NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by
commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'").
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup changes from Tejun Heo:
"A lot of activities on cgroup side. The big changes are focused on
making cgroup hierarchy handling saner.
- cgroup_rmdir() had peculiar semantics - it allowed cgroup
destruction to be vetoed by individual controllers and tried to
drain refcnt synchronously. The vetoing never worked properly and
caused good deal of contortions in cgroup. memcg was the last
reamining user. Michal Hocko removed the usage and cgroup_rmdir()
path has been simplified significantly. This was done in a
separate branch so that the memcg people can base further memcg
changes on top.
- The above allowed cleaning up cgroup lifecycle management and
implementation of generic cgroup iterators which are used to
improve hierarchy support.
- cgroup_freezer updated to allow migration in and out of a frozen
cgroup and handle hierarchy. If a cgroup is frozen, all descendant
cgroups are frozen.
- netcls_cgroup and netprio_cgroup updated to handle hierarchy
properly.
- Various fixes and cleanups.
- Two merge commits. One to pull in memcg and rmdir cleanups (needed
to build iterators). The other pulled in cgroup/for-3.7-fixes for
device_cgroup fixes so that further device_cgroup patches can be
stacked on top."
Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)
* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
cgroup: update Documentation/cgroups/00-INDEX
cgroup_rm_file: don't delete the uncreated files
cgroup: remove subsystem files when remounting cgroup
cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
cgroup: warn about broken hierarchies only after css_online
cgroup: list_del_init() on removed events
cgroup: fix lockdep warning for event_control
cgroup: move list add after list head initilization
netprio_cgroup: allow nesting and inherit config on cgroup creation
netprio_cgroup: implement netprio[_set]_prio() helpers
netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
netprio_cgroup: reimplement priomap expansion
netprio_cgroup: shorten variable names in extend_netdev_table()
netprio_cgroup: simplify write_priomap()
netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
cgroup: remove obsolete guarantee from cgroup_task_migrate.
cgroup: add cgroup->id
cgroup, cpuset: remove cgroup_subsys->post_clone()
cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
...
With BQL being deployed, we can more likely have following behavior :
We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.
This shows in stats (tc -s qdisc dev eth0) as requeues.
Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.
At 1Gbps speed, a full size TSO packet is 500 us of extra latency.
In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :
- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs
This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.
This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov reported packet drops for GRE packets since GRO support
was added.
There is a race in gro_cell_poll() because we call napi_complete()
without any synchronization with a concurrent gro_cells_receive()
Once bug was triggered, we queued packets but did not schedule NAPI
poll.
We can fix this issue using the spinlock protected the napi_skbs queue,
as we have to hold it to perform skb dequeue anyway.
As we open-code skb_dequeue(), we no longer need to mask IRQS, as both
producer and consumer run under BH context.
Bug added in commit c9e6bc644e (net: add gro_cells infrastructure)
Reported-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.
Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
peer.transport_addr_list is currently only protected by sk_sock
which is inpractical to acquire for procfs dumping purposes.
This patch adds RCU protection allowing for the procfs readers to
enter RCU read-side critical sections.
Modification of the list continues to be serialized via sk_lock.
V2: Use list_del_rcu() in sctp_association_free() to be safe
Skip transports marked dead when dumping for procfs
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f0425beda4 "mac80211: retry sending
failed BAR frames later instead of tearing down aggr" caused regression
on rt2x00 hardware (connection hangs). This regression was fixed by
commit be03d4a45c "rt2x00: Don't let
mac80211 send a BAR when an AMPDU subframe fails". But the latter
commit caused yet another problem reported in
https://bugzilla.kernel.org/show_bug.cgi?id=42828#c22
After long discussion in this thread:
http://mid.gmane.org/20121018075615.GA18212@redhat.com
and testing various alternative solutions, which failed on one or other
setup, we have no other good fix for the issues like just revert both
mentioned earlier commits.
To do not affect other hardware which benefit from commit
f0425beda4, instead of reverting it,
introduce flag that when used will restore mac80211 behaviour before
the commit.
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[replaced link with mid.gmane.org that has message-id]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch advertise the MC_FORWARDING status for IPv4 and IPv6.
This field is readonly, only multicast engine in the kernel updates it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
* Remove limitation in the maximum number of supported sets in ipset.
Now ipset automagically increments the number of slots in the array
of sets by 64 new spare slots, from Jozsef Kadlecsik.
* Partially remove the generic queue infrastructure now that ip_queue
is gone. Its only client is nfnetlink_queue now, from Florian
Westphal.
* Add missing attribute policy checkings in ctnetlink, from Florian
Westphal.
* Automagically kill conntrack entries that use the wrong output
interface for the masquerading case in case of routing changes,
from Jozsef Kadlecsik.
* Two patches two improve ct object traceability. Now ct objects are
always placed in any of the existing lists. This allows us to dump
the content of unconfirmed and dying conntracks via ctnetlink as
a way to provide more instrumentation in case you suspect leaks,
from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current SCTP stack is lacking a mechanism to have per association
statistics. This is an implementation modeled after OpenSolaris'
SCTP_GET_ASSOC_STATS.
Userspace part will follow on lksctp if/when there is a general ACK on
this.
V4:
- Move ipackets++ before q->immediate.func() for consistency reasons
- Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
returning bogus RTO values
- return asoc->rto_min when max_obs_rto value has not changed
V3:
- Increase ictrlchunks in sctp_assoc_bh_rcv() as well
- Move ipackets++ to sctp_inq_push()
- return 0 when no rto updates took place since the last call
V2:
- Implement partial retrieval of stat struct to cope for future expansion
- Kill the rtxpackets counter as it cannot be precise anyway
- Rename outseqtsns to outofseqtsns to make it clearer that these are out
of sequence unexpected TSNs
- Move asoc->ipackets++ under a lock to avoid potential miscounts
- Fold asoc->opackets++ into the already existing asoc check
- Kill unneeded (q->asoc) test when increasing rtxchunks
- Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
- Don't count SHUTDOWNs as SACKs
- Move SCTP_GET_ASSOC_STATS to the private space API
- Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
future struct growth
- Move association statistics in their own struct
- Update idupchunks when we send a SACK with dup TSNs
- return min_rto in max_rto when RTO has not changed. Also return the
transport when max_rto last changed.
Signed-off: Michele Baldessari <michele@acksyn.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make code more readable by changing CONF_NO_FCS_RECV which is read
as "No L2CAP FCS option received" to CONF_RECV_NO_FCS which means
"Received L2CAP option NO_FCS". This flag really means that we have
received L2CAP FRAME CHECK SEQUENCE (FCS) OPTION with value "No FCS".
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Before starting quering remote AMP controllers make sure
that there is local active AMP controller.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Some comparisons needs to double negation(!!) in order to make the value
of the field boolean. Add it to the macro makes the code more readable.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
In order to authenticate and configure an incoming SCO connection, the
BT_DEFER_SETUP option was added. This option is intended to defer reply
to Connect Request on SCO sockets.
When a connection is requested, the listening socket is unblocked but
the effective connection setup happens only on first recv. Any send
between accept and recv fails with -ENOTCONN.
Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When the route changes (backup default route, VPNs) which affect a
masqueraded target, the packets were sent out with the outdated source
address. The patch addresses the issue by comparing the outgoing interface
directly with the masqueraded interface in the nat table.
Events are inefficient in this case, because it'd require adding route
events to the network core and then scanning the whole conntrack table
and re-checking the route for all entry.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We used to have several queueing backends, but nowadays only
nfnetlink_queue remains.
In light of this there doesn't seem to be a good reason to
support per-af registering -- just hook up nfnetlink_queue on module
load and remove it on unload.
This means that the userspace BIND/UNBIND_PF commands are now obsolete;
the kernel will ignore them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch modifies the conntrack subsystem so that all existing
allocated conntrack objects can be found in any of the following
places:
* the hash table, this is the typical place for alive conntrack objects.
* the unconfirmed list, this is the place for newly created conntrack objects
that are still traversing the stack.
* the dying list, this is where you can find conntrack objects that are dying
or that should die anytime soon (eg. once the destroy event is delivered to
the conntrackd daemon).
Thus, we make sure that we follow the track for all existing conntrack
objects. This patch, together with some extension of the ctnetlink interface
to dump the content of the dying and unconfirmed lists, will help in case
to debug suspected nf_conn object leaks.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
# make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/inet_hashtables.o
...
net/ipv4/inet_hashtables.c:242:7: warning: restricted __portpair degrades to integer
net/ipv4/inet_hashtables.c:242:7: warning: restricted __addrpair degrades to integer
...
Move __portpair/__addrpair from include/net/inet_hashtables.h
to include/net/sock.h where we need them in struct sock_common
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.
However the condition itself is repeated in several code locations.
Unify it by introducing 'ipv6_accept_ra()' accessor.
Also, simplify the condition expression, making it more readable.
No semantic change.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68835aba4d (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.
This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.
Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.
Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.
Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.
The namespace check can also be done at last.
This slightly improves TCP/UDP lookup times.
IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.
With help from Ben Hutchings & Joe Perches.
Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5e by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/exthdrs_core.c
Jesse Gross says:
====================
This series of improvements for 3.8/net-next contains four components:
* Support for modifying IPv6 headers
* Support for matching and setting skb->mark for better integration with
things like iptables
* Ability to recognize the EtherType for RARP packets
* Two small performance enhancements
The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
conflicts. I left it as is but can do the merge if you want. The conflicts
are:
* ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
exthdrs_core.c. Both should stay.
* A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
after this patch. The IPVS user has two instances of the old constant
name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a BSS struct is updated, the IEs are currently
overwritten or freed. This can lead to races if some
other CPU is accessing the BSS struct and using the
IEs concurrently.
Fix this by always allocating the IEs in a new struct
that holds the data and length and protecting access
to this new struct with RCU.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a driver supports P2P GO powersave, allow it to
set the new feature flags for it and allow userspace
to configure the parameters for it. This can be done
at GO startup and later changed with SET_BSS.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add some information that we have about VHT to radiotap.
This at least lets one see the MCS and NSS information.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some of the chandef checking that we do in cfg80211
to check if a channel is supported or not is also
needed in mac80211, so rework that a bit and export
the functions that are needed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To achieve this, limit the number of retries to
31 (instead of 255) and use the three bits that
are then free for VHT flags.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add support for reporting and calculating VHT MCSes.
Note that I'm not completely sure that the bitrate
calculations are correct, nor that they can't be
simplified.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Convert mac80211 (and where necessary, some drivers a
little bit) to the new channel definition struct.
This will allow extending mac80211 for VHT, which is
currently restricted to channel contexts since there
are no drivers using that which makes it easier. As
I also don't care about VHT for drivers not using the
channel context API, I won't convert the previous API
to VHT support.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change nl80211 to support specifying a VHT (or HT)
using the control channel frequency (as before) and
new attributes for the channel width and first and
second center frequency. The old channel type is of
course still supported for HT.
Also change the cfg80211 channel definition struct
to support these by adding the relevant fields to
it (and removing the _type field.)
This also adds new helper functions:
- cfg80211_chandef_create to create a channel def
struct given the control channel and channel type,
- cfg80211_chandef_identical to check if two channel
definitions are identical
- cfg80211_chandef_compatible to check if the given
channel definitions are compatible, and return the
wider of the two
This isn't entirely complete, but that doesn't matter
until we have a driver using it. In particular, it's
missing
- regulatory checks on the usable bandwidth (if that
even makes sense)
- regulatory TX power (database can't deal with it)
- a proper channel compatibility calculation for the
new channel types
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of passing a channel pointer and channel type
to all functions and driver methods, pass a new channel
definition struct. Right now, this struct contains just
the control channel and channel type, but for VHT this
will change.
Also, add a small inline cfg80211_get_chandef_type() so
that drivers don't need to use the _type field of the
new structure all the time, which will change.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As mwifiex (and mac80211 in the software case) are the
only drivers actually implementing remain-on-channel
with channel type, userspace can't be relying on it.
This is the case, as it's used only for P2P operations
right now.
Rather than adding a flag to tell userspace whether or
not it can actually rely on it, simplify all the code
by removing the ability to use different channel types.
Leave only the validation of the attribute, so that if
we extend it again later (with the needed capability
flag), it can't break userspace sending invalid data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function cfg80211_get_p2p_attr() can fail and returns
a negative error code. However, the return type is unsigned
int. The largest positive number is determined by desired_len
variable in the function, which is u16. So changing the return
type to int to allow easy error checking. Also change the type
for the attribute to enum for improved type checking.
Signed-off-by: Arend van Spriel <arend@broadcom.com>
[fix indentation, don't use u8 attr variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Conflicts:
drivers/net/wireless/iwlwifi/pcie/tx.c
Minor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.
Signed-off-by: David S. Miller <davem@davemloft.net>
With priomap expansion no longer depending on knowing max id
allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx.
Drop prioidx alloc/free logic and convert all uses to cgroup->id.
* In cgrp_css_alloc(), parent->id test is moved above @cs allocation
to simplify error path.
* In cgrp_css_free(), @cs assignment is made initialization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: David S. Miller <davem@davemloft.net>
Provide drivers with hooks to create debugfs files when
a new station is added. This would help drivers to take
advantage of mac80211's station list infrastructure and not maintain
tedious station management code internally.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
[ifdef inline wrapper functions]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the event that an association exceeds its max_retrans attempts, we should
send an ABORT chunk indicating that we are closing the assocation as a result.
Because of the nature of the error, its unlikely to be received, but its a nice
clean way to close the association if it does make it through, and it will give
anyone watching via tcpdump a clue as to what happened.
Change notes:
v2)
* Removed erroneous changes from sctp_make_violation_parmlen
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.
A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.
This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.
We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.
I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Read Local OOB Data command can take more than 1 second on some chips.
e.g. on CSR 0a12:0001 first call to Read Local OOB Data after reset
takes about 1300ms resulting in tx timeout error.
[27698.368655] Bluetooth: hci0 command 0x0c57 tx timeout
2012-10-31 15:53:36.178585 < HCI Command: Read Local OOB Data (0x03|0x0057) plen 0
2012-10-31 15:53:37.496996 > HCI Event: Command Complete (0x0e) plen 36
Read Local OOB Data (0x03|0x0057) ncmd 1
status 0x00
hash 0x92219d9b447f2aa9dc12dda2ae7bae6a
randomizer 0xb1948d0febe4ea38ce85c4e66313beba
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Drivers (e.g. wl12xx) might need to know the vif
to roc on (mainly in order to configure the
rx filters correctly).
Add the vif to the op params, and update the current
users (iwlwifi) to use the new api.
Signed-off-by: Eliad Peller <eliad@wizery.com>
[fix hwsim]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The NL80211_CMD_TDLS_OPER command was previously used only for userspace
request for the kernel code to perform TDLS operations. However, there
are also cases where the driver may need to request operations from
userspace, e.g., when using security on the AP path. Add a new cfg80211
function for generating a TDLS operation event for drivers to request a
new link to be set up (NL80211_TDLS_SETUP) or an existing link to be
torn down (NL80211_TDLS_TEARDOWN). Drivers can optionally use these
events, e.g., based on noticing data traffic being sent to a peer
station that is seen with good signal strength.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In some cases, in particular for experimentation, it
can be useful to be able to add vendor namespace data
to received frames in addition to the normal radiotap
data.
Allow doing this through mac80211 by adding fields to
the RX status descriptor that describe the data while
the data itself is prepended to the frame.
Also add some example code to hwsim, but don't enable
it because it doesn't use a proper OUI identifier.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Some comments misspell "registered"; this fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The user namespace which creates a new network namespace owns that
namespace and all resources created in it. This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives. Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.
This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace. We don't even have
a previous network namespace.
Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns. One version for when
we compile in network namespace suport and another stub for all other
occasions.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The user namespace which creates a new network namespace owns that
namespace and all resources created in it. This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives. Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.
This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace. We don't even have
a previous network namespace.
Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns. One version for when
we compile in network namespace suport and another stub for all other
occasions.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a callback for the HCI_LE_Set_Advertise_Enable command.
The callback is responsible for updating the HCI_LE_PERIPHERAL flag
updating as well as updating the advertising data flags field to
indicate undirected connectable advertising.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This patch adds support for setting basing LE advertising data. The
three elements supported for now are the advertising flags, the TX power
and the friendly name.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
The core specification defines 127 as the "not available" value (well,
"reserved" for BR/EDR and "not available" for LE - but essentially the
same). Therefore, instead of testing for 0 (which is in fact a valid
value) we should be using this invalid value to test if the tx_power is
available.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
The kernel uses some default metric when routes are managed. For example, a
static route added with a metric set to 0 is inserted in the kernel with
metric 1024 (IP6_RT_PRIO_USER).
It is useful for routing daemons to know these values, to be able to set routes
without interfering with what the kernel does.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the offload callbacks into its own structure.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sing GSO support is now separate, pull it out of the module
and make it its own init call.
Remove the cleanup functions as they are no longer called.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull TCPv6 offload functionality into its won file in preparation
for moving it out of the module.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch IPv6 protocol to using the new GRO/GSO calls and data.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch IPv4 code base to using the new GRO/GSO calls and data.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new data structure for IPv6 protocols that holds GRO/GSO
callbacks and a new array to track the protocols that register GRO/GSO.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new data structure for IPv4 protocols that holds GRO/GSO
callbacks and a new array to track the protocols that register GRO/GSO.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Included is a Bluetooth pull -- Gustavo says:
"These are the Bluetooth bits for inclusion in 3.8, there is basically one big
thing here which is the High Speed patches from Andrei, he did a lot of work on
A2MP and management of AMP devices. The rest are mostly clean up and bug
fixes."
Also included is an NFC pull -- Samuel says:
"With this one we have:
- pn544 p2p support.
- pn544 physical and HCI layers separation. We are getting the pn544 driver
ready to support non i2c physical layers.
- LLCP SNL (Service Name Lookup). This is the NFC p2p service discovery
protocol.
- LLCP datagram sockets (connection less) support.
- IDR library usage for NFC devices indexes assignement.
- NFC netlink extension for setting and getting LLCP link characteristics.
- Various code style fixes and cleanups spread over the pn533, LLCP, HCI and
pn544 code."
There are a couple of mac80211 pulls as well -- Johannes says:
"Please pull my mac80211-next tree to get the first round of new features
for 3.8. We have:
* finally, the mac80211 multi-channel work
* scan improvements:
- bg scan
- scan flush
- forced AP scan
* cfg80211 tracing
* a bit of new code to allow implementing SAE (secure authentication of
equals) in managed mode
Along with a few random improvements, features and fixes."
and...
"Please pull from mac80211-next (per below pull request) to get a few
updates. Most important is probably the fix for the WDS regression that
my previous pull request introduced. Other than that, I have some
tracing code, two mesh updates and a change to allow drivers to
calculate the AES CMAC subkeys without having to implement the GF_mulx
operation themselves."
On top of that are the usual updates to iwlwifi, ath9k, rt2x00,
brcmfmac, mwifiex, and a few others here and there. Of note is the
addition of the ar5523 driver, ported from an original FreeBSD driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
__IPTUNNEL_XMIT() is an ugly macro, convert it to a static
inline function, so make it more readable.
IPTUNNEL_XMIT() is unused, just remove it.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow drivers to indicate their mactime is at RX completion and adjust
for this in mac80211. Also rename the existing RX_FLAG_MACTIME_MPDU to
RX_FLAG_MACTIME_START to clarify its intent. Based on similar code by
Johannes Berg.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
[fix docs, atheros drivers]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The xfrm gc threshold value depends on ip_rt_max_size. This
value was set to INT_MAX with the routing cache removal patch,
so we start doing garbage collecting when we have INT_MAX/2
IPsec routes cached. Fix this by going back to the static
threshold of 1024 routes.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch prepares ipv6_find_hdr() function so that it could be
able to skip routing headers, where segements_left is 0. This is
required to handle multiple routing header case correctly when
changing IPv6 addresses.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net. Based upon a conflict resolution
patch posted by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
Open vSwitch will soon also use ipv6_find_hdr() so this moves it
out of Netfilter-specific code into a more common location.
Signed-off-by: Jesse Gross <jesse@nicira.com>
During hardware restart, all interfaces are iterated even
though they haven't been re-added to the driver, document
this behaviour. The same also happens during resume, which
is even more confusing since all of the interfaces were
previously removed from the driver. Make this optional so
drivers relying on the current behaviour can still use it,
but to let drivers that don't want this behaviour disable
it.
Also convert all API users, keeping the old semantics
except in hwsim, where the new normal ones are desired.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the driver requests a restart (reconfiguration) it
gets all the normal method calls, but can't really tell
why they're happening. Call a new restart_complete op
in the driver when the restart completes, so it could
keep its own state about the restart and clear it there.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6431cbc25f(Create a mechanism for upward inetpeer propagation into routes)
introduces these codes, but this mechanism is never enabled since
rt6i_peer_genid always is zero whether it is not assigned or assigned by
rt6_peer_genid(). After 5943634fc5 (ipv4: Maintain redirect and PMTU info
in struct rtable again), the ipv4 related codes of this mechanism has been
removed, I think we maybe able to remove them now.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While connected to a GO, parse the P2P NoA attribute
and pass the CT Window and opportunistic powersave
parameters to the driver.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Parsing the P2P attributes can be tricky as their
contents can be split across multiple (vendor) IEs.
Thus, it's not possible to parse them like IEs (by
returning a pointer to the data.) Instead, provide
a function that copies the attribute data into a
caller-provided buffer and returns the size needed
(useful in case the buffer was too small.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The commit:
commit 5e760230e4
Author: Johannes Berg <johannes.berg@intel.com>
Date: Fri Nov 4 11:18:17 2011 +0100
cfg80211: allow registering to beacons
allowed only a single process to register for beacon events
per wiphy. This breaks cases where a user may want two or
more VIFs on a wiphy and run a seperate hostapd process on
each vif.
This patch allows multiple beacon listeners, fixing the
regression.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This command triggers a new callback: set_mcast_rate(). It enables
the user to change the rate used to send multicast frames for vif
configured as IBSS or MESH_POINT
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As suggested by Eric, we could introduce a helper function
for ipv6 too, to avoid checking if rt is NULL before
dst_release().
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can save a test in ip_rt_put(), considering dst_release() accepts
a NULL parameter, and dst is first element in rtable.
Add a BUILD_BUG_ON() to catch any change that could break this
assertion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of points in the sctp_cmd_interpreter function treat the sctp_cmd_t arg as
a void pointer, even though they are written as various other types. Theres no
need for this as doing so just leads to possible type-punning issues that could
cause crashes, and if we remain type-consistent we can actually just remove the
void * member of the union entirely.
Change Notes:
v2)
* Dropped chunk that modified SCTP_NULL to create a marker pattern
should anyone try to use a SCTP_NULL() assigned sctp_arg_t, Assigning
to .zero provides the same effect and should be faster, per Vlad Y.
v3)
* Reverted part of V2, opting to use memset instead of .zero, so that
the entire union is initalized thus avoiding the i164 speculative load
problems previously encountered, per Dave M.. Also rewrote
SCTP_[NO]FORCE so as to use common infrastructure a little more
Signed-off-by: Neil Horman <nhorman@tuxdriver.com
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
For passive TCP connections using TCP_DEFER_ACCEPT facility,
we incorrectly increment req->retrans each time timeout triggers
while no SYNACK is sent.
SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
which we received the ACK from client). Only the last SYNACK is sent
so that we can receive again an ACK from client, to move the req into
accept queue. We plan to change this later to avoid the useless
retransmit (and potential problem as this SYNACK could be lost)
TCP_INFO later gives wrong information to user, claiming imaginary
retransmits.
Decouple req->retrans field into two independent fields :
num_retrans : number of retransmit
num_timeout : number of timeouts
num_timeout is the counter that is incremented at each timeout,
regardless of actual SYNACK being sent or not, and used to
compute the exponential timeout.
Introduce inet_rtx_syn_ack() helper to increment num_retrans
only if ->rtx_syn_ack() succeeded.
Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
when we re-send a SYNACK in answer to a (retransmitted) SYN.
Prior to this patch, we were not counting these retransmits.
Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
only if a synack packet was successfully queued.
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Elliott Hughes <enh@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The actual parameter order of hci_get_route is (dst, src) and not (src,
dst). All current callers use the right order but the header file shows
the parameters in the wrong order.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Since we have started to use local_amp_id for local AMP
Controller Id it makes sense to rename ctrl_id to remote_amp_id
since it represents remote AMP controller Id.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving HCI Phylink Complete event run amp_physical_cfm
which initialize BR/EDR L2CAP channel associated with High Speed
link and run l2cap_physical_cfm which shall send L2CAP Create
Chan Request.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Disconnect logical link for high speed channel hs_hchan
associated with L2CAP channel chan.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
After physical link is created logical link needs to be created.
The process starts after L2CAP channel is created and L2CAP
Configuration Response with result PENDING is received.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This patch fixes sending an unnecessary HCI_Write_SSP_Mode command if
the command has already been sent as part of the default HCI init
sequence.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This patch adds a flag to be used for LE GAP Peripheral role. In this
role undirected advertising will be enabled and operations not allowed
in Peripheral role (such as scanning and initiating connections) will be
disallowed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
This patch adds missing feature test macros needed for various use cases
and also sorts the macros according to the feature bit location in the
feature mask (for easy lookup).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Even before channel contexts/multi-channel, having a
single global TX power limit was already problematic,
in particular if two managed interfaces connected to
two APs with different power constraints. The channel
context introduction completely broke this though and
in fact I had disabled TX power configuration there
for drivers using channel contexts.
Change everything to track TX power per interface so
that different user settings and different channel
maxima are treated correctly. Also continue tracking
the global TX power though for compatibility with
applications that attempt to configure the wiphy's
TX power globally.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The TX power setting is currently per wiphy (hardware
device) but with multi-channel capabilities that doesn't
make much sense any more.
Allow drivers (and mac80211) to advertise support for
per-interface TX power configuration. When the TX power
is configured for the wiphy, the wdev will be NULL and
the driver can still handle that, but when a wdev is
given the TX power can be set only for that wdev now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With this one we have:
- pn544 p2p support.
- pn544 physical and HCI layers separation. We are getting the pn544 driver
ready to support non i2c physical layers.
- LLCP SNL (Service Name Lookup). This is the NFC p2p service discovery
protocol.
- LLCP datagram sockets (connection less) support.
- IDR library usage for NFC devices indexes assignement.
- NFC netlink extension for setting and getting LLCP link characteristics.
- Various code style fixes and cleanups spread over the pn533, LLCP, HCI and
pn544 code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQjcPOAAoJEIqAPN1PVmxKLQUP/3rkHa4enQTyBMU326z+OE6B
MtY0Xg6C5U5mksNiqAelkKPMOYy3nuoLcx2sZ9ltgJfG6qCNSV273Hr109JFlhI+
qYAybb+VOb4MqcbkUcMFzO+CBGfcDaGhJ4ibjPAQCgBxIOjKqFzbLVZ4K/xf+GUj
eTEeEryJYb+jFt/RSMXzoHipEM30ROLa9VRvoyxL6m0/3uRrxK/NPVbOxPRHaJqa
/oNJxC2BLPb7t4V2iBOzqChFWP2tknmVWzCuI+X0484hH2BGhm4dUzlKXEu286U9
iMueUv0vx68IpKXgzTStgfeIzxA84ufVWV5d+rPo4iu4HbCth+Fsk6Lds2nqmGAn
QoEYhds0r2CrPnuWN7zP+cc/mXTDHDrPj8pf9iC/KdMSe2bUg0kHOyF08A9zLNh0
nOr2DSzYQ9IqgvWtrAUXdfR2QaFHFsb/+pXWKZsE8+PHxXR/sv7mGPD2OCR1TAqx
G1OmOtHHvnwTlxHpNhIOuH35gfi5HPvPCZ9in5fnN8n0eaX3JyZw96fk5QUqYJ3w
UQBAsagMnWRJUTUYYKyG+IjxkGrdNjmLRQQScYuKktZ07/PL4T5t3RpSjxzlMFwc
ImdOdPntHXI2d+m874yCRAhwdp6vTfpgJl4V4XuON13DPbVO6qBqtNGxhYMqJFWF
cxqrWdaPydmFKy/KLNwF
=OC0R
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-3.0
This is the first NFC pull request for 3.8
With this one we have:
- pn544 p2p support.
- pn544 physical and HCI layers separation. We are getting the pn544 driver
ready to support non i2c physical layers.
- LLCP SNL (Service Name Lookup). This is the NFC p2p service discovery
protocol.
- LLCP datagram sockets (connection less) support.
- IDR library usage for NFC devices indexes assignement.
- NFC netlink extension for setting and getting LLCP link characteristics.
- Various code style fixes and cleanups spread over the pn533, LLCP, HCI and
pn544 code.
A number of places in the mesh code don't check that
the frame data is present and in the skb header when
trying to access. Add those checks and the necessary
pskb_may_pull() calls. This prevents accessing data
that doesn't actually exist.
To do this, export ieee80211_get_mesh_hdrlen() to be
able to use it in mac80211.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
The following changeset contains updates for IPVS from Jesper Dangaard
Brouer that did not reach the previous merge window in time.
More specifically, updates to improve IPv6 support in IPVS. More
relevantly, some of the existing code performed wrong handling of the
extensions headers and better fragmentation handling.
Jesper promised more follow-up patches to refine this after this batch
hits net-next. Yet to come.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As a consequence the NFC device IDs won't be increasing all the time,
as IDR provides the first available ID.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The driver now has all HCI stuff isolated in one file, and all the
hardware link specifics in another. Writing a pn544 driver on top of
another hardware link is now just a matter of adding a new file for that
new hardware specifics.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Set the local general bytes and default value for NFCIP1
Target/Initiator registries if the protocol is NFC-DEP
Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Depending on the driver, a lot of setup may be
necessary to start operating as an AP, some of
which may fail. Add an explicit AP start driver
method to make such failures easier to handle,
and add an AP stop driver method for symmetry.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
sock_update_classid() assumes that the update operation always are
applied on the current task. sock_update_classid() needs to know on
which tasks to work on in order to be able to migrate task between
cgroups using the struct cgroup_subsys attach() callback.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Joe Perches <joe@perches.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <cgroups@vger.kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The classid type used throughout the kernel is u32.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <cgroups@vger.kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
generate cookie values when establishing new connections via two build time
config options. Theres no real reason to make this a static selection. We can
add a sysctl that allows for the dynamic selection of these algorithms at run
time, with the default value determined by the corresponding crypto library
availability.
This comes in handy when, for example running a system in FIPS mode, where use
of md5 is disallowed, but SHA1 is permitted.
Note: This new sysctl has no corresponding socket option to select the cookie
hmac algorithm. I chose not to implement that intentionally, as RFC 6458
contains no option for this value, and I opted not to pollute the socket option
namespace.
Change notes:
v2)
* Updated subject to have the proper sctp prefix as per Dave M.
* Replaced deafult selection options with new options that allow
developers to explicitly select available hmac algs at build time
as per suggestion by Vlad Y.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the reading of the LE advertising channel TX power to
the HCI init sequence of LE-capable controllers. This data will be used
e.g. for inclusion in the advertising data packets when advertising is
enabled.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Channel moves are triggered by changes to the BT_CHANNEL_POLICY
sockopt when an ERTM or streaming-mode channel is connected.
Moves are only started if enable_hs is true.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
The move response command includes a result code indicating
"pending", "success", or "failure" status. A pending result is
received when the remote address is still setting up a physical link,
and will be followed by success or failure. On success, logical link
setup will proceed. On failure, the move is stopped. The receiver of
a move channel response must always follow up by sending a move
channel confirm command.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
On an AMP controller, hci_chan maps to a logical link. When a channel
is being moved, the logical link may or may not be connected already.
The hci_chan->state is used to determine the existance of a useable
logical link so the link can be either used or requested.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
An L2CAP channel using high speed continues to be associated with a
BR/EDR l2cap_conn, while also tracking an additional hci_conn
(representing a physical link on a high speed controller) and hci_chan
(representing a logical link). There may only be one physical link
between two high speed controllers. Each physical link may contain
several logical links, with each logical link representing a channel
with specific quality of service.
During a channel move, the destination channel id, current move state,
and role (initiator vs. responder) are tracked and used by the channel
move state machine. The ident value associated with a move request
must also be stored in order to use it in later move responses.
The active channel is stored in local_amp_id.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Expose a function for the AES-CMAC subkey calculation
to drivers. This is the first step of the AES-CMAC
cipher key setup and may be required for CMAC hardware
offloading.
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.
ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix two build error introduced by commit 63dca2c0:
"ipvs: Fix faulty IPv6 extension header handling in IPVS"
First build error was fairly trivial and can occur, when
CONFIG_IP_VS_IPV6 is disabled.
The second build error was tricky, and can occur when deselecting
both all Netfilter and IPVS, but selecting CONFIG_IPV6. This is
caused by "kernel/sysctl_binary.c" including "net/ip_vs.h", which
includes "linux/netfilter_ipv6/ip6_tables.h" causing include
of "include/linux/netfilter/x_tables.h" which then cannot find
the typedef nf_hookfn.
Fix this by only including "linux/netfilter_ipv6/ip6_tables.h" in
case of CONFIG_IP_VS_IPV6 as its already used to guard the usage
of ipv6_find_hdr().
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pull updates from Jesper Dangaard Brouer for IPVS mostly targeted
to improve IPv6 support (7 commits):
ipvs: Trivial changes, use compressed IPv6 address in output
ipvs: IPv6 extend ICMPv6 handling for future types
ipvs: Use config macro IS_ENABLED()
ipvs: Fix faulty IPv6 extension header handling in IPVS
ipvs: Complete IPv6 fragment handling for IPVS
ipvs: API change to avoid rescan of IPv6 exthdr
ipvs: SIP fragment handling
The struct sock *other one seem to be unused. Grep and make do not object.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When assigning amp_mgr in hci_conn (type AMP_LINK) get also reference.
In hci_conn_del those references would be put for both conn types
AMP_LINK and ACL_LINK associated with amp_mgr.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Initialization of beacon transmission in IBSS mode depends
on whether a new BSS is being created or joined. When joining
an existing IBSS network, beaconing has to start only after
a TSF-sync has happened - this is explained in 11.1.4.
Introduce a new parameter in the BSS information structure to
indicate creator/joiner mode.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add an NL80211_SCAN_FLAG_FLUSH flag that causes old bss cache
entries to be flushed on scan completion. This is useful for
collecting guaranteed fresh scan/survey result (e.g. on resume).
For normal scan, flushing only happens on successful completion
of a scan; i.e. it does not happen if the scan is aborted.
For scheduled scan, previous scan results are flushed everytime
when we get new scan results.
This feature is enabled by default. Drivers can disable it by
unsetting the NL80211_FEATURE_SCAN_FLUSH flag.
Signed-off-by: Sam Leffler <sleffler@chromium.org>
Tested-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
[invert polarity of feature flag to account for old kernels]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a flags word to direct and scheduled scan requests; it will
be used for control of optional behaviours such as flushing the
bss cache prior to doing a scan.
Signed-off-by: Sam Leffler <sleffler@chromium.org>
Tested-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Save the AP's VHT capabilities (in managed
mode) and make them available to the driver
in the station information.
Unlike HT capabilities, they aren't restricted
to the common capabilities, so drivers must be
aware of their own capabilities.
Signed-off-by: Mahesh Palivela <maheshp@posedge.com>
[fix endian conversion bug ...]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
SAE extends Authentication frames with fields that are not information
elements. NL80211_ATTR_IE is not suitable for these, so introduce a new
attribute that can be used to specify the fields needed for SAE in
station mode.
Signed-off-by: Jouni Malinen <j@w1.fi>
[change to verify that SAE is only used with authenticate command]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers may need to iterate the active channel
contexts, export an iterator function to allow
that. To make it possible, use RCU-safe list
functions.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On each channel that the device is operating on, it
may need to listen using one or more chains depending
on the SMPS settings of the interfaces using it. The
previous channel context changes completely removed
this ability (before, it was available as the SMPS
mode).
Add per-context tracking of the required static and
dynamic RX chains and notify the driver on changes.
To achieve this, track the chains and SMPS mode used
on each virtual interface and update the channel
context whenever this changes.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce channel context driver methods. The channel
on a context channel is immutable, but the channel type
and other properties can change.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Channel context are the foundation for multi-channel
operation. They are are immutable and are re-created
(or re-used if other interfaces are bound to a certain
channel and a compatible channel type) on channel
switching.
This is an initial implementation and more features
will come in separate patches.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[some changes including RCU protection]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Avoid situation when we are on associate state in mac80211 and
on disassociate state in cfg80211. This can results on crash
during modules unload (like showed on this thread:
http://marc.info/?t=134373976300001&r=1&w=2) and possibly other
problems.
Reported-by: Pedro Francisco <pedrogfrancisco@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Check flags type in switch statement and handle new frame
type ACL_COMPLETE used for High Speed data over AMP.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When DEFER_SETUP is set defer() will trigger an authorization
request to the userspace.
l2cap_chan_no_defer() is meant to be used when one does not want to
support DEFER_SETUP (A2MP for example).
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Handle AMP link when setting up disconnect timeout.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
hci_chan will be identified by handle used in logical link creation
process. This handle is used in AMP ACL-U packet handle field.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When AMP_LINK timeouts execute HCI_OP_DISCONN_PHY_LINK as analog to
HCI_OP_DISCONNECT for ACL_LINK.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.
The returned route can be cached as follows:
- in NH exception: because the cached routes are not shared
with other destinations
- in FIB NH: when using gateway because all destinations for
NH share same gateway
As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two Flush Timeouts: one is old Flush Timeot Option
which is 2 octets and the second is Flush Timeout inside EFS
which is 4 octets long.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Maximum PDU size is defined by new BT Spec as 1492 octets.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Reviewed-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Add direction parameter to phylink_add since it is anyway set later.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Add ctrl_id parameter to amp_ctrl_add since we always set it
after function ctrl is created.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Pull networking changes from David Miller:
"The most important bit in here is the fix for input route caching from
Eric Dumazet, it's a shame we couldn't fully analyze this in time for
3.6 as it's a 3.6 regression introduced by the routing cache removal.
Anyways, will send quickly to -stable after you pull this in.
Other changes of note:
1) Fix lockdep splats in team and bonding, from Eric Dumazet.
2) IPV6 adds link local route even when there is no link local
address, from Nicolas Dichtel.
3) Fix ixgbe PTP implementation, from Jacob Keller.
4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.
5) MAC length computed improperly in VLAN demux, from Antonio
Quartulli."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
Remove noisy printks from llcp_sock_connect
tipc: prevent dropped connections due to rcvbuf overflow
silence some noisy printks in irda
team: set qdisc_tx_busylock to avoid LOCKDEP splat
bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
sctp: check src addr when processing SACK to update transport state
sctp: fix a typo in prototype of __sctp_rcv_lookup()
ipv4: add a fib_type to fib_info
can: mpc5xxx_can: fix section type conflict
can: peak_pcmcia: fix error return code
can: peak_pci: fix error return code
cxgb4: Fix build error due to missing linux/vmalloc.h include.
bnx2x: fix ring size for 10G functions
cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
ixgbe: add support for X540-AT1
ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
ixgbe: fix PTP ethtool timestamping function
ixgbe: (PTP) Fix PPS interrupt code
ixgbe: Fix PTP X540 SDP alignment code for PPS signal
...
Suppose we have an SCTP connection with two paths. After connection is
established, path1 is not available, thus this path is marked as inactive. Then
traffic goes through path2, but for some reasons packets are delayed (after
rto.max). Because packets are delayed, the retransmit mechanism will switch
again to path1. At this time, we receive a delayed SACK from path2. When we
update the state of the path in sctp_check_transmitted(), we do not take into
account the source address of the SACK, hence we update the wrong path.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit d2d68ba9fe (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.
This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.
David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.
With help from Julian Anastasov who provided very helpful
hints, reproduced here :
<quote>
Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:
broadcast 192.168.0.255 dev DEV proto kernel scope link src
192.168.0.1
192.168.0.0/24 dev DEV proto kernel scope link src 192.168.0.1
local 192.168.0.1 dev DEV proto kernel scope host src 192.168.0.1
The "dev DEV proto kernel scope link src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.
And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.
</quote>
Many thanks to Chris Clayton for his patience and help.
Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
Pull cgroup updates from Tejun Heo:
- xattr support added. The implementation is shared with tmpfs. The
usage is restricted and intended to be used to manage per-cgroup
metadata by system software. tmpfs changes are routed through this
branch with Hugh's permission.
- cgroup subsystem ID handling simplified.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
cgroup: Assign subsystem IDs during compile time
cgroup: Do not depend on a given order when populating the subsys array
cgroup: Wrap subsystem selection macro
cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
cgroup: net_prio: Do not define task_netpioidx() when not selected
cgroup: net_cls: Do not define task_cls_classid() when not selected
cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
xattr: mark variable as uninitialized to make both gcc and smatch happy
fs: add missing documentation to simple_xattr functions
cgroup: add documentation on extended attributes usage
cgroup: rename subsys_bits to subsys_mask
cgroup: add xattr support
cgroup: revise how we re-populate root directory
xattr: extract simple_xattr code from tmpfs
Add GRO capability to IPv4 GRE tunnels, using the gro_cells
infrastructure.
Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and
checking GRO is building large packets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new include file (include/net/gro_cells.h), to bring GRO
(Generic Receive Offload) capability to tunnels, in a modular way.
Because tunnels receive path is lockless, and GRO adds a serialization
using a napi_struct, I chose to add an array of up to
DEFAULT_MAX_NUM_RSS_QUEUES cells, so that multi queue devices wont be
slowed down because of GRO layer.
skb_get_rx_queue() is used as selector.
In the future, we might add optional fanout capabilities, using rxhash
for example.
With help from Ben Hutchings who reminded me
netif_get_num_default_rss_queues() function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we skipped the merge window for 3.6-rc1 for the tty tree, everything
is now settled down and working properly, so we are ready for 3.7-rc1.
Here's the patchset, it's big, but the large changes are removing a
firmware file and adding a staging tty driver (it depended on the tty
core changes, so it's going through this tree instead of the staging
tree.)
All of these patches have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlBp36oACgkQMUfUDdst+yk4WgCdEy13hot8fI2Lqnc7W0LKu7GX
4p8AoLTjzrXhLosxdijskDQ9X1OtjrxU
=S5Ng
-----END PGP SIGNATURE-----
Merge tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY changes from Greg Kroah-Hartman:
"As we skipped the merge window for 3.6-rc1 for the tty tree,
everything is now settled down and working properly, so we are ready
for 3.7-rc1. Here's the patchset, it's big, but the large changes are
removing a firmware file and adding a staging tty driver (it depended
on the tty core changes, so it's going through this tree instead of
the staging tree.)
All of these patches have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fix up more-or-less trivial conflicts in
- drivers/char/pcmcia/synclink_cs.c:
tty NULL dereference fix vs tty_port_cts_enabled() helper function
- drivers/staging/{Kconfig,Makefile}:
add-add conflict (dgrp driver added close to other staging drivers)
- drivers/staging/ipack/devices/ipoctal.c:
"split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device"
* tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits)
tty/serial: Add kgdb_nmi driver
tty/serial/amba-pl011: Quiesce interrupts in poll_get_char
tty/serial/amba-pl011: Implement poll_init callback
tty/serial/core: Introduce poll_init callback
kdb: Turn KGDB_KDB=n stubs into static inlines
kdb: Implement disable_nmi command
kernel/debug: Mask KGDB NMI upon entry
serial: pl011: handle corruption at high clock speeds
serial: sccnxp: Make 'default' choice in switch last
serial: sccnxp: Remove mask termios caps for SW flow control
serial: sccnxp: Report actual baudrate back to core
serial: samsung: Add poll_get_char & poll_put_char
Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate
Powerpc 8xx CPM_UART maxidl should not depend on fifo size
Powerpc 8xx CPM_UART too many interrupts
Powerpc 8xx CPM_UART desynchronisation
serial: set correct baud_base for EXSYS EX-41092 Dual 16950
serial: omap: fix the reciever line error case
8250: blacklist Winbond CIR port
8250_pnp: do pnp probe before legacy probe
...
This patch removes core_data member from hci_dev struct as it is unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
John W. Linville says:
====================
Here is another batch of updates intended for 3.7...
Highlights include an hci_connect re-write in Bluetooth, HCI/LLC
layer separation in NFC, removal of the raw pn544 NFC driver, NFC LLCP
raw sockets support, improved IBSS auth frame handling in mac80211,
full-MAC AP mode notification support in mac80211, a lot of attention
paid to brcmfmac, and the usual level of updates to iwlwifi, ath9k,
mwifiex, and rt2x00, and various other updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the number of times we scan/skip the IPv6 exthdrs.
This patch contains a lot of API changes. This is done, to avoid
repeating the scan of finding the IPv6 headers, via ipv6_find_hdr(),
which is called by ip_vs_fill_iph_skb().
Finding the IPv6 headers is done as early as possible, and passed on
as a pointer "struct ip_vs_iphdr *" to the affected functions.
This patch reduce/removes 19 calls to ip_vs_fill_iph_skb().
Notice, I have choosen, not to change the API of function
pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
used by external schedulers, via {un,}register_ip_vs_scheduler.
Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
they do, they are only interested in iph->{s,d}addr.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
IPVS now supports fragmented packets, with support from nf_conntrack_reasm.c
Based on patch from: Hans Schillstrom.
IPVS do like conntrack i.e. use the skb->nfct_reasm
(i.e. when all fragments is collected, nf_ct_frag6_output()
starts a "re-play" of all fragments into the interrupted
PREROUTING chain at prio -399 (NF_IP6_PRI_CONNTRACK_DEFRAG+1)
with nfct_reasm pointing to the assembled packet.)
Notice, module nf_defrag_ipv6 must be loaded for this to work.
Report unhandled fragments, and recommend user to load nf_defrag_ipv6.
To handle fw-mark for fragments. Add a new IPVS hook into prerouting
chain at prio -99 (NF_IP6_PRI_NAT_DST+1) to catch fragments, and copy
fw-mark info from the first packet with an upper layer header.
IPv6 fragment handling should be the last thing on the IPVS IPv6
missing support list.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
IPv6 packets can contain extension headers, thus its wrong to assume
that the transport/upper-layer header, starts right after (struct
ipv6hdr) the IPv6 header. IPVS uses this false assumption, and will
write SNAT & DNAT modifications at a fixed pos which will corrupt the
message.
To fix this, proper header position must be found before modifying
packets. Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr()
to skip the exthdrs. It finds (1) the transport header offset, (2) the
protocol, and (3) detects if the packet is a fragment.
Note, that fragments in IPv6 is represented via an exthdr. Thus, this
is detected while skipping through the exthdrs.
This patch depends on commit 84018f55a:
"netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
This also adds a dependency to ip6_tables.
Originally based on patch from: Hans Schillstrom
kABI notes:
Changing struct ip_vs_iphdr is a potential minor kABI breaker,
because external modules can be compiled with another version of
this struct. This should not matter, as they would most-likely
be using a compiled-in version of ip_vs_fill_iphdr(). When
recompiled, they will notice ip_vs_fill_iphdr() no longer exists,
and they have to used ip_vs_fill_iph_skb() instead.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Cleanup patch.
Use the IS_ENABLED macro, instead of having to check
both the build and the module CONFIG_ option.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Have not converted the proc file output to compressed IPv6 addresses.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
It seems sk_init() has no value today and even does strange things :
# grep . /proc/sys/net/core/?mem_*
/proc/sys/net/core/rmem_default:212992
/proc/sys/net/core/rmem_max:131071
/proc/sys/net/core/wmem_default:212992
/proc/sys/net/core/wmem_max:131071
We can remove it completely.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux tunnels were written before RFC6040 and therefore never
implemented the corner case of ECN getting set in the outer header
and the inner header not being ready for it.
Section 4.2. Default Tunnel Egress Behaviour.
o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
propagate any other ECN codepoint onwards. This is because the
inner Not-ECT marking is set by transports that rely on dropped
packets as an indication of congestion and would not understand or
respond to any other ECN codepoint [RFC4774]. Specifically:
* If the inner ECN field is Not-ECT and the outer ECN field is
CE, the decapsulator MUST drop the packet.
* If the inner ECN field is Not-ECT and the outer ECN field is
Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
outgoing packet with the ECN field cleared to Not-ECT.
This patch moves the ECN decap logic out of the individual tunnels
into a common place.
It also adds logging to allow detecting broken systems that
set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header).
Overloads rx_frame_error to keep track of ECN related error.
Thanks to Chris Wright who caught this while reviewing the new VXLAN
tunnel.
This code was tested by injecting faulty logic in other end GRE
to send incorrectly encapsulated packets.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
batostr is not needed anymore since for printing Bluetooth
addresses we use %pMR specifier.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving HCI Command Status event for Accept Physical Link
execute HCI Write Remote AMP Assoc with data saved from A2MP Create
Physical Link Request.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving A2MP Create Physical Link message execute HCI
Accept Physical Link command to AMP controller.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Channel Selected event indicates that link information data is available.
Read it with Read Local AMP Assoc command. The data shall be sent in the
A2MP Create Physical Link Request.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When there is no remote AMP controller found fallback to normal
L2CAP sequence.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving HCI Command Status after HCI Create Physical Link
execute HCI Write Remote AMP Assoc command to AMP controller.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving A2MP Get AMP Assoc Response execute HCI Create Physical
Link to AMP controller. Define function which will run when receiving
HCI Command Status.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Function calculates AMP keys using hmac_sha256 helper. Calculated keys
are Generic AMP Link Key (gamp) and Dedicated AMP Link Key with
keyID "802b" for 802.11 PAL.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Choose which L2CAP connection to establish by checking support
for HS and remote side supported features.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Create remote AMP controllers structure. It is used to keep information
about discovered remote AMP controllers by A2MP protocol.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Define physical link structures. Physical links are represented by
hci_conn structure. For BR/EDR we use type ACL_LINK and for AMP
we use AMP_LINK.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving A2MP Get AMP Assoc Request execute Read Local AMP Assoc
HCI command to AMP controller. If the AMP Assoc data is larger than it
can fit to HCI event only fragment is read. When all fragments are read
send A2MP Get AMP Assoc Response.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
When receiving A2MP Get Info Request execute Read Local AMP Info HCI
command to AMP controller with function to be executed upon receiving
command complete event. Function will handle A2MP Get Info Response.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Create amp_mgr_list global list which will be used by different
hci devices to find amp_mgr.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Add a few definitions to hci.h
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
xmit callback provided by a driver encapsulates upper layers
data and sends it to the hardware. So, HCI does not know the
exact amount of data being sent and thus can't handle partially
sent frames properly.
Therefore, the driver must return 0 for completely sent frame or
negative for failure.
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Acked-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The previous shdlc HCI driver and its header are removed from the tree.
PN544 now registers directly with HCI and passes the name of the llc it
requires (shdlc).
HCI instantiation now allocates the required llc instance. The llc is
started when the HCI device is brought up.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is used by HCI drivers such as the one for the pn544 which require
communications between HCI and the chip to use shdlc.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is a passthrough llc. It can be used by HCI drivers that don't
need link layer control. HCI will then write directly to the driver, and
driver will deliver incoming frames directly to HCI without any
processing.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The LLC layer manages modules that control the link layer protocol (such
as shdlc) between HCI and an HCI driver. The driver must simply specify
the required llc when it registers with HCI.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This enables the completion callback to be called from a different
context, preventing a possible deadlock if the callback resulted in the
invocation of a nested call to the currently locked nfc_dev.
This is also more in line with the im_transceive nfc_ops for NFC Core or
NCI drivers which already behave asynchronously.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This method initiates execution of an HCI cmd. Result will be delivered
through an asynchronous callback.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NFC is using a number of custom ordered workqueues w/ WQ_MEM_RECLAIM.
WQ_MEM_RECLAIM is unnecessary unless NFC is gonna be used as transport
for storage device, and all use cases match one work item to one
ordered workqueue - IOW, there's no actual ordering going on at all
and using system_nrt_wq gives the same behavior.
There's nothing to be gained by using custom workqueues. Use
system_nrt_wq instead and drop all the custom ones.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
During NFC-DEP target activation, store the remote
general bytes to be used later in dep_link_up.
When dep_link_up is called, activate the NFC-DEP target,
and forward the remote general bytes.
When dep_link_down is called, deactivate the target.
Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
If initiator protocol is NFC-DEP, set the local general bytes
in nci_start_poll.
Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.
This page is used to build fragments for skbs.
Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)
But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page
Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.
This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.
(up to 32768 bytes per frag, thats order-3 pages on x86)
This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.
Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536
Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When taking SYNACK RTT samples for servers using TCP Fast Open, fix
the code to ensure that we only call tcp_valid_rtt_meas() after we
receive the ACK that completes the 3-way handshake.
Previously we were always taking an RTT sample in
tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
receive the SYN. So for TFO we must wait until tcp_rcv_state_process()
to take the RTT sample.
To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock()
before we set the snt_synack timestamp, since tcp_synack_rtt_meas()
already ensures that we only take a SYNACK RTT sample if snt_synack is
non-zero. To be careful, we only take a snt_synack timestamp when
a SYNACK transmit or retransmit succeeds.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for adding another spot where we compute the SYNACK
RTT, extract this code so that it can be shared.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"More bug fixes, nothing gets past these guys"
1) More kernel info leaks found by Mathias Krause, this time in the
IPSEC configuration layers.
2) When IPSEC policies change, we do not properly make sure that cached
routes (which could now be stale) throughout the system will be
revalidated. Fix this by generalizing the generation count
invalidation scheme used by ipv4. From Nicolas Dichtel.
3) When repairing TCP sockets, we need to allow to restore not just the
send window scale, but the receive one too. Extend the existing
interface to achieve this in a backwards compatible way. From
Andrey Vagin.
4) A fix for FCOE scatter gather feature validation erroneously caused
scatter gather to be disabled for things like AOE too. From Ed L
Cashin.
5) Several cases of mishandling of error pointers, from Mathias Krause,
Wei Yongjun, and Devendra Naga.
6) Fix gianfar build, from Richard Cochran.
7) CAP_NET_* failures should return -EPERM not -EACCES, from Zhao
Hongjiang.
8) Hardware reset fix in janz-ican3 CAN driver, from Ira W Snyder.
9) Fix oops during rmmod in ti_hecc CAN driver, from Marc Kleine-Budde.
10) The removal of the conditional compilation of the clk support code
in the stmmac driver broke things. This is because the interfaces
used are the ones that don't also perform the enable/disable of the
clk. Fix from Stefan Roese.
11) The QFQ packet scheduler can record out of range virtual start
times, resulting later in misbehavior and even crashes. Fix from
Paolo Valente.
12) If MSG_WAITALL is used with IOAT DMA under TCP, we can wedge the
receiver when the advertised receive window goes to zero. Detect
this case and force the processing of the IOAT DMA queue when it
happens to avoid getting stuck. Fix from Michal Kubecek.
13) batman-adv assumes that test_bit() returns only 0 or 1, but this is
not true for x86 (which returns -1 or 0, via the 'sbb' instruction).
Fix from Linus Lussing.
14) Fix small packet corruption in e1000, from Tushar Dave.
15) make_blackhole() in the IPSEC policy code can do one read unlock too
many, fix from Li RongQing.
16) The new tcp_try_coalesce() code introduced a bug in TCP URG
handling, fix from Eric Dumazet.
17) Fix memory leak in __netif_receive_skb() when doing zerocopy and
when hit an OOM condition. From Michael S Tsirkin.
18) netxen blindly deferences pdev->bus->self, which is not guarenteed
to be non-NULL. Fix from Nikolay Aleksandrov.
19) Fix a performance regression caused by mistakes in ipv6 checksum
validation in the bnx2x driver, fix from Michal Schmidt.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
net/stmmac: Use clk_prepare_enable and clk_disable_unprepare
net: change return values from -EACCES to -EPERM
net/irda: sh_sir: fix return value check in sh_sir_set_baudrate()
stmmac: fix return value check in stmmac_open_ext_timer()
gianfar: fix phc index build failure
ipv6: fix return value check in fib6_add()
bnx2x: remove false warning regarding interrupt number
can: ti_hecc: fix oops during rmmod
can: janz-ican3: fix support for older hardware revisions
net: do not disable sg for packets requiring no checksum
aoe: assert AoE packets marked as requiring no checksum
at91ether: return PTR_ERR if call to clk_get fails
xfrm_user: don't copy esn replay window twice for new states
xfrm_user: ensure user supplied esn replay window is valid
xfrm_user: fix info leak in copy_to_user_tmpl()
xfrm_user: fix info leak in copy_to_user_policy()
xfrm_user: fix info leak in copy_to_user_state()
xfrm_user: fix info leak in copy_to_user_auth()
net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200
tcp: restore rcv_wscale in a repair mode (v2)
...
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/
The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.
"
If insufficient fragments are received to complete reassembly of a
packet within 60 seconds of the reception of the first-arriving
fragment of that packet, reassembly of that packet must be
abandoned and all the fragments that have been received for that
packet must be discarded. If the first fragment (i.e., the one
with a Fragment Offset of zero) has been received, an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment.
"
As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.
With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UDP packet.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for Secure Simple Pairing with devices that have
KeyboardOnly as their IO capability. Such devices will cause a passkey
notification on our side and optionally also keypress notifications.
Without this patch some keyboards cannot be paired using the mgmt
interface.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
IPv6 dst should take care of rt_genid too. When a xfrm policy is inserted or
deleted, all dst should be invalidated.
To force the validation, dst entries should be created with ->obsolete set to
DST_OBSOLETE_FORCE_CHK. This was already the case for all functions calling
ip6_dst_alloc(), except for ip6_rt_copy().
As a consequence, we can remove the specific code in inet6_connection_sock.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since route cache deletion (89aef8921b), delay is no
more used. Remove it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In AP mode, when a station requests connection to an AP and if the
request is failed for particular reason, userspace is notified about the
failure through NL80211_CMD_CONN_FAILED command. Reason for the failure
is sent through the attribute NL80211_ATTR_CONN_FAILED_REASON.
Signed-off-by: Pandiyarajan Pitchaimuthu <c_ppitch@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The kerneldoc comment for .sched_scan_stop() callback describes a
driver_initiated flag, but the interface does not hold such a flag.
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Always store audit loginuids in type kuid_t.
Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.
Modify audit_get_loginuid to return a kuid_t.
Modify audit_set_loginuid to take a kuid_t.
Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.
Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This warning:
In file included from linux/include/linux/tcp.h:227:0,
from linux/include/linux/ipv6.h:221,
from linux/include/net/ipv6.h:16,
from linux/include/linux/sunrpc/clnt.h:26,
from linux/net/sunrpc/stats.c:22:
linux/include/net/sock.h: In function `sk_rmem_schedule':
linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
is seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.
Commit c76562b670 ("netvm: prevent a stream-specific deadlock")
accidentally replaced the "size" parameter of sk_rmem_schedule() with an
unsigned int. This changes the semantics of the comparison in the
return statement.
In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer. In addition, __sk_mem_schedule() takes a
signed integer for its "size" parameter, so there is an implicit type
conversion in sk_rmem_schedule() anyway.
Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are exactly
the same.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John W. Linville says:
====================
This is another batch of updates intended for the 3.7 stream.
There are not a lot of large items, but iwlwifi, mwifiex, rt2x00,
ath9k, and brcmfmac all get some attention. Wei Yongjun also provides
a series of small maintenance fixes.
This also includes a pull of the wireless tree in order to satisfy
some prerequisites for later patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/netfilter/nfnetlink_log.c
net/netfilter/xt_LOG.c
Rather easy conflict resolution, the 'net' tree had bug fixes to make
sure we checked if a socket is a time-wait one or not and elide the
logging code if so.
Whereas on the 'net-next' side we are calculating the UID and GID from
the creds using different interfaces due to the user namespace changes
from Eric Biederman.
Signed-off-by: David S. Miller <davem@davemloft.net>
WARNING: With this change it is impossible to load external built
controllers anymore.
In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is
set, corresponding subsys_id should also be a constant. Up to now,
net_prio_subsys_id and net_cls_subsys_id would be of the type int and
the value would be assigned during runtime.
By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN
to IS_ENABLED, all *_subsys_id will have constant value. That means we
need to remove all the code which assumes a value can be assigned to
net_prio_subsys_id and net_cls_subsys_id.
A close look is necessary on the RCU part which was introduces by
following patch:
commit f845172531
Author: Herbert Xu <herbert@gondor.apana.org.au> Mon May 24 09:12:34 2010
Committer: David S. Miller <davem@davemloft.net> Mon May 24 09:12:34 2010
cls_cgroup: Store classid in struct sock
Tis code was added to init_cgroup_cls()
/* We can't use rcu_assign_pointer because this is an int. */
smp_wmb();
net_cls_subsys_id = net_cls_subsys.subsys_id;
respectively to exit_cgroup_cls()
net_cls_subsys_id = -1;
synchronize_rcu();
and in module version of task_cls_classid()
rcu_read_lock();
id = rcu_dereference(net_cls_subsys_id);
if (id >= 0)
classid = container_of(task_subsys_state(p, id),
struct cgroup_cls_state, css)->classid;
rcu_read_unlock();
Without an explicit explaination why the RCU part is needed. (The
rcu_deference was fixed by exchanging it to rcu_derefence_index_check()
in a later commit, but that is a minor detail.)
So here is my pondering why it was introduced and why it safe to
remove it now. Note that this code was copied over to net_prio the
reasoning holds for that subsystem too.
The idea behind the RCU use for net_cls_subsys_id is to make sure we
get a valid pointer back from task_subsys_state(). task_subsys_state()
is just blindly accessing the subsys array and returning the
pointer. Obviously, passing in -1 as id into task_subsys_state()
returns an invalid value (out of lower bound).
So this code makes sure that only after module is loaded and the
subsystem registered, the id is assigned.
Before unregistering the module all old readers must have left the
critical section. This is done by assigning -1 to the id and issuing a
synchronized_rcu(). Any new readers wont call task_subsys_state()
anymore and therefore it is safe to unregister the subsystem.
The new code relies on the same trick, but it looks at the subsys
pointer return by task_subsys_state() (remember the id is constant
and therefore we allways have a valid index into the subsys
array).
No precautions need to be taken during module loading
module. Eventually, all CPUs will get a valid pointer back from
task_subsys_state() because rebind_subsystem() which is called after
the module init() function will assigned subsys[net_cls_subsys_id] the
newly loaded module subsystem pointer.
When the subsystem is about to be removed, rebind_subsystem() will
called before the module exit() function. In this case,
rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer
and then it calls synchronize_rcu(). All old readers have left by then
the critical section. Any new reader wont access the subsystem
anymore. At this point we are safe to unregister the subsystem. No
synchronize_rcu() call is needed.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
task_netprioidx() should not be defined in case the configuration is
CONFIG_NETPRIO_CGROUP=n. The reason is that in a following patch the
net_prio_subsys_id will only be defined if CONFIG_NETPRIO_CGROUP!=n.
When net_prio is not built at all any callee should only get an empty
task_netprioidx() without any references to net_prio_subsys_id.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
task_cls_classid() should not be defined in case the configuration is
CONFIG_NET_CLS_CGROUP=n. The reason is that in a following patch the
net_cls_subsys_id will only be defined if CONFIG_NET_CLS_CGROUP!=n.
When net_cls is not built at all a callee should only get an empty
task_cls_classid() without any references to net_cls_subsys_id.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
The only user of sock_update_classid() is net/socket.c which happens
to include cls_cgroup.h directly.
tj: Fix build breakage due to missing cls_cgroup.h inclusion in
drivers/net/tun.c reported in linux-next by Stephen.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark keys that might be used to receive management
frames so drivers can fall back on software crypto
for them if they don't support hardware offload.
As the new flag is only set correctly for RX keys
and the existing IEEE80211_KEY_FLAG_SW_MGMT flag
can only affect TX, also rename the latter to
IEEE80211_KEY_FLAG_SW_MGMT_TX.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Return code is not needed in hci_chan_del
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
hdev is allocated with kzalloc so zero initialization is not needed.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Since route cache deletion (89aef8921b), delay is no
more used. Remove it.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace. Avoid that problem by
passing kuids and kgids instead.
- define struct scm_creds for use in scm_cookie and netlink_skb_parms
that holds uid and gid information in kuid_t and kgid_t.
- Modify scm_set_cred to fill out scm_creds by heand instead of using
cred_to_ucred to fill out struct ucred. This conversion ensures
userspace does not get incorrect uid or gid values to look at.
- Modify scm_recv to convert from struct scm_creds to struct ucred
before copying credential values to userspace.
- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
instead of just copying struct ucred from userspace.
- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
into the NETLINK_CB.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull these fixes intended for 3.6. There are more commits
here than I would like -- I got a bit behind while I was stalking
Steven Rostedt in San Diego last week... I'll slow it down after this!
There are a couple of pulls here. One is from Johannes:
"Please pull (according to the below information) to get a few fixes.
* a fix to properly disconnect in the driver when authentication or
association fails
* a fix to prevent invalid information about mesh paths being reported
to userspace
* a memory leak fix in an nl80211 error path"
The other comes via Gustavo:
"A few updates for the 3.6 kernel. There are two btusb patches to add
more supported devices through the new USB_VENDOR_AND_INTEFACE_INFO()
macro and another one that add a new device id for a Sony Vaio laptop,
one fix for a user-after-free and, finally, two patches from Vinicius
to fix a issue in SMP pairing."
Along with those...
Arend van Spriel provides a fix for a use-after-free bug in brcmfmac.
Daniel Drake avoids a hang by not trying to touch the libertas hardware
duing suspend if it is already powered-down.
Felix Fietkau provides a batch of ath9k fixes that adress some
potential problems with power settings, as well as a fix to avoid a
potential interrupt storm.
Gertjan van Wingerde provides a register-width fix for rt2x00, and
a rt2x00 fix to prevent incorrectly detecting the rfkill status.
He also provides a device ID patch.
Hante Meuleman gives us three brcmfmac fixes, one that properly
initializes a command structure, one that fixes a race condition that
could lose usb requests, and one that removes some log spam.
Marc Kleine-Budde offers an rt2x00 fix for a voltage setting on some
specific devices.
Mohammed Shafi Shajakhan sent an ath9k fix to avoid a crash related to
using timers that aren't allocated when 2 wire bluetooth coexistence
hardware is in use.
Sergei Poselenov changes rt2800usb to do some validity checking for
received packets, avoiding crashes on an ARM Soc.
Stone Piao gives us an mwifiex fix for an incorrectly set skb length
value for a command buffer.
All of these are localized to their specific drivers, and relatively
small. The power-related patches from Felix are bigger than I would
like, but I merged them in consideration of their isolation to ath9k
and the sensitive nature of power settings in wireless devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull in mac80211.git to let the next patch apply
without conflicts, also resolving a hwsim conflict.
Conflicts:
drivers/net/wireless/mac80211_hwsim.c
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.
Bug already reported here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498
Before the patch:
$ ip route add blackhole 2001::1/128
RTNETLINK answers: No such device
$ ip route add blackhole 2001::1/128 dev eth0
$ ip -6 route | grep 2001
2001::1 dev eth0 metric 1024
After:
$ ip route add blackhole 2001::1/128
$ ip -6 route | grep 2001
blackhole 2001::1 dev lo metric 1024 error -22
v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ESN for esp is defined in RFC 4303. This RFC assumes that the
sequence number counters are always up to date. However,
this is not true if an async crypto algorithm is employed.
If the sequence number counters are not up to date on sequence
number check, we may incorrectly update the upper 32 bit of
the sequence number. This leads to a DOS.
We workaround this by comparing the upper sequence number,
(used for authentication) with the upper sequence number
computed after the async processing. We drop the packet
if these numbers are different.
To do this, we introduce a recheck function that does this
check in the ESN case.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state,
in addition to Recovery state. Retire the current rate-halving in CWR.
When losses are detected via ACKs in CWR state, the sender enters Recovery
state but the cwnd reduction continues and does not restart.
Rename and refactor cwnd reduction functions since both CWR and Recovery
use the same algorithm:
tcp_init_cwnd_reduction() is new and initiates reduction state variables.
tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery().
tcp_ends_cwnd_reduction() is previously tcp_complete_cwr().
The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(),
and the cwnd moderation inside tcp_enter_cwr() are removed. The unused
parameter, flag, in tcp_cwnd_reduction() is also removed.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -
1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled
2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes
3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket
4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes
5. supporting TCP_FASTOPEN socket option
6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock
7. supporting TCP's TFO cookie option
8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.
The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.
In addition, it includes the following:
1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.
2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.
"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.
"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.
3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for
the passive open side") changed the initRTO from 3secs to 1sec in
accordance to RFC6298 (former RFC2988bis). This reduced the time till
the last SYN retransmission packet gets sent from 93secs to 31secs.
RFC1122 is stating that the retransmission should be done for at least 3
minutes, but this seems to be quite high.
"However, the values of R1 and R2 may be different for SYN
and data segments. In particular, R2 for a SYN segment MUST
be set large enough to provide retransmission of the segment
for at least 3 minutes. The application can close the
connection (i.e., give up on the open attempt) sooner, of
course."
This patch increases the value of TCP_SYN_RETRIES to the value of 6,
providing a retransmission window of 63secs.
The comments for SYN and SYNACK retries have also been updated to
describe the current settings. The same goes for the documentation file
"Documentation/networking/ip-sysctl.txt".
Signed-off-by: Alexander Bergmann <alex@linlab.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.
This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.
Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.
Signed-off-by: Patrick McHardy <kaber@trash.net>
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.
This also makes it clear that it is checking the security of the link.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.
This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a broken build due to a missing header:
...
CC net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...
The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.
See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,
Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change since v1:
* Fixed inuse counters access spotted by Eric
In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.
[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(
Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:
* sklist
* prot inuse counters
The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>