Граф коммитов

63106 Коммитов

Автор SHA1 Сообщение Дата
Paul Durrant ed1f50c3a7 net: add skb_checksum_setup
This patch adds a function to set up the partial checksum offset for IP
packets (and optionally re-calculate the pseudo-header checksum) into the
core network code.
The implementation was previously private and duplicated between xen-netback
and xen-netfront, however it is not xen-specific and is potentially useful
to any network driver.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:24:19 -08:00
David S. Miller aef2b45fe4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Steffen Klassert says:

====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.

The version from net-next can be used, like it is done in linux-next.

1) Checkpatch cleanups, from Weilong Chen.

2) Fix lockdep complaints when pktgen is used with IPsec,
   from Fan Du.

3) Update pktgen to allow any combination of IPsec transport/tunnel mode
   and AH/ESP/IPcomp type, from Fan Du.

4) Make pktgen_dst_metrics static, Fengguang Wu.

5) Compile fix for pktgen when CONFIG_XFRM is not set,
   from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:14:25 -08:00
stephen hemminger 6daaf0de2f sctp: make sctp_addto_chunk_fixed local
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:42:30 -08:00
Andy Fleming 7614aba6a3 phylib: Add of_phy_attach
10G PHYs don't currently support running the state machine, which
is implicitly setup via of_phy_connect(). Therefore, it is necessary
to implement an OF version of phy_attach(), which does everything
except start the state machine.

Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:29:49 -08:00
Andy Fleming 257184d7cc phylib: Support attaching to generic 10g driver
phy_attach_direct() may now attach to a generic 10G driver. It can
also be used exactly as phy_connect_direct(), which will be useful
when using of_mdio, as phy_connect (and therefore of_phy_connect)
start the PHY state machine, which is currently irrelevant for 10G
PHYs.

Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:29:49 -08:00
Andy Fleming 898dd0bda3 phylib: introduce PHY_INTERFACE_MODE_XGMII for 10G PHY
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:29:48 -08:00
Andy Fleming efabdfb95a phylib: Add Clause 45 read/write functions
Need an extra parameter to read or write Clause 45 PHYs, so
need a different API with the extra parameter.

Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:29:48 -08:00
WANG Cong 7eb8896df0 net_sched: act: remove struct tcf_act_hdr
It is not necessary at all.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong 2519a602c2 net_sched: optimize tcf_match_indev()
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.

Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.

BTW, this will also save some bytes from the core struct of u32.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong 832d1d5bfa net_sched: add struct net pointer to tcf_proto_ops->dump
It will be needed by the next patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
WANG Cong ddafd34f41 net_sched: act: move idx_gen into struct tcf_hashinfo
There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
Hannes Frederic Sowa 8ed1dc44d3 ipv4: introduce hardened ip_no_pmtu_disc mode
This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
to be honored by protocols which do more stringent validation on the
ICMP's packet payload. This knob is useful for people who e.g. want to
run an unmodified DNS server in a namespace where they need to use pmtu
for TCP connections (as they are used for zone transfers or fallback
for requests) but don't want to use possibly spoofed UDP pmtu information.

Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
if the returned packet is in the window or if the association is valid.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:55 -08:00
Hannes Frederic Sowa f87c10a8aa ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing
While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.

We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.

I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.

Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.

The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.

Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:54 -08:00
Christoph Paasch 8a59359cb8 tcp: metrics: New netlink attribute for src IP and dumped in netlink reply
This patch adds a new netlink attribute for the source-IP and appends it
to the netlink reply. Now, iproute2 can have access to the source-IP.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:38:18 -05:00
David S. Miller 1a6c1e5bd2 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull these updates for the 3.14 stream!

For the mac80211 bits, Johannes says:

"Felix adds some helper functions for P2P NoA software tracking, Joe
fixes alignment (but as this apparently never caused issues I didn't
send it to 3.13), Kyeyoon/Jouni add QoS-mapping support (a Hotspot 2.0
feature), Weilong fixed a bunch of checkpatch errors and I get to play
fire-fighter or so and clean up other people's locking issues. I also
added nl80211 vendor-specific events, as we'd discussed at the wireless
summit."

For the iwlwifi bits, Emmanuel says:

"I have here a rework of the interrupt handling to meet RT kernel
requirements - basically we don't take any lock in the primary interrupt
handler. This gave me a good reason to clean things up a bit on the way.
There is also a fix of the QoS mapping along with a few workarounds for
hardware / firmware issues that are hard to hit.
Three fixes suggested by static analyzers, and other various stuff.
Most importantly, I update the Copyright note to include the new year."

For the bluetooth bits, Gustavo says:

"More patches to 3.14. The bulk of changes here is the 6LoWPAN support for
Bluetooth LE Devices. The commits that touches net/ieee802154/ are already
acked by David Miller. Other than that we have some RFCOMM fixes and
improvements plus fixes and clean ups all over the tree."

Beyond that, ath9k, brcmfmac, mwifiex, and wil6210 get their usual
level of attention.  The wl1251 driver gets a number of updates,
and there are a handful of other bits here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 14:53:33 -05:00
David S. Miller ef8570d859 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
This batch contains one single patch with the l2tp match
for xtables, from James Chapman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 14:50:02 -05:00
John W. Linville 235f939228 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/ieee802154/6lowpan.c
2014-01-10 10:59:40 -05:00
David S. Miller 751fcac19a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
nf_tables updates for net-next

The following patchset contains the following nf_tables updates,
mostly updates from Patrick McHardy, they are:

* Add the "inet" table and filter chain type for this new netfilter
  family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6
  rules, this should help to simplify the burden in the administration
  of dual stack firewalls. This also includes several patches to prepare
  the infrastructure for this new table and a new meta extension to
  match the layer 3 and 4 protocol numbers, from Patrick McHardy.

* Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used
  in NFPROTO_INET, as we don't certainly know which one would be used,
  also from Patrick McHardy.

* Do not allow to delete a table that contains sets, otherwise these
  sets become orphan, from Patrick McHardy.

* Hold a reference to the corresponding nf_tables family module when
  creating a table of that family type, to avoid the module deletion
  when in use, from Patrick McHardy.

* Update chain counters before setting the chain policy to ensure that
  we don't leave the chain in inconsistent state in case of errors (aka.
  restore chain atomicity). This also fixes a possible leak if it fails
  to allocate the chain counters if no counters are passed to be restored,
  from Patrick McHardy.

* Don't check for overflows in the table counter if we are just renaming
  a chain, from Patrick McHardy.

* Replay the netlink request after dropping the nfnl lock to load the
  module that supports provides a chain type, from Patrick.

* Fix chain type module references, from Patrick.

* Several cleanups, function renames, constification and code
  refactorizations also from Patrick McHardy.

* Add support to set the connmark, this can be used to set it based on
  the meta mark (similar feature to -j CONNMARK --restore), from
  Kristian Evensen.

* A couple of fixes to the recently added meta/set support and nft_reject,
  and fix missing chain type unregistration if we fail to register our
  the family table/filter chain type, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 21:36:01 -05:00
James Chapman 74f77a6b2b netfilter: introduce l2tp match extension
Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2
and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id
and session-id, the filtering decision can also include the L2TP
packet type (control or data), protocol version (2 or 3) and
encapsulation type (UDP or IP).

The most common use for this will likely be to filter L2TP data
packets of individual L2TP tunnels or sessions. While a u32 match can
be used, the L2TP protocol headers are such that field offsets differ
depending on bits set in the header, making rules for matching generic
L2TP connections cumbersome. This match extension takes care of all
that.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 21:36:39 +01:00
Patrick McHardy 3876d22dba netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain()
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:16 +01:00
Patrick McHardy fa2c1de0bb netfilter: nf_tables: minor nf_chain_type cleanups
Minor nf_chain_type cleanups:

- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy 2a37d755b8 netfilter: nf_tables: constify chain type definitions and pointers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy baae3e62f3 netfilter: nf_tables: fix chain type module reference handling
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.

Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:14 +01:00
Kristian Evensen c4ede3d382 netfilter: nft_ct: Add support to set the connmark
This patch adds kernel support for setting properties of tracked
connections. Currently, only connmark is supported. One use-case
for this feature is to provide the same functionality as
-j CONNMARK --save-mark in iptables.

Some restructuring was needed to implement the set op. The new
structure follows that of nft_meta.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 19:07:44 +01:00
John W. Linville 300e5fd160 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-01-08 13:44:29 -05:00
Daniel Borkmann fd44b93cb5 net: skbuff: const-ify casts in skb_queue_* functions
We should const-ify comparisons on skb_queue_* inline helper
functions as their parameters are const as well, so lets not
drop that.

Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:34:00 -05:00
Patrick McHardy 4566bf2706 netfilter: nft_meta: add l4proto support
For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:31 +01:00
Patrick McHardy 124edfa9e0 netfilter: nf_tables: add nfproto support to meta expression
Needed by multi-family tables to distinguish IPv4 and IPv6 packets.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:30 +01:00
Patrick McHardy 1d49144c0a netfilter: nf_tables: add "inet" table for IPv4/IPv6
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:25 +01:00
Patrick McHardy 115a60b173 netfilter: nf_tables: add support for multi family tables
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07 23:55:46 +01:00
Patrick McHardy c9484874e7 netfilter: nf_tables: add hook ops to struct nft_pktinfo
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Jerry Chu bf5a755f5e net-gre-gro: Add GRE support to the GRO stack
This patch built on top of Commit 299603e837
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.

The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.

Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.

The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.

Note that commit 60769a5dcd "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.

I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.

All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)

An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.

The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).

1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%

1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%

1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%

1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%

The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).

2.1 ethtool -K gre1 gro on (hence will use gro_cells)

2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%

2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%

2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%

2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%

2.2 ethtool -K gre1 gro off

2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%

2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%

2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%

2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 16:21:31 -05:00
FX Le Bail 509aba3b0d IPv6: add the option to use anycast addresses as source addresses in echo reply
This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 echo reply. This sysctl is false by default
  to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 15:51:39 -05:00
Eric Dumazet 438e38fadc gre_offload: statically build GRE offloading support
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.

This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 20:28:34 -05:00
David S. Miller 39b6b2992f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 19:48:38 -05:00
Thomas Graf 44da5ae5fb openvswitch: Drop user features if old user space attempted to create datapath
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:53:00 -08:00
Thomas Graf 43d4be9cb5 openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:53 -08:00
Thomas Graf af2806f8f9 net: Export skb_zerocopy() to zerocopy from one skb to another
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:42 -08:00
Thomas Graf bb9b18fb55 genl: Add genlmsg_new_unicast() for unicast message allocation
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:53 -08:00
David S. Miller 56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Eric Dumazet 996b175e39 tcp: out_of_order_queue do not use its lock
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:34:34 -05:00
Arend van Spriel 6e08d757b7 mmc: add SDIO identifiers for Broadcom WLAN devices
The SDIO identifier for Broadcom WLAN devices were defined in the
brcmfmac SDIO driver. Moving the definitions in MMC header file
seems common sense.

Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-06 15:48:06 -05:00
Vijay Subramanian d4b36210c2 net: pkt_sched: PIE AQM scheme
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions.  Naeem Khademi and Dave Taht independently contributed to ECN
support.

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 15:13:01 -05:00
David S. Miller 9aa28f2b71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>

====================
nftables updates for net-next

The following patchset contains nftables updates for your net-next tree,
they are:

* Add set operation to the meta expression by means of the select_ops()
  infrastructure, this allows us to set the packet mark among other things.
  From Arturo Borrero Gonzalez.

* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
  Borkmann.

* Add new queue expression to nf_tables. These comes with two previous patches
  to prepare this new feature, one to add mask in nf_tables_core to
  evaluate the queue verdict appropriately and another to refactor common
  code with xt_NFQUEUE, from Eric Leblond.

* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
  Eric Leblond.

* Add the reject expression to nf_tables, this adds the missing TCP RST
  support. It comes with an initial patch to refactor common code with
  xt_NFQUEUE, again from Eric Leblond.

* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
  Nazarewicz.

* Remove the nft_meta_target code, now that Arturo added the set operation
  to the meta expression, from me.

* Add help information for nf_tables to Kconfig, also from me.

* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
  available to other nf_tables objects, requested by Arturo, from me.

* Expose the table usage counter, so we can know how many chains are using
  this table without dumping the list of chains, from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:29:30 -05:00
Hannes Frederic Sowa 1e85c9b66d 8021q: make vlan_pcpu_stats visible without CONFIG_VLAN_8021Q
macvlan needs vlan_pcpu_stats so make it visible even if compiling
without VLAN_8021Q support. Otherwise a very long compiler error happens.

Fixes: cdf3e274cf ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-By: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:27:55 -05:00
David S. Miller 855404efae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:18:50 -05:00
Hauke Mehrtens b2395b8aea bcma: export bcma_find_core_unit()
This function is used to get a specific core when there is more than
one core of that specific type. This is used in bgmac to reset all GMAC
cores.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:25:19 -05:00
Li RongQing cdf3e274cf macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats
They are same, so unify them as one; since macvlan is a kind of vlan,
vlan_pcpu_stats should be a proper name for vlan and macvlan.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:11:33 -05:00
Li RongQing 8f84985fec net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:10:24 -05:00
David S. Miller 653864d9dd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and pci_regs.h.

Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.

Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.

Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().

Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.

Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware.  Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions.  Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled.  Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.

Jesse refactors the LED function to flash LED lights correctly.

v2:
 - removed patch 5 "i40e: add set settings and pauseparam" based on
   feedback from Ben Hutchings, will re-work that patch for later
   submission
 - Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
   address Or Gerlitz's questions and concerns.  This patch adds the
   implementation for the VXLAN ndo's and allows the hardware to do
   receive checksum offload for inner packets on the UDP ports that
   VXLAN notifies us about.
 - Added patch "i40e: using for_each_set_bit to simplify the code"
   from Wei Yongjun.  This patch uses for_each_set_bit() to simply
   the code.

v3:
 - fixed indentation issue in patch 11 based on feedback from
   Sergei Shtylyov.

Sorry for the delayed release of v4, it was delayed to the holidays.

v4:
 - Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
   while holding a spin lock in patch 6 by executing the AQ commands from
   a subtask.
 - Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
   option for i40e and wrapped appropriate code with the config option in
   patch 6.
 - Updated patch 7 based on the changes made in patch 6 in the above two
   bullets.

v5:
 - Added the patch to pci_regs.h based on David Miller's feedback to add
   PCI defines for speed and width
 - Updated patch 3 description to better explain the changes based on
   feedback from David Miller
 - Updated patch 4 to use the newly added defines to pci_regs.h instead
   of local defines
 - Updated patch 7 to use <net/vxlan.h> in the #include based on feedback
   from David Miller
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 19:50:35 -05:00