These defines are not used in the kernel, and they have duplicate
definitions under the RNDIS_* prefix.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 802_* network OIDs were duplicated, so let's merge them and
use the RNDIS_* prefixed definitions from the hyperV driver.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RNDIS protocol contains a vast number of Object ID:s (OIDs).
The current definitions had multiple definitions of these ID:s,
let's use the nicely RNDIS_*-prefixed defines from the HyperV
implementation, rename everywhere they're used, and copy+rename
the few that were missing from this list of objects.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RNDIS status codes are redefined with much stranged ifdeffery
and only one of these codes was used in the hyperv driver, and
there it is very clearly referring to the RNDIS variant, not some
other status. So clarify this by explictly using the RNDIS_*
prefixed status code in the hyperv drivera and delete the
duplicate defines.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a first step to consolidate the RNDIS implementations, break out
a common file with all the #defines and move it to <linux/rndis.h>.
This also deletes the immediate duplicated defines in the
<linux/rndis.h> file that yields a lot of compilation warnings.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The header file <linux/usb/rndis_host.h> used a number of #defines
that included the cpu_to_le32() macro to assure the result will be
in LE endianness. Inlining this into the code instead of using it
in the code definitions yields consolidation opportunities later
on as you will see in the following patches. The individual
drivers also used local defines - all are switched over to the
pattern of doing the conversion at the call sites instead.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.
http://queue.acm.org/detail.cfm?id=2209336
This AQM main input is no longer queue size in bytes or packets, but the
delay packets stay in (FIFO) queue.
As we don't have infinite memory, we still can drop packets in enqueue()
in case of massive load, but mean of CoDel is to drop packets in
dequeue(), using a control law based on two simple parameters :
target : target sojourn time (default 5ms)
interval : width of moving time window (default 100ms)
Based on initial work from Dave Taht.
Refactored to help future codel inclusion as a plugin for other linux
qdisc (FQ_CODEL, ...), like RED.
include/net/codel.h contains codel algorithm as close as possible than
Kathleen reference.
net/sched/sch_codel.c contains the linux qdisc specific glue.
Separate structures permit a memory efficient implementation of fq_codel
(to be sent as a separate work) : Each flow has its own struct
codel_vars.
timestamps are taken at enqueue() time with 1024 ns precision, allowing
a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
usec as base unit.
Selected packets are dropped, unless ECN is enabled and packets can get
ECN mark instead.
Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
tg3 drivers (BQL enabled).
Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
[ interval TIME ] [ ecn ]
qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
maxpacket 1514 ecn_mark 84399 drop_overlimit 0
CoDel must be seen as a base module, and should be used keeping in mind
there is still a FIFO queue. So a typical setup will probably need a
hierarchy of several qdiscs and packet classifiers to be able to meet
whatever constraints a user might have.
One possible example would be to use fq_codel, which combines Fair
Queueing and CoDel, in replacement of sfq / sfq_red.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an optimized boolean function to check if
2 ethernet addresses are the same.
This is to avoid any confusion about compare_ether_addr_64bits
returning an unsigned, and not being able to use the
compare_ether_addr_64bits function for sorting ala memcmp.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETHTOOL_GMODULEINFO returns a new struct ethtool_modinfo that will return the
type and size of plug-in module eeprom (such as SFP+) for parsing
by userland program.
ETHTOOL_GMODULEEEPROM returns the raw eeprom information
using the existing ethtool_eeprom structture to return the data
Signed-off-by: Stuart Hodgson <smhodgson@solarflare.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add a boolean function to check if 2 ethernet addresses
are the same.
This is to avoid any confusion about compare_ether_addr
returning an unsigned, and not being able to use the
compare_ether_addr function for sorting ala memcmp.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
can be used e.g. for ingress traffic policing or
to detect when a host/port consumes more bandwidth than expected.
This is done by optionally making cost to mean
"cost per 16-byte-chunk-of-data" instead of "cost per packet".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.
[ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the flags parameter to ipv6_find_hdr. This flags
allows us to:
* know if this is a fragment.
* stop at the AH header, so the information contained in that header
can be used for some specific packet handling.
This patch also adds the offset parameter for inspection of one
inner IPv6 header that is contained in error messages.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.
This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.
Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:
9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment
Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.
Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).
Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.
iptables -I PREROUTING -t raw -p tcp --dport 8888 \
-j CT --helper ftp
And you disabled the automatic helper assignment.
I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Conflicts:
drivers/net/ethernet/intel/e1000e/param.c
drivers/net/wireless/iwlwifi/iwl-agn-rx.c
drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
drivers/net/wireless/iwlwifi/iwl-trans.h
Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell. In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.
In e1000e a conflict arose in the validation code for settings of
adapter->itr. 'net-next' had more sophisticated logic so that
logic was used.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a somewhat generic framework for MDIO bus
multiplexers. It is modeled on the I2C multiplexer.
The multiplexer is needed if there are multiple PHYs with the same
address connected to the same MDIO bus adepter, or if there is
insufficient electrical drive capability for all the connected PHY
devices.
Conceptually it could look something like this:
------------------
| Control Signal |
--------+---------
|
--------------- --------+------
| MDIO MASTER |---| Multiplexer |
--------------- --+-------+----
| |
C C
h h
i i
l l
d d
| |
--------- A B ---------
| | | | | |
| PHY@1 +-------+ +---+ PHY@1 |
| | | | | |
--------- | | ---------
--------- | | ---------
| | | | | |
| PHY@2 +-------+ +---+ PHY@2 |
| | | |
--------- ---------
This framework configures the bus topology from device tree data. The
mechanics of switching the multiplexer is left to device specific
drivers.
The follow-on patch contains a multiplexer driven by GPIO lines.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add of_mdio_find_bus() which allows an mii_bus to be located given its
associated the device tree node.
This is needed by the follow-on patch to add a driver for MDIO bus
multiplexers.
The of_mdiobus_register() function is modified so that the device tree
node is recorded in the mii_bus. Then we can find it again by
iterating over all mdio_bus_class devices.
Because the OF device tree has now become an integral part of the
kernel, this can live in mdio_bus.c (which contains the needed
mdio_bus_class structure) instead of of_mdio.c.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neither compare_ether_addr() nor compare_ether_addr_64bits()
(as it can fall back to the former) have comparison semantics
like memcmp() where the sign of the return value indicates sort
order. We had a bug in the wireless code due to a blind memcmp
replacement because of this.
A cursory look suggests that the wireless bug was the only one
due to this semantic difference.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head. Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With multiple cards is hard to figure out which port caused trouble
int the layer2 routines (e.g. got a timeout).
Now we have the informations in the log output.
Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
For certification test it is very useful to change the layer1
timer3 value on runtime.
Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be full preemptiv safe, we cannot handle a L2 timeout in the timer
context itself, we should do all actions via the D-channel thread.
Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Transfer padding was wrong for full-speed USB in ASIX driver, fix
from Ingo van Lil.
2) Propagate the negative packet offset fix into the PowerPC BPF JIT.
From Jan Seiffert.
3) dl2k driver's private ioctls were letting unprivileged tasks make
MII writes and other ugly bits like that. Fix from Jeff Mahoney.
4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund.
5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and
Julian Anastasov.
6) Fix races and sleeping in locked context bugs in drop_monitor, from
Neil Horman.
7) Fix link status indication in smsc95xx driver, from Paolo Pisati.
8) Fix bridge netfilter OOPS, from Peter Huang.
9) L2TP sendmsg can return on error conditions with the socket lock
held, oops. Fix from Sasha Levin.
10) udp_diag should return meaningful values for socket memory usage,
from Shan Wei.
11) Eric Dumazet is so awesome he gets his own section:
Socket memory cgroup code (I never should have applied those
patches, grumble...) made erroneous changes to
sk_sockets_allocated_read_positive(). It was changed to
use percpu_counter_sum_positive (which requires BH disabling)
instead of percpu_counter_read_positive (which does not).
Revert back to avoid crashes and lockdep warnings.
Adjust the default tcp_adv_win_scale and tcp_rmem[2] values
to fix throughput regressions. This is necessary as a result
of our more precise skb->truesize tracking.
Fix SKB leak in netem packet scheduler.
12) New device IDs for various bluetooth devices, from Manoj Iyer,
AceLan Kao, and Steven Harms.
13) Fix command completion race in ipw2200, from Stanislav Yakovlev.
14) Fix rtlwifi oops on unload, from Larry Finger.
15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver.
From Stephane Fillod.
16) ehea driver registers it's IRQ before all the necessary state is
setup, resulting in crashes. Fix from Thadeu Lima de Souza
Cascardo.
17) Fix PHY connection failures in davinci_emac driver, from Anatolij
Gustschin.
18) Missing break; in switch statement in bluetooth's
hci_cmd_complete_evt(). Fix from Szymon Janc.
19) Fix queue programming in iwlwifi, from Johannes Berg.
20) Interrupt throttling defaults not being actually programmed into the
hardware, fix from Jeff Kirsher and Ying Cai.
21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from
Benjamin Poirier.
22) Fix blind status block RX producer pointer deref in TG3 driver, from
Matt Carlson.
23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de
Souza Cascardo.
24) Fix crashes in 6lowpan, from Alexander Smirnov.
25) tcp_complete_cwr() needs to be careful to not rewind the CWND to
ssthresh if ssthresh has the "infinite" value. Fix from Yuchung
Cheng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
sungem: Fix WakeOnLan
tcp: change tcp_adv_win_scale and tcp_rmem[2]
net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
drop_monitor: prevent init path from scheduling on the wrong cpu
usbnet: fix failure handling in usbnet_probe
usbnet: fix leak of transfer buffer of dev->interrupt
ucc_geth: Add 16 bytes to max TX frame for VLANs
net: ucc_geth, increase no. of HW RX descriptors
netem: fix possible skb leak
sky2: fix receive length error in mixed non-VLAN/VLAN traffic
sky2: propogate rx hash when packet is copied
net: fix two typos in skbuff.h
cxgb3: Don't call cxgb_vlan_mode until q locks are initialized
ixgbe: fix calling skb_put on nonlinear skb assertion bug
ixgbe: Fix a memory leak in IEEE DCB
igbvf: fix the bug when initializing the igbvf
smsc75xx: enable mac to detect speed/duplex from phy
smsc75xx: declare smsc75xx's MII as GMII capable
smsc75xx: fix phy interrupt acknowledge
smsc75xx: fix phy init reset loop
...
This patch adds support for a skb_head_is_locked helper function. It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag. If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed. When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.
While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf
Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.
ER is enabled by sysctl_tcp_early_retrans:
0: Disables ER
1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.
2: (Default) reduce dupthresh like mode 1. In addition, delay
entering fast recovery by RTT/4.
Note: mode 2 is implemented in the third part of this patch series.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ECN (Explicit Congestion Notification) marking capability to netem
tc qdisc add dev eth0 root netem drop 0.5 ecn
Instead of dropping packets, try to ECN mark them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove useless casts and rename variables for less confusion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.
The implementation is derived from ipv6/raw and ipv4/l2tp_ip.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checkpatch warns about the use of __attribute__((packed)). So use the
recommended __packed syntax instead.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.
We 'upgrade' skb->head as a fragment in itself
This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.
This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.
This is a followup of patch "net: allow skb->head to be a page fragment"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.
We have three spots were it hurts :
1) GRO aggregation
When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.
2) splice(socket -> pipe).
We must copy the linear part to a page fragment.
This kind of defeats splice() purpose (zero copy claim)
3) TCP coalescing.
Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)
Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.
build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.
Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.
This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJPnjtXAAoJEDeqqVYsXL0MNWkIAMYc5n06Ac/gdLY8xIIYD6GA
OzLd/NQ2Q4RqZJsX26OueX2ZjMRYDixBAmiw837K9PvrLUHGfMAaninNyzDgVZ3R
pbIK45OylIQrvMAl/R7ZzqihfxJjg5utqEUt1M2qv/ikkBJ8+zWAJTQKxRYJ1OAW
9QDhIP986hljNyZfDuqXBvA4XkngYeCO0WLjBOeYCseSzeV8vOLpmsD/ANHDcqA6
Ct4uM4KJWF4jHz3sN3w5T3morNwvh42onNcy++911HcH5Nc9bkOqBWQAZ5RtINp9
8rkUz3OtCCQpZz6ffF3ZXJaopuU3ELfP80t03OScr2T75SxLqLf6mFYVAcR7RpQ=
=nXrE
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of SAS and SATA fixes; there are one or two longstanding
bug fixes, but most of this is regression fixes."
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] libfc: update mfs boundry checking
[SCSI] Revert "[SCSI] libsas: fix sas port naming"
[SCSI] libsas: fix false positive 'device attached' conditions
[SCSI] libsas, libata: fix start of life for a sas ata_port
[SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
[SCSI] libsas: unify domain_device sas_rphy lifetimes
[SCSI] libsas: fix sas_get_port_device regression
[SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys
[SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
[SCSI] libata: Pass correct DMA device to scsi host
[SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue
More recent versions of the UEFI spec have added new attributes for
variables. Add them.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.
When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own. The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).
End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time. You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.
NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops. Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.
The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes). But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org # needed for systemd/autofs interaction fix
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here are a number of small USB fixes for 3.4-rc5.
Nothing major, as before, some USB gadget fixes. There's a crash fix
for a number of ASUS laptops on resume that had been reported by a
number of different people. We think the fix might also pertain to
other machines, as this was a BIOS bug, and they seem to travel to
different models and manufacturers quite easily. Other than that, some
other reported problems fixed as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk+dZ7kACgkQMUfUDdst+ylxwACdFS4V69bnBi2nw9spEYClh+FB
SToAoLUFsDiTDexIMIMWZkCyq+bqVLG+
=Pzvi
-----END PGP SIGNATURE-----
Merge tag 'usb-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are a number of small USB fixes for 3.4-rc5.
Nothing major, as before, some USB gadget fixes. There's a crash fix
for a number of ASUS laptops on resume that had been reported by a
number of different people. We think the fix might also pertain to
other machines, as this was a BIOS bug, and they seem to travel to
different models and manufacturers quite easily. Other than that,
some other reported problems fixed as well."
* tag 'usb-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: udc-core: fix incompatibility with dummy-hcd
usb: gadget: udc-core: fix wrong call order
USB: cdc-wdm: fix race leading leading to memory corruption
USB: EHCI: fix crash during suspend on ASUS computers
usb gadget: uvc: uvc_request_data::length field must be signed
usb: gadget: dummy: do not call pullup() on udc_stop()
usb: musb: davinci.c: add missing unregister
usb: musb: drop __deprecated flag
USB: gadget: storage gadgets send wrong error code for unknown commands
usb: otg: gpio_vbus: Add otg transceiver events and notifiers
Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6.
Support has been tested with and without hardware offloading. This
version fixes the L2TP over localhost issue with incorrect checksums
being reported.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing controversial, just another batch of fixes:
- Samsung/exynos fixes for more merge window fallout: build errors and
warnings mostly, but also some clock/device setup issues on exynos4/5
- PXA bug and warning fixes related to gpio and pinmux
- IRQ domain conversion bugfixes for U300 and MSM
- A regulator setup fix for U300
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPmzKwAAoJEIwa5zzehBx36jMP/3D/oE3AkqI/fhulDU2g8KmO
9/dVGOsdeKWqH14xcjc79l1V0aX5dtgqRIBQ/c13LetKhpyxY7J+ZVQml2c1jN7G
+h3DhmA+EzRUxXNuC8Iut6UgoITx+Rk8aGu6s1gD/osn9Ynm3B4+xxjhcEf0cZkD
aWq7GLipBa8uOd2ryrGftW1BS+tgf+Tcmc6yCxebbv5ARA5lJQRPuP+YgnYNyhzU
f7SCgkt6IpgcCcxAAzzgWLeFoPK0FfsOUXTkzLYOhJpOrkn82g5MWXDhe1tGddUr
JqJpYoIEwNzDcsEcImWDuoSFoaqgO6GiOU+8Y/tx6KNVnya9520JAoxBjU1EqgnV
+jHNHW9JaZs/oc7LbJ774NuV5YZ2XDhpZm1Cc0dBT2oDh2oFmeaXIt6cSA4jGYpm
4OP7BMDuxBdLlrKk8coDXCUT21HNsgRSkyx6Ic6dVNxvz86sKKE8FMj5m5c82GBk
n+1C+jCEs6rJi7e/ddCNMa6P0R9my5lq40S7WwOXDPw6YeetYPs5QYz7GJhM+n7z
f4uT20ggm1qmUWHVtAA2MiuTWlinSEFThRDS7+yRNioTNEfvcT5LmOTYt1UAtFQj
f9StJUgkBuoB+kuUNDKPtNqy5dT3y6d9L+9gbCIpalxwBOLs/s+0k8nHTYeu/lZ5
Ir56+pqGpr+7TCuBg28N
=V4y6
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Nothing controversial, just another batch of fixes:
- Samsung/exynos fixes for more merge window fallout: build errors
and warnings mostly, but also some clock/device setup issues on
exynos4/5
- PXA bug and warning fixes related to gpio and pinmux
- IRQ domain conversion bugfixes for U300 and MSM
- A regulator setup fix for U300"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: PXA2xx: MFP: fix potential direction bug
ARM: PXA2xx: MFP: fix bug with MFP_LPM_KEEP_OUTPUT
arm/sa1100: fix sa1100-rtc memory resource
ARM: pxa: fix gpio wakeup setting
ARM: SAMSUNG: add missing MMC_CAP2_BROKEN_VOLTAGE capability
ARM: EXYNOS: Fix compilation error when CONFIG_OF is not defined
ARM: EXYNOS: Fix resource on dev-dwmci.c
ARM: S3C24XX: Fix build warning for S3C2410_PM
ARM: mini2440_defconfig: Fix build error
ARM: msm: Fix gic irqdomain support
ARM: EXYNOS: Fix incorrect initialization of GIC
ARM: EXYNOS: use 'exynos4-sdhci' as device name for sdhci controllers
ARM: u300: bump all IRQ numbers by one
ARM: ux300: Fix unimplementable regulation constraints
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPmubjAAoJEEFnBt12D9kBPZ4P/2wrFqNy2+Fxatstmb1GKwqW
Yx0acIZkJK/gfZetFDSiTzM8bcxp/oz2lfqHE7xFq96s36wSZyO9B9Ry+YiD9x4g
wzN/Gb7UufnuSEEJo19E13sJxMXJ/4dVFHKtJnxSL3k9lC6nO/YkEn7C2OXE9gXy
d3a8H2jyGK9CPPHzQVxDP+9vYj8RpZcXz1M49H8dZ7UZblPfHlvPNzsclKbuTiFK
8zg16sX01BjvbawwMuAuJH/SGytKpM7WtlzDni/OplqX3CDjIoNZ8BEqDgXDVcS8
dwhmhPEf89v5ctWhaDHlskwAkL8oFjwFAHMZ7pjiIZ1aTxPPEKtyegVqPKoQwQmK
XCbjb9SqFNRXI2nIkbeX1xGHQS1wngRcfWJ13qLDEDi3d3aGLUJDYTtMHPrjvNnI
kzqbNC6fFuKZNOaDYMb1nqV3oWa8VLlDhXgEn5K8ZzBInZ/12LsqybRqImwLQnN0
flRCcOWApbHmJB64IAI1NzbG5Y6FxG53lwW+bwkw83W1fWw1WaUNNvJ8HBfDoN3G
QflSqY8TtRE5vGvTburRbDIF3YBeGsaPDTksIg2t82PZ1SabUfIhpWRbhzS6YvMH
NiKlgk88Q3VHLUF4TOxDThXZEJpL4VcJWOV0/Y4eSDNNhOtBKfGMoo9EBCfLwevM
q6EyapSjiDz0pu9jUN/P
=V4y2
-----END PGP SIGNATURE-----
Merge tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6
Pull misc SPI device driver bug fixes from Grant Likely.
* tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6:
spi/spi-bfin5xx: Fix flush of last bit after each spi transfer
spi/spi-bfin5xx: fix reversed if condition in interrupt mode
spi/spi_bfin_sport: drop bits_per_word from client data
spi/bfin_spi: drop bits_per_word from client data
spi/spi-bfin-sport: move word length setup to transfer handler
spi/bfin5xx: rename config macro name for bfin5xx spi controller driver
spi/pl022: Allow request for higher frequency than maximum possible
spi/bcm63xx: set master driver mode_bits.
spi/bcm63xx: don't use the stopping state
spi/bcm63xx: convert to the pump message infrastructure
spi/spi-ep93xx.c: use dma_transfer_direction instead of dma_data_direction
spi: fix spi.h kernel-doc warning
spi/pl022: Fix calculate_effective_freq()
spi/pl022: Fix range checking for bits per word
Fix kernel-doc warning in spi.h (copy/paste):
Warning(include/linux/spi/spi.h:365): No description found for parameter 'unprepare_transfer_hardware'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
In 3.3, gpio wakeup setting was broken. The call
enable_irq_wake() didn't set up the PXA gpio registers
(PWER, ...) anymore.
Fix it at least for pxa27x. The driver doesn't seem to be
used in pxa25x (weird ...), and the fix doesn't extend to
pxa3xx and pxa95x (which don't have a gpio_set_wake()
available).
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Merge fixes from Andrew Morton:
"13 fixes. The acerhdf patches aren't (really) fixes. But they've
been stuck in my tree for up to two years, sent to Matthew multiple
times and the developers are unhappy."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (13 patches)
mm: fix NULL ptr dereference in move_pages
mm: fix NULL ptr dereference in migrate_pages
revert "proc: clear_refs: do not clear reserved pages"
drivers/rtc/rtc-ds1307.c: fix BUG shown with lock debugging enabled
arch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file
hugetlbfs: lockdep annotate root inode properly
acerhdf: lowered default temp fanon/fanoff values
acerhdf: add support for new hardware
acerhdf: add support for Aspire 1410 BIOS v1.3314
fs/buffer.c: remove BUG() in possible but rare condition
mm: fix up the vmscan stat in vmstat
epoll: clear the tfile_check_list on -ELOOP
mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma
Don't pick __u8/__u16 values directly from raw pointers, but instead use
an array of structures of code:value pairs. This is OK, since the buffer
we take options from is not an skb memory, but a user-to-kernel one.
For those options which don't require any value now, require this to be
zero (for potential future extension of this API).
v2: Changed tcp_repair_opt to use two __u32-s as spotted by David Laight.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>