UDP encapsulation is broken on IPv6. This is because the logic to resubmit
the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
the resubmit label is in the wrong position since we already get the
nexthdr value when performing decapsulation. In addition the skb pull is no
longer necessary either.
This changes the return value check to look for < 0, using it for the
nexthdr on the next iteration, and moves the resubmit label to the proper
location.
With these changes the v6 code now matches what we do in the v4 ip input
code wrt resubmitting when decapsulating.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: "Tom Herbert" <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory pointed to by idev->stats.icmpv6msgdev,
idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
read context without taking a reference on idev. For example, through
IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
waiting for an RCU grace period to elapse. This could lead to the
memory being written to after it has been freed.
Fix this by using call_rcu to free the memory used for stats, as well
as idev after an RCU grace period has elapsed.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
breakage on BeagleBones:
- BeagleBones don't support RTC-only mode, it can cause hardware
damage if system-power-controller is specified without
ti,pmic-shutdown-controller
- Fix a recent regression to am3517 SoCs caused by the recent clock
move that was not noticed until now despite automated boot
testing
- Fix a regression for n900 touchscreen triggered by recent
recent input changes
- Fix compatible property for dm816x USB to avoid errors with
USB Ethernet
- Fix oops for omap3 when built with CONFIG_THUMB2_KERNEL
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVbLwXAAoJEBvUPslcq6VzGnoQANbB1fMm74/aMuyVW4LoErio
ii83kk2pX8EWbePle0rUrrre55LzRhJRr2Mcj4bSniafNqkYr36dQrxErnqwmBtA
XawjoVPQB3mG/tBD0oVkzmJtaAXW3GA8IkmQrVe4jUqCh7AjnHYZ5IYjFtGxbJey
oyHI48jcxQE1hhNfeTwHOlLhIIPGpRfdE8vYWOlM+rvm/7ZmKCNmnfZzx0XAyLjq
rXw3IEgyIMbrbHy8fvdE/t2paWV+kb7urVzS/eu7Zn60CpZ9gwWFz4uENvvN2mDk
L78Jz6uxxNrSmGCY+A1LBNWdt7KgiK1GqX/NkI9yc3vvkaJ/aYUh/1zae3pcofpY
HNPcGWNAszDpP1xxCvwhNdJWaKWytZbHadTuVcyU86bKnEiu8Ph3Nh//EizNk/fK
gpSEzRNPP8oVKY3iUIwtG8CfeiKZHI3EjyYTYM+Z9wg0OMpospX4A6VAiyUpuNO+
DeuAMGC46OhxOperErl4R+qomx2nf7d2FLvJnes/cp5sxM97Qeu7XYqzoqKVyJyU
uLKNKmRS71Q1yddQKBVT5nCh5lTw/Mm+qCTHmeelRj52HbtxoT44EzdOr/OhY/0z
td07Y+B1SANLrq5r/tBPBrYc0imJPKzD9Woyab7PASE0KxhNqkyAG0zA/9b/zLYo
cqnk7D22Hs5ZOm4LyY3Q
=Saui
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v4.1/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Merge omap fixes for v4.1, urgent fix to avoid potential hardware damage From Tony Lindgren:
Omap fixes for the -rc cycle, including a fix for potential hardware
breakage on BeagleBones:
- BeagleBones don't support RTC-only mode, it can cause hardware
damage if system-power-controller is specified without
ti,pmic-shutdown-controller
- Fix a recent regression to am3517 SoCs caused by the recent clock
move that was not noticed until now despite automated boot
testing
- Fix a regression for n900 touchscreen triggered by recent
recent input changes
- Fix compatible property for dm816x USB to avoid errors with
USB Ethernet
- Fix oops for omap3 when built with CONFIG_THUMB2_KERNEL
* tag 'omap-for-v4.1/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am335x-boneblack: disable RTC-only sleep to avoid hardware damage
ARM: dts: AM35xx: fix system control module clocks
ARM: dts: Fix n900 dts file to work around 4.1 touchscreen regression on n900
ARM: dts: Fix dm816x to use right compatible flag for MUSB
ARM: OMAP3: Fix booting with thumb2 kernel
Pull Intel IOMMU fix from David Woodhouse:
"This fixes an oops when attempting to enable 1:1 passthrough mode for
devices on which VT-d translation was disabled anyway.
It's actually a long-standing bug but recent changes (commit
18436afdc11a: "iommu/vt-d: Allow RMRR on graphics devices too") have
made it much easier to trigger with 'iommu=pt intel_iommu=igfx_off' on
the command line"
* git://git.infradead.org/intel-iommu:
iommu/vt-d: Fix passthrough mode with translation-disabled devices
Pull libata fixes from Tejun Heo:
"Two driver fixes. One is for an ahci_mvebu controller config bug and
the other fixes pata_octeon_cf build issue"
* 'for-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
pata_octeon_cf: fix broken build
ata: ahci_mvebu: Fix wrongly set base address for the MBus window setting
MODULE_DEVICE_TABLE is referring to wrong driver's table and breaks the
build. Fix that.
Cc: stable@vger.kernel.org
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Jaeden Amero says:
====================
net/phy: micrel: Center FLP timing at 16ms
In v2, we add an additional cleanup commit to make an array of strings
static const and to improve const correctness generally. We also no longer
unnecessarily initialize the result variable in
ksz9031_center_flp_timing().
In v3, we remove the unnecessary result variable from ksz9031_config_init()
introduced by a previous version of "net/phy: micrel: Center FLP timing at
16ms".
In v4, we modify the commit message of "net/phy: micrel: Center FLP timing
at 16ms" to replace the awkward quotation of the data sheet's programming
procedure with an explanation of why we program the FLP burst registers and
restart auto-negotiation where we do (config_init).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Link failures have been observed when using the KSZ9031 with HP 1810-8G
and HP 1910-8G network switches. Center the FLP timing at 16ms to help
avoid intermittent link failures.
>From the KSZ9031RNX and KSZ9031MNX data sheets revision 2.2, section
"Auto-Negotiation Timing":
The KSZ9031[RNX or MNX] Fast Link Pulse (FLP) burst-to-burst
transmit timing for Auto-Negotiation defaults to 8ms. IEEE 802.3
Standard specifies this timing to be 16ms +/-8ms. Some PHY link
partners need to receive the FLP with 16ms centered timing;
otherwise, there can be intermittent link failures and long
link-up times.
The PHY data sheet recommends configuring the FLP burst registers after
power-up/reset and immediately thereafter restarting auto-negotiation, so
we center the FLP timing at 16ms and then restart auto-negotiation in the
config_init for KSZ9031.
Signed-off-by: Jaeden Amero <jaeden.amero@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some defines for a few pad skew related extended registers.
Specify for which MMD Address (dev_addr) they are for.
Signed-off-by: Jaeden Amero <jaeden.amero@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a few places in this driver, we weren't using const where we could
have. Use const more.
In addition, change the arrays of strings in ksz9031_config_init() to be
not only const, but also static.
Signed-off-by: Jaeden Amero <jaeden.amero@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Alexander Duyck pointed out that:
struct tnode {
...
struct key_vector kv[1];
}
The kv[1] member of struct tnode is an arry that refernced by
a null pointer will not crash the system, like this:
struct tnode *p = NULL;
struct key_vector *kv = p->kv;
As such p->kv doesn't actually dereference anything, it is simply a
means for getting the offset to the array from the pointer p.
This patch make the code more regular to avoid making people feel
odd when they look at the code.
Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
API compliance scanning with coccinelle flagged:
./drivers/net/wan/dscc4.c:1036:1-33:
WARNING: timeout (10) seems HZ dependent
./drivers/net/wan/dscc4.c:554:2-34:
WARNING: timeout (10) seems HZ dependent
./drivers/net/wan/dscc4.c:599:2-34:
WARNING: timeout (10) seems HZ dependent
Numeric constants passed to schedule_timeout_*() make the effective
timeout HZ dependent which does not seem to be the intent here.
Fixed up by converting the constant to jiffies with msecs_to_jiffies(),
passing 100ms (assuming HZ==100 in the original code).
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
API compliance scanning with coccinelle flagged:
./drivers/net/wan/cosa.c:520:2-18: WARNING:
timeout (30) seems HZ dependent
Numeric constants passed to schedule_timeout() make the effective
timeout HZ dependent which makes little sense in a device probe.
Fixed up by converting the constant to jiffies with msecs_to_jiffies()
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix static checker warnings in the flow of system guid query.
Fixes: 707c4602cd ('net/mlx5_core: Add new query HCA vport commands')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver gets unregistered a call to netif_napi_del() was
missing, this all was also missing in the error paths of
b44_init_one().
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
In big endian cases, macro cpu_to_le16 unfolds to __swab16 which
provides special case for constants. In little endian cases,
__constant_cpu_to_le16 and cpu_to_le16 expand directly to the
same expression. So, replace __constant_cpu_to_le16 with
cpu_to_le16 with the goal of getting rid of the definition of
__constant_cpu_to_le16 completely.
The semantic patch that performs this transformation is as follows:
@@expression x;@@
- __constant_cpu_to_le16(x)
+ cpu_to_le16(x)
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.
Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.
Fixes: 03c57747a7 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use time_is_before_eq_jiffies macro for time comparison
Signed-off-by: Antonio Murdaca <antonio.murdaca@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS updates from Ralf Baechle:
"Eight fixes across arch/mips. Nothing stands particuarly out nor is
complicated but fixes keep coming in at a higher than comfortable
rate"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: KVM: Do not sign extend on unsigned MMIO load
MIPS: BPF: Fix stack pointer allocation
MIPS: Loongson-3: Fix a cpu-hotplug issue in loongson3_ipi_interrupt()
MIPS: Fix enabling of DEBUG_STACKOVERFLOW
MIPS: c-r4k: Fix typo in probe_scache()
MIPS: Avoid an FPE exception in FCSR mask probing
MIPS: ath79: Add a missing new line in log message
MIPS: ralink: Fix clearing the illegal access interrupt
There are several places in the driver (all in control paths) where
coherent dma memory is being allocated using either dma_alloc_coherent()
or the deprecated pci_alloc_consistent(). All these calls should be
changed to use dma_zalloc_coherent() to avoid uninitialized fields in
data structures backed by this memory.
Reported-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current driver adjust freq formula is:
fe * diff = ppb * pc
Note:
fe: ENET ref clock frequency in Hz
diff = inc_corr - inc: difference between default increment and correction increment
ppb: parts per billion adjustment from base
pc: correction period (in number of fe clock cycles)
The correction increment will be used after N cycles of regular increments,
not every N cycles (with N being the correction period). For example, set ENET_ATCOR=4,
INC=8, INC_CORR=9, there will be 4 increments of 8 (ENET_ATINC[INC]) , followed by 1
increment of 9 (ENET_ATINC[INC_CORR]).
So, the correct formula is:
fe * diff = ppb * (pc + 1)
For ENET_ATCOR, a value 0 disables the correction counter and no corrections occur.
So base on the origin formula, set pc = pc > 1 ? pc - 1 : pc.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove sparse warnings:
drivers/net/ethernet/xilinx/ll_temac_main.c:65:16: warning: cast removes
address space of expression
drivers/net/ethernet/xilinx/ll_temac_main.c:70:9: warning: cast removes
address space of expression
drivers/net/ethernet/xilinx/ll_temac_main.c:127:16: warning: cast
removes address space of expression
drivers/net/ethernet/xilinx/ll_temac_main.c:137:9: warning: cast removes
address space of expression
drivers/net/ethernet/xilinx/ll_temac_main.c:409:3: warning: symbol
'temac_options' was not declared. Should it be static?
drivers/net/ethernet/xilinx/ll_temac_main.c:590:6: warning: symbol
'temac_adjust_link' was not declared. Should it be static?
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 and IPv6 share same implementation of get_cookie_sock(),
and there is no point inlining it.
We add tcp_ prefix to the common helper name and export it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump useful ring statistics along with interrupt status, software
maintained pointers and hardware registers to help troubleshoot TX queue
stalls.
When a timeout occurs, disable TX NAPI for the rings, dump their states
while interrupts are disabled, re-enable interrupts, NAPI and queue flow
control to help with the recovery.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool -S on a DSA interface can deadlock for some switches because
the same lock is taken twice. Use the register read function which
expects the lock to be already held.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 31888234b7 ("net: dsa: mv88e6xxx: Replace stats mutex with SMI mutex")
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC address of the soft-interface is used to initialise
the "non-purge" TT entry of each existing VLAN. Therefore
when the user invokes ndo_set_mac_address() all the
"non-purge" TT entries have to be updated, not only the one
belonging to the non-tagged network.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
The header files could not be build indepdent from each other. This is
happened because headers didn't include the files for things they've used.
This was problematic because the success of a build depended on the
knowledge about the right order of local includes.
Also source files were not including everything they've used explicitly.
Instead they required that transitive includes are always stable. This is
problematic because some transitive includes are not obvious, depend on
config settings and may not be stable in the future.
The order for include blocks are:
* primary headers (main.h and the *.h file of a *.c file)
* global linux headers
* required local headers
* extra forward declarations for pointers in function/struct declarations
The only exceptions are linux/bitops.h and linux/if_ether.h in packet.h.
This header file is shared with userspace applications like batctl and must
therefore build together with userspace applications. The header
linux/bitops.h is not part of the uapi headers and linux/if_ether.h
conflicts with the musl implementation of netinet/if_ether.h. The
maintainers rejected the use of __KERNEL__ preprocessor checks and thus
these two headers are only in main.h. All files using packet.h first have
to include main.h to work correctly.
Reported-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
This API has to be used to let any routing protocol free
neighbor specific allocated resources
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Some mesh attributes are behind substructs in the
batadv_priv object and for this reason the name cannot be
used anymore to refer to them.
This patch allows to specify the variable name where the
attribute is stored inside batadv_priv instead of using the
name
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
An unoptimized version of the Jenkins one-at-a-time hash function is used
and partially copied all over the code wherever an hashtable is used.
Instead the optimized version shared between the whole kernel should be
used to reduce code duplication and use better optimized code.
Only the DAT code must use the old implementation because it is used as
distributed hash function which has to be common for all nodes.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
allow programs read/write skb->mark, tc_index fields and
((struct qdisc_skb_cb *)cb)->data.
mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.
All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.
Add verifier tests and improve sample code.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt->func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.
This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
w/o JIT w/JIT
before 20.5 23.6 Mpps
after 21.8 26.6 Mpps
Old programs with BPF_LL_OFF will still work as-is.
We can now undo most of the earlier workaround commit:
a166151cbe ("bpf: fix bpf helpers to use skb->mac_header relative offsets")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For same reasons than in commit 12e25e1041 ("tcp: remove redundant
checks"), we can remove redundant checks done for timewait sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debug is printing the struct smt_header * address using
the %x format specifier. Fix it to use %p instead.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the Tx timer function runs in softirq context the driver needs
to call disable_irq_nosync instead of a disable_irq.
Reported-by: Josh Stone <jistone@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix:
drivers/net/wan/dscc4.c: In function 'dscc4_open':
drivers/net/wan/dscc4.c:1049:25: warning: variable 'ppriv' set but not used
[-Wunused-but-set-variable]
This has been in there unused since 1da177e4c3 (Linux-2.6.12-rc2) simply
remove it.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
rhashtable uses EXPORT_SYMBOL_GPL() without importing linux/export.h
directly it is only imported indirectly through some other includes.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).
As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())
With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.
This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.
The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.
This new feature is available for both IPv4 and IPv6 (Thanks Neal)
Tested:
Wrote a test program and checked its behavior on IPv4 and IPv6.
strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
IPv6 test :
socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.
lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets
lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66
Check that a given source port is indeed used by many different
connections :
lpaa23:~# ss -t src :40000 | head -10
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.2:40000 127.0.202.33:44983
ESTAB 0 0 127.0.0.2:40000 127.2.27.240:44983
ESTAB 0 0 127.0.0.2:40000 127.2.98.5:44983
ESTAB 0 0 127.0.0.2:40000 127.0.124.196:44983
ESTAB 0 0 127.0.0.2:40000 127.2.139.38:44983
ESTAB 0 0 127.0.0.2:40000 127.1.59.80:44983
ESTAB 0 0 127.0.0.2:40000 127.3.6.228:44983
ESTAB 0 0 127.0.0.2:40000 127.0.38.53:44983
ESTAB 0 0 127.0.0.2:40000 127.1.197.10:44983
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are 2 fixes for the driver core that resolve some reported issues,
one is a regression from 4.0, the other a fixes a reported oops that has
been there since 3.19. Both have been in linux-next for a while with no
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlVzguAACgkQMUfUDdst+yltoQCgokCbKeHXhGu+31KjYboiXkhk
5ikAnRZKyFI8HKr+B9inecb/cMD0jhvR
=uVNu
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two fixes for the driver core that resolve some reported
issues.
One is a regression from 4.0, the other a fixes a reported oops that
has been there since 3.19.
Both have been in linux-next for a while with no problems"
* tag 'driver-core-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers/base: cacheinfo: handle absence of caches
drivers: of/base: move of_init to driver_init
Here are some IIO driver fixes to resolve reported issues, some ozwpan
fixes for some reported CVE problems, and a rtl8712 driver fix for a
reported regression.
All have been in linux-next successfully.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlVzg8QACgkQMUfUDdst+ymYUACeP3xl35WYayROqjrEjooQDHOP
EgoAoJqg4TSH5yaQ75nUd3PCa8/Xmu5c
=Uuk9
-----END PGP SIGNATURE-----
Merge tag 'staging-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging / IIO fixes from Greg KH:
"Here are some IIO driver fixes to resolve reported issues, some ozwpan
fixes for some reported CVE problems, and a rtl8712 driver fix for a
reported regression.
All have been in linux-next successfully"
* tag 'staging-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8712: fix stack dump
ozwpan: unchecked signed subtraction leads to DoS
ozwpan: divide-by-zero leading to panic
ozwpan: Use unsigned ints to prevent heap overflow
ozwpan: Use proper check to prevent heap overflow
iio: adc: twl6030-gpadc: Fix modalias
iio: adis16400: Fix burst transfer for adis16448
iio: adis16400: Fix burst mode
iio: adis16400: Compute the scan mask from channel indices
iio: adis16400: Use != channel indices for the two voltage channels
iio: adis16400: Report pressure channel scale
Here are a few TTY and Serial driver fixes for reported regressions and
crashes. All of these have been in linux-next with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlVzhBsACgkQMUfUDdst+yk3vACgok97fBbPTarm5Xw7yZAbu3tD
6twAmwQC2GwyyAoZz3HTjK0NQnfBIm8N
=j8nR
-----END PGP SIGNATURE-----
Merge tag 'tty-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are a few TTY and Serial driver fixes for reported regressions
and crashes.
All of these have been in linux-next with no reported problems"
* tag 'tty-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
n_tty: Fix auditing support for cannonical mode
serial: 8250_omap: provide complete custom startup & shutdown callbacks
n_tty: Fix calculation of size in canon_copy_from_read_buf
serial: imx: Fix DMA handling for IDLE condition aborts
serial/amba-pl011: Unconditionally poll for FIFO space before each TX char
Here are some USB and PHY driver fixes that resolve some reported
regressions. Also in here are some new device ids. All of the details
are in the shortlog and these patches have been in linux-next with no
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlVzhIcACgkQMUfUDdst+ykY9ACg0HeMGzJAfsWSLiSsBRpDxFAr
ixAAnjOYj4Gv+XtAD4ZiOuMpaQaADntQ
=4uxU
-----END PGP SIGNATURE-----
Merge tag 'usb-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and PHY driver fixes from Greg KH:
"Here are some USB and PHY driver fixes that resolve some reported
regressions. Also in here are some new device ids.
All of the details are in the shortlog and these patches have been in
linux-next with no problems"
* tag 'usb-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle
usb: renesas_usbhs: Don't disable the pipe if Control write status stage
usb: renesas_usbhs: Fix fifo unclear in usbhsf_prepare_pop
usb: gadget: f_fs: fix check in read operation
usb: musb: fix order of conditions for assigning end point operations
usb: gadget: f_uac1: check return code from config_ep_by_speed
usb: gadget: ffs: fix: Always call ffs_closed() in ffs_data_clear()
usb: gadget: g_ffs: Fix counting of missing_functions
usb: s3c2410_udc: correct reversed pullup logic
usb: dwc3: gadget: Fix incorrect DEPCMD and DGCMD status macros
usb: phy: tahvo: Pass the IRQF_ONESHOT flag
usb: phy: ab8500-usb: Pass the IRQF_ONESHOT flag
usb: renesas_usbhs: Revise the binding document about the dma-names
usb: host: xhci: add mutex for non-thread-safe data
usb: make module xhci_hcd removable
USB: serial: ftdi_sio: Add support for a Motion Tracker Development Board
usb: gadget: f_midi: fix segfault when reading empty id
phy: phy-rcar-gen2: Fix USBHS_UGSTS_LOCK value
phy: omap-usb2: invoke pm_runtime_disable on error path
phy: fix Kconfig dependencies
...
Stupid typo fix for v4.1. One of the IS_ENABLED() macro calls forgot the
CONFIG_ prefix. Only affects a tiny number of platforms, but still...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVc32nAAoJEMWQL496c2LNKQ8P/0Xqqe66ts3/J3VsIaOkApqG
ZlS+mPCvltFKfSMHBuOLM3EC8lAmc6rntokHQW6rn0yXLCsuQNOmB6mqYuEjmIdL
cQqTZkwtwkD/pbyXxyVctPPet2dAUSPtYsHrzhMbY1kUJN28IkoCMDknfDtt4RaW
mKsZGwTRGDxS66XdCqKFgrCk2G1MfVZX24gXEczKLB/zGRo9mWIIWJlaPevfMjTw
k6NoiHTuKEbqtJtM/5WKjrTvIWBiGUjNqFS74ME7o7zIJ0KK1fE37Yk/zz2s06cc
NmopyGsvN/noU10txr1HNMXs4cqGbxbrLbZfePqwZvOm2CadoCVW1R6xnY60c3lN
BLsgBFcu0x/ET0aAZR5SB5bFAC39nQWR56FfFnLUDkjjrRcH8Meja0LihHSbKg//
R6U9Z5m+tOu3opY6gLTa0zeiogzvxvjZ+f0RS6gkQkIFd3n/3FgAlBbXqyOf0NzP
UegpqwaNu/vlHUJX4AWQ/+8onxYLZB3y286ITAcdXRdGDatKdLjQdopSImpEz5dW
bOnyggb3hnF7ObQMcEAZ57zM4sNtlpunCftMChHRxkTdUFARJZZHCpYp8Qm9hRWk
4kAudnhGyBJqvtS+zNMVmFifQhneyqr4v6vd8RDyybGYYEUfIAIv66xEQhBQK2GH
Lldj06m8EVU+oeoS77bY
=5Fmj
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
Pull devicetree fix from Grant Likely:
"Stupid typo fix for v4.1. One of the IS_ENABLED() macro calls forgot
the CONFIG_ prefix. Only affects a tiny number of platforms, but
still..."
* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
of/dynamic: Fix test for PPC_PSERIES
Pull drm fixes from Dave Airlie:
"i915 has a bunch of fixes, and Russell found a bug in sysfs writing
handling that results in userspace getting stuck"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: fix writing to /sys/class/drm/*/status
drm/i915: Move WaBarrierPerformanceFixDisable:skl to skl code from chv code
drm/i915: Include G4X/VLV/CHV in self refresh status
drm/i915: Initialize HWS page address after GPU reset
drm/i915: Don't skip request retirement if the active list is empty
drm/i915/hsw: Fix workaround for server AUX channel clock divisor