WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
David S. Miller	394efd19d5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 13:48:30 -05:00
Eric Dumazet	74d332c13b	net: extend net_device allocation to vmalloc() Joby Poriyath provided a xen-netback patch to reduce the size of xenvif structure as some netdev allocation could fail under memory pressure/fragmentation. This patch is handling the problem at the core level, allowing any netdev structures to use vmalloc() if kmalloc() failed. As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Joby Poriyath <joby.poriyath@citrix.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-03 23:19:00 -05:00
Oliver Hartkopp	51b2f451b5	can: add broadcast manager documentation This patch adds documentation about the broadcast manager. It's based on Brian Thorne's initial patch http://marc.info/?l=linux-can&m=138119382015496&w=2 and Daniele Venzano's work http://brownhat.org/docs/socketcan.html . Signed-off-by: Brian Thorne <hardbyte@gmail.com> Cc: Daniele Venzano <linux@brownhat.org> Cc: Andre Naujoks <nautsch2@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2013-10-31 20:45:04 +01:00
Masanari Iida	c17cb8b55b	doc:net: Fix typo in Documentation/networking Correct spelling typo in Documentation/networking Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-30 17:10:20 -04:00
Randy Dunlap	ad86de802d	Documentation/networking: netdev-FAQ typo corrections Various typo fixes to netdev-FAQ.txt: - capitalize Linux - hyphenate dual-word adjectives - minor punctuation fixes Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-27 16:55:40 -04:00
Marek Lindner	bc58eeef74	batman-adv: update email address for Marek Lindner Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2013-10-19 14:53:48 +02:00
Simon Wunderlich	c679ff8fb2	batman-adv: update email address for Simon Wunderlich My university will stop email service for alumni in january 2014, please use my new e-mail address instead. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>	2013-10-19 14:53:41 +02:00
Simon Wunderlich	9f4980e68b	batman-adv: remove vis functionality This is replaced by a userspace program, we don't need this functionality to bloat the kernel. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2013-10-09 21:22:32 +02:00
Nikolay Aleksandrov	7a6afab1de	bonding: document the new xmit policy modes and update the changed ones Add new documentation for encap2+3 and encap3+4, also update the formula for the old modes due to the changes. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-03 15:36:38 -04:00
Neil Horman	7eacd03810	bonding: Make alb learning packet interval configurable running bonding in ALB mode requires that learning packets be sent periodically, so that the switch knows where to send responding traffic. However, depending on switch configuration, there may not be any need to send traffic at the default rate of 3 packets per second, which represents little more than wasted data. Allow the ALB learning packet interval to be made configurable via sysfs Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Acked-by: Veaceslav Falico <vfalico@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-15 22:20:44 -04:00
Jesse Brandeburg	1bff652941	i40e: include i40e in kernel proper This patch adds the changes for Kconfig, i40e.txt, MAINTAINERS, Kbuild and new i40e/Makefile to build i40e with the kernel. New driver build option is CONFIG_I40E Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> CC: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com> CC: e1000-devel@lists.sourceforge.net Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-09-11 02:28:40 -07:00
Daniel Borkmann	f212781082	net: ipv6: mld: document force_mld_version in ip-sysctl.txt Document force_mld_version parameter in ip-sysctl.txt. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:21 -04:00
Sonic Zhang	e2a240c7d3	driver:net:stmmac: Disable DMA store and forward mode if platform data force_thresh_dma_mode is set. Some synopsys ip implementation doesn't support DMA store and forward mode, such as BF60x. So, set force_thresh_dma_mode to use DMA thresholds only. Update document and devicetree as well. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:26:09 -04:00
Daniel Borkmann	7ec06da81d	net: packet: document available fanout policies Update documentation to add fanout policies that are available. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 16:43:29 -04:00
Eric Dumazet	95bd09eb27	tcp: TSO packets automatic sizing After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Tom Herbert <therbert@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:50:06 -04:00
Hannes Frederic Sowa	b800c3b966	ipv6: drop fragmented ndisc packets by default (RFC 6980) This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <fernando@gont.com.ar> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:32:08 -04:00
David S. Miller	5b2941b18d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A number of significant new features and optimizations for net-next/3.12. Highlights are: * "Megaflows", an optimization that allows userspace to specify which flow fields were used to compute the results of the flow lookup. This allows for a major reduction in flow setups (the major performance bottleneck in Open vSwitch) without reducing flexibility. * Converting netlink dump operations to use RCU, allowing for additional parallelism in userspace. * Matching and modifying SCTP protocol fields. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 22:11:18 -04:00
Jeff Kirsher	d7064f4c19	Documentation/networking/: Update Intel wired LAN driver documentation Updates the documentation to the Intel wired LAN drivers. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-27 16:05:26 -04:00
Andy Zhou	03f0d916aa	openvswitch: Mega flow implementation Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2013-08-23 16:43:07 -07:00
David S. Miller	89d5e23210	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 13:30:54 -07:00
Hannes Frederic Sowa	fc4eba58b4	ipv6: make unsolicited report intervals configurable for mld Commit `cab70040df` ("net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3") and `2690048c01` ("net: igmp: Allow user-space configuration of igmp unsolicited report interval") by William Manley made igmp unsolicited report intervals configurable per interface and corrected the interval of unsolicited igmpv3 report messages resendings to 1s. Same needs to be done for IPv6: MLDv1 (RFC2710 7.10.): 10 seconds MLDv2 (RFC3810 9.11.): 1 second Both intervals are configurable via new procfs knobs mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval. (also added .force_mld_version to ipv6_devconf_dflt to bring structs in line without semantic changes) v2: a) Joined documentation update for IPv4 and IPv6 MLD/IGMP unsolicited_report_interval procfs knobs. b) incorporate stylistic feedback from William Manley v3: a) add new DEVCONF_* values to the end of the enum (thanks to David Miller) Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: William Manley <william.manley@youview.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-13 17:05:04 -07:00
David S. Miller	71acc0ddd4	Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl" This reverts commit `cda5f98e36`. As per Vlad's request. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 13:09:41 -07:00
Daniel Borkmann	cda5f98e36	net: sctp: convert sctp_checksum_disable module param into sctp sysctl Get rid of the last module parameter for SCTP and make this configurable via sysctl for SCTP like all the rest of SCTP's configuration knobs. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-09 11:33:02 -07:00
Paul Gortmaker	49dfe76261	Documentation: add networking/netdev-FAQ.txt A collection of expectations and operational details about how networking development takes place in the context of the netdev mailing list. The content is meant to capture specific items that are unique to netdev workflow, and not re-document generic linux expectations that are already captured elsewhere. This was originally proposed[1] as a regular posting mailing list FAQ, but it probably is more universally accessible here in tree. [1] https://lwn.net/Articles/559211/ Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-31 17:36:29 -07:00
Florian Westphal	fd158d79d3	netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skb The module was "permanent", due to the special tproxy skb->destructor. Nowadays we have tcp early demux and its sock_edemux destructor in networking core which can be used instead. Thanks to early demux changes the input path now also handles "skb->sk is tw socket" correctly, so this no longer needs the special handling introduced with commit `d503b30bd6` (netfilter: tproxy: do not assign timewait sockets to skb->sk). Thus: - move assign_sock function to where its needed - don't prevent timewait sockets from being assigned to the skb - remove nf_tproxy_core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-07-31 16:39:40 +02:00
Hannes Frederic Sowa	5ad37d5dee	tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies \| If you want to test which effects syncookies have to your \| network connections you can set this knob to 2 to enable \| unconditionally generation of syncookies. Original idea and first implementation by Eric Dumazet. Cc: Florian Westphal <fw@strlen.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-30 16:15:18 -07:00
Eric Dumazet	c9bee3b7fd	tcp: TCP_NOTSENT_LOWAT socket option Idea of this patch is to add optional limitation of number of unsent bytes in TCP sockets, to reduce usage of kernel memory. TCP receiver might announce a big window, and TCP sender autotuning might allow a large amount of bytes in write queue, but this has little performance impact if a large part of this buffering is wasted : Write queue needs to be large only to deal with large BDP, not necessarily to cope with scheduling delays (incoming ACKS make room for the application to queue more bytes) For most workloads, using a value of 128 KB or less is OK to give applications enough time to react to POLLOUT events in time (or being awaken in a blocking sendmsg()) This patch adds two ways to set the limit : 1) Per socket option TCP_NOTSENT_LOWAT 2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets not using TCP_NOTSENT_LOWAT socket option (or setting a zero value) Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect. This changes poll()/select()/epoll() to report POLLOUT only if number of unsent bytes is below tp->nosent_lowat Note this might increase number of sendmsg()/sendfile() calls when using non blocking sockets, and increase number of context switches for blocking sockets. Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is defined as : Specify the minimum number of bytes in the buffer until the socket layer will pass the data to the protocol) Tested: netperf sessions, and watching /proc/net/protocols "memory" column for TCP With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458) lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y Using 128KB has no bad effect on the throughput or cpu usage of a single flow, although there is an increase of context switches. A bonus is that we hold socket lock for a shorter amount of time and should improve latencies of ACK processing. lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 412,514 context-switches 200.034645535 seconds time elapsed lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 2,675,818 context-switches 200.029651391 seconds time elapsed Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-By: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-24 17:54:48 -07:00
Daniel Borkmann	91705c61b5	net: sctp: trivial: update mailing list address The SCTP mailing list address to send patches or questions to is linux-sctp@vger.kernel.org and not lksctp-developers@lists.sourceforge.net anymore. Therefore, update all occurences. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-24 17:53:38 -07:00
Linus Torvalds	496322bc91	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit `0a4db187a9` ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...	2013-07-09 18:24:39 -07:00
Geert Uytterhoeven	b78ba72cda	Documentation: Fix references to defunct linux-net@vger.kernel.org linux-net@vger.kernel.org was replaced by netdev@oss.sgi.com was replaced by netdev@vger.kernel.org. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-09 12:42:19 -07:00
Linus Torvalds	80cc38b163	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ...	2013-07-04 11:40:58 -07:00
David S. Miller	0c1072ae02	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-03 14:55:13 -07:00
David S. Miller	4e144d3a80	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for net-next, they are: * Enforce policy to several nfnetlink subsystem, from Daniel Borkmann. * Use xt_socket to match the third packet (to perform simplistic socket-based stateful filtering), from Eric Dumazet. * Avoid large timeout for picked up from the middle TCP flows, from Florian Westphal. * Exclude IPVS from struct net if IPVS is disabled and removal of unnecessary included header file, from JunweiZhang. * Release SCTP connection immediately under load, to mimic current TCP behaviour, from Julian Anastasov. * Replace and enhance SCTP state machine, from Julian Anastasov. * Add tweak to reduce sync traffic in the presence of persistence, also from Julian Anastasov. * Add tweak for the IPVS SH scheduler not to reject connections directed to a server, choose a new one instead, from Alexander Frolkin. * Add support for sloppy TCP and SCTP modes, that creates state information on any packet, not only initial handshake packets, from Alexander Frolkin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-30 17:35:13 -07:00
Julian Anastasov	4d0c875dcc	ipvs: add sync_persist_mode flag Add sync_persist_mode flag to reduce sync traffic by syncing only persistent templates. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-06-26 18:01:46 +09:00
Veaceslav Falico	8599b52e14	bonding: add an option to fail when any of arp_ip_target is inaccessible Currently, we fail only when all of the ips in arp_ip_target are gone. However, in some situations we might need to fail if even one host from arp_ip_target becomes unavailable. All situations, obviously, rely on the idea that we need completely functional network, with all interfaces/addresses working correctly. One real world example might be: vlans on top on bond (hybrid port). If bond and vlans have ips assigned and we have their peers monitored via arp_ip_target - in case of switch misconfiguration (trunk/access port), slave driver malfunction or tagged/untagged traffic dropped on the way - we will be able to switch to another slave. Though any other configuration needs that if we need to have access to all arp_ip_targets. This patch adds this possibility by adding a new parameter - arp_all_targets (both as a module parameter and as a sysfs knob). It can be set to: 0 or any (the default) - which works exactly as it's working now - the slave is up if any of the arp_ip_targets are up. 1 or all - the slave is up if all of the arp_ip_targets are up. This parameter can be changed on the fly (via sysfs), and requires the mode to be active-backup and arp_validate to be enabled (it obeys the arp_validate config on which slaves to validate). Internally it's done through: 1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's an array of jiffies, meaning that slave->target_last_arp_rx[i] is the last time we've received arp from bond->params.arp_targets[i] on this slave. 2) If we successfully validate an arp from bond->params.arp_targets[i] in bond_validate_arp() - update the slave->target_last_arp_rx[i] with the current jiffies value. 3) When getting slave's last_rx via slave_last_rx(), we return the oldest time when we've received an arp from any address in bond->params.arp_targets[]. If the value of arp_all_targets == 0 - we still work the same way as before. Also, update the documentation to reflect the new parameter. v3->v4: Kill the forgotten rtnl_unlock(), rephrase the documentation part to be more clear, don't fail setting arp_all_targets if arp_validate is not set - it has no effect anyway but can be easier to set up. Also, print a warning if the last arp_ip_target is removed while the arp_interval is on, but not the arp_validate. v2->v3: Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(), use the same initialization value for target_last_arp_rx[] as is used for the default last_arp_rx, to avoid useless interface flaps. Also, instead of failing to remove the last arp_ip_target just print a warning - otherwise it might break existing scripts. v1->v2: Correctly handle adding/removing hosts in arp_ip_target - we need to shift/initialize all slave's target_last_arp_rx. Also, don't fail module loading on arp_all_targets misconfiguration, just disable it, and some minor style fixes. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-25 16:58:38 -07:00
Veaceslav Falico	d7d35c681f	bonding: doc: some details on backup slave arp validation Add some details to bonding documentation on how backup slave arp validation works. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-25 16:58:38 -07:00
Cong Wang	7623757661	doc: fix some syntax errors in netlink mmap sample code Cc: Patrick McHardy <kaber@trash.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-25 16:47:02 -07:00
Shan Wei	a3c910d2e7	tcp: doc : fix the syncookies default value syncookies is on for default since in commit `e994b7c901` (tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL). And fix a typo of CONFIG_SYN_COOKIES. Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-24 00:26:26 -07:00
Cong Wang	e3d73bcedf	net: add doc for ip_early_demux sysctl commit `6648bd7e0e` (ipv4: Add sysctl knob to control early socket demux) introduced such sysctl, but forgot to add doc into Documentation/networking/ip-sysctl.txt. This patch adds it. Basically I grab the doc from the description of commit `41063e9dd1` (ipv4: Early TCP socket demux.) and the above commit. Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-12 15:10:13 -07:00
Rami Rosen	e8b265e8ba	doc:networking: Fix default value (icmp_ignore_bogus_error_responses). This patch fixes icmp_ignore_bogus_error_responses default value to be 1 instead of FALSE. It is initialized to 1 in icmp_sk_init(). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-07 23:31:54 -07:00
Daniel Borkmann	d70a3f887a	doc: packet: simplify tpacket example code This patch simplifies the tpacket_v3 example code a bit by getting rid of unecessary macro wrappers, removing some debugging code so that it is more to the point, and also adds a header comment. Now this example code is the very minimum one needs to start from when dealing with tpacket_v3 and ~100 lines smaller than before. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-06-07 14:39:05 -07:00
Stefan Huber	3d9738aa41	Documentation/networking/ieee802154 fix a typo Corrected the words Accsess to Access. Signed-off-by: Stefan Huber <steffhip@googlemail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-06-05 16:24:59 +02:00
Anatol Pomozov	f884ab15af	doc: fix misspellings with 'codespell' tool Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-05-28 12:02:12 +02:00
Cong Wang	b1098bbe1b	bonding: remove ifenslave.c from kernel source As Stephen proposed: Since bonding supports configuration via iproute (netlink) and sysfs, I think it is time to purge the old ifenslave code out of Documentation/networking and update the documentation. Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-27 23:34:46 -07:00
Masanari Iida	3dd17edea0	doc:networking: Fix typo in documentation/networking Correct spelling typo Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-27 23:29:18 -07:00
Willem de Bruijn	191cb1f21a	rps: document flow limit in scaling.txt Explain the mechanism and API of the recently merged rps flow limit patch. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-23 18:54:44 -07:00
Daniel Borkmann	2940b26bec	packet: doc: update timestamping part Bring the timestamping section in sync with the implementation. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-25 01:22:22 -04:00
Patrick McHardy	5683264c39	netlink: add documentation for memory mapped I/O Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-19 14:58:36 -04:00
Giuseppe CAVALLARO	49cfbf675c	stmmac: review driver documentation This patch reviews the driver documentation file; for example, there were some new fields (in the driver module parameter section) and the ptp files were not documented. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-08 16:55:27 -04:00
Werner Almesberger	56aa091d60	ieee802154/nl-mac.c: make some MLME operations optional Check for NULL before calling the following operations from "struct ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req, and scan_req. This fixes a current oops where those functions are called but not implemented. It also updates the documentation to clarify that they are now optional by design. If a call to an unimplemented function is attempted, the kernel returns EOPNOTSUPP via netlink. The following operations are still required: get_phy, get_pan_id, get_short_addr, and get_dsn. Note that the places where this patch changes the initialization of "ret" should not affect the rest of the code since "ret" was always set (again) before returning its value. Signed-off-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-08 12:00:16 -04:00
Daniel Borkmann	4eb0614825	doc: packet: add minimal TPACKET_V3 example code Lost in space for a long time, but it finally came back to us from some ancient code tombs. This patch adds a minimal runnable example of Linux' packet mmap(2) from Chetan Loke's TPACKET_V3. Special thanks to David S. Miller, and also Eric Leblond and Victor Julien! Cc: Eric Leblond <eric@regit.org> Cc: Victor Julien <victor@inliniac.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-30 17:40:27 -04:00
Giuseppe CAVALLARO	94fbbbf894	stmmac: update the Doc and Version (PTP+SGMII) This patch updates the stmmac.txt file adding information related to the PTP and SGMII/RGMII supports. Also the patch updates the driver version to: March_2013. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-26 12:53:37 -04:00
Yuchung Cheng	e33099f96d	tcp: implement RFC5682 F-RTO This patch implements F-RTO (foward RTO recovery): When the first retransmission after timeout is acknowledged, F-RTO sends new data instead of old data. If the next ACK acknowledges some never-retransmitted data, then the timeout was spurious and the congestion state is reverted. Otherwise if the next ACK selectively acknowledges the new data, then the timeout was genuine and the loss recovery continues. This idea applies to recurring timeouts as well. While F-RTO sends different data during timeout recovery, it does not (and should not) change the congestion control. The implementaion follows the three steps of SACK enhanced algorithm (section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and 3 are in tcp_process_loss(). The basic version is not supported because SACK enhanced version also works for non-SACK connections. The new implementation is functionally in parity with the old F-RTO implementation except the one case where it increases undo events: In addition to the RFC algorithm, a spurious timeout may be detected without sending data in step 2, as long as the SACK confirms not all the original data are dropped. When this happens, the sender will undo the cwnd and perhaps enter fast recovery instead. This additional check increases the F-RTO undo events by 5x compared to the prior implementation on Google Web servers, since the sender often does not have new data to send for HTTP. Note F-RTO may detect spurious timeout before Eifel with timestamps does so. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:47:51 -04:00
Yuchung Cheng	9b44190dc1	tcp: refactor F-RTO The patch series refactor the F-RTO feature (RFC4138/5682). This is to simplify the loss recovery processing. Existing F-RTO was developed during the experimental stage (RFC4138) and has many experimental features. It takes a separate code path from the traditional timeout processing by overloading CA_Disorder instead of using CA_Loss state. This complicates CA_Disorder state handling because it's also used for handling dubious ACKs and undos. While the algorithm in the RFC does not change the congestion control, the implementation intercepts congestion control in various places (e.g., frto_cwnd in tcp_ack()). The new code implements newer F-RTO RFC5682 using CA_Loss processing path. F-RTO becomes a small extension in the timeout processing and interfaces with congestion control and Eifel undo modules. It lets congestion control (module) determines how many to send independently. F-RTO only chooses what to send in order to detect spurious retranmission. If timeout is found spurious it invokes existing Eifel undo algorithms like DSACK or TCP timestamp based detection. The first patch removes all F-RTO code except the sysctl_tcp_frto is left for the new implementation. Since CA_EVENT_FRTO is removed, TCP westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-21 11:47:50 -04:00
David S. Miller	61816596d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull in the 'net' tree to get Daniel Borkmann's flow dissector infrastructure change. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-20 12:46:26 -04:00
Julian Anastasov	0c12582fbc	ipvs: add backup_only flag to avoid loops Dmitry Akindinov is reporting for a problem where SYNs are looping between the master and backup server when the backup server is used as real server in DR mode and has IPVS rules to function as director. Even when the backup function is enabled we continue to forward traffic and schedule new connections when the current master is using the backup server as real server. While this is not a problem for NAT, for DR and TUN method the backup server can not determine if a request comes from client or from director. To avoid such loops add new sysctl flag backup_only. It can be needed for DR/TUN setups that do not need backup and director function at the same time. When the backup function is enabled we stop any forwarding and pass the traffic to the local stack (real server mode). The flag disables the director function when the backup function is enabled. For setups that enable backup function for some virtual services and director function for other virtual services there should be another more complex solution to support DR/TUN mode, may be to assign per-virtual service syncid value, so that we can differentiate the requests. Reported-by: Dmitry Akindinov <dimak@stalker.com> Tested-by: German Myzovsky <lawyer@sipnet.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2013-03-19 21:21:51 +09:00
Christoph Paasch	1a2c6181c4	tcp: Remove TCPCT TCPCT uses option-number 253, reserved for experimental use and should not be used in production environments. Further, TCPCT does not fully implement RFC 6013. As a nice side-effect, removing TCPCT increases TCP's performance for very short flows: Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests for files of 1KB size. before this patch: average (among 7 runs) of 20845.5 Requests/Second after: average (among 7 runs) of 21403.6 Requests/Second Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-17 14:35:13 -04:00
Li RongQing	b66c66dc5c	Documentation: fix neigh/default/gc_thresh1 default value. The default value is 128, not 256 #grep gc_thresh1 net/ -rI net/decnet/dn_neigh.c: .gc_thresh1 = 128, net/ipv6/ndisc.c: .gc_thresh1 = 128, net/ipv4/arp.c: .gc_thresh1 = 128, Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-15 09:12:23 -04:00
Nandita Dukkipati	6ba8a3b19e	tcp: Tail loss probe (TLP) This patch series implement the Tail loss probe (TLP) algorithm described in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The first patch implements the basic algorithm. TLP's goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly RTO. In the absence of loss, there is no change in the connection state. PTO stands for probe timeout. It is a timer event indicating that an ACK is overdue and triggers a loss probe packet. The PTO value is set to max(2SRTT, 10ms) and is adjusted to account for delayed ACK timer when there is only one oustanding packet. TLP Algorithm On transmission of new data in Open state: -> packets_out > 1: schedule PTO in max(2SRTT, 10ms). -> packets_out == 1: schedule PTO in max(2RTT, 1.5RTT + 200ms) -> PTO = min(PTO, RTO) Conditions for scheduling PTO: -> Connection is in Open state. -> Connection is either cwnd limited or no new data to send. -> Number of probes per tail loss episode is limited to one. -> Connection is SACK enabled. When PTO fires: new_segment_exists: -> transmit new segment. -> packets_out++. cwnd remains same. no_new_packet: -> retransmit the last segment. Its ACK triggers FACK or early retransmit based recovery. ACK path: -> rearm RTO at start of ACK processing. -> reschedule PTO if need be. In addition, the patch includes a small variation to the Early Retransmit (ER) algorithm, such that ER and TLP together can in principle recover any N-degree of tail loss through fast recovery. TLP is controlled by the same sysctl as ER, tcp_early_retrans sysctl. tcp_early_retrans==0; disables TLP and ER. ==1; enables RFC5827 ER. ==2; delayed ER. ==3; TLP and delayed ER. [DEFAULT] ==4; TLP only. The TLP patch series have been extensively tested on Google Web servers. It is most effective for short Web trasactions, where it reduced RTOs by 15% and improved HTTP response time (average by 6%, 99th percentile by 10%). The transmitted probes account for <0.5% of the overall transmissions. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-12 08:30:34 -04:00
Jason Wang	f422d2a04f	net: docs: document multiqueue tuntap API Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-06 14:56:10 -05:00
Stephen Hemminger	ca2eb5679f	tcp: remove Appropriate Byte Count support TCP Appropriate Byte Count was added by me, but later disabled. There is no point in maintaining it since it is a potential source of bugs and Linux already implements other better window protection heuristics. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-05 14:51:16 -05:00
Jitendra Kalsaria	577ae39ddb	qlcnic: Updating copyright information. We recently refactored the driver source, this patch will take care of updating copyright date and adding it to newly added files. Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-04 21:08:48 -05:00
David S. Miller	b640bee6d9	Merge branch 'master' of git://1984.lsi.us.es/nf-next Pablo Neira Ayuso says: ==================== This batch contains netfilter updates for you net-next tree, they are: * The new connlabel extension for x_tables, that allows us to attach labels to each conntrack flow. The kernel implementation uses a bitmask and there's a file in user-space that maps the bits with the corresponding string for each existing label. By now, you can attach up to 128 overlapping labels. From Florian Westphal. * A new round of improvements for the netns support for conntrack. Gao feng has moved many of the initialization code of each module of the netns init path. He also made several code refactoring, that code looks cleaner to me now. * Added documentation for all possible tweaks for nf_conntrack via sysctl, from Jiri Pirko. * Cisco 7941/7945 IP phone support for our SIP conntrack helper, from Kevin Cernekee. * Missing header file in the snmp helper, from Stephen Hemminger. * Finally, a couple of fixes to resolve minor issues with these changes, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-27 00:56:10 -05:00
David S. Miller	930d52c012	Merge branch 'legacy-isa-delete' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Paul Gortmaker says: ==================== The Ethernet-HowTo was maintained for roughly 10 years, from 1993 to 2003. Fortunately sane hardware probing and auto detection (via PCI and ISA/PnP) largely made the document a relic of the past, hence it being abandoned a decade ago. However, there is one last useful thing that we can extract from the effort made in maintaining that document. We can use it to guide us with respect to what rare, experimental and/or super ancient 10Mbit ISA drivers don't make sense to maintain in-tree anymore. Nobody will argue that ISA is obsolete. Availability went away at about the time Pentium3 motherboards moved from 500MHz Slot1/SECC processors to the green 500MHz Socket 370 Pentium3 chips, at the turn of the century. In theory, it is possible that someone could still be running one of these 12+ year old P3 machines and want 3.9+ bleeding edge kernels (but unlikely). In light of the above (remote) possibility, we can defer the removal of some ISA network drivers that were highly popular and well tested. Typically that means the stuff more from the mid to late '90s, some with ISA PnP support, like the 3c509, the wd/SMC 8390 based stuff, PCnet/lance etc. But a lot of other drivers, typically from the early 1990s were for rare hardware, and experimental (to the point of requiring a cron job that would do a test ping, and then ifconfig down/up and/or a rmmod/insmod!). And some of these drivers (znet, and lp486e to name two) are physically tied to platforms with on motherboard ethernet -- of 486 machines that date from the early 1990s and can only have single digit amounts of memory. What I'd like to achieve here with this series, is to get rid of those old drivers that are no longer being used. In an earlier discussion where I'd proposed deleting a single driver, Alan suggested we instead dump all the historical stuff in one go, to make it "...immediately obvious where the break point is..."[1] and that it was "perfectly reasonable it (and a pile of other ISA cards) ought to be shown the door"[2]. So that is the goal here - make a clear line in the sand where the really ancient stuff finally gets kicked to the curb. Two old parallel port drivers are considered for removal here as well, since in early 386/486 ISA machines, the parallel port was typically found with the UARTS on the multi-I/O ISA controller card. These drivers also date from the early 1990's; parallel ports are no longer found on modern boards, and their performance was not even capable of 10% of 10Mbit bandwidth. Allow me a preemptive justification against the inevitable comments from well meaning bystanders who suggest "why not just leave all this alone?". Dead drivers cost us all if they are left in tree. If you think that is false, then please first consider: -every time you type "git status", you are checking to see if modifications have been made by you to all that dead code. -every time you type "git grep <regex>" you are searching through files which contain that dead code that simply does not interest you. -every time you build a "allyesconfig" and an "allmodconfig" (don't tell me you skip this step before submitting your changes to a maintainer), you waste CPU cycles building this dead code. -every time there is a tree wide API change, or cleanup, or file relocation, we pay the cost of updating dead code, or moving dead code. -daily regression tests (take linux-next as the most transparent example) spend time building (and possibly running) this dead code. -hard working people who regularly run auditing tools looking for lurking bugs (sparse/coverity/smatch/coccinelle) are wasting time checking for, and fixing bugs in this dead code. This last one is key. Please take a look at the git history for the files that are proposed for removal here. Look at the git history for any one of them ("git whatchanged --follow drivers/net/.../driver.c") Mentally sort the changes into two bins -- (1) the robotic tree-wide changes, and (2) the "look I found a real run-time bug while using this" category. You will see that category #2 is essentially empty. Further to that, realize that drivers don't simply disappear. We are not operating in the binary-only distribution space like other OS. All these drivers remain in the git history forever. If a person is an enthusiast for extreme legacy hardware, they are probably already customizing their kernel source and building it themselves to support such systems. Also keep in mind that they could still build the 3.8 kernel exactly as-is, and run it (or a 3.8.x stable variant of it) for several more years if they were really determined to cling to these old experimental ISA drivers for some reason. In summary, I hope that folks can be pragmatic about this, and not get swept up in nostalgia. Ask yourself whether it is realistic to expect a person would have a genuine use case where they would need to build a 3.9+ modern kernel and install it on some legacy hardware that has no option but to absolutely _require_ one of the drivers that are deleted here. The following series was created with --irreversible-delete for ease of review (it skips showing the content of files that are deleted); however the complete patches can be pulled as per below. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-22 14:47:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明	2724680bce	neigh: Keep neighbour cache entries if number of them is small enough. Since we have removed NCE (Neighbour Cache Entry) reference from routing entries, the only refcnt holders of an NCE are its timer (if running) and its owner table, in usual cases. As a result, neigh_periodic_work() purges NCEs over and over again even for gateways. It does not make sense to purge entries, if number of them is very small, so keep them. The minimum number of entries to keep is specified by gc_thresh1. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-22 14:25:28 -05:00
Paul Gortmaker	0ffd89e48f	drivers/net: delete Digital EtherWorks-3 support. This is another one that makes sense to target for obsolescence, since it (a)appeared pre-1995, and (b)was rather rare, and (c)did not really have any statistically significant active linux user base. Removing this ISA 10Mbit driver support is unlikely to be even noticed by the user base of 3.9+ linux kernels, especially when the documentation clearly indicates the vintage with this text: "...designed to work with all kernels > 1.1.33" Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-01-22 10:39:55 -05:00
Paul Gortmaker	1f1c7a5c1d	drivers/net: delete old DEC depca ISA drivers support. These are old ISA 10Mbit cards from the 1st 1/2 of the 1990s and required manual jumper settings in order to configure them. Here we remove them on the premise that they are no longer used in any modern 3.9+ kernels. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-01-22 10:39:55 -05:00
Paul Gortmaker	168e06ae26	drivers/net: delete old parallel port de600/de620 drivers The parallel port is largely replaced by USB, and even in the day where these drivers were current, the documented speed was less than 100kB/s. Let us not pretend that anyone cares about these drivers anymore, or worse - pretend that anyone is using them on a modern kernel. As a side bonus, this is the end of legacy parallel port ethernet, so we get to drop the whole chunk relating to that in the legacy Space.c file containing the non-PCI unified probe dispatch. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-01-22 10:39:49 -05:00
Paul Gortmaker	202dc3fc59	Documentation: remove obsolete networking/multicast.txt file The original intent of this file was to list limitations in drivers/hardware relating to multicast use, back when some modest hardware from the early 1990s did not support things we might take for granted today. I was intending to delete some now-gone MCA/token ring entries in this file, but once I opened it, I found it only contained information on the earliest (pre-2000) linux networking drivers. Checking the git history shows that the file hasn't been touched since 2005. Clearly nobody is actively consulting this file as a meaningful reference. Rather than add a "YES YES YES" line for all of the drivers we currently have, lets just take advantage of the fact that nobody is using the file to delete it. This has the side benefit of not having to do a line-by-line deletion of the file content as each older driver is expired. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-21 13:54:05 -05:00
Jiri Pirko	c9f9e0e159	netfilter: doc: add nf_conntrack sysctl api documentation I grepped through the code and picked bits about nf_conntrack sysctl api and put that into one documentation file. [ I have mangled this patch including comments from several grammar improvements proposed by Neal Murphy <neal.p.murphy@alum.wpi.edu>, any new grammar error is my mistake --pablo ] Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-01-21 12:50:06 +01:00
Vincent Bernat	d59577b6ff	sk-filter: Add ability to lock a socket filter program While a privileged program can open a raw socket, attach some restrictive filter and drop its privileges (or send the socket to an unprivileged program through some Unix socket), the filter can still be removed or modified by the unprivileged program. This commit adds a socket option to lock the filter (SO_LOCK_FILTER) preventing any modification of a socket filter program. This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even root is not allowed change/drop the filter. The state of the lock can be read with getsockopt(). No error is triggered if the state is not changed. -EPERM is returned when a user tries to remove the lock or to change/remove the filter while the lock is active. The check is done directly in sk_attach_filter() and sk_detach_filter() and does not affect only setsockopt() syscall. Signed-off-by: Vincent Bernat <bernat@luffy.cx> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-17 03:21:25 -05:00
David S. Miller	4b87f92259	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/networking/ip-sysctl.txt drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c Both conflicts were simply overlapping context. A build fix for qlcnic is in here too, simply removing the added devinit annotations which no longer exist. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-15 15:05:59 -05:00
Florian Fainelli	f9a8f83b04	net: phy: remove flags argument from phy_{attach, connect, connect_direct} The flags argument of the phy_{attach,connect,connect_direct} functions is then used to assign a struct phy_device dev_flags with its value. All callers but the tg3 driver pass the flag 0, which results in the underlying PHY drivers in drivers/net/phy/ not being able to actually use any of the flags they would set in dev_flags. This patch gets rid of the flags argument, and passes phydev->dev_flags to the internal PHY library call phy_attach_direct() such that drivers which actually modify a phy device dev_flags get the value preserved for use by the underlying phy driver. Acked-by: Kosta Zertsekel <konszert@marvell.com> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-14 15:11:50 -05:00
Paul Gortmaker	00494be454	networking/cs89x0.txt: delete stale information about hand patching Output of a git grep happened to make me look into this file, and I found instructions about how to hand patch (without using patch) the driver into the kernel tree. Since the driver has been a part of the mainline kernel for years, we can dump this whole section. Fortunately it doesn't even cause a renumbering of the sections to do so. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-11 16:52:26 -08:00
Vijay Subramanian	3d55b32370	doc: Clarify behavior when sysctl tcp_ecn = 1 Recent commit (commit `7e3a2dc529` doc: make the description of how tcp_ecn works more explicit and clear ) clarified the behavior of tcp_ecn sysctl variable but description is inconsistent. When requested by incoming conections, ECN is enabled with not just tcp_ecn = 2 but also with tcp_ecn = 1. This patch makes it clear that with tcp_ecn = 1, ECN is enabled when requested by incoming connections. Also fix spelling of 'incoming'. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-10 14:21:16 -08:00
Cong Wang	7265a6bbdd	netconsole: add IPv6 example in doc Update the netconsole document as well. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 17:56:10 -08:00
stephen hemminger	3b09adcb20	ip-sysctl: fix spelling errors Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 15:12:34 -08:00
Hannes Frederic Sowa	db2b620aa0	ipv6: document ndisc_notify in networking/ip-sysctl.txt I slipped in a new sysctl without proper documentation. I would like to make up for this now. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:35:38 -08:00
Jiri Pirko	9a57247f31	rtnl: expose carrier value with possibility to set it Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-28 15:24:18 -08:00
Rick Jones	d825da2ede	doc: Tighten-up and clarify description of tcp_fin_timeout The description for tcp_fin_timeout should be tigher and more clear. In addition to being tighter, we should make the spelling of the state name consistent with what utilities report, remove the now dated reference to 2.2 and put the default in the consistent place. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-10 17:14:28 -05:00
Shan Wei	5d248c491b	net: doc : use more suitable word 'unexpected' to replace 'secluded' 'secluded' is used to describe places, not suitable here. Suggested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 14:31:07 -05:00
Shan Wei	cc86802805	net: doc: add default value for neighbour parameters Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-05 16:01:28 -05:00
Rick Jones	7e3a2dc529	doc: make the description of how tcp_ecn works more explicit and clear Make the description of how tcp_ecn works a bit more explicit and clear. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-29 13:14:58 -05:00
Giuseppe CAVALLARO	f9e01b5565	stmmac: update the doc with new IRQ mitigation This patch updates the stmmac.txt adding some information about the new rx/tx mitigation schema adopted in the driver. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-26 17:22:12 -05:00
David S. Miller	24bc518a68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/tx.c Minor iwlwifi conflict in TX queue disabling between 'net', which removed a bogus warning, and 'net-next' which added some status register poking code. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-25 12:49:17 -05:00
Zhi Yong Wu	cc9b310165	vxlan: fix command usage in its doc Some commands don't work in its example doc. The patch will fix it. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-23 14:03:04 -05:00
David S. Miller	67f4efdce7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor line offset auto-merges. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-17 22:00:43 -05:00
Kirill Smelkov	73e212fc48	doc/net: Fix typo in netdev-features.txt Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-13 14:37:48 -05:00
Daniel Borkmann	d1ee40f960	doc: packet_mmap: update doc to implementation status This improves the packet_mmap.txt document in the following ways: * Add initial information about different TPACKET versions * Add initial information about packet fanout * Add pointer to BPF document (since this also could be of interest) * 'Fix' minor, rather cosmetic things Information partially taken from related commit messages. Reported-by: Ronny Meeus <ronny.meeus@gmail.com> Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Cc: Ulisses Alonso Camaró <uaca@alumni.uv.es> Cc: Johann Baudy <johann.baudy@gnu-log.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-09 16:45:49 -05:00
David S. Miller	f1e0b5b4f1	Included changes: - minimal fixes to the packet layout to avoid the __packed attribute when not needed - new packet type called UNICAST_4ADDR: in this packet it is possible to find both source and destination node (in the classic UNICAST header only the destination field exists). - a new feature: Distributed ARP Table (D.A.T.). It aims to reduce ARP lookups latency by means of a simil-DHT approach. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlCasB8ACgkQpGgxIkP9cwd2kACeOhRvj9EA2hWo1vKQUN6OvTi0 nOcAn08Rvyf+gUPAFQcA9HEUAr8kAebT =c3ey -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - minimal fixes to the packet layout to avoid the __packed attribute when not needed - new packet type called UNICAST_4ADDR: in this packet it is possible to find both source and destination node (in the classic UNICAST header only the destination field exists). - a new feature: Distributed ARP Table (D.A.T.). It aims to reduce ARP lookups latency by means of a simil-DHT approach.	2012-11-07 19:08:42 -05:00
Paul Chavent	5920cd3a41	packet: tx_ring: allow the user to choose tx data offset The tx data offset of packet mmap tx ring used to be : (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)) The problem is that, with SOCK_RAW socket, the payload (14 bytes after the beginning of the user data) is misaligned. This patch allows to let the user gives an offset for it's tx data if he desires. Set sock option PACKET_TX_HAS_OFF to 1, then specify in each frame of your tx ring tp_net for SOCK_DGRAM, or tp_mac for SOCK_RAW. Signed-off-by: Paul Chavent <paul.chavent@onera.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-07 18:54:30 -05:00
Antonio Quartulli	0e861a3c4f	batman-adv: Distributed ARP Table - add a new debug log level A new log level has been added to concentrate messages regarding DAT: ARP snooping, requests, response and DHT related messages. The new log level is named BATADV_DBG_DAT Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-11-07 20:00:18 +01:00
Neil Horman	3c68198e75	sctp: Make hmac algorithm selection for cookie generation dynamic Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to generate cookie values when establishing new connections via two build time config options. Theres no real reason to make this a static selection. We can add a sysctl that allows for the dynamic selection of these algorithms at run time, with the default value determined by the corresponding crypto library availability. This comes in handy when, for example running a system in FIPS mode, where use of md5 is disallowed, but SHA1 is permitted. Note: This new sysctl has no corresponding socket option to select the cookie hmac algorithm. I chose not to implement that intentionally, as RFC 6458 contains no option for this value, and I opted not to pollute the socket option namespace. Change notes: v2) * Updated subject to have the proper sctp prefix as per Dave M. * Replaced deafult selection options with new options that allow developers to explicitly select available hmac algs at build time as per suggestion by Vlad Y. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-26 02:22:18 -04:00
stephen hemminger	d342894c5d	vxlan: virtual extensible lan This is an implementation of Virtual eXtensible Local Area Network as described in draft RFC: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 The driver integrates a Virtual Tunnel Endpoint (VTEP) functionality that learns MAC to IP address mapping. This implementation has not been tested only against the Linux userspace implementation using TAP, not against other vendor's equipment. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-01 18:39:45 -04:00
Jerry Chu	1046716368	tcp: TCP Fast Open Server - header & support functions This patch adds all the necessary data structure and support functions to implement TFO server side. It also documents a number of flags for the sysctl_tcp_fastopen knob, and adds a few Linux extension MIBs. In addition, it includes the following: 1. a new TCP_FASTOPEN socket option an application must call to supply a max backlog allowed in order to enable TFO on its listener. 2. A number of key data structures: "fastopen_rsk" in tcp_sock - for a big socket to access its request_sock for retransmission and ack processing purpose. It is non-NULL iff 3WHS not completed. "fastopenq" in request_sock_queue - points to a per Fast Open listener data structure "fastopen_queue" to keep track of qlen (# of outstanding Fast Open requests) and max_qlen, among other things. "listener" in tcp_request_sock - to point to the original listener for book-keeping purpose, i.e., to maintain qlen against max_qlen as part of defense against IP spoofing attack. 3. various data structure and functions, many in tcp_fastopen.c, to support server side Fast Open cookie operations, including /proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-31 20:02:18 -04:00
Srinivas Kandagatla	d56631a66c	net:stmmac: Remove bus_id from mdio platform data. This patch removes bus_id from mdio platform data, The reason to remove bus_id is, stmmac mdio bus_id is always same as stmmac bus-id, so there is no point in passing this in different variable. Also stmmac ethernet driver connects to phy with bus_id passed its platform data. So, having single bus-id is much simpler. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-31 16:11:28 -04:00
Alex Bergmann	6c9ff979d1	tcp: Increase timeout for SYN segments Commit `9ad7c049` ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") changed the initRTO from 3secs to 1sec in accordance to RFC6298 (former RFC2988bis). This reduced the time till the last SYN retransmission packet gets sent from 93secs to 31secs. RFC1122 is stating that the retransmission should be done for at least 3 minutes, but this seems to be quite high. "However, the values of R1 and R2 may be different for SYN and data segments. In particular, R2 for a SYN segment MUST be set large enough to provide retransmission of the segment for at least 3 minutes. The application can close the connection (i.e., give up on the open attempt) sooner, of course." This patch increases the value of TCP_SYN_RETRIES to the value of 6, providing a retransmission window of 63secs. The comments for SYN and SYNACK retries have also been updated to describe the current settings. The same goes for the documentation file "Documentation/networking/ip-sysctl.txt". Signed-off-by: Alexander Bergmann <alex@linlab.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-31 15:42:10 -04:00
Simon Wunderlich	536a23f119	batman-adv: Add the backbone gateway list to debugfs This is especially useful if there are no claims yet, but we still want to know which gateways are using bridge loop avoidance in the network. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-08-23 14:02:41 +02:00
John Eaglesham	6b923cb718	bonding: support for IPv6 transmit hashing Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-22 22:49:30 -07:00
Dirk Gouders	6556bfde65	netconsole.txt: revision of examples for the receiver of kernel messages There are at least 4 implementations of netcat with the BSD-based being the only one that has to be used without the -p switch to specify the listening port. Jan Engelhardt suggested to add an example for socat(1). Signed-off-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:33 -07:00
Eric Dumazet	0c7462a235	ipv4: remove rt_cache_rebuild_count After IP route cache removal, rt_cache_rebuild_count is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:53:22 -07:00
Rick Jones	f8b72d36d2	net-next: minor cleanups for bonding documentation The section titled "Configuring Bonding for Maximum Throughput" is actually section twelve not thirteen, and there are a couple of words spelled incorrectly. Signed-off-by: Rick Jones <rick.jones2@hp.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:44:01 -07:00
Neil Horman	5aa93bcf66	sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:13:46 -07:00
David S. Miller	c073cfc89f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A few bug fixes and small enhancements for net-next/3.6. ... Ansis Atteka (1): openvswitch: Do not send notification if ovs_vport_set_options() failed Ben Pfaff (1): openvswitch: Check gso_type for correct sk_buff in queue_gso_packets(). Jesse Gross (2): openvswitch: Enable retrieval of TCP flags from IPv6 traffic. openvswitch: Reset upper layer protocol info on internal devices. Leo Alterman (1): openvswitch: Fix typo in documentation. Pravin B Shelar (1): openvswitch: Check currect return value from skb_gso_segment() Raju Subramanian (1): openvswitch: Replace Nicira Networks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 16:16:34 -07:00
Leo Alterman	efaac3bf08	openvswitch: Fix typo in documentation. Signed-off-by: Leo Alterman <lalterman@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>	2012-07-20 14:51:07 -07:00
Yuchung Cheng	67da22d23f	net-tcp: Fast Open client - cookie-less mode In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 11:02:03 -07:00
Yuchung Cheng	cf60af03ca	net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 11:02:03 -07:00
stephen hemminger	8427b2acfd	bridge: update documentation references Update the references to bridge utilities and web pages to current locations Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 10:48:07 -07:00
Eric Dumazet	282f23c6ee	tcp: implement RFC 5961 3.2 Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s \| grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 01:36:20 -07:00
Eric Dumazet	46d3ceabd8	tcp: TCP Small Queues This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-11 18:12:59 -07:00
Jon Mason	c0589fa78a	vxge/s2io: remove dead URLs URLs to neterion.com and s2io.com no longer resolve. Remove all references to these URLs in the driver source and documentation. Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 23:24:47 -07:00
Giuseppe CAVALLARO	0ec2ccd080	stmmac: update the driver Documentation and add EEE This patch updates the stmmac's documentation adding some missing files in the section used to describe the internal driver's structure. Also the patch adds a new section to describe the EEE support. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-01 03:34:49 -07:00
David S. Miller	c801e3cc19	ipv4: Clarify in docs that accept_local requires rp_filter. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-30 22:39:27 -07:00
Sjur Brændeland	5051c94bb3	Documentation/networking/caif: Update documentation Update drawing and remove description of old features. Add HSI and USB link layers to the drawing. Reported-by: Joerg Reisenweber <joerg.reisenweber@stericssion.com> Signed-off-by: Sjur Brændeland <sjur.brandeland@stericssion.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-25 16:44:12 -07:00
Oliver Hartkopp	ea53fe0c66	canfd: update documentation according to CAN FD extensions Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2012-06-19 21:40:26 +02:00
Martin Hundebøll	f8214865a5	batman-adv: Add get_ethtool_stats() support Added additional counters in a bat_stats structure, which are exported through the ethtool api. The counters are specific to batman-adv and includes: forwarded packets and bytes management packets and bytes (aggregated OGMs at this point) translation table packets New counters are added by extending "enum bat_counters" in types.h and adding corresponding descriptive string(s) to bat_counters_strings in soft-iface.c. Counters are increased by calling batadv_add_counter() and incremented by one by calling batadv_inc_counter(). Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2012-06-18 18:00:58 +02:00
Thomas Graf	d0daebc3d6	ipv4: Add interface option to enable routing of 127.0.0.0/8 Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-12 15:25:46 -07:00
David S. Miller	c1864cfb80	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-06-06 15:06:41 -07:00
David S. Miller	9b97b84eb5	Merge branch 'master' of git://gitorious.org/linux-can/linux-can-next	2012-06-06 11:13:26 -07:00
Giuseppe CAVALLARO	3d2377144c	stmmac: update driver's doc Fixed the driver's documentation that was obsolete and didn't report new platform fields (recently added). Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-06 09:34:56 -07:00
Oliver Hartkopp	d6e640f976	can: update documentation wording error frames -> error messages As Heinz-Juergen Oertel pointed out 'CAN error frames' are a already defined term for the CAN protocol violation indication on the wire. To avoid confusion with the error messages created by CAN drivers available via CAN RAW sockets update the documentation and change the naming from 'error frames' to 'error messages' or 'error message frames'. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2012-05-23 22:55:49 +02:00
Linus Torvalds	e8650a0823	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "As usual, it's mostly typo fixes, redundant code elimination and some documentation updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits) edac, mips: don't change code that has been removed in edac/mips tree xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer lib: Change mail address of Oskar Schirmer net: Change mail address of Oskar Schirmer arm/m68k: Change mail address of Sebastian Hess i2c: Change mail address of Oskar Schirmer net: Fix tcp_build_and_update_options comment in struct tcp_sock atomic64_32.h: fix parameter naming mismatch Kconfig: replace "--- help ---" with "---help---" c2port: fix bogus Kconfig "default no" edac: Fix spelling errors. qla1280: Remove redundant NULL check before release_firmware() call remoteproc: remove redundant NULL check before release_firmware() qla2xxx: Remove redundant NULL check before release_firmware() call. aic94xx: Get rid of redundant NULL check before release_firmware() call tehuti: delete redundant NULL check before release_firmware() qlogic: get rid of a redundant test for NULL before call to release_firmware() bna: remove redundant NULL test before release_firmware() tg3: remove redundant NULL test before release_firmware() call typhoon: get rid of redundant conditional before all to release_firmware() ...	2012-05-22 19:22:50 -07:00
Paul Gortmaker	a5e371f61a	drivers/net: delete all code/drivers depending on CONFIG_MCA The support for CONFIG_MCA is being removed, since the 20 year old hardware simply isn't capable of meeting today's software demands on CPU and memory resources. This commit removes any MCA specific net drivers, and removes any MCA specific probe/support code from drivers that were doing a dual ISA/MCA role. Cc: "David S. Miller" <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-05-17 16:37:41 -04:00
alex.bluesman.smirnov@gmail.com	dd456d45d7	Documentation/networking/ieee802154: update MAC chapter Update the documentation according to latest changes. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-16 15:17:08 -04:00
Paul Gortmaker	ee446fd5e6	tokenring: delete all remaining driver support This represents the mass deletion of the of the tokenring support. It gets rid of: - the net/tr.c which the drivers depended on - the drivers/net component - the Kbuild infrastructure around it - any tokenring related CONFIG_ settings in any defconfigs - the tokenring headers in the include/linux dir - the firmware associated with the tokenring drivers. - any associated token ring documentation. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-05-15 20:23:16 -04:00
Sven Eckelmann	a77e8c61db	batman-adv: README cleanups - Add routing_algo - Remove date from README: The date has to be updated when a patch touches the README. Therefore, nearly every feature will modify this date. It can happens quite often that not only one feature is currently in development or waiting on the mailinglist. This creates merge conflicts when applying a patchset. The date itself doesn't provide any additional information when this file is only available in a release tarball or as part of a SCM repository. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-05-14 08:50:22 +02:00
Pablo Neira Ayuso	4981682cc1	netfilter: bridge: optionally set indev to vlan if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge netfilter removes the vlan header temporarily and then feeds the packet to ip(6)tables. When the new "bridge-nf-pass-vlan-input-device" sysctl is on (default off), then bridge netfilter will also set the in-interface to the vlan interface; if such an interface exists. This is needed to make iptables REDIRECT target work with "vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to match the vlan device name. Also update Documentation with current brnf default settings. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-05-08 19:36:47 +02:00
David S. Miller	0d6c4a2e46	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-07 23:35:40 -04:00
Eric Dumazet	b49960a05e	tcp: change tcp_adv_win_scale and tcp_rmem[2] tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-02 21:08:58 -04:00
Yuchung Cheng	eed530b6c6	tcp: early retransmit This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-02 20:56:10 -04:00
Shan Wei	c60f6aa8ac	net: doc: merge /proc/sys/net/core/* documents into one place All parameter descriptions in /proc/sys/net/core/* now is separated two places. So, merge them into Documentation/sysctl/net.txt. Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-27 00:09:26 -04:00
Masanari Iida	c94bed8e19	Documentation: Fix typo in multiple files in Documentation Correct multiple spelling typo in Documentation. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Rob Landley <rob@landley.net> Reported-by: Anders Larsen <al@alarsen.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-04-16 14:37:13 +02:00
John W. Linville	7eab0f64a9	Merge branch 'master' into for-davem Conflicts: drivers/net/wireless/iwlwifi/iwl-testmode.c net/wireless/nl80211.c	2012-04-12 14:41:59 -04:00
John W. Linville	8065248069	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	2012-04-12 13:49:28 -04:00
Simon Wunderlich	9bf8e4d425	batman-adv: export claim tables through debugfs Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-04-11 14:28:59 +02:00
Simon Wunderlich	23721387c4	batman-adv: add basic bridge loop avoidance code This second version of the bridge loop avoidance for batman-adv avoids loops between the mesh and a backbone (usually a LAN). By connecting multiple batman-adv mesh nodes to the same ethernet segment a loop can be created when the soft-interface is bridged into that ethernet segment. A simple visualization of the loop involving the most common case - a LAN as ethernet segment: node1 <-- LAN --> node2 \| \| wifi <-- mesh --> wifi Packets from the LAN (e.g. ARP broadcasts) will circle forever from node1 or node2 over the mesh back into the LAN. With this patch, batman recognizes backbone gateways, nodes which are part of the mesh and backbone/LAN at the same time. Each backbone gateway "claims" clients from within the mesh to handle them exclusively. By restricting that only responsible backbone gateways may handle their claimed clients traffic, loops are effectively avoided. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>	2012-04-11 14:28:58 +02:00
Johannes Berg	24398e39c8	mac80211: set HT channel before association Changing the channel type during operation is confusing to some drivers and will be hard to handle in multi-channel scenarios. Instead of changing the channel, set it to the right HT channel before authenticating/associating and don't change it -- just update the 20/40 MHz restrictions in rate control as needed when changed by the AP. This also fixes a problem that Paul missed in his fix for the "regulatory makes us deaf" issue -- when we couldn't use 40 MHz we still associated saying we were using 40 MHz, which could in similarly broken APs make us never even connect successfully. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-04-10 14:54:07 -04:00
David S. Miller	06eb4eafbd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-04-10 14:30:45 -04:00
Ben Hutchings	e34fac1c2e	doc, net: Update ndo_start_xmit return type and values Commit `dc1f8bf68b` ('netdev: change transmit to limited range type') changed the required return type and `9a1654ba0b` ('net: Optimize hard_start_xmit() return checking') changed the valid numerical return values. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-06 02:43:13 -04:00
Ben Hutchings	de7aca16fd	doc, net: Remove instruction to set net_device::trans_start Commit `08baf56108` ('net: txq_trans_update() helper') made it unnecessary for most drivers to set net_device::trans_start (or netdev_queue::trans_start). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-06 02:43:13 -04:00
Ben Hutchings	b3cf65457f	doc, net: Update netdev operation names Commits `d314774cf2` ('netdev: network device operations infrastructure') and `008298231a` ('netdev: add more functions to netdevice ops') moved and renamed net device operation pointers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-06 02:43:12 -04:00
Ben Hutchings	04fd3d3515	doc, net: Update documentation of synchronisation for TX multiqueue Commits `e308a5d806` ('netdev: Add netdev->addr_list_lock protection.') and `e8a0464cc9` ('netdev: Allocate multiple queues for TX.') introduced more fine-grained locks. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-06 02:43:12 -04:00
Ben Hutchings	93b6a3adbd	doc, net: Remove obsolete reference to dev->poll Commit `bea3348eef` ('[NET]: Make NAPI polling independent of struct net_device objects.') removed the automatic disabling of NAPI polling by dev_close(), and drivers must now do this themselves. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-06 02:43:12 -04:00
Giuseppe CAVALLARO	cd7201f477	stmmac: MDC clock dynamically based on the csr clock input If a specific clk_csr value is passed from the platform this means that the CSR Clock Range selection cannot be changed at run-time and it is fixed (as reported in the driver documentation). Viceversa the driver will try to set the MDC clock dynamically according to the actual clock input. Signed-off-by: Deepak Sikri <deepak.sikri@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Reviewed-by: Francesco Virlinzi <francesco.virlinzi@st.com> Reviewed-by: David Laight <david.laight@aculab.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:39:24 -04:00
Deepak SIKRI	8327eb65e7	stmmac: re-work the internal GMAC DMA platf parameters This patch re-works the internal GMAC DMA parameters passed from the platform. In the past, we only passed the pbl but, with new core, other parameters can be passed and are mandatory on some platforms. New parameters are documented in stmmac.txt because this patch has an impact for many platforms. Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com> Signed-off-by: Vikas Manocha <vikas.manocha@st.com> Signed-off-by: Deepak Sikri <deepak.sikri@st.com> Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:39:24 -04:00
Deepak SIKRI	55f9a4d6fa	stmmac: Define CSUM offload engine Types This patch explicitly defines the CSUM offload engine type which need (not mandatory) to be passed from the platform code. STMMAC core supports two check sum offload engine types- Type-1 & Type-2. Also, there are STMMAC cores that do not have the check sum offload capabilities. The behaviour of Type-1 & Type-2 cores related to provision of checksum increases the packet length for Type-1 cores by 2, as the checksum is appended at the end of data packet and the same is made accountable in the DMA status. The STMMAC cores beyond Version-3.5 provide HW interface registers which allows the user to read the HW capabilities, while to support the previous cores the information related to HW capabilities has to be provided from the platform code. The Type-1 cores which do not have the HW register interface need this information. This patch also updates the driver's doc. Signed-off-by: Deepak Sikri <deepak.sikri@st.com> Hacked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:39:23 -04:00
Fernando Luis Vazquez Cao	5d6bd8619d	TCP: update ip_local_port_range documentation The explanation of ip_local_port_range in Documentation/networking/ip-sysctl.txt contains several factual errors: - The default value of ip_local_port_range does not depend on the amount of memory available in the system. - tcp_tw_recycle is not enabled by default. - 1024-4999 is not the default value. - Etc. Clean up the mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-03 17:38:55 -04:00
Lucas De Marchi	78286cdf05	Documentation: replace install commands with softdeps Install commands should not be used to specify soft dependencies among modules. When loading modules it's much better to have a softdep that modprobe knows what's being done than having to fork/exec another instance of modprobe to load the other module. By using a softdep user has also an option to remove the dependencies when removing the module (and if its refcount dropped to 0) Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-30 16:03:15 -07:00
Lucas De Marchi	970e248649	Documentation: remove references to /etc/modprobe.conf Usage of /etc/modprobe.conf file was deprecated by module-init-tools and is no longer parsed by new kmod tool. References to this file are replaced in Documentation, comments and Kconfig according to the context. There are also some references to the old /etc/modules.conf from 2.4 kernels that are being removed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-30 16:03:15 -07:00
Linus Torvalds	3556485f15	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates for 3.4 from James Morris: "The main addition here is the new Yama security module from Kees Cook, which was discussed at the Linux Security Summit last year. Its purpose is to collect miscellaneous DAC security enhancements in one place. This also marks a departure in policy for LSM modules, which were previously limited to being standalone access control systems. Chromium OS is using Yama, and I believe there are plans for Ubuntu, at least. This patchset also includes maintenance updates for AppArmor, TOMOYO and others." Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key rename. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits) AppArmor: Fix location of const qualifier on generated string tables TOMOYO: Return error if fails to delete a domain AppArmor: add const qualifiers to string arrays AppArmor: Add ability to load extended policy TOMOYO: Return appropriate value to poll(). AppArmor: Move path failure information into aa_get_name and rename AppArmor: Update dfa matching routines. AppArmor: Minor cleanup of d_namespace_path to consolidate error handling AppArmor: Retrieve the dentry_path for error reporting when path lookup fails AppArmor: Add const qualifiers to generated string tables AppArmor: Fix oops in policy unpack auditing AppArmor: Fix error returned when a path lookup is disconnected KEYS: testing wrong bit for KEY_FLAG_REVOKED TOMOYO: Fix mount flags checking order. security: fix ima kconfig warning AppArmor: Fix the error case for chroot relative path name lookup AppArmor: fix mapping of META_READ to audit and quiet flags AppArmor: Fix underflow in xindex calculation AppArmor: Fix dropping of allowed operations that are force audited AppArmor: Add mising end of structure test to caps unpacking ...	2012-03-21 13:25:04 -07:00

1 2 3 4 5 ...

761 Коммитов