WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Stephen Hemminger	25c4e53a4c	[PKTGEN]: use pr_debug Remove private debug macro and replace with standard version Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:28 -07:00
Eric Dumazet	fa438ccfdf	[NET]: Keep sk_backlog near sk_lock sk_backlog is a critical field of struct sock. (known famous words) It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(), tcp_v4_rcv(), sk_receive_skb(). It really makes sense to place it next to sk_lock, because sk_backlog is only used after sk_lock locked (and thus memory cache line in L1 cache). This should reduce cache misses and sk_lock acquisition time. (In theory, we could only move the head pointer near sk_lock, and leaving tail far away, because 'tail' is normally not so hot, but keep it simple :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:27 -07:00
Ilpo Järvinen	e317f6f69c	[TCP]: FRTO undo response falls back to ratehalving one if ECEd Undoing ssthresh is disabled in fastretrans_alert whenever FLAG_ECE is set by clearing prior_ssthresh. The clearing does not protect FRTO because FRTO operates before fastretrans_alert. Moving the clearing of prior_ssthresh earlier seems to be a suboptimal solution to the FRTO case because then FLAG_ECE will cause a second ssthresh reduction in try_to_open (the first occurred when FRTO was entered). So instead, FRTO falls back immediately to the rate halving response, which switches TCP to CA_CWR state preventing the latter reduction of ssthresh. If the first ECE arrived before the ACK after which FRTO is able to decide RTO as spurious, prior_ssthresh is already cleared. Thus no undoing for ssthresh occurs. Besides, FLAG_ECE should be set also in the following ACKs resulting in rate halving response that sees TCP is already in CA_CWR, which again prevents an extra ssthresh reduction on that round-trip. If the first ECE arrived before RTO, ssthresh has already been adapted and prior_ssthresh remains cleared on entry because TCP is in CA_CWR (the same applies also to a case where FRTO is entered more than once and ECE comes in the middle). High_seq must not be touched after tcp_enter_cwr because CWR round-trip calculation depends on it. I believe that after this patch, FRTO should be ECN-safe and even able to take advantage of synergy benefits. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:26 -07:00
Ilpo Järvinen	e01f9d7793	[TCP]: Complete icsk-to-local-variable change (in tcp_enter_cwr) A local variable for icsk was created but this change was missing. Spotted by Jarek Poplawski. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:25 -07:00
Ilpo Järvinen	3cfe3baaf0	[TCP]: Add two new spurious RTO responses to FRTO New sysctl tcp_frto_response is added to select amongst these responses: - Rate halving based; reuses CA_CWR state (default) - Very conservative; used to be the only one available (=1) - Undo cwr; undoes ssthresh and cwnd reductions (=2) The response with rate halving requires a new parameter to tcp_enter_cwr because FRTO has already reduced ssthresh and doing a second reduction there has to be prevented. In addition, to keep things nice on 80 cols screen, a local variable was added. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:23 -07:00
Ilpo Järvinen	c5e7af0df5	[TCP]: Correct reordering detection change (no FRTO case) The reordering detection must work also when FRTO has not been used at all which was the original intention of mine, just the expression of the idea was flawed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:22 -07:00
Eric Dumazet	54287cc178	[TCP]: Keep copied_seq, rcv_wup and rcv_next together. I noticed in oprofile study a cache miss in tcp_rcv_established() to read copied_seq. ffffffff80400a80 <tcp_rcv_established>: /* tcp_rcv_established total: 4034293 2.0400 */ 55493 0.0281 :ffffffff80400bc9: mov 0x4c8(%r12),%eax copied_seq 543103 0.2746 :ffffffff80400bd1: cmp 0x3e0(%r12),%eax rcv_nxt if (tp->copied_seq == tp->rcv_nxt && len - tcp_header_len <= tp->ucopy.len) { In this function, the cache line 0x4c0 -> 0x500 is used only for this reading 'copied_seq' field. rcv_wup and copied_seq should be next to rcv_nxt field, to lower number of active cache lines in hot paths. (tcp_rcv_established(), tcp_poll(), ...) As you suggested, I changed tcp_create_openreq_child() so that these fields are changed together, to avoid adding a new store buffer stall. Patch is 64bit friendly (no new hole because of alignment constraints) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:21 -07:00
Ilpo Järvinen	cf4c6bf83d	[TCP]: struct *sock argument renamed: sp -> sk In general, TCP code uses "sk" for struct sock pointer. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:20 -07:00
John Heffner	886236c124	[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:19 -07:00
Angelo P. Castellani	5ef814753e	[TCP] YeAH-TCP: algorithm implementation YeAH-TCP is a sender-side high-speed enabled TCP congestion control algorithm, which uses a mixed loss/delay approach to compute the congestion window. It's design goals target high efficiency, internal, RTT and Reno fairness, resilience to link loss while keeping network elements load as low as possible. For further details look here: http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf Signed-off-by: Angelo P. Castellani <angelo.castellani@gmail.con> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:18 -07:00
Ilpo Järvinen	4dc2665e36	[TCP]: SACK enhanced FRTO Implements the SACK-enhanced FRTO given in RFC4138 using the variant given in Appendix B. RFC4138, Appendix B: "This means that in order to declare timeout spurious, the TCP sender must receive an acknowledgment for non-retransmitted segment between SND.UNA and RecoveryPoint in algorithm step 3. RecoveryPoint is defined in conservative SACK-recovery algorithm [RFC3517]" The basic version of the FRTO algorithm can still be used also when SACK is enabled. To enabled SACK-enhanced version, tcp_frto sysctl is set to 2. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:16 -07:00
Ilpo Järvinen	288035f915	[TCP]: Prevent reordering adjustments during FRTO To be honest, I'm not too sure how the reord stuff works in the first place but this seems necessary. When FRTO has been active, the one and only retransmission could be unnecessary but the state and sending order might not be what the sacktag code expects it to be (to work correctly). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:15 -07:00
Ilpo Järvinen	66e93e45c0	[TCP] FRTO: Fake cwnd for ssthresh callback TCP without FRTO would be in Loss state with small cwnd. FRTO, however, leaves cwnd (typically) to a larger value which causes ssthresh to become too large in case RTO is triggered again compared to what conventional recovery would do. Because consecutive RTOs result in only a single ssthresh reduction, RTO+cumulative ACK+RTO pattern is required to trigger this event. A large comment is included for congestion control module writers trying to figure out what CA_EVENT_FRTO handler should do because there exists a remote possibility of incompatibility between FRTO and module defined ssthresh functions. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:14 -07:00
Ilpo Järvinen	d1a54c6a0a	[TCP] FRTO: Reverse RETRANS bit clearing logic Previously RETRANS bits were cleared on the entry to FRTO. We postpone that into tcp_enter_frto_loss, which is really the place were the clearing should be done anyway. This allows simplification of the logic from a clearing loop to the head skb clearing only. Besides, the other changes made in the previous patches to tcp_use_frto made it impossible for the non-SACKed FRTO to be entered if other than the head has been rexmitted. With SACK-enhanced FRTO (and Appendix B), however, there can be a number retransmissions in flight when RTO expires (same thing could happen before this patchset also with non-SACK FRTO). To not introduce any jumpiness into the packet counting during FRTO, instead of clearing RETRANS bits from skbs during entry, do it later on. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:13 -07:00
Ilpo Järvinen	46d0de4ed9	[TCP] FRTO: Entry is allowed only during (New)Reno like recovery This interpretation comes from RFC4138: "If the sender implements some loss recovery algorithm other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when earlier fast recovery is underway." I think the RFC means to say (especially in the light of Appendix B) that ...recovery is underway (not just fast recovery) or was underway when it was interrupted by an earlier (F-)RTO that hasn't yet been resolved (snd_una has not advanced enough). Thus, my interpretation is that whenever TCP has ever retransmitted other than head, basic version cannot be used because then the order assumptions which are used as FRTO basis do not hold. NewReno has only the head segment retransmitted at a time. Therefore, walk up to the segment that has not been SACKed, if that segment is not retransmitted nor anything before it, we know for sure, that nothing after the non-SACKed segment should be either. This assumption is valid because TCPCB_EVER_RETRANS does not leave holes but each non-SACKed segment is rexmitted in-order. Check for retrans_out > 1 avoids more expensive walk through the skb list, as we can know the result beforehand: F-RTO will not be allowed. SACKed skb can turn into non-SACked only in the extremely rare case of SACK reneging, in this case we might fail to detect retransmissions if there were them for any other than head. To get rid of that feature, whole rexmit queue would have to be walked (always) or FRTO should be prevented when SACK reneging happens. Of course RTO should still trigger after reneging which makes this issue even less likely to show up. And as long as the response is as conservative as it's now, nothing bad happens even then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:12 -07:00
Ilpo Järvinen	7c9a4a5b67	[TCP]: Prevent unrelated cwnd adjustment while using FRTO FRTO controls cwnd when it still processes the ACK input or it has just reverted back to conventional RTO recovery; the normal rules apply when FRTO has reverted to standard congestion control. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:11 -07:00
Ilpo Järvinen	94d0ea7786	[TCP] FRTO: frto_counter modulo-op converted to two assignments Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:10 -07:00
Ilpo Järvinen	52c63f1e86	[TCP]: Don't enter to fast recovery while using FRTO Because TCP is not in Loss state during FRTO recovery, fast recovery could be triggered by accident. Non-SACK FRTO is more robust than not yet included SACK-enhanced version (that can receiver high number of duplicate ACKs with SACK blocks during FRTO), at least with unidirectional transfers, but under extraordinary patterns fast recovery can be incorrectly triggered, e.g., Data loss+ACK losses => cumulative ACK with enough SACK blocks to meet sacked_out >= dupthresh condition). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:09 -07:00
Ilpo Järvinen	aa8b6a7ad1	[TCP] FRTO: Response should reset also snd_cwnd_cnt Since purpose is to reduce CWND, we prevent immediate growth. This is not a major issue nor is "the correct way" specified anywhere. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:08 -07:00
Ilpo Järvinen	95c4922bf9	[TCP] FRTO: fixes fallback to conventional recovery The FRTO detection did not care how ACK pattern affects to cwnd calculation of the conventional recovery. This caused incorrect setting of cwnd when the fallback becames necessary. The knowledge tcp_process_frto() has about the incoming ACK is now passed on to tcp_enter_frto_loss() in allowed_segments parameter that gives the number of segments that must be added to packets-in-flight while calculating the new cwnd. Instead of snd_una we use FLAG_DATA_ACKED in duplicate ACK detection because RFC4138 states (in Section 2.2): If the first acknowledgment after the RTO retransmission does not acknowledge all of the data that was retransmitted in step 1, the TCP sender reverts to the conventional RTO recovery. Otherwise, a malicious receiver acknowledging partial segments could cause the sender to declare the timeout spurious in a case where data was lost. If the next ACK after RTO is duplicate, we do not retransmit anything, which is equal to what conservative conventional recovery does in such case. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:07 -07:00
Ilpo Järvinen	6408d206c7	[TCP] FRTO: Ignore some uninteresting ACKs Handles RFC4138 shortcoming (in step 2); it should also have case c) which ignores ACKs that are not duplicates nor advance window (opposite dir data, winupdate). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:06 -07:00
Ilpo Järvinen	7b0eb22b1d	[TCP] FRTO: Use Disorder state during operation instead of Open Retransmission counter assumptions are to be changed. Forcing reason to do this exist: Using sysctl in check would be racy as soon as FRTO starts to ignore some ACKs (doing that in the following patches). Userspace may disable it at any moment giving nice oops if timing is right. frto_counter would be inaccessible from userspace, but with SACK enhanced FRTO retrans_out can include other than head, and possibly leaving it non-zero after spurious RTO, boom again. Luckily, solution seems rather simple: never go directly to Open state but use Disorder instead. This does not really change much, since TCP could anyway change its state to Disorder during FRTO using path tcp_fastretrans_alert -> tcp_try_to_open (e.g., when a SACK block makes ACK dubious). Besides, Disorder seems to be the state where TCP should be if not recovering (in Recovery or Loss state) while having some retransmissions in-flight (see tcp_try_to_open), which is exactly what happens with FRTO. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:05 -07:00
Ilpo Järvinen	7487c48c4f	[TCP] FRTO: Consecutive RTOs keep prior_ssthresh and ssthresh In case a latency spike causes more than one RTO, the later should not cause the already reduced ssthresh to propagate into the prior_ssthresh since FRTO declares all such RTOs spurious at once or none of them. In treating of ssthresh, we mimic what tcp_enter_loss() does. The previous state (in frto_counter) must be available until we have checked it in tcp_enter_frto(), and also ACK information flag in process_frto(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:04 -07:00
Ilpo Järvinen	30935cf4f9	[TCP] FRTO: Comment cleanup & improvement Moved comments out from the body of process_frto() to the head (preferred way; see Documentation/CodingStyle). Bonus: it's much easier to read in this compacted form. FRTO algorithm and implementation is described in greater detail. For interested reader, more information is available in RFC4138. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:03 -07:00
Ilpo Järvinen	bdaae17da8	[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c In addition, removed inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:02 -07:00
Ilpo Järvinen	9ead9a1d38	[TCP] FRTO: Separated response from FRTO detection algorithm FRTO spurious RTO detection algorithm (RFC4138) does not include response to a detected spurious RTO but can use different response algorithms. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:01 -07:00
Ilpo Järvinen	522e7548a9	[TCP] FRTO: Incorrectly clears TCPCB_EVER_RETRANS bit FRTO was slightly too brave... Should only clear TCPCB_SACKED_RETRANS bit. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:00 -07:00
Alexey Kuznetsov	1194ed0a3e	[NETLINK]: Infinite recursion in netlink. Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 13:07:28 -07:00
YOSHIFUJI Hideaki	a23cf14b16	IPv6: fix Routing Header Type 0 handling thinko Oops, thinko. The test for accempting a RH0 was exatly the wrong way around. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-24 19:26:06 -07:00
YOSHIFUJI Hideaki	0bcbc92629	[IPV6]: Disallow RH0 by default. A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-24 14:58:30 -07:00
Patrick McHardy	05d224468a	[XFRM]: beet: fix pseudo header length value draft-nikander-esp-beet-mode-07.txt is not entirely clear on how the length value of the pseudo header should be calculated, it states "The Header Length field contains the length of the pseudo header, IPv4 options, and padding in 8 octets units.", but also states "Length in octets (Header Len + 1) * 8". draft-nikander-esp-beet-mode-08-pre1.txt [1] clarifies this, the header length should not include the first 8 byte. This change affects backwards compatibility, but option encapsulation didn't work until very recently anyway. [1] http://users.piuha.net/jmelen/BEET/draft-nikander-esp-beet-mode-08-pre1.txt Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-23 22:39:02 -07:00
Stephen Hemminger	4d4d3d1e88	[TCP]: Congestion control initialization. Change to defer congestion control initialization. If setsockopt() was used to change TCP_CONGESTION before connection is established, then protocols that use sequence numbers to keep track of one RTT interval (vegas, illinois, ...) get confused. Change the init hook to be called after handshake. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-23 22:32:11 -07:00
Trond Myklebust	241c39b9ac	RPC: Fix the TCP resend semantics for NFSv4 Fix a regression due to the patch "NFS: disconnect before retrying NFSv4 requests over TCP" The assumption made in xprt_transmit() that the condition "req->rq_bytes_sent == 0 and request is on the receive list" should imply that we're dealing with a retransmission is false. Firstly, it may simply happen that the socket send queue was full at the time the request was initially sent through xprt_transmit(). Secondly, doing this for each request that was retransmitted implies that we disconnect and reconnect for _every_ request that happened to be retransmitted irrespective of whether or not a disconnection has already occurred. Fix is to move this logic into the call_status request timeout handler. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:30 -07:00
Denis Lunev	ac57b3a9ce	[NETLINK]: Don't attach callback to a going-away netlink socket There is a race between netlink_dump_start() and netlink_release() that can lead to the situation when a netlink socket with non-zero callback is freed. Here it is: CPU1: CPU2 netlink_release(): netlink_dump_start(): sk = netlink_lookup(); /* OK / netlink_remove(); spin_lock(&nlk->cb_lock); if (nlk->cb) { / false / ... } spin_unlock(&nlk->cb_lock); spin_lock(&nlk->cb_lock); if (nlk->cb) { / false / ... } nlk->cb = cb; spin_unlock(&nlk->cb_lock); ... sock_orphan(sk); / * proceed with releasing * the socket */ The proposal it to make sock_orphan before detaching the callback in netlink_release() and to check for the sock to be SOCK_DEAD in netlink_dump_start() before setting a new callback. Signed-off-by: Denis Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 17:05:58 -07:00
Olaf Kirch	bfb6709d0b	[IrDA]: Correctly handling socket error This patch fixes an oops first reported in mid 2006 - see http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that when an error is signalled on the socket, irda_recvmsg_stream returns without removing a local wait_queue variable from the socket's sk_sleep queue. This causes havoc further down the road. In response to this problem, a patch was made that invoked sock_orphan on the socket when receiving a disconnect indication. This is not a good fix, as this sets sk_sleep to NULL, causing applications sleeping in recvmsg (and other places) to oops. This is against the latest net-2.6 and should be considered for -stable inclusion. Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 15:07:22 -07:00
Vlad Yasevich	d0cf0d9940	[SCTP]: Do not interleave non-fragments when in partial delivery The way partial delivery is currently implemnted, it is possible to intereleave a message (either from another steram, or unordered) that is not part of partial delivery process. The only way to this is for a message to not be a fragment and be 'in order' or unorderd for a given stream. This will result in bypassing the reassembly/ordering queues where things live duing partial delivery, and the message will be delivered to the socket in the middle of partial delivery. This is a two-fold problem, in that: 1. the app now must check the stream-id and flags which it may not be doing. 2. this clearing partial delivery state from the association and results in ulp hanging. This patch is a band-aid over a much bigger problem in that we don't do stream interleave. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 14:16:09 -07:00
David S. Miller	fefaa75e04	[IPSEC] af_key: Fix thinko in pfkey_xfrm_policy2msg() Make sure to actually assign the determined mode to rq->sadb_x_ipsecrequest_mode. Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 14:16:07 -07:00
Linus Torvalds	80d74d5123	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [BRIDGE]: Unaligned access when comparing ethernet addresses [SCTP]: Unmap v4mapped addresses during SCTP_BINDX_REM_ADDR operation. [SCTP]: Fix assertion (!atomic_read(&sk->sk_rmem_alloc)) failed message [NET]: Set a separate lockdep class for neighbour table's proxy_queue [NET]: Fix UDP checksum issue in net poll mode. [KEY]: Fix conversion between IPSEC_MODE_xxx and XFRM_MODE_xxx. [NET]: Get rid of alloc_skb_from_cache	2007-04-17 16:51:32 -07:00
NeilBrown	30f3deeee8	knfsd: use a spinlock to protect sk_info_authunix sk_info_authunix is not being protected properly so the object that it points to can be cache_put twice, leading to corruption. We borrow svsk->sk_defer_lock to provide the protection. We should probably rename that lock to have a more generic name - later. Thanks to Gabriel for reporting this. Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: Gabriel Barazer <gabriel@oxeva.fr> Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-17 16:36:27 -07:00
Evgeny Kravtsunov	19bb3506e2	[BRIDGE]: Unaligned access when comparing ethernet addresses compare_ether_addr() implicitly requires that the addresses passed are 2-bytes aligned in memory. This is not true for br_stp_change_bridge_id() and br_stp_recalculate_bridge_id() in which one of the addresses is unsigned char *, and thus may not be 2-bytes aligned. Signed-off-by: Evgeny Kravtsunov <emkravts@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org>	2007-04-17 14:16:00 -07:00
Paolo Galtieri	0304ff8a2d	[SCTP]: Unmap v4mapped addresses during SCTP_BINDX_REM_ADDR operation. During the sctp_bindx() call to add additional addresses to the endpoint, any v4mapped addresses are converted and stored as regular v4 addresses. However, when trying to remove these addresses, the v4mapped addresses are not converted and the operation fails. This patch unmaps the addresses on during the remove operation as well. Signed-off-by: Paolo Galtieri <pgaltieri@mvista.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:42 -07:00
Tsutomu Fujii	ea2bc483ff	[SCTP]: Fix assertion (!atomic_read(&sk->sk_rmem_alloc)) failed message In current implementation, LKSCTP does receive buffer accounting for data in sctp_receive_queue and pd_lobby. However, LKSCTP don't do accounting for data in frag_list when data is fragmented. In addition, LKSCTP doesn't do accounting for data in reasm and lobby queue in structure sctp_ulpq. When there are date in these queue, assertion failed message is printed in inet_sock_destruct because sk_rmem_alloc of oldsk does not become 0 when socket is destroyed. Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:37 -07:00
Pavel Emelianov	c2ecba7171	[NET]: Set a separate lockdep class for neighbour table's proxy_queue Otherwise the following calltrace will lead to a wrong lockdep warning: neigh_proxy_process() `- lock(neigh_table->proxy_queue.lock); arp_redo /* via tbl->proxy_redo */ arp_process neigh_event_ns neigh_update skb_queue_purge `- lock(neighbor->arp_queue.lock); This is not a deadlock actually, as neighbor table's proxy_queue and the neighbor's arp_queue are different queues. Lockdep thinks there is a deadlock as both queues are initialized with skb_queue_head_init() and thus have a common class. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:31 -07:00
Aubrey.Li	5e7d7fa573	[NET]: Fix UDP checksum issue in net poll mode. In net poll mode, the current checksum function doesn't consider the kind of packet which is padded to reach a specific minimum length. I believe that's the problem causing my test case failed. The following patch fixed this issue. Signed-off-by: Aubrey.Li <aubreylee@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:26 -07:00
Kazunori MIYAZAWA	55569ce256	[KEY]: Fix conversion between IPSEC_MODE_xxx and XFRM_MODE_xxx. We should not blindly convert between IPSEC_MODE_xxx and XFRM_MODE_xxx just by incrementing / decrementing because the assumption is not true any longer. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Singed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-17 13:13:21 -07:00
Herbert Xu	b4dfa0b1fb	[NET]: Get rid of alloc_skb_from_cache Since this was added originally for Xen, and Xen has recently (~2.6.18) stopped using this function, we can safely get rid of it. Good timing too since this function has started to bit rot. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:16 -07:00
Linus Torvalds	b1847a041a	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Fix inline directive in pci_iommu.c [SPARC64]: Fix arg passing to compat_sys_ipc(). [SPARC]: Fix section mismatch warnings in pci.c and pcic.c [SUNRPC]: Make sure on-stack cmsg buffer is properly aligned. [SPARC]: avoid CHILD_MAX and OPEN_MAX constants [SPARC64]: Fix SBUS IOMMU allocation code.	2007-04-13 18:20:39 -07:00
David S. Miller	49688c8431	[NETFILTER] arp_tables: Fix unaligned accesses. There are two device string comparison loops in arp_packet_match(). The first one goes byte-by-byte but the second one tries to be clever and cast the string to a long and compare by longs. The device name strings in the arp table entries are not guarenteed to be aligned enough to make this value, so just use byte-by-byte for both cases. Based upon a report by <drraid@gmail.com>. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-13 16:37:54 -07:00
YOSHIFUJI Hideaki	612f09e849	[IPV6] SNMP: Fix {In,Out}NoRoutes statistics. A packet which is being discarded because of no routes in the forwarding path should not be counted as OutNoRoutes but as InNoRoutes. Additionally, on this occasion, a packet whose destinaion is not valid should be counted as InAddrErrors separately. Based on patch from Mitsuru Chinen <mitch@linux.vnet.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-13 16:18:02 -07:00
Joy Latten	661697f728	[IPSEC] XFRM_USER: kernel panic when large security contexts in ACQUIRE When sending a security context of 50+ characters in an ACQUIRE message, following kernel panic occurred. kernel BUG in xfrm_send_acquire at net/xfrm/xfrm_user.c:1781! cpu 0x3: Vector: 700 (Program Check) at [c0000000421bb2e0] pc: c00000000033b074: .xfrm_send_acquire+0x240/0x2c8 lr: c00000000033b014: .xfrm_send_acquire+0x1e0/0x2c8 sp: c0000000421bb560 msr: 8000000000029032 current = 0xc00000000fce8f00 paca = 0xc000000000464b00 pid = 2303, comm = ping kernel BUG in xfrm_send_acquire at net/xfrm/xfrm_user.c:1781! enter ? for help 3:mon> t [c0000000421bb650] c00000000033538c .km_query+0x6c/0xec [c0000000421bb6f0] c000000000337374 .xfrm_state_find+0x7f4/0xb88 [c0000000421bb7f0] c000000000332350 .xfrm_tmpl_resolve+0xc4/0x21c [c0000000421bb8d0] c0000000003326e8 .xfrm_lookup+0x1a0/0x5b0 [c0000000421bba00] c0000000002e6ea0 .ip_route_output_flow+0x88/0xb4 [c0000000421bbaa0] c0000000003106d8 .ip4_datagram_connect+0x218/0x374 [c0000000421bbbd0] c00000000031bc00 .inet_dgram_connect+0xac/0xd4 [c0000000421bbc60] c0000000002b11ac .sys_connect+0xd8/0x120 [c0000000421bbd90] c0000000002d38d0 .compat_sys_socketcall+0xdc/0x214 [c0000000421bbe30] c00000000000869c syscall_exit+0x0/0x40 --- Exception: c00 (System Call) at 0000000007f0ca9c SP (fc0ef8f0) is in userspace We are using size of security context from xfrm_policy to determine how much space to alloc skb and then putting security context from xfrm_state into skb. Should have been using size of security context from xfrm_state to alloc skb. Following fix does that Signed-off-by: Joy Latten <latten@austin.ibm.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-13 16:14:35 -07:00
Jerome Borsboom	279e172a58	[VLAN]: Allow VLAN interface on top of bridge interface When a VLAN interface is created on top of a bridge interface and netfilter is enabled to see the bridged packets, the packets can be corrupted when passing through the netfilter code. This is caused by the VLAN driver not setting the 'protocol' and 'nh' members of the sk_buff structure. In general, this is no problem as the VLAN interface is mostly connected to a physical ethernet interface which does not use the 'protocol' and 'nh' members. For a bridge interface, however, these members do matter. Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-13 16:12:47 -07:00
Andrew Morton	09fe3ef46c	[PKTGEN]: Add try_to_freeze() The pktgen module prevents suspend-to-disk. Fix. Acked-by: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-12 14:45:32 -07:00
Patrick McHardy	01102e7ca2	[NETFILTER]: ipt_ULOG: use put_unaligned Use put_unaligned to fix warnings about unaligned accesses. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-12 14:27:03 -07:00
David S. Miller	bc375ea7ef	[SUNRPC]: Make sure on-stack cmsg buffer is properly aligned. Based upon a report from Meelis Roos. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-12 13:35:59 -07:00
Jaroslav Kysela	50c9cc2e54	[NETFILTER]: ipt_CLUSTERIP: fix oops in checkentry function The clusterip_config_find_get() already increases entries reference counter, so there is no reason to do it twice in checkentry() callback. This causes the config to be freed before it is removed from the list, resulting in a crash when adding the next rule. Signed-off-by: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-10 13:26:48 -07:00
David S. Miller	15d33c070d	[TCP]: slow_start_after_idle should influence cwnd validation too For the cases that slow_start_after_idle are meant to deal with, it is almost a certainty that the congestion window tests will think the connection is application limited and we'll thus decrease the cwnd there too. This defeats the whole point of setting slow_start_after_idle to zero. So test it there too. We do not cancel out the entire tcp_cwnd_validate() function so that if the sysctl is changed we still have the validation state maintained. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-09 13:31:15 -07:00
Patrick McHardy	bb8a954f27	[NET_SCHED]: cls_tcindex: fix compatibility breakage Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed to expect and use a u16 value in 2.6.11, which broke compatibility on big endian machines. Change back to use int. Reported by Ole Reinartz <ole.reinartz@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-09 13:31:13 -07:00
David S. Miller	161980f4c6	[IPV6]: Revert recent change to rt6_check_dev(). This reverts `a0d78ebf3a` It causes pings to link-local addresses to fail. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-06 11:42:27 -07:00
Patrick McHardy	254d0d24e3	[XFRM]: beet: fix IP option decapsulation Beet mode looks for the beet pseudo header after the outer IP header, which is wrong since that is followed by the ESP header. Additionally it needs to adjust the packet length after removing the pseudo header and point the data pointer to the real data location. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 16:03:33 -07:00
Patrick McHardy	d4b1e84062	[XFRM]: beet: fix beet mode decapsulation Beet mode decapsulation fails to properly set up the skb pointers, which only works by accident in combination with CONFIG_NETFILTER, since in that case the skb is fixed up in xfrm4_input before passing it to the netfilter hooks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 15:59:41 -07:00
Patrick McHardy	04fef9893a	[XFRM]: beet: use IPOPT_NOP for option padding draft-nikander-esp-beet-mode-07.txt states "The padding MUST be filled with NOP options as defined in Internet Protocol [1] section 3.1 Internet header format.", so do that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 15:54:39 -07:00
Patrick McHardy	c5027c9a89	[XFRM]: beet: fix IP option encapsulation Beet mode calculates an incorrect value for the transport header location when IP options are present, resulting in encapsulation errors. The correct location is 4 or 8 bytes before the end of the original IP header, depending on whether the pseudo header is padded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 15:54:02 -07:00
Herbert Xu	4c4d51a731	[IPSEC]: Reject packets within replay window but outside the bit mask Up until this point we've accepted replay window settings greater than 32 but our bit mask can only accomodate 32 packets. Thus any packet with a sequence number within the window but outside the bit mask would be accepted. This patch causes those packets to be rejected instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-05 00:07:39 -07:00
Mitsuru Chinen	60e5c16641	[IPv6]: Exclude truncated packets from InHdrErrors statistics Incoming trancated packets are counted as not only InTruncatedPkts but also InHdrErrors. They should be counted as InTruncatedPkts only. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-04 23:54:59 -07:00
Jean Delvare	75559c167b	[APPLETALK]: Fix a remotely triggerable crash When we receive an AppleTalk frame shorter than what its header says, we still attempt to verify its checksum, and trip on the BUG_ON() at the end of function atalk_sum_skb() because of the length mismatch. This has security implications because this can be triggered by simply sending a specially crafted ethernet frame to a target victim, effectively crashing that host. Thus this qualifies, I think, as a remote DoS. Here is the frame I used to trigger the crash, in npg format: <Appletalk Killer> { # Ethernet header ----- XX XX XX XX XX XX # Destination MAC 00 00 00 00 00 00 # Source MAC 00 1D # Length # LLC header ----- AA AA 03 08 00 07 80 9B # Appletalk # Appletalk header ----- 00 1B # Packet length (invalid) 00 01 # Fake checksum 00 00 00 00 # Destination and source networks 00 00 00 00 # Destination and source nodes and ports # Payload ----- 0C 0D 0E 0F 10 11 12 13 14 } The destination MAC address must be set to those of the victim. The severity is mitigated by two requirements: * The target host must have the appletalk kernel module loaded. I suspect this isn't so frequent. * AppleTalk frames are non-IP, thus I guess they can only travel on local networks. I am no network expert though, maybe it is possible to somehow encapsulate AppleTalk packets over IP. The bug has been reported back in June 2004: http://bugzilla.kernel.org/show_bug.cgi?id=2979 But it wasn't investigated, and was closed in July 2006 as both reporters had vanished meanwhile. This code was new in kernel 2.6.0-test5: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=7ab442d7e0a76402c12553ee256f756097cae2d2 And not modified since then, so we can assume that vanilla kernels 2.6.0-test5 and later, and distribution kernels based thereon, are affected. Note that I still do not know for sure what triggered the bug in the real-world cases. The frame could have been corrupted by the kernel if we have a bug hiding somewhere. But more likely, we are receiving the faulty frame from the network. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-04 23:52:46 -07:00
Adrian Bunk	418106d624	[PATCH] net/sunrpc/svcsock.c: fix a check The return value of kernel_recvmsg() should be assigned to "err", not compared with the random value of a never initialized "err" (and the "< 0" check wrongly always returned false since == comparisons never have a result < 0). Spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 21:12:47 -07:00
Eric W. Biederman	927498217c	[PATCH] net: Ignore sysfs network device rename bugs. The generic networking code ensures that no two networking devices have the same name, so there is no time except when sysfs has implementation bugs that device_rename when called from dev_change_name will fail. The current error handling for errors from device_rename in dev_change_name is wrong and results in an unusable and unrecoverable network device if device_rename is happens to return an error. This patch removes the buggy error handling. Which confines the mess when device_rename hits a problem to sysfs, instead of propagating it the rest of the network stack. Making linux a little more robust. Without this patch you can observe what happens when sysfs has a bug when CONFIG_SYSFS_DEPRECATED is not set and you attempt to rename a real network device to a name like (broken_parity_status, device, modalias, power, resource2, subsystem_vendor, class, driver, irq, msi_bus, resource, subsystem, uevent, config, enable, local_cpus, numa_node, resource0, subsystem_device, vendor) Greg has a patch that fixes the sysfs bugs but he doesn't trust it for a 2.6.21 timeframe. This patch which just ignores errors should be safe and it keeps the system from going completely wacky. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-04 08:51:52 -07:00
John Heffner	84565070e4	[TCP]: Do receiver-side SWS avoidance for rcvbuf < MSS. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:56:32 -07:00
YOSHIFUJI Hideaki	b59e139bbd	[IPv6]: Fix incorrect length check in rawv6_sendmsg() In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says: > From: Sridhar Samudrala <sri@us.ibm.com> > Date: Thu, 29 Mar 2007 14:17:28 -0700 > > > The check for length in rawv6_sendmsg() is incorrect. > > As len is an unsigned int, (len < 0) will never be TRUE. > > I think checking for IPV6_MAXPLEN(65535) is better. > > > > Is it possible to send ipv6 jumbo packets using raw > > sockets? If so, we can remove this check. > > I don't see why such a limitation against jumbo would exist, > does anyone else? > > Thanks for catching this Sridhar. A good compiler should simply > fail to compile "if (x < 0)" when 'x' is an unsigned type, don't > you think :-) Dave, we use "int" for returning value, so we should fix this anyway, IMHO; we should not allow len > INT_MAX. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:30:54 -07:00
Patrick McHardy	31ba548f96	[NET_SCHED]: cls_basic: fix memory leak in basic_destroy tp->root is not freed on destruction. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:30:52 -07:00
Steven Whitehouse	83886b6b63	[NET]: Change "not found" return value for rule lookup This changes the "not found" error return for the lookup function to -ESRCH so that it can be distinguished from the case where a rule or route resulting in -ENETUNREACH has been found during the search. It fixes a bug where if DECnet was compiled with routing support, but no routes were added to the routing table, it was failing to fall back to endnode routing. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:30:51 -07:00
Linus Torvalds	9415fddd99	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [IFB]: Fix crash on input device removal [BNX2]: Fix link interrupt problem.	2007-03-29 13:15:13 -07:00
Patrick McHardy	c01003c205	[IFB]: Fix crash on input device removal The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-29 11:46:52 -07:00
Jiri Kosina	cb3fecc2f2	[PATCH] bluetooth hid quirks: mightymouse quirk I have a bugreport that scrollwheel of bluetooth version of apple mightymouse doesn't work. The USB version of mightymouse works, as there is a quirk for handling scrollwheel in hid/usbhid for it. Now that bluetooth git tree is hooked to generic hid layer, it could easily use the quirks which are already present in generic hid parser, hid-input, etc. Below is a simple patch against bluetooth git tree, which adds quirk handling to current bluetooth hidp code, and sets quirk flags for device 0x05ac/0x030c, which is the bluetooth version of the apple mightymouse. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-29 08:22:24 -07:00
Linus Torvalds	c203b33d2e	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV	2007-03-28 14:00:27 -07:00
Linus Torvalds	4db43e677e	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: SUN3/3X Lance trivial fix improved mv643xx_eth: Fix use of uninitialized port_num field forcedeth: fix tx timeout forcedeth: fix nic poll qla3xxx: bugfix: Jumbo frame handling. qla3xxx: bugfix: Dropping interrupt under heavy network load. qla3xxx: bugfix: Multi segment sends were getting whacked. qla3xxx: bugfix: Add tx control block memset. atl1: remove unnecessary crc inversion myri10ge: correctly detect when TSO should be used [PATCH] WE-22 : prevent information leak on 64 bit [PATCH] wext: Add missing ioctls to 64<->32 conversion [PATCH] bcm43xx: Fix machine check on PPC for version 1 PHY [PATCH] bcm43xx: fix radio_set_tx_iq [PATCH] bcm43xx: Fix code for confusion between PHY revision and PHY version	2007-03-28 13:45:13 -07:00
Arnaldo Carvalho de Melo	39ebc0276b	[DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV We were only checking if there was enough space to put the int, but left len as specified by the (malicious) user, sigh, fix it by setting len to sizeof(val) and transfering just one int worth of data, the one asked for. Also check for negative len values. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-28 11:54:32 -07:00
Jeff Garzik	a9c87a10db	Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes	2007-03-28 02:21:18 -04:00
Herbert Xu	53aadcc909	[IPV6]: Set IF_READY if the device is up and has carrier We still need to set the IF_READY flag in ipv6_add_dev for the case where all addresses (including the link-local) are deleted and then recreated. In that case the IPv6 device too will be destroyed and then recreated. In order to prevent the original problem, we simply ensure that the device is up before setting IF_READY. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-27 14:31:52 -07:00
Patrick McHardy	c38c83cb70	[NET_SCHED]: sch_htb/sch_hfsc: fix oops in qlen_notify During both HTB and HFSC class deletion the class is removed from the class hash before calling qdisc_tree_decrease_qlen. This makes the ->get operation in qdisc_tree_decrease_qlen fail, so it passes a NULL pointer to ->qlen_notify, causing an oops. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-27 14:04:24 -07:00
Jean Tourrilhes	c2805fbb86	[PATCH] WE-22 : prevent information leak on 64 bit Johannes Berg discovered that kernel space was leaking to userspace on 64 bit platform. He made a first patch to fix that. This is an improved version of his patch. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-03-27 14:10:26 -04:00
Robert P. J. Day	9b2f7bcf0e	[NET]: Remove dead net/sched/Makefile entry for sch_hpfq.o. Remove the worthless net/sched/Makefile entry for the non-existent source file sch_hpfq.c. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-26 16:20:34 -07:00
Robert Olsson	d562f1f8a9	[IPV4] fib_trie: Document locking. Paul E. McKenney writes: > Those of use who dive into networking only occasionally would much > appreciate this. ;-) No problem here... Acked-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (but trivial) Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-26 14:22:22 -07:00
Alexey Dobriyan	79f4f6428f	[NET]: Correct accept(2) recovery after sock_attach_fd() * d_alloc() in sock_attach_fd() fails leaving ->f_dentry of new file NULL * bail out to out_fd label, doing fput()/__fput() on new file * but __fput() assumes valid ->f_dentry and dereferences it Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-26 14:09:52 -07:00
Patrick McHardy	035832a280	[NET_SCHED]: Fix ingress locking Ingress queueing uses a seperate lock for serializing enqueue operations, but fails to properly protect itself against concurrent changes to the qdisc tree. Use queue_lock for now since the real fix it quite intrusive. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:12 -07:00
Patrick McHardy	d3fa76ee6b	[NET_SCHED]: cls_basic: fix NULL pointer dereference cls_basic doesn't allocate tp->root before it is linked into the active classifier list, resulting in a NULL pointer dereference when packets hit the classifier before its ->change function is called. Reported by Chris Madden <chris@reflexsecurity.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:11 -07:00
Adrian Bunk	c93a882ebe	[DCCP]: make dccp_write_xmit_timer() static again dccp_write_xmit_timer() needlessly became global. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:10 -07:00
David S. Miller	f11e6659ce	[IPV6]: Fix routing round-robin locking. As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:05 -07:00
Thomas Graf	a979101106	[DECNet] fib: Fix out of bound access of dn_fib_props[] Fixes a typo which caused fib_props[] to have the wrong size and makes sure the value used to index the array which is provided by userspace via netlink is checked to avoid out of bound access. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:04 -07:00
Thomas Graf	a0ee18b9b7	[IPv4] fib: Fix out of bound access of fib_props[] Fixes a typo which caused fib_props[] to have the wrong size and makes sure the value used to index the array which is provided by userspace via netlink is checked to avoid out of bound access. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:03 -07:00
Ralf Baechle	954b2e7f4c	[NET] AX.25 Kconfig and docs updates and fixes o The AX.25 Howto is unmaintained since several years. I've replaced it with a wiki at http://www.linux-ax25.org which provides more uptodate information. o Change default for AX25_DAMA_SLAVE to Y. AX25_DAMA_SLAVE only compiles in support for DAMA but doesn't activate it. I hope this gets Linux distributions to ship their AX.25 kernels with AX25_DAMA_SLAVE enabled. The price for this would be very small. o Delete historic changelog from comments, that's what SCM systems are meant to do. o ---help--- in Kconfig looks so yellingly eye insulting. Use just help. o Rewrite the commented out piece of old Linux 2.4 configuration language to Kconfig for consistency. o Fixup dependencies. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:02 -07:00
Alexey Kuznetsov	ecbb416939	[NET]: Fix neighbour destructor handling. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms etc. The device should guarantee that dead neighbor entries (neigh->dead != 0) do not get private part initialized, otherwise nobody will cleanup it. I think this is enough for ipoib which is the only user of this thing. Initialization private part of neighbor entries happens in ipib start_xmit routine, which is not reached when device is down. But it would be better to add explicit test for neigh->dead in any case. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:01 -07:00
Thomas Graf	e1701c68c1	[NET]: Fix fib_rules compatibility breakage Based upon a patch from Patrick McHardy. The fib_rules netlink attribute policy introduced in 2.6.19 broke userspace compatibilty. When specifying a rule with "from all" or "to all", iproute adds a zero byte long netlink attribute, but the policy requires all addresses to have a size equal to sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a validation error. Check attribute length of FRA_SRC/FRA_DST in the generic framework by letting the family specific rules implementation provide the length of an address. Report an error if address length is non zero but no address attribute is provided. Fix actual bug by checking address length for non-zero instead of relying on availability of attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:00 -07:00
Patrick Ringl	0fa7d868ca	[PATCH] fix typos in net/ieee80211/Kconfig This is just a QA / cosmetic fix .. [ "a modules" => "a module" ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-03-24 16:51:53 -07:00
Patrick McHardy	848c29fd64	[NETFILTER]: nat: avoid rerouting packets if only XFRM policy key changed Currently NAT not only reroutes packets in the OUTPUT chain when the routing key changed, but also if only the non-routing part of the IPsec policy key changed. This breaks ping -I since it doesn't use SO_BINDTODEVICE but IP_PKTINFO cmsg to specify the output device, and this information is lost. Only do full rerouting if the routing key changed, and just do a new policy lookup with the old route if only the ports changed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:30:29 -07:00
Patrick McHardy	ca8fbb859c	[NETFILTER]: nf_conntrack_netlink: add missing dependency on NF_NAT NF_CT_NETLINK=y, NF_NAT=m results in: LD .tmp_vmlinux1 net/built-in.o: dans la fonction « nfnetlink_parse_nat_proto »: nf_conntrack_netlink.c:(.text+0x28db9): référence indéfinie vers « nf_nat_proto_find_get » nf_conntrack_netlink.c:(.text+0x28dd6): référence indéfinie vers « nf_nat_proto_put » net/built-in.o: dans la fonction « ctnetlink_new_conntrack »: nf_conntrack_netlink.c:(.text+0x29959): référence indéfinie vers « nf_nat_setup_info » nf_conntrack_netlink.c:(.text+0x29b35): référence indéfinie vers « nf_nat_setup_info » nf_conntrack_netlink.c:(.text+0x29cf7): référence indéfinie vers « nf_nat_setup_info » nf_conntrack_netlink.c:(.text+0x29de2): référence indéfinie vers « nf_nat_setup_info » make: *** [.tmp_vmlinux1] Erreur 1 Reported by Kevin Baradon <kevin.baradon@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:29:57 -07:00
Dave Jones	b6f99a2119	[NET]: fix up misplaced inlines. Turning up the warnings on gcc makes it emit warnings about the placement of 'inline' in function declarations. Here's everything that was under net/ Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:27:49 -07:00
Vlad Yasevich	289f42492c	[SCTP]: Correctly reset ssthresh when restarting association Reset ssthresh to the correct value (peer's a_rwnd) when restarting association. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:26:25 -07:00
Patrick McHardy	b19cbe2a16	[BRIDGE]: Fix fdb RCU race br_fdb_get use atomic_inc to increase the refcount of an element found on a RCU protected list, which can lead to the following race: CPU0 CPU1 br_fdb_get: rcu_read_lock __br_fdb_get: find element fdb_delete: hlist_del_rcu br_fdb_put br_fdb_put: atomic_dec_and_test call_rcu(fdb_rcu_free) br_fdb_get: atomic_inc rcu_read_unlock fdb_rcu_free: kmem_cache_free Use atomic_inc_not_zero instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:25:20 -07:00
Patrick McHardy	ec25615b9d	[NET]: Fix fib_rules dump race fib_rules_dump needs to use list_for_each_entry_rcu to protect against concurrent changes to the rules list. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:24:38 -07:00

1 2 3 4 5 ...

4416 Коммитов