WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Nicolas Dichtel	575659936f	sctp: fix a typo in prototype of __sctp_rcv_lookup() Just to avoid confusion when people only reads this prototype. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-04 15:53:48 -04:00
Linus Torvalds	aab174f0df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...	2012-10-02 20:25:04 -07:00
Al Viro	56b31d1c9f	unexport sock_map_fd(), switch to sock_alloc_file() Both modular callers of sock_map_fd() had been buggy; sctp one leaks descriptor and file if copy_to_user() fails, 9p one shouldn't be exposing file in the descriptor table at all. Switch both to sock_alloc_file(), export it, unexport sock_map_fd() and make it static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:08:50 -04:00
David S. Miller	b48b63a1f6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-15 11:43:53 -04:00
Wei Yongjun	54a2792423	sctp: use list_move_tail instead of list_del/list_add_tail Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-04 14:16:13 -04:00
Thomas Graf	4c3a5bdae2	sctp: Don't charge for data in sndbuf again when transmitting packet SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up by __sctp_write_space(). Buffer space charged via sctp_set_owner_w() is released in sctp_wfree() which calls __sctp_write_space() directly. Buffer space charged via skb_set_owner_w() is released via sock_wfree() which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set. sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets. Therefore if sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ... Charging for the data twice does not make sense in the first place, it leads to overcharging sndbuf by a factor 2. Therefore this patch only charges a single byte in wmem_alloc when transmitting an SCTP packet to ensure that the socket stays alive until the packet has been released. This means that control chunks are no longer accounted for in wmem_alloc which I believe is not a problem as skb->truesize will typically lead to overcharging anyway and thus compensates for any control overhead. Signed-off-by: Thomas Graf <tgraf@suug.ch> CC: Vlad Yasevich <vyasevic@redhat.com> CC: Neil Horman <nhorman@tuxdriver.com> CC: David Miller <davem@davemloft.net> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-03 13:24:13 -04:00
David S. Miller	e6acb38480	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace This is an initial merge in of Eric Biederman's work to start adding user namespace support to the networking. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-24 18:54:37 -04:00
Dan Carpenter	02644a1745	sctp: fix bogus if statement in sctp_auth_recv_cid() There is an extra semi-colon here, so we always return 0 instead of calling __sctp_auth_cid(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-16 13:36:29 -07:00
Ulrich Weber	6932f119bd	sctp: fix compile issue with disabled CONFIG_NET_NS struct seq_net_private has no struct net if CONFIG_NET_NS is not enabled Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-16 13:36:29 -07:00
Eric W. Biederman	e1fc3b14f9	sctp: Make sysctl tunables per net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:32:16 -07:00
Eric W. Biederman	f53b5b097e	sctp: Push struct net down into sctp_verify_ext_param Add struct net as a parameter to sctp_verify_param so it can be passed to sctp_verify_ext_param where struct net will be needed when the sctp tunables become per net tunables. Add struct net as a parameter to sctp_verify_init so struct net can be passed to sctp_verify_param. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:37 -07:00
Eric W. Biederman	24cb81a6a9	sctp: Push struct net down into all of the state machine functions There are a handle of state machine functions primarily those dealing with processing INIT packets where there is neither a valid endpoint nor a valid assoication from which to derive a struct net. Therefore add struct net * to the parameter list of sctp_state_fn_t and update all of the state machine functions. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:37 -07:00
Eric W. Biederman	e7ff4a7037	sctp: Push struct net down into sctp_in_scope struct net will be needed shortly when the tunables are made per network namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:37 -07:00
Eric W. Biederman	89bf3450cb	sctp: Push struct net down into sctp_transport_init Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:37 -07:00
Eric W. Biederman	55e26eb95a	sctp: Push struct net down to sctp_chunk_event_lookup This trickles up through sctp_sm_lookup_event up to sctp_do_sm and up further into sctp_primitiv_NAME before the code reaches places where struct net can be reliably found. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:37 -07:00
Eric W. Biederman	ebb7e95d93	sctp: Add infrastructure for per net sysctls Start with an empty sctp_net_table that will be populated as the various tunable sysctls are made per net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:37 -07:00
Eric W. Biederman	b01a24078f	sctp: Make the mib per network namespace Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:30:36 -07:00
Eric W. Biederman	bb2db45b54	sctp: Enable sctp in all network namespaces - Fix the sctp_af operations to work in all namespaces - Enable sctp socket creation in all network namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:29:59 -07:00
Eric W. Biederman	13d782f6b4	sctp: Make the proc files per network namespace. - Convert all of the files under /proc/net/sctp to be per network namespace. - Don't print anything for /proc/net/sctp/snmp except in the initial network namespaces as the snmp counters still have to be converted to be per network namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:29:53 -07:00
Eric W. Biederman	632c928a6a	sctp: Move the percpu sockets counter out of sctp_proc_init The percpu sctp socket counter has nothing at all to do with the sctp proc files, and having it in the wrong initialization is confusing, and makes network namespace support a pain. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:17:26 -07:00
Eric W. Biederman	2ce9550350	sctp: Make the ctl_sock per network namespace - Kill sctp_get_ctl_sock, it is useless now. - Pass struct net where needed so net->sctp.ctl_sock is accessible. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:17:26 -07:00
Eric W. Biederman	4db67e8086	sctp: Make the address lists per network namespace - Move the address lists into struct net - Add per network namespace initialization and cleanup - Pass around struct net so it is everywhere I need it. - Rename all of the global variable references into references to the variables moved into struct net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:12:17 -07:00
Eric W. Biederman	4110cc255d	sctp: Make the association hashtable handle multiple network namespaces - Use struct net in the hash calculation - Use sock_net(association.base.sk) in the association lookups. - On receive calculate the network namespace from skb->dev. - Pass struct net from receive down to the functions that actually do the association lookup. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 22:44:12 -07:00
Eric W. Biederman	4cdadcbcb6	sctp: Make the endpoint hashtable handle multiple network namespaces - Use struct net in the hash calculation - Use sock_net(endpoint.base.sk) in the endpoint lookups. - On receive calculate the network namespace from skb->dev. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 22:44:12 -07:00
Eric W. Biederman	f1f4376307	sctp: Make the port hash table use struct net in it's key. - Add struct net into the port hash table hash calculation - Add struct net inot the struct sctp_bind_bucket so there is a memory of which network namespace a port is allocated in. No need for a ref count because sctp_bind_bucket only exists when there are sockets in the hash table and sockets can not change their network namspace, and sockets already ref count their network namespace. - Add struct net into the key comparison when we are testing to see if we have found the port hash table entry we are looking for. With these changes lookups in the port hash table becomes safe to use in multiple network namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 22:44:12 -07:00
Eric W. Biederman	a7cb5a49bf	userns: Print out socket uids in a user namespace aware fashion. Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:48:06 -07:00
Mel Gorman	c76562b670	netvm: prevent a stream-specific deadlock This patch series is based on top of "Swap-over-NBD without deadlocking v15" as it depends on the same reservation of PF_MEMALLOC reserves logic. When a user or administrator requires swap for their application, they create a swap partition and file, format it with mkswap and activate it with swapon. In diskless systems this is not an option so if swap if required then swapping over the network is considered. The two likely scenarios are when blade servers are used as part of a cluster where the form factor or maintenance costs do not allow the use of disks and thin clients. The Linux Terminal Server Project recommends the use of the Network Block Device (NBD) for swap but this is not always an option. There is no guarantee that the network attached storage (NAS) device is running Linux or supports NBD. However, it is likely that it supports NFS so there are users that want support for swapping over NFS despite any performance concern. Some distributions currently carry patches that support swapping over NFS but it would be preferable to support it in the mainline kernel. Patch 1 avoids a stream-specific deadlock that potentially affects TCP. Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC reserves. Patch 3 adds three helpers for filesystems to handle swap cache pages. For example, page_file_mapping() returns page->mapping for file-backed pages and the address_space of the underlying swap file for swap cache pages. Patch 4 adds two address_space_operations to allow a filesystem to pin all metadata relevant to a swapfile in memory. Upon successful activation, the swapfile is marked SWP_FILE and the address space operation ->direct_IO is used for writing and ->readpage for reading in swap pages. Patch 5 notes that patch 3 is bolting filesystem-specific-swapfile-support onto the side and that the default handlers have different information to what is available to the filesystem. This patch refactors the code so that there are generic handlers for each of the new address_space operations. Patch 6 adds an API to allow a vector of kernel addresses to be translated to struct pages and pinned for IO. Patch 7 adds support for using highmem pages for swap by kmapping the pages before calling the direct_IO handler. Patch 8 updates NFS to use the helpers from patch 3 where necessary. Patch 9 avoids setting PF_private on PG_swapcache pages within NFS. Patch 10 implements the new swapfile-related address_space operations for NFS and teaches the direct IO handler how to manage kernel addresses. Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO where appropriate. Patch 12 fixes a NULL pointer dereference that occurs when using swap-over-NFS. With the patches applied, it is possible to mount a swapfile that is on an NFS filesystem. Swap performance is not great with a swap stress test taking roughly twice as long to complete than if the swap device was backed by NBD. This patch: netvm: prevent a stream-specific deadlock It could happen that all !SOCK_MEMALLOC sockets have buffered so much data that we're over the global rmem limit. This will prevent SOCK_MEMALLOC buffers from receiving data, which will prevent userspace from running, which is needed to reduce the buffered data. Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit. Once this change it applied, it is important that sockets that set SOCK_MEMALLOC do not clear the flag until the socket is being torn down. If this happens, a warning is generated and the tokens reclaimed to avoid accounting errors until the bug is fixed. [davem@davemloft.net: Warning about clearing SOCK_MEMALLOC] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Rik van Riel <riel@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Neil Brown <neilb@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:47 -07:00
David S. Miller	92101b3b2e	ipv4: Prepare for change of rt->rt_iif encoding. Use inet_iif() consistently, and for TCP record the input interface of cached RX dst in inet sock. rt->rt_iif is going to be encoded differently, so that we can legitimately cache input routes in the FIB info more aggressively. When the input interface is "use SKB device index" the rt->rt_iif will be set to zero. This forces us to move the TCP RX dst cache installation into the ipv4 specific code, and as well it should since doing the route caching for ipv6 is pointless at the moment since it is not inspected in the ipv6 input paths yet. Also, remove the unlikely on dst->obsolete, all ipv4 dsts have obsolete set to a non-zero value to force invocation of the check callback. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 16:36:26 -07:00
David S. Miller	5e9965c15b	Merge branch 'kill_rtcache' The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. The general flow of this patch series is that first the routing cache is removed. We build a completely new rtable entry every lookup request. Next we make some simplifications due to the fact that removing the routing cache causes several members of struct rtable to become no longer necessary. Then we need to make some amends such that we can legally cache pre-constructed routes in the FIB nexthops. Firstly, we need to invalidate routes which are hit with nexthop exceptions. Secondly we have to change the semantics of rt->rt_gateway such that zero means that the destination is on-link and non-zero otherwise. Now that the preparations are ready, we start caching precomputed routes in the FIB nexthops. Output and input routes need different kinds of care when determining if we can legally do such caching or not. The details are in the commit log messages for those changes. The patch series then winds down with some more struct rtable simplifications and other tidy ups that remove unnecessary overhead. On a SPARC-T3 output route lookups are ~876 cycles. Input route lookups are ~1169 cycles with rpfilter disabled, and about ~1468 cycles with rpfilter enabled. These measurements were taken with the kbench_mod test module in the net_test_tools GIT tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net_test_tools.git That GIT tree also includes a udpflood tester tool and stresses route lookups on packet output. For example, on the same SPARC-T3 system we can run: time ./udpflood -l 10000000 10.2.2.11 with routing cache: real 1m21.955s user 0m6.530s sys 1m15.390s without routing cache: real 1m31.678s user 0m6.520s sys 1m25.140s Performance undoubtedly can easily be improved further. For example fib_table_lookup() performs a lot of excessive computations with all the masking and shifting, some of it conditionalized to deal with edge cases. Also, Eric's no-ref optimization for input route lookups can be re-instated for the FIB nexthop caching code path. I would be really pleased if someone would work on that. In fact anyone suitable motivated can just fire up perf on the loading of the test net_test_tools benchmark kernel module. I spend much of my time going: bash# perf record insmod ./kbench_mod.ko dst=172.30.42.22 src=74.128.0.1 iif=2 bash# perf report Thanks to helpful feedback from Joe Perches, Eric Dumazet, Ben Hutchings, and others. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 17:04:15 -07:00
Neil Horman	5aa93bcf66	sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:13:46 -07:00
David S. Miller	f5b0a87436	net: Document dst->obsolete better. Add a big comment explaining how the field works, and use defines instead of magic constants for the values assigned to it. Suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 13:31:21 -07:00
David S. Miller	abaa72d7fd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c	2012-07-19 11:17:30 -07:00
David S. Miller	a6ff1a2f1e	Merge branch 'nexthop_exceptions' These patches implement the final mechanism necessary to really allow us to go without the route cache in ipv4. We need a place to have long-term storage of PMTU/redirect information which is independent of the routes themselves, yet does not get us back into a situation where we have to write to metrics or anything like that. For this we use an "next-hop exception" table in the FIB nexthops. The one thing I desperately want to avoid is having to create clone routes in the FIB trie for this purpose, because that is very expensive. However, I'm willing to entertain such an idea later if this current scheme proves to have downsides that the FIB trie variant would not have. In order to accomodate this any such scheme, we need to be able to produce a full flow key at PMTU/redirect time. That required an adjustment of the interface call-sites used to propagate these events. For a PMTU/redirect with a fully specified socket, we pass that socket and use it to produce the flow key. Otherwise we use a passed in SKB to formulate the key. There are two cases that need to be distinguished, ICMP message processing (in which case the IP header is at skb->data) and output packet processing (mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)). We also have to make the code able to handle the case where the dst itself passed into the dst_ops->{update_pmtu,redirect} method is invalidated. This matters for calls from sockets that have cached that route. We provide a inet{,6} helper function for this purpose, and edit SCTP specially since it caches routes at the transport rather than socket level. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 10:48:26 -07:00
David S. Miller	6700c2709c	net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 03:29:28 -07:00
Ioan Orghici	db28aafad9	sctp: fix sparse warning for sctp_init_cause_fixed Fix the following sparse warning: * symbol 'sctp_init_cause_fixed' was not declared. Should it be static? Signed-off-by: Ioan Orghici <ioanorghici@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 23:23:52 -07:00
Neil Horman	2eebc1e188	sctp: Fix list corruption resulting from freeing an association on a list A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] <IRQ> [22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210 [22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80 [22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680 [22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320 [22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910 [22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0 [22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440 [22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0 [22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130 [22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0 [22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130 [22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420 [22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30 [22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110 [22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0 [22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0 [22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f [22766.387283] <EOI> [22766.387284] [22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP <ffff880147c039b0> [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 22:32:26 -07:00
David S. Miller	02f3d4ce9e	sctp: Adjust PMTU updates to accomodate route invalidation. This adjusts the call to dst_ops->update_pmtu() so that we can transparently handle the fact that, in the future, the dst itself can be invalidated by the PMTU update (when we have non-host routes cached in sockets). Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 03:57:14 -07:00
David S. Miller	1ed5c48f23	net: Remove checks for dst_ops->redirect being NULL. No longer necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-12 00:41:25 -07:00
David S. Miller	ec18d9a269	ipv6: Add redirect support to all protocol icmp error handlers. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-12 00:25:15 -07:00
David S. Miller	55be7a9c60	ipv4: Add redirect support to all protocol icmp error handlers. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-11 21:27:49 -07:00
Neil Horman	ed10627725	sctp: refactor sctp_packet_append_chunk and clenup some memory leaks While doing some recent work on sctp sack bundling I noted that sctp_packet_append_chunk was pretty inefficient. Specifially, it was called recursively while trying to bundle auth and sack chunks. Because of that we call sctp_packet_bundle_sack and sctp_packet_bundle_auth a total of 4 times for every call to sctp_packet_append_chunk, knowing that at least 3 of those calls will do nothing. So lets refactor sctp_packet_bundle_auth to have an outer part that does the attempted bundling, and an inner part that just does the chunk appends. This saves us several calls per iteration that we just don't need. Also, noticed that the auth and sack bundling fail to free the chunks they allocate if the append fails, so make sure we add that in Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-08 23:56:16 -07:00
Neil Horman	4244854d22	sctp: be more restrictive in transport selection on bundled sacks It was noticed recently that when we send data on a transport, its possible that we might bundle a sack that arrived on a different transport. While this isn't a major problem, it does go against the SHOULD requirement in section 6.4 of RFC 2960: An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK, etc.) to the same destination transport address from which it received the DATA or control chunk to which it is replying. This rule should also be followed if the endpoint is bundling DATA chunks together with the reply chunk. This patch seeks to correct that. It restricts the bundling of sack operations to only those transports which have moved the ctsn of the association forward since the last sack. By doing this we guarantee that we only bundle outbound saks on a transport that has received a chunk since the last sack. This brings us into stricter compliance with the RFC. Vlad had initially suggested that we strictly allow only sack bundling on the transport that last moved the ctsn forward. While this makes sense, I was concerned that doing so prevented us from bundling in the case where we had received chunks that moved the ctsn on multiple transports. In those cases, the RFC allows us to select any of the transports having received chunks to bundle the sack on. so I've modified the approach to allow for that, by adding a state variable to each transport that tracks weather it has moved the ctsn since the last sack. This I think keeps our behavior (and performance), close enough to our current profile that I think we can do this without a sysctl knob to enable/disable it. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yaseivch <vyasevich@gmail.com> CC: David S. Miller <davem@davemloft.net> CC: linux-sctp@vger.kernel.org Reported-by: Michele Baldessari <michele@redhat.com> Reported-by: sorin serban <sserban@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-30 22:44:35 -07:00
Daniel Halperin	39d84a58ad	sctp: fix warning when compiling without IPv6 net/sctp/protocol.c: In function ‘sctp_addr_wq_timeout_handler’: net/sctp/protocol.c:676: warning: label ‘free_next’ defined but not used Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-19 00:26:26 -07:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Joe Perches	e87cc4728f	net: Convert net_ratelimit uses to net_<level>_ratelimited Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-15 13:45:03 -04:00
Nicolas Dichtel	e0268868ba	sctp: check cached dst before using it dst_check() will take care of SA (and obsolete field), hence IPsec rekeying scenario is taken into account. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yaseivch <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:15:47 -04:00
Eric Dumazet	f545a38f74	net: add a limit parameter to sk_add_backlog() sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the memory limit. We need to make this limit a parameter for TCP use. No functional change expected in this patch, all callers still using the old sk_rcvbuf limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 22:28:28 -04:00
Eric W. Biederman	ec8f23ce0f	net: Convert all sysctl registrations to register_net_sysctl This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:30 -04:00
Eric W. Biederman	5dd3df105b	net: Move all of the network sysctls without a namespace into init_net. This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:17 -04:00
Eric Dumazet	95c9617472	net: cleanup unsigned to unsigned int Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-15 12:44:40 -04:00
Thomas Graf	acdd598536	sctp: Allow struct sctp_event_subscribe to grow without breaking binaries getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns an error if the user provides less bytes than the size of struct sctp_event_subscribe. Struct sctp_event_subscribe needs to be extended by an u8 for every new event or notification type that is added. This obviously makes getsockopt fail for binaries that are compiled against an older versions of <net/sctp/user.h> which do not contain all event types. This patch changes getsockopt behaviour to no longer return an error if not enough bytes are being provided by the user. Instead, it returns as much of sctp_event_subscribe as fits into the provided buffer. This leads to the new behavior that users see what they have been aware of at compile time. The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-04 18:05:02 -04:00
Benjamin Poirier	0343c5543b	sctp: Export sctp_do_peeloff lookup sctp_association within sctp_do_peeloff() to enable its use outside of the sctp code with minimal knowledge of the former. Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-08 13:52:08 -08:00
Linus Torvalds	98793265b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)	2012-01-08 13:21:22 -08:00
David S. Miller	abb434cb05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-23 17:13:56 -05:00
Thomas Graf	a76c0adf60	sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd When checking whether a DATA chunk fits into the estimated rwnd a full sizeof(struct sk_buff) is added to the needed chunk size. This quickly exhausts the available rwnd space and leads to packets being sent which are much below the PMTU limit. This can lead to much worse performance. The reason for this behaviour was to avoid putting too much memory pressure on the receiver. The concept is not completely irational because a Linux receiver does in fact clone an skb for each DATA chunk delivered. However, Linux also reserves half the available socket buffer space for data structures therefore usage of it is already accounted for. When proposing to change this the last time it was noted that this behaviour was introduced to solve a performance issue caused by rwnd overusage in combination with small DATA chunks. Trying to reproduce this I found that with the sk_buff overhead removed, the performance would improve significantly unless socket buffer limits are increased. The following numbers have been gathered using a patched iperf supporting SCTP over a live 1 Gbit ethernet network. The -l option was used to limit DATA chunk sizes. The numbers listed are based on the average of 3 test runs each. Default values have been used for sk_(r\|w)mem. Chunk Size Unpatched No Overhead ------------------------------------- 4 15.2 Kbit [!] 12.2 Mbit [!] 8 35.8 Kbit [!] 26.0 Mbit [!] 16 95.5 Kbit [!] 54.4 Mbit [!] 32 106.7 Mbit 102.3 Mbit 64 189.2 Mbit 188.3 Mbit 128 331.2 Mbit 334.8 Mbit 256 537.7 Mbit 536.0 Mbit 512 766.9 Mbit 766.6 Mbit 1024 810.1 Mbit 808.6 Mbit Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-20 13:58:37 -05:00
Xi Wang	2692ba61a8	sctp: fix incorrect overflow check on autoclose Commit `8ffd3208` voids the previous patches `f6778aab` and `810c0719` for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-19 16:25:46 -05:00
Eric Dumazet	dfd56b8b38	net: use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) \|\| defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-11 18:25:16 -05:00
David S. Miller	b3613118eb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2011-12-02 13:49:21 -05:00
Justin P. Mattock	42b2aa86c6	treewide: Fix typos in various parts of the kernel, and fix some comments. The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-12-02 14:57:31 +01:00
Xi Wang	c89304b8ea	sctp: better integer overflow check in sctp_auth_create_key() The check from commit `30c2235c` is incomplete and cannot prevent cases like key_len = 0x80000000 (INT_MAX + 1). In that case, the left-hand side of the check (INT_MAX - key_len), which is unsigned, becomes 0xffffffff (UINT_MAX) and bypasses the check. However this shouldn't be a security issue. The function is called from the following two code paths: 1) setsockopt() 2) sctp_auth_asoc_set_secret() In case (1), sca_keylength is never going to exceed 65535 since it's bounded by a u16 from the user API. As such, the key length will never overflow. In case (2), sca_keylength is computed based on the user key (1 short) and 2 * key_vector (3 shorts) for a total of 7 * USHRT_MAX, which still will not overflow. In other words, this overflow check is not really necessary. Just make it more correct. Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-29 15:51:03 -05:00
Alexey Dobriyan	4e3fd7a06d	net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
Michio Honda	34d2d89f2d	sctp: fasthandoff with ASCONF at server-node Retransmit chunks to newly confirmed destination when ASCONF and HEARTBEAT negotiation has success with a single-homed peer. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-08 15:11:30 -05:00
Michio Honda	ddc4bbee6e	sctp: fasthandoff with ASCONF at mobile-node Fast retransmission after changing the last address with ASCONF negotiation Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-08 15:11:29 -05:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Paul Gortmaker	bc3b2d7fb9	net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:30 -04:00
Eric Dumazet	b903d324be	ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAIT commit `66b13d99d9` (ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT) fixed IPv4 only. This part is for the IPv6 side, adding a tclass param to ip6_xmit() We alias tw_tclass and tw_tos, if socket family is INET6. [ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill TOS field, TCLASS is not taken into account ] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-27 00:44:35 -04:00
Eric Dumazet	87fb4b7b53	net: more accurate skb truesize skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored. Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting. This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place. At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks. Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before. Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-13 16:05:07 -04:00
David S. Miller	8decf86879	Merge branch 'master' of github.com:davem330/net Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c	2011-09-22 03:23:13 -04:00
Max Matveev	d5ccd49660	sctp: deal with multiple COOKIE_ECHO chunks Attempt to reduce the number of IP packets emitted in response to single SCTP packet (`2e3216cd`) introduced a complication - if a packet contains two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the socket while processing first COOKIE_ECHO and then loses the association and forgets to uncork the socket. To deal with the issue add new SCTP command which can be used to set association explictly. Use this new command when processing second COOKIE_ECHO chunk to restore the context for SCTP state machine. Signed-off-by: Max Matveev <makc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-16 17:17:22 -04:00
Michio Honda	6af29ccc22	sctp: Bundle HEAERTBEAT into ASCONF_ACK With this patch a HEARTBEAT chunk is bundled into the ASCONF-ACK for ADD IP ADDRESS, confirming the new destination as quickly as possible. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-24 19:41:13 -07:00
Michio Honda	f207c050fb	sctp: HEARTBEAT negotiation after ASCONF This patch fixes BUG that the ASCONF receiver transmits DATA chunks to the newly added UNCONFIRMED destination. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-24 19:41:09 -07:00
David S. Miller	033b1142f4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-21 13:38:42 -07:00
Michał Mirosław	b73c43f884	net: sctp: fix checksum marking for outgoing packets Packets to devices without NETIF_F_SCTP_CSUM (including NETIF_F_NO_CSUM) should be properly checksummed because the packets can be diverted or rerouted after construction. This still leaves packets diverted from NETIF_F_SCTP_CSUM-enabled devices with broken checksums. Fixing this needs implementing software offload fallback in networking core. For users of sctp_checksum_disable, skb->ip_summed should be left as CHECKSUM_NONE and not CHECKSUM_UNNECESSARY as per include/linux/skbuff.h. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-14 15:16:31 -07:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
Thomas Graf	cd4fcc704f	sctp: ABORT if receive, reassmbly, or reodering queue is not empty while closing socket Trigger user ABORT if application closes a socket which has data queued on the socket receive queue or chunks waiting on the reassembly or ordering queue as this would imply data being lost which defeats the point of a graceful shutdown. This behavior is already practiced in TCP. We do not check the input queue because that would mean to parse all chunks on it to look for unacknowledged data which seems too much of an effort. Control chunks or duplicated chunks may also be in the input queue and should not be stopping a graceful shutdown. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-08 09:53:08 -07:00
Thomas Graf	f8d9605243	sctp: Enforce retransmission limit during shutdown When initiating a graceful shutdown while having data chunks on the retransmission queue with a peer which is in zero window mode the shutdown is never completed because the retransmission error count is reset periodically by the following two rules: - Do not timeout association while doing zero window probe. - Reset overall error count when a heartbeat request has been acknowledged. The graceful shutdown will wait for all outstanding TSN to be acknowledged before sending the SHUTDOWN request. This never happens due to the peer's zero window not acknowledging the continuously retransmitted data chunks. Although the error counter is incremented for each failed retransmission, the receiving of the SACK announcing the zero window clears the error count again immediately. Also heartbeat requests continue to be sent periodically. The peer acknowledges these requests causing the error counter to be reset as well. This patch changes behaviour to only reset the overall error counter for the above rules while not in shutdown. After reaching the maximum number of retransmission attempts, the T5 shutdown guard timer is scheduled to give the receiver some additional time to recover. The timer is stopped as soon as the receiver acknowledges any data. The issue can be easily reproduced by establishing a sctp association over the loopback device, constantly queueing data at the sender while not reading any at the receiver. Wait for the window to reach zero, then initiate a shutdown by killing both processes simultaneously. The association will never be freed and the chunks on the retransmission queue will be retransmitted indefinitely. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-07 14:08:44 -07:00
Wei Yongjun	949123016a	sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe it We forgot to send up SCTP_SENDER_DRY_EVENT notification when user app subscribes to this event, and there is no data to be sent or retransmit. This is required by the Socket API and used by the DTLS/SCTP implementation. Reported-by: Michael Tüxen <Michael.Tuexen@lurchi.franken.de> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Tested-by: Robin Seggelmann <seggelmann@fh-muenster.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-07 04:10:26 -07:00
Eric Dumazet	f03d78db65	net: refine {udp\|tcp\|sctp}_mem limits Current tcp/udp/sctp global memory limits are not taking into account hugepages allocations, and allow 50% of ram to be used by buffers of a single protocol [ not counting space used by sockets / inodes ...] Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram per protocol, and a minimum of 128 pages. Heavy duty machines sysadmins probably need to tweak limits anyway. References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032 Reported-by: starlight <starlight@binnacle.cx> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-07 00:27:05 -07:00
Joe Perches	7fd71b1e07	sctp: Reduce switch/case indent Make the case labels the same indent as the switch. git diff -w shows useless break;s removed after returns and a comment added to an unnecessary default: break; because of a dubious gcc warning. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-01 16:11:16 -07:00
Joe Perches	ea11073387	net: Remove casts of void * Unnecessary casts of void * clutter the code. These are the remainder casts after several specific patches to remove netdev_priv and dev_priv. Done via coccinelle script: $ cat cast_void_pointer.cocci @@ type T; T pt; void pv; @@ - pt = (T *)pv; + pt = pv; Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>	2011-06-16 23:19:27 -04:00
Michio Honda	6d65e5eee6	sctp: kzalloc() error handling on deleting last address Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-11 15:53:45 -07:00
David S. Miller	5d0c90cf4d	sctp: Guard IPV6 specific code properly. Outside of net/sctp/ipv6.c, IPV6 specific code needs to be ifdef guarded. This fixes build failures with IPV6 disabled. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-06 13:05:55 -07:00
Michio Honda	8a07eb0a50	sctp: Add ASCONF operation on the single-homed host In this case, the SCTP association transmits an ASCONF packet including addition of the new IP address and deletion of the old address. This patch implements this functionality. In this case, the ASCONF chunk is added to the beginning of the queue, because the other chunks cannot be transmitted in this state. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 02:04:53 -07:00
Michio Honda	7dc04d7122	sctp: Add socket option operation for Auto-ASCONF. This patch allows the application to operate Auto-ASCONF on/off behavior via setsockopt() and getsockopt(). Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 02:04:53 -07:00
Michio Honda	dd51be0f54	sctp: Add sysctl support for Auto-ASCONF. This patch allows the system administrator to change default Auto-ASCONF on/off behavior via an sysctl value. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 02:04:53 -07:00
Michio Honda	9f7d653b67	sctp: Add Auto-ASCONF support (core). SCTP reconfigure the IP addresses in the association by using ASCONF chunks as mentioned in RFC5061. For example, we can start to use the newly configured IP address in the existing association. This patch implements automatic ASCONF operation in the SCTP stack with address events in the host computer, which is called auto_asconf. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 02:04:53 -07:00
Michio Honda	b1364104e3	sctp: Add ADD/DEL ASCONF handling at the receiver. This patch fixes the problem that the original code cannot delete the remote address where the corresponding transport is currently directed, even when the ASCONF is sent from the other address (this situation happens when the single-homed sender transmits ASCONF with ADD and DEL.) Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 02:04:52 -07:00
Wei Yongjun	a000c01e60	sctp: stop pending timers and purge queues when peer restart asoc If the peer restart the asoc, we should not only fail any unsent/unacked data, but also stop the T3-rtx, SACK, T4-rto timers, and teardown ASCONF queues. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-31 15:29:17 -07:00
Wei Yongjun	8b4472cc13	sctp: fix memory leak of the ASCONF queue when free asoc If an ASCONF chunk is outstanding, then the following ASCONF chunk will be queued for later transmission. But when we free the asoc, we forget to free the ASCONF queue at the same time, this will cause memory leak. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-25 17:55:32 -04:00
Dan Rosenberg	71338aa7d0	net: convert %p usage to %pK The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-24 01:13:12 -04:00
David S. Miller	5d41452166	sctp: Fix build failure. Commit `c182f90bc1` ("SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict()") and commit `1231f0baa5` ("net,rcu: convert call_rcu(sctp_local_addr_free) to kfree_rcu()"), happening in different trees, introduced a build failure. Simply make the SCTP race fix use kfree_rcu() too. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-21 02:10:23 -04:00
Linus Torvalds	06f4e926d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.	2011-05-20 13:43:21 -07:00
Jacek Luczak	c182f90bc1	SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict() During the sctp_close() call, we do not use rcu primitives to destroy the address list attached to the endpoint. At the same time, we do the removal of addresses from this list before attempting to remove the socket from the port hash As a result, it is possible for another process to find the socket in the port hash that is in the process of being closed. It then proceeds to traverse the address list to find the conflict, only to have that address list suddenly disappear without rcu() critical section. Fix issue by closing address list removal inside RCU critical section. Race can result in a kernel crash with general protection fault or kernel NULL pointer dereference: kernel: general protection fault: 0000 [#1] SMP kernel: RIP: 0010:[<ffffffffa02f3dde>] [<ffffffffa02f3dde>] sctp_bind_addr_conflict+0x64/0x82 [sctp] kernel: Call Trace: kernel: [<ffffffffa02f415f>] ? sctp_get_port_local+0x17b/0x2a3 [sctp] kernel: [<ffffffffa02f3d45>] ? sctp_bind_addr_match+0x33/0x68 [sctp] kernel: [<ffffffffa02f4416>] ? sctp_do_bind+0xd3/0x141 [sctp] kernel: [<ffffffffa02f5030>] ? sctp_bindx_add+0x4d/0x8e [sctp] kernel: [<ffffffffa02f5183>] ? sctp_setsockopt_bindx+0x112/0x4a4 [sctp] kernel: [<ffffffff81089e82>] ? generic_file_aio_write+0x7f/0x9b kernel: [<ffffffffa02f763e>] ? sctp_setsockopt+0x14f/0xfee [sctp] kernel: [<ffffffff810c11fb>] ? do_sync_write+0xab/0xeb kernel: [<ffffffff810e82ab>] ? fsnotify+0x239/0x282 kernel: [<ffffffff810c2462>] ? alloc_file+0x18/0xb1 kernel: [<ffffffff8134a0b1>] ? compat_sys_setsockopt+0x1a5/0x1d9 kernel: [<ffffffff8134aaf1>] ? compat_sys_socketcall+0x143/0x1a4 kernel: [<ffffffff810467dc>] ? sysenter_dispatch+0x7/0x32 Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-19 17:13:04 -04:00
Joe Perches	afd7614c00	sctp: sctp_sendmsg: Don't test known non-null sinfo It's already known non-null above. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 17:30:50 -04:00
Joe Perches	517aa0bcda	sctp: sctp_sendmsg: Don't initialize default_sinfo This variable only needs initialization when cmsgs.info is NULL. Use memset to ensure padding is also zeroed so kernel doesn't leak any data. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 17:30:49 -04:00
David S. Miller	902ebd3e0d	sctp: Remove rt->rt_src usage in sctp_v4_get_saddr() Flow key is available, so fetch it from there. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 13:32:47 -07:00
David S. Miller	7ef73bca73	sctp: Fix debug message args. I messed things up when I converted over to the transport flow, I passed the ipv4 address value instead of it's address. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 21:24:07 -07:00
David S. Miller	f1c0a276ea	sctp: Don't use rt->rt_{src,dst} in sctp_v4_xmit() Now we can pick it out of the transport's flow key. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:28:29 -07:00
David S. Miller	d9d8da805d	inet: Pass flowi to ->queue_xmit(). This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 15:28:28 -07:00
David S. Miller	8663c938ce	sctp: Store a flowi in transports to provide persistent keying. Several future simplifications are possible now because of this. For example, the sctp_addr unions can simply refer directly to the flowi information. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-08 14:05:14 -07:00
Lai Jiangshan	1231f0baa5	net,rcu: convert call_rcu(sctp_local_addr_free) to kfree_rcu() The rcu callback sctp_local_addr_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(sctp_local_addr_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-05-07 22:50:51 -07:00
David S. Miller	18a353f428	sctp: Use flowi4's {saddr,daddr} in sctp_v4_dst_saddr() and sctp_v4_get_dst() Instead of rt->rt_{src,dst} Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-03 20:55:05 -07:00
Vlad Yasevich	da0420bee2	sctp: clean up route lookup calls Change the call to take the transport parameter and set the cached 'dst' appropriately inside the get_dst() function calls. This will allow us in the future to clean up source address storage as well. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 13:14:06 -07:00
Vlad Yasevich	af1384703f	sctp: remove useless arguments from get_saddr() call There is no point in passing a destination address to a get_saddr() call. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 13:14:06 -07:00
Vlad Yasevich	9c6a02f41d	sctp: make sctp over IPv6 work with IPsec SCTP never called xfrm_output after it's v6 route lookups so that never really worked with ipsec. Additioanlly, we never passed port nubmers and protocol in the flowi, so any port based policies were never applied as well. Now that we can fixed ipv6 routing lookup code, using ip6_dst_lookup_flow() and pass port numbers. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 13:14:05 -07:00
Vlad Yasevich	9914ae3ca7	sctp: cache the ipv6 source after route lookup The ipv6 routing lookup does give us a source address, but instead of filling it into the dst, it's stored in the flowi. We can use that instead of going through the entire source address selection again. Also the useless ->dst_saddr member of sctp_pf is removed. And sctp_v6_dst_saddr() is removed, instead by introduce sctp_v6_to_addr(), which can be reused to cleanup some dup code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 13:14:04 -07:00
Weixing Shi	625034113b	sctp: fix sctp to work with ipv6 source address routing In the below test case, using the source address routing, sctp can not work. Node-A 1)ifconfig eth0 inet6 add 2001:1::1/64 2)ip -6 rule add from 2001:1::1 table 100 pref 100 3)ip -6 route add 2001:2::1 dev eth0 table 100 4)sctp_darn -H 2001:1::1 -P 250 -l & Node-B 1)ifconfig eth0 inet6 add 2001:2::1/64 2)ip -6 rule add from 2001:2::1 table 100 pref 100 3)ip -6 route add 2001:1::1 dev eth0 table 100 4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s root cause: Node-A and Node-B use the source address routing, and at begining, source address will be NULL,sctp will search the routing table by the destination address, because using the source address routing table, and the result dst_entry will be NULL. solution: walk through the bind address list to get the source address and then lookup the routing table again to get the correct dst_entry. Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 13:14:04 -07:00
Lucas De Marchi	e9c549998d	Revert wrong fixes for common misspellings These changes were incorrectly fixed by codespell. They were now manually corrected. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-04-26 23:31:11 -07:00
Eric Dumazet	b71d1d426d	inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-22 11:04:14 -07:00
Wei Yongjun	e1cdd553d4	sctp: implement event notification SCTP_SENDER_DRY_EVENT This patch implement event notification SCTP_SENDER_DRY_EVENT. SCTP Socket API Extensions: 6.1.9. SCTP_SENDER_DRY_EVENT When the SCTP stack has no more user data to send or retransmit, this notification is given to the user. Also, at the time when a user app subscribes to this event, if there is no data to be sent or retransmit, the stack will immediately send up this notification. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-21 10:35:44 -07:00
Wei Yongjun	ee916fd0fd	sctp: change auth event type name to SCTP_AUTHENTICATION_EVENT This patch change the auth event type name to SCTP_AUTHENTICATION_EVENT, which is based on API extension compliance. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-21 10:35:44 -07:00
Wei Yongjun	209ba424c2	sctp: implement socket option SCTP_GET_ASSOC_ID_LIST This patch Implement socket option SCTP_GET_ASSOC_ID_LIST. SCTP Socket API Extension: 8.2.6. Get the Current Identifiers of Associations (SCTP_GET_ASSOC_ID_LIST) This option gets the current list of SCTP association identifiers of the SCTP associations handled by a one-to-many style socket. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-21 10:35:43 -07:00
Wei Yongjun	4c6a6f4213	sctp: move chunk from retransmit queue to abandoned list If there is still data waiting to retransmit and remain in retransmit queue, while doing the next retransmit, if the chunk is abandoned, we should move it to abandoned list. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:06 -07:00
Wei Yongjun	92c73af58e	sctp: make heartbeat information in sctp_make_heartbeat() Make heartbeat information in sctp_make_heartbeat() instead of make it in sctp_sf_heartbeat() directly for common using. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:05 -07:00
Wei Yongjun	de6becdc08	sctp: fix to check the source address of COOKIE-ECHO chunk SCTP does not check whether the source address of COOKIE-ECHO chunk is the original address of INIT chunk or part of the any address parameters saved in COOKIE in CLOSED state. So even if the COOKIE-ECHO chunk is from any address but with correct COOKIE, the COOKIE-ECHO chunk still be accepted. If the COOKIE is not from a valid address, the assoc should not be established. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:05 -07:00
Shan Wei	85c5ed4e44	sctp: handle ootb packet in chunk order as defined Changed the order of processing SHUTDOWN ACK and COOKIE ACK refer to section 8.4:Handle "Out of the Blue" Packets. SHUTDOWN ACK chunk should be processed before processing "Stale Cookie" ERROR or a COOKIE ACK. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:04 -07:00
Vlad Yasevich	deb85a6ecc	sctp: bail from sctp_endpoint_lookup_assoc() if not bound The sctp_endpoint_lookup_assoc() function uses a port hash to lookup the association and then checks to see if any of them are on the current endpoint. However, if the current endpoint is not bound, there can't be any associations on it, thus we can bail early. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:03 -07:00
Vlad Yasevich	0b8f9e25b0	sctp: remove completely unsed EMPTY state SCTP does not SCTP_STATE_EMPTY and we can never be in that state. Remove useless code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:03 -07:00
Shan Wei	96ca468b86	sctp: check invalid value of length parameter in error cause RFC4960, section 3.3.7 said: If an endpoint receives an ABORT with a format error or no TCB is found, it MUST silently discard it. When an endpoint receives ABORT that parameter value is invalid, drop it. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:02 -07:00
Shan Wei	8a00be1c89	sctp: check parameter value of length in ERROR chunk When an endpoint receives ERROR that parameter value is invalid, send an ABORT to peer with a Protocol Violation error code. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:51:01 -07:00
Vlad Yasevich	c6ef006bf5	sctp: Release all routes when processing acks ADD_IP or DEL_IP When processing an ACK for ADD_IP parameter, we only release the routes on non-active transports. This can cause a wrong source address to be used. We can release the routes and cause new route lookups and source address selection so that new addresses can be used as source. Additionally, we don't need to lookup routes for all transports at the same time. We can let the transmit code path update the cached route when the transport actually sends something. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:22 -07:00
Vlad Yasevich	ee9cbaca7d	sctp: Allow bindx_del to accept 0 port We allow 0 port when adding new addresses. It only makes sence to allow 0 port when removing addresses. When removing the currently bound port will be used when the port in the address is set to 0. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:21 -07:00
Vlad Yasevich	f246a7b7c5	sctp: teach CACC algorithm about removed transports When we have have to remove a transport due to ASCONF, we move the data to a new active path. This can trigger CACC algorithm to not mark that data as missing when SACKs arrive. This is because the transport passed to the CACC algorithm is the one this data is sitting on, not the one it was sent on (that one may be gone). So, by sending the original transport (even if it's NULL), we may start marking data as missing. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:21 -07:00
Shan Wei	934253a7b4	sctp: use memdup_user to copy data from userspace Use common function to simply code. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:20 -07:00
Shan Wei	66009927f1	sctp: kill abandoned SCTP_CMD_TRANSMIT command Remove SCTP_CMD_TRANSMIT command as it never be used. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:19 -07:00
Shan Wei	6a435732ac	sctp: use common head of addr parameter to access member in addr-unrelated code The 'p' member of struct sctp_paramhdr is common part for IPv4 addr parameter and IPv6 addr parameter in union sctp_addr_param. For addr-related code, use specified addr parameter. Otherwise, use common header to access type/length member. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:19 -07:00
Shan Wei	33c7cfdbb0	sctp: fix the comment of sctp_sf_violation_paramlen() Update the comment about sctp_sf_violation_paramlen() to be more precise. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-19 21:45:18 -07:00
Wei Yongjun	9494c7c577	sctp: fix oops while removed transport still using as retran path Since we can not update retran path to unconfirmed transports, when we remove a peer, the retran path may not be update if the other transports are all unconfirmed, and we will still using the removed transport as the retran path. This may cause panic if retrasnmit happen. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-12 19:33:51 -07:00
Vlad Yasevich	25f7bf7d0d	sctp: fix oops when updating retransmit path with DEBUG on commit `fbdf501c93` sctp: Do no select unconfirmed transports for retransmissions Introduced the initial falt. commit `d598b166ce` sctp: Make sure we always return valid retransmit path Solved the problem, but forgot to change the DEBUG statement. Thus it was still possible to dereference a NULL pointer. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-12 19:33:50 -07:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Wei Yongjun	2cab86bee8	sctp: malloc enough room for asconf-ack chunk Sometime the ASCONF_ACK parameters can equal to the fourfold of ASCONF parameters, this only happend in some special case: ASCONF parameter is : Unrecognized Parameter (4 bytes) ASCONF_ACK parameter should be: Error Cause Indication parameter (8 bytes header) + Error Cause (4 bytes header) + Unrecognized Parameter (4bytes) Four 4bytes Unrecognized Parameters in ASCONF chunk will cause panic. Pid: 0, comm: swapper Not tainted 2.6.38-next+ #22 Bochs Bochs EIP: 0060:[<c0717eae>] EFLAGS: 00010246 CPU: 0 EIP is at skb_put+0x60/0x70 EAX: 00000077 EBX: c09060e2 ECX: dec1dc30 EDX: c09469c0 ESI: 00000000 EDI: de3c8d40 EBP: dec1dc58 ESP: dec1dc2c DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=dec1c000 task=c09aef20 task.ti=c0980000) Stack: c09469c0 e1894fa4 00000044 00000004 de3c8d00 de3c8d00 de3c8d44 de3c8d40 c09060e2 de25dd80 de3c8d40 dec1dc7c e1894fa4 dec1dcb0 00000040 00000004 00000000 00000800 00000004 00000004 dec1dce0 e1895a2b dec1dcb4 de25d960 Call Trace: [<e1894fa4>] ? sctp_addto_chunk+0x4e/0x89 [sctp] [<e1894fa4>] sctp_addto_chunk+0x4e/0x89 [sctp] [<e1895a2b>] sctp_process_asconf+0x32f/0x3d1 [sctp] [<e188d554>] sctp_sf_do_asconf+0xf8/0x173 [sctp] [<e1890b02>] sctp_do_sm+0xb8/0x159 [sctp] [<e18a2248>] ? sctp_cname+0x0/0x52 [sctp] [<e189392d>] sctp_assoc_bh_rcv+0xac/0xe3 [sctp] [<e1897d76>] sctp_inq_push+0x2d/0x30 [sctp] [<e18a21b2>] sctp_rcv+0x7a7/0x83d [sctp] [<c077a95c>] ? ipv4_confirm+0x118/0x125 [<c073a970>] ? nf_iterate+0x34/0x62 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194 [<c0747992>] ip_local_deliver_finish+0xf5/0x194 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194 [<c0747a6e>] NF_HOOK.clone.1+0x3d/0x44 [<c0747ab3>] ip_local_deliver+0x3e/0x44 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194 [<c074775c>] ip_rcv_finish+0x29f/0x2c7 [<c07474bd>] ? ip_rcv_finish+0x0/0x2c7 [<c0747a6e>] NF_HOOK.clone.1+0x3d/0x44 [<c0747cae>] ip_rcv+0x1f5/0x233 [<c07474bd>] ? ip_rcv_finish+0x0/0x2c7 [<c071dce3>] __netif_receive_skb+0x310/0x336 [<c07221f3>] netif_receive_skb+0x4b/0x51 [<e0a4ed3d>] cp_rx_poll+0x1e7/0x29c [8139cp] [<c072275e>] net_rx_action+0x65/0x13a [<c0445a54>] __do_softirq+0xa1/0x149 [<c04459b3>] ? __do_softirq+0x0/0x149 <IRQ> [<c0445891>] ? irq_exit+0x37/0x72 [<c040a7e9>] ? do_IRQ+0x81/0x95 [<c07b3670>] ? common_interrupt+0x30/0x38 [<c0428058>] ? native_safe_halt+0xa/0xc [<c040f5d7>] ? default_idle+0x58/0x92 [<c0408fb0>] ? cpu_idle+0x96/0xb2 [<c0797989>] ? rest_init+0x5d/0x5f [<c09fd90c>] ? start_kernel+0x34b/0x350 [<c09fd0cb>] ? i386_start_kernel+0xba/0xc1 Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-01 21:45:51 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
David S. Miller	a84b50ceb7	sctp: Pass __GFP_NOWARN to hash table allocation attempts. Like DCCP and other similar pieces of code, there are mechanisms here to try allocating smaller hash tables if the allocation fails. So pass in __GFP_NOWARN like the others do instead of emitting a scary message. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-30 17:51:36 -07:00
David S. Miller	4c9483b2fb	ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	9cce96df5b	net: Put fl4_* macros to struct flowi4 and use them again. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	9d6ec93801	ipv4: Use flowi4 in public route lookup interfaces. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:48 -08:00
David S. Miller	6281dcc94a	net: Make flowi ports AF dependent. Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:46 -08:00
David S. Miller	1d28f42c1b	net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:44 -08:00
Hagen Paul Pfeifer	efea2c6b2e	sctp: several declared/set but unused fixes Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-07 15:51:14 -08:00
David S. Miller	b23dd4fe42	ipv4: Make output route lookup return rtable directly. Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-02 14:31:35 -08:00
Eric Dumazet	eaefd1105b	net: add __rcu annotations to sk_wq and wq Add proper RCU annotations/verbs to sk_wq and wq members Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access) Fix sunrpc sk_sleep() abuse too Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-22 10:19:31 -08:00
Shan Wei	59ed5aba9c	sctp: fix compile warnings in sctp_tsnmap_num_gabs net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’: net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-20 11:10:15 -08:00
Jiri Bohac	2205a6ea93	sctp: fix reporting of unknown parameters commit `5fa782c2f5` re-worked the handling of unknown parameters. sctp_init_cause_fixed() can now return -ENOSPC if there is not enough tailroom in the error chunk skb. When this happens, the error header is not appended to the error chunk. In that case, the payload of the unknown parameter should not be appended either. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-19 19:06:55 -08:00
Shan Wei	4580ccc04d	sctp: user perfect name for Delayed SACK Timer option The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK, not SCTP_DELAYED_ACK. Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK, for making compatibility with existing applications. Reference: 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK) （http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25) Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 16:51:29 -08:00
David S. Miller	b4aa9e05a6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h drivers/vhost/vhost.c	2010-12-17 12:27:22 -08:00
Wei Yongjun	7d743b7e95	sctp: fix the return value of getting the sctp partial delivery point Get the sctp partial delivery point using SCTP_PARTIAL_DELIVERY_POINT socket option should return 0 if success, not -ENOTSUPP. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:48:44 -08:00
Wei Yongjun	40a010395c	SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address SCTP_SET_PEER_PRIMARY_ADDR does not accpet v4mapped address, using v4mapped address in SCTP_SET_PEER_PRIMARY_ADDR socket option will get -EADDRNOTAVAIL error if v4map is enabled. This patch try to fix it by mapping v4mapped address to v4 address if allowed. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 15:29:49 -08:00
David Shwatrz	8917a3c0b7	Fix a typo in datagram.c and sctp/socket.c. Hi, This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c Regards, David Shwartz Signed-off-by: David Shwartz <dshwatrz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 13:10:11 -08:00
Eric Dumazet	8d987e5c75	net: avoid limits overflow Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-10 12:12:00 -08:00
Linus Torvalds	5f05647dd8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David	2010-10-23 11:47:02 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
David S. Miller	21a180cda0	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/Kconfig net/ipv4/tcp_timer.c	2010-10-04 11:56:38 -07:00
David S. Miller	9a7241c21b	sctp: Fix break indentation in sctp_ioctl(). Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 22:14:37 -07:00
Dan Rosenberg	51e97a12be	sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac() The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids array and attempts to ensure that only a supported hmac entry is returned. The current code fails to do this properly - if the last id in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the id integer remains set after exiting the loop, and the address of an out-of-bounds entry will be returned and subsequently used in the parent function, causing potentially ugly memory corruption. This patch resets the id integer to 0 on encountering an invalid id so that NULL will be returned after finishing the loop if no valid ids are found. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:58:49 -07:00
Dan Rosenberg	d7e0d19aa0	sctp: prevent reading out-of-bounds memory Two user-controlled allocations in SCTP are subsequently dereferenced as sockaddr structs, without checking if the dereferenced struct members fall beyond the end of the allocated chunk. There doesn't appear to be any information leakage here based on how these members are used and additional checking, but it's still worth fixing. [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion] Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:58:48 -07:00
David S. Miller	e40051d134	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/qlcnic/qlcnic_init.c net/ipv4/ip_output.c	2010-09-27 01:03:03 -07:00
Eric Dumazet	a02cec2155	net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-23 14:33:39 -07:00
Vlad Yasevich	4bdab43323	sctp: Do not reset the packet during sctp_packet_config(). sctp_packet_config() is called when getting the packet ready for appending of chunks. The function should not touch the current state, since it's possible to ping-pong between two transports when sending, and that can result packet corruption followed by skb overlfow crash. Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-17 16:47:56 -07:00
David S. Miller	e548833df8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c	2010-09-09 22:27:33 -07:00
Joe Perches	123031c0ee	sctp: fix test for end of loop Add a list_has_sctp_addr function to simplify loop Based on a patches by Dan Carpenter and David Miller Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-09 15:00:29 -07:00
Diego Elio 'Flameeyes' Pettenò	65040c33ee	sctp: implement SIOCINQ ioctl() (take 3) This simple patch copies the current approach for SIOCINQ ioctl() from DCCP into SCTP so that the userland code working with SCTP can use a similar interface across different protocols to know how much space to allocate for a buffer. Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-08 13:08:05 -07:00
Eric Dumazet	db40980fcd	net: poll() optimizations No need to test twice sk->sk_shutdown & RCV_SHUTDOWN Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-06 18:48:45 -07:00
Joe Perches	145ce502e4	net/sctp: Use pr_fmt and pr_<level> Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to use do { print } while (0) guards. Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when lines were continued. Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt Add a missing newline in "Failed bind hash alloc" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-26 14:11:48 -07:00
Linus Torvalds	3cfc2c42c1	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits) Documentation: update broken web addresses. fix comment typo "choosed" -> "chosen" hostap:hostap_hw.c Fix typo in comment Fix spelling contorller -> controller in comments Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault fs/Kconfig: Fix typo Userpace -> Userspace Removing dead MACH_U300_BS26 drivers/infiniband: Remove unnecessary casts of private_data fs/ocfs2: Remove unnecessary casts of private_data libfc: use ARRAY_SIZE scsi: bfa: use ARRAY_SIZE drm: i915: use ARRAY_SIZE drm: drm_edid: use ARRAY_SIZE synclink: use ARRAY_SIZE block: cciss: use ARRAY_SIZE comment typo fixes: charater => character fix comment typos concerning "challenge" arm: plat-spear: fix typo in kerneldoc reiserfs: typo comment fix update email address ...	2010-08-04 15:31:02 -07:00
Eric Dumazet	1823e4c80e	snmp: add align parameter to snmp_mib_init() In preparation for 64bit snmp counters for some mibs, add an 'align' parameter to snmp_mib_init(), instead of assuming mibs only contain 'unsigned long' fields. Callers can use __alignof__(type) to provide correct alignment. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:17 -07:00
Uwe Kleine-König	421f91d21a	fix typos concerning "initiali[zs]e" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-06-16 18:05:05 +02:00
Changli Gao	d8d1f30b95	net-next: remove useless union keyword remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-10 23:31:35 -07:00
Eric Dumazet	bc10502dba	net: use __packed annotation cleanup patch. Use new __packed annotation in net/ and include/ (except netfilter) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-03 03:21:52 -07:00
Joe Perches	3fa21e07e6	net: Remove unnecessary returns from void function()s This patch removes from net/ (but not any netfilter files) all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ \| \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 23:23:14 -07:00
Wei Yongjun	2e3219b5c8	sctp: fix append error cause to ERROR chunk correctly commit `5fa782c2f5` sctp: Fix skb_over_panic resulting from multiple invalid \ parameter errors (CVE-2010-1173) (v4) cause 'error cause' never be add the the ERROR chunk due to some typo when check valid length in sctp_init_cause_fixed(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 22:51:58 -07:00
David S. Miller	6811d58fc1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/linux/if_link.h	2010-05-16 22:26:58 -07:00
Wei Yongjun	55fa0cfd7c	sctp: delete active ICMP proto unreachable timer when free transport transport may be free before ICMP proto unreachable timer expire, so we should delete active ICMP proto unreachable timer when transport is going away. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-16 00:46:22 -07:00
Amerigo Wang	e3826f1e94	net: reserve ports for applications using fixed port numbers (Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-15 23:28:40 -07:00
David S. Miller	278554bd65	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ath/ar9170/usb.c drivers/scsi/iscsi_tcp.c net/ipv4/ipmr.c	2010-05-12 00:05:35 -07:00
Vlad Yasevich	50b5d6ad63	sctp: Fix a race between ICMP protocol unreachable and connect() ICMP protocol unreachable handling completely disregarded the fact that the user may have locked the socket. It proceeded to destroy the association, even though the user may have held the lock and had a ref on the association. This resulted in the following: Attempt to release alive inet socket f6afcc00 ========================= [ BUG: held lock freed! ] ------------------------- somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held there! (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c 1 lock held by somenu/2672: #0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c stack backtrace: Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55 Call Trace: [<c1232266>] ? printk+0xf/0x11 [<c1038553>] debug_check_no_locks_freed+0xce/0xff [<c10620b4>] kmem_cache_free+0x21/0x66 [<c1185f25>] __sk_free+0x9d/0xab [<c1185f9c>] sk_free+0x1c/0x1e [<c1216e38>] sctp_association_put+0x32/0x89 [<c1220865>] __sctp_connect+0x36d/0x3f4 [<c122098a>] ? sctp_connect+0x13/0x4c [<c102d073>] ? autoremove_wake_function+0x0/0x33 [<c12209a8>] sctp_connect+0x31/0x4c [<c11d1e80>] inet_dgram_connect+0x4b/0x55 [<c11834fa>] sys_connect+0x54/0x71 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239 [<c1054026>] ? might_fault+0x42/0x7c [<c1054026>] ? might_fault+0x42/0x7c [<c11847ab>] sys_socketcall+0x6d/0x178 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c1002959>] syscall_call+0x7/0xb This was because the sctp_wait_for_connect() would aqcure the socket lock and then proceed to release the last reference count on the association, thus cause the fully destruction path to finish freeing the socket. The simplest solution is to start a very short timer in case the socket is owned by user. When the timer expires, we can do some verification and be able to do the release properly. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 00:56:07 -07:00
David S. Miller	f546061840	Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev Add missing linux/vmalloc.h include to net/sctp/probe.c Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-03 16:24:31 -07:00
David S. Miller	7ef527377b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-05-02 22:02:06 -07:00
Eric Dumazet	4381548237	net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-01 15:00:15 -07:00
Vlad Yasevich	0e3aef8d09	sctp: Tag messages that can be Nagle delayed at creation. When we create the sctp_datamsg and fragment the user data, we know exactly if we are sending full segments or not and how they might be bundled. During this time, we can mark messages a Nagle capable or not. This makes the check at transmit time much simpler. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	bfa0d9843a	sctp: Optimize computation of highest new tsn in SACK. Right now, if the highest tsn in the SACK doesn't change, we'll end up scanning the transmitted lists on the transports twice: once for locating the highest _new_ tsn, and once for actually tagging chunks as acked. This is a waste, since we can record the highest _new_ tsn at the same time as tagging chunks. Long ago this was not possible because we would try to mark chunks as missing at the same time as tagging them acked and this approach didn't work. Now that the two steps are separate, we can re-use the old approach. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	ea862c8d1f	sctp: correctly mark missing chunks in fast recovery According to RFC 4960 Section 7.2.4: If an endpoint is in Fast Recovery and a SACK arrives that advances the Cumulative TSN Ack Point, the miss indications are incremented for all TSNs reported missing in the SACK. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	6588337189	sctp: rwnd_press should be cumulative rwnd_press tracks the pressure on the recieve window. Every timer the receive buffer overlows, we truncate the receive window and then grow it back. However, if we don't track the cumulative presser, it's possible to reach a situation when receive buffer is empty, but rwnd stays truncated. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	cf9b4812e1	sctp: fast recovery algorithm is per association. SCTP fast recovery algorithm really applies per association and impacts all transports. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	b2cf9b6bd9	sctp: update transport initializations Right now, sctp transports are not fully initialized and when adding any new fields, they have to be explicitely initialized. This is prone to mistakes. So we switch to calling kzalloc() which makes things much simpler. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	d9efc2231b	sctp: Do not force T3 timer on fast retransmissions. We don't need to force the T3 timer any more and it's actually wrong to do as it causes too long of a delay. The timer will be started if one is not running, but if one is running, we leave it alone. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Vlad Yasevich	ae19c54866	sctp: remove 'resent' bit from the chunk The 'resent' bit is used to make sure that we don't update rto estimate based on retransmitted chunks. However, we already have the 'rto_pending' bit that we test when need to update rto, so 'resent' bit is just extra. Additionally, we currently have a bug in that we always set a 'resent' bit and thus rto estimate is only updated by Heartbeats. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Vlad Yasevich	d598b166ce	sctp: Make sure we always return valid retransmit path commit 4951feda0c60d1ef681f1a270afdd617924ab041 sctp: Do no select unconfirmed transports for retransmissions added code to make sure that we do not select unconfirmed paths for data transmission. This caused a problem when there are only 2 paths, 1 unconfirmed and 1 unreachable. In that case, the next retransmit path returned is NULL and that causes a kernel crash. The solution is to only change retransmit paths if we found one to use. Reported-by: Frank Schuster <frank.schuster01@web.de> Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Dan Carpenter	b99a4d53a7	sctp: cleanup: remove duplicate assignment This assignment isn't needed because we did it earlier already. Also another reason to delete the assignment is because it triggers a Smatch warning about checking for NULL pointers after a dereference. Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Wei Yongjun	787a51a087	sctp: implement sctp association probing module This patch implement sctp association probing module, the module will be called sctp_probe. This module allows for capturing the changes to SCTP association state in response to incoming packets. It is used for debugging SCTP congestion control algorithms. Usage: $ modprobe sctp_probe [full=n] [port=n] [bufsize=n] $ cat /proc/net/sctpprobe The output format is: TIME ASSOC LPORT RPORT MTU RWND UNACK <REMOTE-ADDR STATE CWND SSTHRESH INFLIGHT PARTIAL_BYTES_ACKED MTU> ... The output will be like this: 9.226086 c4064c48 9000 8000 1500 53352 1 192.168.0.19 1 4380 54784 1252 0 1500 9.287195 c4064c48 9000 8000 1500 45144 5 192.168.0.19 1 5880 54784 6500 0 1500 9.289130 c4064c48 9000 8000 1500 42724 5 192.168.0.19 1 7380 54784 6500 0 1500 9.620332 c4064c48 9000 8000 1500 48284 4 192.168.0.19 1 8880 54784 5200 0 1500 ...... Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Shan Wei	ec7b951950	sctp: use sctp_chunk_is_data macro to decide a chunk is data chunk sctp_chunk_is_data macro is defined to decide that whether a chunk is data chunk or not. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Vlad Yasevich	fbdf501c93	sctp: Do no select unconfirmed transports for retransmissions An unconfirmed transport is one that we have not been able to reach since the beginning. There is no point in trying to retrasnmit data on those transports. Also, the specification forbids it due to security issues. Reported-by: Frank Schuster <frank.schuster01@web.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:39:26 -04:00
Wei Yongjun	bc4f841a05	sctp: fix to retranmit at least one DATA chunk While doing retranmit, if control chunk exists, such as FORWARD TSN chunk, and the DATA chunk can not be bundled with this control chunk because of PMTU limit, no DATA chunk will be retranmitted in the current implementation. This patch makes sure to retranmit at least one DATA chunk in this case. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:38:53 -04:00
Wei Yongjun	6429d3dc4b	sctp: missing set src and dest port while lookup output route While lookup the output route, we do not set the src and dest port. This will cause we got a wrong route if we had set the outbund transport to IPsec with src or dst port. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 21:42:44 -04:00
Wei Yongjun	bd69b981a3	sctp: assure at least one T3-rtx timer is running if a FORWARD TSN is sent PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP: C5) If a FORWARD TSN is sent, the sender MUST assure that at least one T3-rtx timer is running. So this patch fix to assure at least one T3-rtx timer is running if a FORWARD TSN is or will to sent. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 21:42:43 -04:00
Vlad Yasevich	c17b02b38a	sctp: send SHUTDOWN-ACK chunk back to the source. SHUTDOWN-ACK is alaways sent to the primary path at the first time, but should better transmit SHUTDOWN-ACK chunk to the same destination transport address from which it received the SHUTDOWN chunk. Based on the work from Wei Yongjun <yjwei@cn.fujitsu.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 21:42:43 -04:00
Vlad Yasevich	a5f4cea74f	sctp: Use correct address family in sctp_getsockopt_peer_addrs() The function should use the address family of the address when trying to determine the length of the structure. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 21:42:42 -04:00
Neil Horman	5fa782c2f5	sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4) Ok, version 4 Change Notes: 1) Minor cleanups, from Vlads notes Summary: Hey- Recently, it was reported to me that the kernel could oops in the following way: <5> kernel BUG at net/core/skbuff.c:91! <5> invalid operand: 0000 [#1] <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U) vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod <5> CPU: 0 <5> EIP: 0060:[<c02bff27>] Not tainted VLI <5> EFLAGS: 00010216 (2.6.9-89.0.25.EL) <5> EIP is at skb_over_panic+0x1f/0x2d <5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44 <5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40 <5> ds: 007b es: 007b ss: 0068 <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0) <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180 e0c2947d <5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004 df653490 <5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e 00000004 <5> Call Trace: <5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp] <5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp] <5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp] <5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp] <5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp] <5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp] <5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp] <5> [<c01555a4>] cache_grow+0x140/0x233 <5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp] <5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp] <5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp] <5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter] <5> [<c02d005e>] nf_iterate+0x40/0x81 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151 <5> [<c02d0362>] nf_hook_slow+0x83/0xb5 <5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e103e>] ip_rcv+0x334/0x3b4 <5> [<c02c66fd>] netif_receive_skb+0x320/0x35b <5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd] <5> [<c02c67a4>] process_backlog+0x6c/0xd9 <5> [<c02c690f>] net_rx_action+0xfe/0x1f8 <5> [<c012a7b1>] __do_softirq+0x35/0x79 <5> [<c0107efb>] handle_IRQ_event+0x0/0x4f <5> [<c01094de>] do_softirq+0x46/0x4d Its an skb_over_panic BUG halt that results from processing an init chunk in which too many of its variable length parameters are in some way malformed. The problem is in sctp_process_unk_param: if (NULL == errp) errp = sctp_make_op_error_space(asoc, chunk, ntohs(chunk->chunk_hdr->length)); if (errp) { sctp_init_cause(errp, SCTP_ERROR_UNKNOWN_PARAM, WORD_ROUND(ntohs(param.p->length))); sctp_addto_chunk(errp, WORD_ROUND(ntohs(param.p->length)), param.v); When we allocate an error chunk, we assume that the worst case scenario requires that we have chunk_hdr->length data allocated, which would be correct nominally, given that we call sctp_addto_chunk for the violating parameter. Unfortunately, we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error chunk, so the worst case situation in which all parameters are in violation requires chunk_hdr->length+(sizeof(sctp_errhdr_t)param_count) bytes of data. The result of this error is that a deliberately malformed packet sent to a listening host can cause a remote DOS, described in CVE-2010-1173: http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173 I've tested the below fix and confirmed that it fixes the issue. We move to a strategy whereby we allocate a fixed size error chunk and ignore errors we don't have space to report. Tested by me successfully Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 14:22:01 -07:00
Vlad Yasevich	c078669340	sctp: Fix oops when sending queued ASCONF chunks When we finish processing ASCONF_ACK chunk, we try to send the next queued ASCONF. This action runs the sctp state machine recursively and it's not prepared to do so. kernel BUG at kernel/timer.c:790! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/module/ipv6/initstate Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0 EIP is at add_timer+0xd/0x1b EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4 ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000) Stack: c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004 <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14 00000004 <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14 000000d0 Call Trace: [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp] [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp] [<d1863386>] ? sctp_pname+0x0/0x1d [sctp] [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp] [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp] [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp] [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp] [<d1863334>] ? sctp_cname+0x0/0x52 [sctp] [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp] [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp] [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp] Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie> Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:34 -07:00
Wei Yongjun	a8170c35e7	sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set When calculating the INIT/INIT-ACK chunk length, we should not only account the length of parameters, but also the parameters zero padding length, such as AUTH HMACS parameter and CHUNKS parameter. Without the parameters zero padding length we may get following oops. skb_over_panic: text:ce2068d2 len:130 put:6 head:cac3fe00 data:cac3fe00 tail:0xcac3fe82 end:0xcac3fe80 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:127! invalid opcode: 0000 [#2] SMP last sysfs file: /sys/module/aes_generic/initstate Modules linked in: authenc ...... Pid: 4102, comm: sctp_darn Tainted: G D 2.6.34-rc2 #6 EIP: 0060:[<c0607630>] EFLAGS: 00010282 CPU: 0 EIP is at skb_over_panic+0x37/0x3e EAX: 00000078 EBX: c07c024b ECX: c07c02b9 EDX: cb607b78 ESI: 00000000 EDI: cac3fe7a EBP: 00000002 ESP: cb607b74 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process sctp_darn (pid: 4102, ti=cb607000 task=cabdc990 task.ti=cb607000) Stack: c07c02b9 ce2068d2 00000082 00000006 cac3fe00 cac3fe00 cac3fe82 cac3fe80 <0> c07c024b cac3fe7c cac3fe7a c0608dec ca986e80 ce2068d2 00000006 0000007a <0> cb8120ca ca986e80 cb812000 00000003 cb8120c4 ce208a25 cb8120ca cadd9400 Call Trace: [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp] [<c0608dec>] ? skb_put+0x2e/0x32 [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp] [<ce208a25>] ? sctp_make_init+0x279/0x28c [sctp] [<c0686a92>] ? apic_timer_interrupt+0x2a/0x30 [<ce1fdc0b>] ? sctp_sf_do_prm_asoc+0x2b/0x7b [sctp] [<ce202823>] ? sctp_do_sm+0xa0/0x14a [sctp] [<ce2133b9>] ? sctp_pname+0x0/0x14 [sctp] [<ce211d72>] ? sctp_primitive_ASSOCIATE+0x2b/0x31 [sctp] [<ce20f3cf>] ? sctp_sendmsg+0x7a0/0x9eb [sctp] [<c064eb1e>] ? inet_sendmsg+0x3b/0x43 [<c04244b7>] ? task_tick_fair+0x2d/0xd9 [<c06031e1>] ? sock_sendmsg+0xa7/0xc1 [<c0416afe>] ? smp_apic_timer_interrupt+0x6b/0x75 [<c0425123>] ? dequeue_task_fair+0x34/0x19b [<c0446abb>] ? sched_clock_local+0x17/0x11e [<c052ea87>] ? _copy_from_user+0x2b/0x10c [<c060ab3a>] ? verify_iovec+0x3c/0x6a [<c06035ca>] ? sys_sendmsg+0x186/0x1e2 [<c042176b>] ? __wake_up_common+0x34/0x5b [<c04240c2>] ? __wake_up+0x2c/0x3b [<c057e35c>] ? tty_wakeup+0x43/0x47 [<c04430f2>] ? remove_wait_queue+0x16/0x24 [<c0580c94>] ? n_tty_read+0x5b8/0x65e [<c042be02>] ? default_wake_function+0x0/0x8 [<c0604e0e>] ? sys_socketcall+0x17f/0x1cd [<c040264c>] ? sysenter_do_call+0x12/0x22 Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 ...... EIP: [<c0607630>] skb_over_panic+0x37/0x3e SS:ESP 0068:cb607b74 To reproduce: # modprobe sctp # echo 1 > /proc/sys/net/sctp/addip_enable # echo 1 > /proc/sys/net/sctp/auth_enable # sctp_test -H 3ffe:501:ffff💯20c:29ff:fe4d:f37e -P 800 -l # sctp_darn -H 3ffe:501:ffff💯20c:29ff:fe4d:f37e -P 900 -h 192.168.0.21 -p 800 -I -s -t sctp_darn ready to send... 3ffe:501:ffff💯20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.0.21 3ffe:501:ffff💯20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.1.21 3ffe:501:ffff💯20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> snd=10 ------------------------------------------------------------------ eth0 has addresses: 3ffe:501:ffff💯20c:29ff:fe4d:f37e and 192.168.0.21 eth1 has addresses: 192.168.1.21 ------------------------------------------------------------------ Reported-by: George Cheimonidis <gchimon@gmail.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:33 -07:00
Vlad Yasevich	81419d862d	sctp: per_cpu variables should be in bh_disabled section Since the change of the atomics to percpu variables, we now have to disable BH in process context when touching percpu variables. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:33 -07:00
Vlad Yasevich	0c42749cff	sctp: fix potential reference of a freed pointer When sctp attempts to update an assocition, it removes any addresses that were not in the updated INITs. However, the loop may attempt to refrence a transport with address after removing it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:32 -07:00
Wei Yongjun	561b1733a4	sctp: avoid irq lock inversion while call sk->sk_data_ready() sk->sk_data_ready() of sctp socket can be called from both BH and non-BH contexts, but the default sk->sk_data_ready(), sock_def_readable(), can not be used in this case. Therefore, we have to make a new function sctp_data_ready() to grab sk->sk_data_ready() with BH disabling. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #129 --------------------------------------------------------- sctp_darn/1517 just changed the state of lock: (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80 but this lock took another, SOFTIRQ-unsafe lock in the past: (slock-AF_INET){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 1 lock held by sctp_darn/1517: #0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:31 -07:00
Eric Dumazet	c377411f24	net: sk_add_backlog() take rmem_alloc into account Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 15:13:20 -07:00
Eric Dumazet	aa39514516	net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t sk_sleep(struct sock sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 16:37:13 -07:00
Shan Wei	4e15ed4d93	net: replace ipfragok with skb->local_df As Herbert Xu said: we should be able to simply replace ipfragok with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function) has droped ipfragok and set local_df value properly. The patch kills the ipfragok parameter of .queue_xmit(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-15 23:36:37 -07:00
David S. Miller	871039f02f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c	2010-04-11 14:53:53 -07:00
Brian Haley	486f50ca79	SCTP: Change to use ipv6_addr_copy() Change SCTP IPv6 code to use ipv6_addr_copy() Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-03 15:10:21 -07:00
Hagen Paul Pfeifer	b68c92460d	sctp: eliminate useless code Remove duplicate declaration of symbol: struct hlist_node *node was already declared, the seconds declaration shadows the first one. CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-30 23:58:22 -07:00
YOSHIFUJI Hideaki / 吉藤英明	de7737e056	sctp: Use ipv6_addr_diff() in sctp_v6_addr_match_len(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-30 23:28:47 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
stephen hemminger	502a2ffd73	ipv6: convert idev_list to list macros Convert to list macro's for the list of addresses per interface in IPv6. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-20 15:45:09 -07:00
Linus Torvalds	d89b218b80	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits) bridge: ensure to unlock in error path in br_multicast_query(). drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages sky2: Avoid rtnl_unlock without rtnl_lock ipv6: Send netlink notification when DAD fails drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant ipconfig: Handle devices which take some time to come up. mac80211: Fix memory leak in ieee80211_if_write() mac80211: Fix (dynamic) power save entry ipw2200: use kmalloc for large local variables ath5k: read eeprom IQ calibration values correctly for G mode ath5k: fix I/Q calibration (for real) ath5k: fix TSF reset ath5k: use fixed antenna for tx descriptors libipw: split ieee->networks into small pieces mac80211: Fix sta_mtx unlocking on insert STA failure path rt2x00: remove KSEG1ADDR define from rt2x00soc.h net: add ColdFire support to the smc91x driver asix: fix setting mac address for AX88772 ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}. net: Fix dev_mc_add() ...	2010-03-13 14:50:18 -08:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Zhu Yi	a3a858ff18	net: backlog functions rename sk_add_backlog -> __sk_add_backlog sk_add_backlog_limited -> sk_add_backlog Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-05 13:34:03 -08:00
Zhu Yi	50b1a782f8	sctp: use limited socket backlog Make sctp adapt to the limited socket backlog change. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-05 13:34:01 -08:00
Alexey Dobriyan	dc4c2c3105	net: remove INIT_RCU_HEAD() usage call_rcu() will unconditionally reinitialize RCU head anyway. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-17 00:03:27 -08:00
Tejun Heo	7d720c3e4f	percpu: add __percpu sparse annotations to net Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_() users which used to cast the argument to (void ) are updated to cast it to (void __percpu *). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-16 23:05:38 -08:00
Daniel Mack	3ad2f3fbb9	tree-wide: Assorted spelling fixes In particular, several occurances of funny versions of 'success', 'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address', 'beginning', 'desirable', 'separate' and 'necessary' are fixed. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Joe Perches <joe@perches.com> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-09 11:13:56 +01:00
Alexey Dobriyan	5833929cc2	net: constify MIB name tables Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-23 01:21:27 -08:00
Julia Lawall	09cb47a2c6	net/sctp: Eliminate useless code The variable newinet is initialized twice to the same (side effect-free) expression. Drop one initialization. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) // <smpl> @forall@ idexpression x; identifier f!=ERR_PTR; @@ x = f(...) ... when != x ( x = f(...,<+...x...+>,...) \| x = f(...) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-21 02:43:20 -08:00
Andrew Morton	8ffd32083c	net/sctp/socket.c: squish warning net/sctp/socket.c: In function 'sctp_setsockopt_autoclose': net/sctp/socket.c:2090: warning: comparison is always false due to limited range of data type Cc: Andrei Pelinescu-Onciul <andrei@iptel.org> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-03 21:25:53 -08:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
Linus Torvalds	a252e749f1	sctp: fix compile error due to sysctl mismerge I messed up the merge in `d7fc02c7ba`, where the conflict in question wasn't just about CTL_UNNUMBERED being removed, but the 'strategy' field is too (sysctl handling is now done through the /proc interface, with no duplicate protocols for reading the data). Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-08 12:51:22 -08:00
Linus Torvalds	d7fc02c7ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c	2009-12-08 07:55:01 -08:00
Linus Torvalds	1557d33007	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...	2009-12-08 07:38:50 -08:00
Jiri Kosina	d014d04386	Merge branch 'for-next' into for-linus Conflicts: kernel/irq/chip.c	2009-12-07 18:36:35 +01:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Thadeu Lima de Souza Cascardo	94e2bd6888	tree-wide: fix some typos and punctuation in comments fix some typos and punctuation in comments Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:48 +01:00
Andrei Pelinescu-Onciul	810c07194f	sctp: fix sctp_setsockopt_autoclose compile warning Fix the following warning, when building on 64 bits: net/sctp/socket.c:2091: warning: large integer implicitly truncated to unsigned type Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-02 01:16:49 -08:00
Joe Perches	f64f9e7192	net: Move && and \|\| to end of previous line Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-29 16:55:45 -08:00
David S. Miller	9b963e5d0e	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/ieee802154/fakehard.c drivers/net/e1000e/ich8lan.c drivers/net/e1000e/phy.c drivers/net/netxen/netxen_nic_init.c drivers/net/wireless/ath/ath9k/main.c	2009-11-29 00:57:15 -08:00
Andrei Pelinescu-Onciul	5fdd4baef6	sctp: on T3_RTX retransmit all the in-flight chunks When retransmitting due to T3 timeout, retransmit all the in-flight chunks for the corresponding transport/path, including chunks sent less then 1 rto ago. This is the correct behaviour according to rfc4960 section 6.3.3 E3 and "Note: Any DATA chunks that were sent to the address for which the T3-rtx timer expired but did not fit in one MTU (rule E3 above) should be marked for retransmission and sent as soon as cwnd allows (normally, when a SACK arrives). ". This fixes problems when more then one path is present and the T3 retransmission of the first chunk that timeouts stops the T3 timer for the initial active path, leaving all the other in-flight chunks waiting forever or until a new chunk is transmitted on the same path and timeouts (and this will happen only if the cwnd allows sending new chunks, but since cwnd was dropped to MTU by the timeout => it will wait until the first heartbeat). Example: 10 packets in flight, sent at 0.1 s intervals on the primary path. The primary path is down and the first packet timeouts. The first packet is retransmitted on another path, the T3 timer for the primary path is stopped and cwnd is set to MTU. All the other 9 in-flight packets will not be retransmitted (unless more new packets are sent on the primary path which depend on cwnd allowing it, and even in this case the 9 packets will be retransmitted only after a new packet timeouts which even in the best case would be more then RTO). This commit reverts `d0ce92910b` and also removes the now unused transport->last_rto, introduced in `b6157d8e03`. p.s The problem is not only when multiple paths are there. It can happen in a single homed environment. If the application stops sending data, it possible to have a hung association. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-29 00:14:02 -08:00
Vlad Yasevich	4814326b59	sctp: prevent too-fast association id reuse We use the idr subsystem and always ask for an id at or above 1. This results in a id reuse when one association is terminated while another is created. To prevent re-use, we keep track of the last id returned and ask for that id + 1 as a base for each query. We let the idr spin lock protect this base id as well. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:54:01 -05:00
Andrei Pelinescu-Onciul	da85b7396f	sctp: fix integer overflow when setting the autoclose timer When setting the autoclose timeout in jiffies there is a possible integer overflow if the value in seconds is very large (e.g. for 2^22 s with HZ=1024). The problem appears even on 64-bit due to the integer promotion rules. The fix is just a cast to unsigned long. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:54:01 -05:00
Andrei Pelinescu-Onciul	f6778aab6c	sctp: limit maximum autoclose setsockopt value To avoid overflowing the maximum timer interval when transforming the autoclose interval from seconds to jiffies, limit the maximum autoclose value to MAX_SCHEDULE_TIMEOUT/HZ. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:54:01 -05:00
Neil Horman	d8dd15781d	sctp: Fix mis-ordering of user space data when multihoming in use Recently had a bug reported to me, in which the user was sending packets with a payload containing a sequence number. The packets were getting delivered in order according the chunk TSN values, but the sequence values in the payload were arriving out of order. At first I thought it must be an application error, but we eventually found it to be a problem on the transmit side in the sctp stack. The conditions for the error are that multihoming must be in use, and it helps if each transport has a different pmtu. The problem occurs in sctp_outq_flush. Basically we dequeue packets from the data queue, and attempt to append them to the orrered packet for a given transport. After we append a data chunk we add the trasport to the end of a list of transports to have their packets sent at the end of sctp_outq_flush. The problem occurs when a data chunks fills up a offered packet on a transport. The function that does the appending (sctp_packet_transmit_chunk), will try to call sctp_packet_transmit on the full packet, and then append the chunk to a new packet. This call to sctp_packet_transmit, sends that packet ahead of the others that may be queued in the transport_list in sctp_outq_flush. The result is that frames that were sent in one order from the user space sending application get re-ordered prior to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing of data payloads, even though tsn ordering is correct. The fix is to change where we assign a tsn. By doing this earlier, we are then free to place chunks in packets, whatever way we see fit and the protocol will make sure to do all the appropriate re-ordering on receive as is needed. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: William Reich <reich@ulticom.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:54:00 -05:00
Vlad Yasevich	46d5a80855	sctp: Update max.burst implementation Current implementation of max.burst ends up limiting new data during cwnd decay period. The decay is happening becuase the connection is idle and we are allowed to fill the congestion window. The point of max.burst is to limit micro-bursts in response to large acks. This still happens, as max.burst is still applied to each transmit opportunity. It will also apply if a very large send is made (greater then allowed by burst). Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:54:00 -05:00
Vlad Yasevich	245cba7e55	sctp: Remove useless last_time_used variable The transport last_time_used variable is rather useless. It was only used when determining if CWND needs to be updated due to idle transport. However, idle transport detection was based on a Heartbeat timer and last_time_used was not incremented when sending Heartbeats. As a result the check for cwnd reduction was always true. We can get rid of the variable and just base our cwnd manipulation on the HB timer (like the code comment sais). We also have to call into the cwnd manipulation function regardless of whether HBs are enabled or not. That way we will detect idle transports if the user has disabled Heartbeats. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:58 -05:00
Amerigo Wang	a242b41ded	sctp: remove deprecated SCTP_GET__OLD stuffs SCTP_GET__OLD stuffs are schedlued to be removed. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:58 -05:00
Andrei Pelinescu-Onciul	37051f7386	sctp: allow setting path_maxrxt independent of SPP_PMTUD_ENABLE Since draft-ietf-tsvwg-sctpsocket-15.txt, setting the SPP_MTUD_ENABLE flag when changing pathmaxrxt via the SCTP_PEER_ADDR_PARAMS setsockopt is not required any longer. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:57 -05:00
Vlad Yasevich	90f2f5318b	sctp: Update SWS avaoidance receiver side algorithm We currently send window update SACKs every time we free up 1 PMTU worth of data. That a lot more SACKs then necessary. Instead, we'll now send back the actuall window every time we send a sack, and do window-update SACKs when a fraction of the receive buffer has been opened. The fraction is controlled with a sysctl. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:57 -05:00
Vlad Yasevich	e0e9db178a	sctp: Select a working primary during sctp_connectx() When sctp_connectx() is used, we pick the first address as primary, even though it may not have worked. This results in excessive retransmits and poor performance. We should select the address that the association was established with. Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:57 -05:00
Vlad Yasevich	6383cfb3ed	sctp: Fix malformed "Invalid Stream Identifier" error The "Invalid Stream Identifier" error has a 16 bit reserved field at the end, thus making the parameter length be 8 bytes. We've never supplied that reserved field making wireshark tag the packet as malformed. Reported-by: Chris Dischino <cdischino@sonusnet.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:56 -05:00
Wei Yongjun	b93d647174	sctp: implement the sender side for SACK-IMMEDIATELY extension This patch implement the sender side for SACK-IMMEDIATELY extension. Section 4.1. Sender Side Considerations Whenever the sender of a DATA chunk can benefit from the corresponding SACK chunk being sent back without delay, the sender MAY set the I-bit in the DATA chunk header. Reasons for setting the I-bit include o The sender is in the SHUTDOWN-PENDING state. o The application requests to set the I-bit of the last DATA chunk of a user message when providing the user message to the SCTP implementation. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:56 -05:00
Wei Yongjun	6dc7694f9d	sctp: implement the receiver side for SACK-IMMEDIATELY extension This patch implement the receiver side for SACK-IMMEDIATELY extension: Section 4.2. Receiver Side Considerations On reception of an SCTP packet containing a DATA chunk with the I-bit set, the receiver SHOULD NOT delay the sending of the corresponding SACK chunk and SHOULD send it back immediately. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2009-11-23 15:53:53 -05:00
Eric W. Biederman	6d4561110a	sysctl: Drop & in front of every proc_handler. For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2009-11-18 08:37:40 -08:00
David S. Miller	a2bfbc072e	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/can/Kconfig	2009-11-17 00:05:02 -08:00
Vlad Yasevich	a78102e74e	sctp: Set socket source address when additing first transport Recent commits sctp: Get rid of an extra routing lookup when adding a transport and sctp: Set source addresses on the association before adding transports changed when routes are added to the sctp transports. As such, we didn't set the socket source address correctly when adding the first transport. The first transport is always the primary/active one, so when adding it, set the socket source address. This was causing regression failures in SCTP tests. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-13 19:56:52 -08:00
Vlad Yasevich	f9c67811eb	sctp: Fix regression introduced by new sctp_connectx api A new (unrealeased to the user) sctp_connectx api `c6ba68a266` sctp: support non-blocking version of the new sctp_connectx() API introduced a regression cought by the user regression test suite. In particular, the API requires the user library to re-allocate the buffer and could potentially trigger a SIGFAULT. This change corrects that regression by passing the original address buffer to the kernel unmodified, but still allows for a returned association id. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-13 19:56:51 -08:00

... 3 4 5 6 7 ...

1055 Коммитов