WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Ben Hutchings	c445477d74	net: RPS: Enable hardware acceleration of RFS Allow drivers for multiqueue hardware with flow filter tables to accelerate RFS. The driver must: 1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion IRQs (in queue order). This will provide a mapping from CPUs to the queues for which completions are handled nearest to them. 2. Implement net_device_ops::ndo_rx_flow_steer. This operation adds or replaces a filter steering the given flow to the given RX queue, if possible. 3. Periodically remove filters for which rps_may_expire_flow() returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-24 14:53:01 -08:00
David S. Miller	5bdc22a565	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/sched/sch_hfsc.c net/sched/sch_htb.c net/sched/sch_tbf.c	2011-01-24 14:09:35 -08:00
David S. Miller	e92427b289	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2011-01-24 13:17:06 -08:00
Eric Dumazet	c506653d35	net: arp_ioctl() must hold RTNL Commit `941666c2e3` "net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()" introduced a regression, reported by Jamie Heilman. "arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert in pneigh_lookup() Removing RTNL requirement from arp_ioctl() was a mistake, just revert that part. Reported-by: Jamie Heilman <jamie@audible.transient.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-24 13:16:16 -08:00
Rusty Russell	577d6a7c3a	module: fix missing semicolons in MODULE macro usage You always needed them when you were a module, but the builtin versions of the macros used to be more lenient. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-01-24 14:32:54 +10:30
Eric Dumazet	23624935e0	net_sched: TCQ_F_CAN_BYPASS generalization Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq SFQ is special because it can have external classifiers, and in these cases, we cannot bypass queue discipline (packet could be dropped by classifier) without admin asking it, or further changes. Its worth doing this, especially for SFQ, avoiding dirtying memory in case no packets are already waiting in queue. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-21 16:26:09 -08:00
Eric Dumazet	bb134d2298	net: netif_setup_tc() is static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-21 13:08:27 -08:00
Eric Dumazet	9190b3b320	net_sched: accurate bytes/packets stats/rates In commit `44b8288308` (net_sched: pfifo_head_drop problem), we fixed a problem with pfifo_head drops that incorrectly decreased sch->bstats.bytes and sch->bstats.packets Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a previously enqueued packet, and bstats cannot be changed, so bstats/rates are not accurate (over estimated) This patch changes the qdisc_bstats updates to be done at dequeue() time instead of enqueue() time. bstats counters no longer account for dropped frames, and rates are more correct, since enqueue() bursts dont have effect on dequeue() rate. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 23:31:33 -08:00
Patrick McHardy	ffa934f192	rtnetlink: fix link attribute validation with IFLA_GROUP rtnl_group_changelink() is invoked by rtnl_newlink() before the link attributes have been validated. Additionally the group changes are performed even if NLM_F_CREATE is specified and a new link is created, while more reasonable semantics would be to set the group value on the newly created link. Fix both problems by moving the rtnl_group_changelink() invocation down to the handling of non-existant links without NLM_F_CREATE() and add a dev_set_group() call to rtnl_create_link(). Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Vlad Dogaru <ddvlad@rosedu.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 23:28:54 -08:00
David Rientjes	6a108a14fa	kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option is used to configure any non-standard kernel with a much larger scope than only small devices. This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes references to the option throughout the kernel. A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Calling the option "EXPERT" more accurately represents its intention: only expert users who understand the impact of the configuration changes they are making should enable it. Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <david.woodhouse@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg KH <gregkh@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-20 17:02:05 -08:00
Eric Dumazet	f2eda47df4	ipv6: raw: rcu annotations Remove sparse warnings, using a function typedef to be able to use __rcu annotation on mh_filter pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:34 -08:00
Eric Dumazet	6193d2be29	neigh: __rcu annotations fix some minor issues and sparse (__rcu) warnings Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:34 -08:00
Eric Dumazet	753ea8e962	net: ipv6: sit: fix rcu annotations Fix minor __rcu annotations and remove sparse warnings Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:33 -08:00
Eric Dumazet	a2da570d62	net_sched: RCU conversion of stab This patch converts stab qdisc management to RCU, so that we can perform the qdisc_calculate_pkt_len() call before getting qdisc lock. This shortens the lock's held time in __dev_xmit_skb(). This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of cache misses and so reducing latencies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:32 -08:00
Eric Dumazet	fd245a4adb	net_sched: move TCQ_F_THROTTLED flag In commit `3711210576` (net: QDISC_STATE_RUNNING dont need atomic bit ops) I moved QDISC_STATE_RUNNING flag to __state container, located in the cache line containing qdisc lock and often dirtied fields. I now move TCQ_F_THROTTLED bit too, so that we let first cache line read mostly, and shared by all cpus. This should speedup HTB/CBQ for example. Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an "unsigned int" for __state container, reducing by 8 bytes Qdisc size. Introduce helpers to hide implementation details. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:32 -08:00
Eric Dumazet	817fb15dfd	net_sched: sfq: allow divisor to be a parameter SFQ currently uses a 1024 slots hash table, and its internal structure (sfq_sched_data) allocation needs order-1 page on x86_64 Allow tc command to specify a divisor value (hash table size), between 1 and 65536. If no value is provided, assume the 1024 default size. This allows admins to setup smaller (or bigger) SFQ for specific needs. This also brings back sfq_sched_data allocations to order-0 ones, saving 3KB per SFQ qdisc. Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of memory. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:59:16 -08:00
Eric Dumazet	3fbd8758b0	net: dev_close_many() is static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Octavian Purdila <opurdila@ixiacom.com> Reviewed-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 16:55:30 -08:00
Eric Dumazet	bced94ed5e	netfilter: add a missing include in nf_conntrack_reasm.c After commit `ae90bdeaea` (netfilter: fix compilation when conntrack is disabled but tproxy is enabled) we have following warnings : net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol 'nf_ct_frag6_gather' was not declared. Should it be static? net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol 'nf_ct_frag6_output' was not declared. Should it be static? net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol 'nf_ct_frag6_init' was not declared. Should it be static? net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol 'nf_ct_frag6_cleanup' was not declared. Should it be static? Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-20 21:00:38 +01:00
Changli Gao	41a7cab6d3	netfilter: nf_nat: place conntrack in source hash after SNAT is done If SNAT isn't done, the wrong info maybe got by the other cts. As the filter table is after DNAT table, the packets dropped in filter table also bother bysource hash table. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-20 15:49:52 +01:00
Patrick McHardy	82d800d8e7	Merge branch 'connlimit' of git://dev.medozas.de/linux Conflicts: Documentation/feature-removal-schedule.txt Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-20 10:33:55 +01:00
Florian Westphal	28a51ba59a	netfilter: do not omit re-route check on NF_QUEUE verdict ret != NF_QUEUE only works in the "--queue-num 0" case; for queues > 0 the test should be '(ret & NF_VERDICT_MASK) != NF_QUEUE'. However, NF_QUEUE no longer DROPs the skb unconditionally if queueing fails (due to NF_VERDICT_FLAG_QUEUE_BYPASS verdict flag), so the re-route test should also be performed if this flag is set in the verdict. The full test would then look something like && ((ret & NF_VERDICT_MASK) == NF_QUEUE && (ret & NF_VERDICT_FLAG_QUEUE_BYPASS)) This is rather ugly, so just remove the NF_QUEUE test altogether. The only effect is that we might perform an unnecessary route lookup in the NF_QUEUE case. ip6table_mangle did not have such a check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-20 10:23:26 +01:00
David S. Miller	a07aa004c8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2011-01-20 00:06:15 -08:00
Eric Dumazet	cc7ec456f8	net_sched: cleanups Cleanup net/sched code to current CodingStyle and practices. Reduce inline abuse Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:12 -08:00
Alban Crequy	7180a03118	af_unix: coding style: remove one level of indentation in unix_shutdown() Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:11 -08:00
John Fastabend	b8970f0bfc	net_sched: implement a root container qdisc sch_mqprio This implements a mqprio queueing discipline that by default creates a pfifo_fast qdisc per tx queue and provides the needed configuration interface. Using the mqprio qdisc the number of tcs currently in use along with the range of queues alloted to each class can be configured. By default skbs are mapped to traffic classes using the skb priority. This mapping is configurable. Configurable parameters, struct tc_mqprio_qopt { __u8 num_tc; __u8 prio_tc_map[TC_BITMASK + 1]; __u8 hw; __u16 count[TC_MAX_QUEUE]; __u16 offset[TC_MAX_QUEUE]; }; Here the count/offset pairing give the queue alignment and the prio_tc_map gives the mapping from skb->priority to tc. The hw bit determines if the hardware should configure the count and offset values. If the hardware bit is set then the operation will fail if the hardware does not implement the ndo_setup_tc operation. This is to avoid undetermined states where the hardware may or may not control the queue mapping. Also minimal bounds checking is done on the count/offset to verify a queue does not exceed num_tx_queues and that queue ranges do not overlap. Otherwise it is left to user policy or hardware configuration to create useful mappings. It is expected that hardware QOS schemes can be implemented by creating appropriate mappings of queues in ndo_tc_setup(). One expected use case is drivers will use the ndo_setup_tc to map queue ranges onto 802.1Q traffic classes. This provides a generic mechanism to map network traffic onto these traffic classes and removes the need for lower layer drivers to know specifics about traffic types. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:11 -08:00
John Fastabend	4f57c087de	net: implement mechanism for HW based QOS This patch provides a mechanism for lower layer devices to steer traffic using skb->priority to tx queues. This allows for hardware based QOS schemes to use the default qdisc without incurring the penalties related to global state and the qdisc lock. While reliably receiving skbs on the correct tx ring to avoid head of line blocking resulting from shuffling in the LLD. Finally, all the goodness from txq caching and xps/rps can still be leveraged. Many drivers and hardware exist with the ability to implement QOS schemes in the hardware but currently these drivers tend to rely on firmware to reroute specific traffic, a driver specific select_queue or the queue_mapping action in the qdisc. By using select_queue for this drivers need to be updated for each and every traffic type and we lose the goodness of much of the upstream work. Firmware solutions are inherently inflexible. And finally if admins are expected to build a qdisc and filter rules to steer traffic this requires knowledge of how the hardware is currently configured. The number of tx queues and the queue offsets may change depending on resources. Also this approach incurs all the overhead of a qdisc with filters. With the mechanism in this patch users can set skb priority using expected methods ie setsockopt() or the stack can set the priority directly. Then the skb will be steered to the correct tx queues aligned with hardware QOS traffic classes. In the normal case with single traffic class and all queues in this class everything works as is until the LLD enables multiple tcs. To steer the skb we mask out the lower 4 bits of the priority and allow the hardware to configure upto 15 distinct classes of traffic. This is expected to be sufficient for most applications at any rate it is more then the 8021Q spec designates and is equal to the number of prio bands currently implemented in the default qdisc. This in conjunction with a userspace application such as lldpad can be used to implement 8021Q transmission selection algorithms one of these algorithms being the extended transmission selection algorithm currently being used for DCB. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:10 -08:00
Vlad Dogaru	e7ed828f10	netlink: support setting devgroup parameters If a rtnetlink request specifies a negative or zero ifindex and has no interface name attribute, but has a group attribute, then the chenges are made to all the interfaces belonging to the specified group. Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:10 -08:00
Vlad Dogaru	cbda10fa97	net_device: add support for network device groups Net devices can now be grouped, enabling simpler manipulation from userspace. This patch adds a group field to the net_device structure, as well as rtnetlink support to query and modify it. Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:09 -08:00
Shan Wei	441c793a56	net: cleanup unused macros in net directory Clean up some unused macros in net/*. 1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE. 2. never be used since introduced to kernel. e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:20:04 -08:00
Linus Torvalds	1268afe676	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits) sctp: user perfect name for Delayed SACK Timer option net: fix can_checksum_protocol() arguments swap Revert "netlink: test for all flags of the NLM_F_DUMP composite" gianfar: Fix misleading indentation in startup_gfar() net/irda/sh_irda: return to RX mode when TX error net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan. USB CDC NCM: tx_fixup() race condition fix ns83820: Avoid bad pointer deref in ns83820_init_one(). ipv6: Silence privacy extensions initialization bnx2x: Update bnx2x version to 1.62.00-4 bnx2x: Fix AER setting for BCM57712 bnx2x: Fix BCM84823 LED behavior bnx2x: Mark full duplex on some external PHYs bnx2x: Fix BCM8073/BCM8727 microcode loading bnx2x: LED fix for BCM8727 over BCM57712 bnx2x: Common init will be executed only once after POR bnx2x: Swap BCM8073 PHY polarity if required iwlwifi: fix valid chain reading from EEPROM ath5k: fix locking in tx_complete_poll_work ath9k_hw: do PA offset calibration only on longcal interval ...	2011-01-19 20:25:45 -08:00
Shan Wei	4580ccc04d	sctp: user perfect name for Delayed SACK Timer option The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK, not SCTP_DELAYED_ACK. Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK, for making compatibility with existing applications. Reference: 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK) （http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25) Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 16:51:29 -08:00
Patrick McHardy	14f0290ba4	Merge branch 'master' of /repos/git/net-next-2.6	2011-01-19 23:51:37 +01:00
Eric Dumazet	d402786ea4	net: fix can_checksum_protocol() arguments swap commit `0363466866` (net offloading: Convert checksums to use centrally computed features.) mistakenly swapped can_checksum_protocol() arguments. This broke IPv6 on bnx2 for instance, on NIC without TCPv6 checksum offloads. Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 14:15:21 -08:00
David S. Miller	b8f3ab4290	Revert "netlink: test for all flags of the NLM_F_DUMP composite" This reverts commit `0ab03c2b14`. It breaks several things including the avahi daemon. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 13:34:20 -08:00
Patrick McHardy	f5c88f56b3	netfilter: nf_conntrack: fix lifetime display for disabled connections When no tstamp extension exists, ct_delta_time() returns -1, which is then assigned to an u64 and tested for negative values to decide whether to display the lifetime. This obviously doesn't work, use a s64 and merge the two minor functions into one. Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-19 19:10:49 +01:00
Jan Engelhardt	cc4fc02257	netfilter: xtables: connlimit revision 1 This adds destination address-based selection. The old "inverse" member is overloaded (memory-wise) with a new "flags" variable, similar to how J.Park did it with xt_string rev 1. Since revision 0 userspace only sets flag 0x1, no great changes are made to explicitly test for different revisions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2011-01-19 18:27:46 +01:00
Pablo Neira Ayuso	a992ca2a04	netfilter: nf_conntrack_tstamp: add flow-based timestamp extension This patch adds flow-based timestamping for conntracks. This conntrack extension is disabled by default. Basically, we use two 64-bits variables to store the creation timestamp once the conntrack has been confirmed and the other to store the deletion time. This extension is disabled by default, to enable it, you have to: echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp This patch allows to save memory for user-space flow-based loogers such as ulogd2. In short, ulogd2 does not need to keep a hashtable with the conntrack in user-space to know when they were created and destroyed, instead we use the kernel timestamp. If we want to have a sane IPFIX implementation in user-space, this nanosecs resolution timestamps are also useful. Other custom user-space applications can benefit from this via libnetfilter_conntrack. This patch modifies the /proc output to display the delta time in seconds since the flow start. You can also obtain the flow-start date by means of the conntrack-tools. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-19 16:00:07 +01:00
Eric Dumazet	80f8f1027b	net: filter: dont block softirqs in sk_run_filter() Packet filter (BPF) doesnt need to disable softirqs, being fully re-entrant and lock-less. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-18 21:33:05 -08:00
Alban Crequy	d6ae3bae3d	af_unix: implement socket filter Linux Socket Filters can already be successfully attached and detached on unix sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...). See: Documentation/networking/filter.txt But the filter was never used in the unix socket code so it did not work. This patch uses sk_filter() to filter buffers before delivery. This short program demonstrates the problem on SOCK_DGRAM. int main(void) { int i, j, ret; int sv[2]; struct pollfd fds[2]; char message = "Hello world!"; char buffer[64]; struct sock_filter ins[32] = {{0,},}; struct sock_fprog filter; socketpair(AF_UNIX, SOCK_DGRAM, 0, sv); for (i = 0 ; i < 2 ; i++) { fds[i].fd = sv[i]; fds[i].events = POLLIN; fds[i].revents = 0; } for(j = 1 ; j < 13 ; j++) { / Set a socket filter to truncate the message / memset(ins, 0, sizeof(ins)); ins[0].code = BPF_RET\|BPF_K; ins[0].k = j; filter.len = 1; filter.filter = ins; setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter)); / send a message / send(sv[0], message, strlen(message) + 1, 0); / The filter should let the message pass but truncated. / poll(fds, 2, 0); / Receive the truncated message*/ ret = recv(sv[1], buffer, 64, 0); printf("received %d bytes, expected %d\n", ret, j); } for (i = 0 ; i < 2 ; i++) close(sv[i]); return 0; } Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-18 21:33:05 -08:00
David S. Miller	a5db219f4c	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-01-18 16:28:31 -08:00
Jesse Gross	6ee400aafb	net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan. In netif_skb_features() we return only the features that are valid for vlans if we have a vlan packet. However, we should not mask out NETIF_F_HW_VLAN_TX since it enables transmission of vlan tags and is obviously valid. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-18 16:13:50 -08:00
Romain Francoise	2fdc1c8093	ipv6: Silence privacy extensions initialization When a network namespace is created (via CLONE_NEWNET), the loopback interface is automatically added to the new namespace, triggering a printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set. This is problematic for applications which use CLONE_NEWNET as part of a sandbox, like Chromium's suid sandbox or recent versions of vsftpd. On a busy machine, it can lead to thousands of useless "lo: Disabled Privacy Extensions" messages appearing in dmesg. It's easy enough to check the status of privacy extensions via the use_tempaddr sysctl, so just removing the printk seems like the most sensible solution. Signed-off-by: Romain Francoise <romain@orebokech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-18 16:13:49 -08:00
David S. Miller	f966a13f92	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2011-01-18 12:50:19 -08:00
Jiri Olsa	93557f53e1	netfilter: nf_conntrack: nf_conntrack snmp helper Adding support for SNMP broadcast connection tracking. The SNMP broadcast requests are now paired with the SNMP responses. Thus allowing using SNMP broadcasts with firewall enabled. Please refer to the following conversation: http://marc.info/?l=netfilter-devel&m=125992205006600&w=2 Patrick McHardy wrote: > > The best solution would be to add generic broadcast tracking, the > > use of expectations for this is a bit of abuse. > > The second best choice I guess would be to move the help() function > > to a shared module and generalize it so it can be used for both. This patch implements the "second best choice". Since the netbios-ns conntrack module uses the same helper functionality as the snmp, only one helper function is added for both snmp and netbios-ns modules into the new object - nf_conntrack_broadcast. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 18:12:24 +01:00
Eric Dumazet	94d117a1c7	netfilter: ipt_CLUSTERIP: remove "no conntrack!" When a packet is meant to be handled by another node of the cluster, silently drop it instead of flooding kernel log. Note : INVALID packets are also dropped without notice. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 16:27:56 +01:00
Patrick McHardy	a8fc0d9b34	Merge branch 'master' of git://dev.medozas.de/linux	2011-01-18 16:20:53 +01:00
Florian Westphal	94b27cc361	netfilter: allow NFQUEUE bypass if no listener is available If an skb is to be NF_QUEUE'd, but no program has opened the queue, the packet is dropped. This adds a v2 target revision of xt_NFQUEUE that allows packets to continue through the ruleset instead. Because the actual queueing happens outside of the target context, the 'bypass' flag has to be communicated back to the netfilter core. Unfortunately the only choice to do this without adding a new function argument is to use the target function return value (i.e. the verdict). In the NF_QUEUE case, the upper 16bit already contain the queue number to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e. we now have extra room for a new flag. If a hook issued a NF_QUEUE verdict, then the netfilter core will continue packet processing if the queueing hook returns -ESRCH (== "this queue does not exist") and the new NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value. Note: If the queue exists, but userspace does not consume packets fast enough, the skb will still be dropped. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 16:08:30 +01:00
Florian Westphal	f615df76ed	netfilter: reduce NF_VERDICT_MASK to 0xff NF_VERDICT_MASK is currently 0xffff. This is because the upper 16 bits are used to store errno (for NF_DROP) or the queue number (NF_QUEUE verdict). As there are up to 0xffff different queues available, there is no more room to store additional flags. At the moment there are only 6 different verdicts, i.e. we can reduce NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space. NF_VERDICT_BITS would then be reduced to 8, but because the value is exported to userspace, this might cause breakage; e.g.: e.g. 'queuenr = (1 << NF_VERDICT_BITS) \| NF_QUEUE' would now break. Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value to the 'userspace compat' section. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:52:14 +01:00
Florian Westphal	06cdb6349c	netfilter: nfnetlink_queue: do not free skb on error Move free responsibility from nf_queue to caller. This enables more flexible error handling; we can now accept the skb instead of freeing it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:28:38 +01:00
Florian Westphal	f158508618	netfilter: nfnetlink_queue: return error number to caller instead of returning -1 on error, return an error number to allow the caller to handle some errors differently. ECANCELED is used to indicate that the hook is going away and should be ignored. A followup patch will introduce more 'ignore this hook' conditions, (depending on queue settings) and will move kfree_skb responsibility to the caller. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:27:28 +01:00

1 2 3 4 5 ...

17846 Коммитов