Граф коммитов

617 Коммитов

Автор SHA1 Сообщение Дата
Harald Welte 6869c4d8e0 [NETFILTER]: reduce netfilter sk_buff enlargement
As discussed at netconf'05, we're trying to save every bit in sk_buff.
The patch below makes sk_buff 8 bytes smaller.  I did some basic
testing on my notebook and it seems to work.

The only real in-tree user of nfcache was IPVS, who only needs a
single bit.  Unfortunately I couldn't find some other free bit in
sk_buff to stuff that bit into, so I introduced a separate field for
them.  Maybe the IPVS guys can resolve that to further save space.

Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
alike are only 6 values defined), but unfortunately the bluetooth code
overloads pkt_type :(

The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just
came up with a way how to do it without any skb fields, so it's safe
to remove it.

- remove all never-implemented 'nfcache' code
- don't have ipvs code abuse 'nfcache' field. currently get's their own
  compile-conditional skb->ipvs_property field.  IPVS maintainers can
  decide to move this bit elswhere, but nfcache needs to die.
- remove skb->nfcache field to save 4 bytes
- move skb->nfctinfo into three unused bits to save further 4 bytes

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:31:04 -07:00
Harald Welte bf3a46aa9b [NETFILTER]: convert nfmark and conntrack mark to 32bit
As discussed at netconf'05, we convert nfmark and conntrack-mark to be
32bits even on 64bit architectures.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-29 15:29:31 -07:00
Patrick McHardy 06c7427021 [FIB_TRIE]: Don't ignore negative results from fib_semantic_match
When a semantic match occurs either success, not found or an error
(for matching unreachable routes/blackholes) is returned. fib_trie
ignores the errors and looks for a different matching route. Treat
results other than "no match" as success and end lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 22:06:09 -07:00
David S. Miller c1cc168442 [ROSE]: Fix typo in rose_route_frame() locking fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 14:55:32 -07:00
David S. Miller dc16aaf29d [ROSE]: Fix missing unlocks in rose_route_frame()
Noticed by Coverity checker.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:50:09 -07:00
David S. Miller d5d283751e [TCP]: Document non-trivial locking path in tcp_v{4,6}_get_port().
This trips up a lot of folks reading this code.
Put an unlikely() around the port-exhaustion test
for good measure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:49:54 -07:00
David S. Miller 89ebd197eb [TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail().
Intention of this bit is to force pushing of the existing
send queue when TCP_CORK or TCP_NODELAY state changes via
setsockopt().

But it's easy to create a situation where the bit never
clears.  For example, if the send queue starts empty:

1) set TCP_NODELAY
2) clear TCP_NODELAY
3) set TCP_CORK
4) do small write()

The current code will leave TCP_NAGLE_PUSH set after that
sequence.  Unconditionally clearing the bit when new data
is added via skb_entail() solves the problem.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:13:06 -07:00
Thomas Graf 0fbbeb1ba4 [PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()
qdisc_create_dflt() is missing to destroy the newly allocated
default qdisc if the initialization fails resulting in leaks
of all kinds. The only caller in mainline which may trigger
this bug is sch_tbf.c in tbf_create_dflt_qdisc().

Note: qdisc_create_dflt() doesn't fulfill the official locking
      requirements of qdisc_destroy() but since the qdisc could
      never be seen by the outside world this doesn't matter
      and it can stay as-is until the locking of pkt_sched
      is cleaned up.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:12:44 -07:00
Vlad Yasevich d2287f8441 [SCTP]: Add SENTINEL to SCTP MIB stats
Add SNMP_MIB_SENTINEL to the definition of the sctp_snmp_list so that
the output routine in proc correctly terminates.  This was causing some
problems running on ia64 systems.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:12:04 -07:00
Ralf Baechle 01d7dd0e9f [AX25]: UID fixes
o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer
   as the result.  Breaks ROSE completly and AX.25 if UID policy set to deny.

 o While the list structure of AX.25's UID to callsign mapping table was
   properly protected by a spinlock, it's elements were not refcounted
   resulting in a race between removal and usage of an element.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:11:45 -07:00
Ralf Baechle 53b924b31f [NET]: Fix socket bitop damage
The socket flag cleanups that went into 2.6.12-rc1 are basically oring
the flags of an old socket into the socket just being created.
Unfortunately that one was just initialized by sock_init_data(), so already
has SOCK_ZAPPED set.  As the result zapped sockets are created and all
incoming connection will fail due to this bug which again was carefully
replicated to at least AX.25, NET/ROM or ROSE.

In order to keep the abstraction alive I've introduced sock_copy_flags()
to copy the socket flags from one sockets to another and used that
instead of the bitwise copy thing.  Anyway, the idea here has probably
been to copy all flags, so sock_copy_flags() should be the right thing.
With this the ham radio protocols are usable again, so I hope this will
make it into 2.6.13.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:11:30 -07:00
Patrick McHardy 66a79a19a7 [NETFILTER]: Fix HW checksum handling in ip_queue/ip6_queue
The checksum needs to be filled in on output, after mangling a packet
ip_summed needs to be reset.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:10:35 -07:00
Dave Johnson 1344a41637 [IPV4]: Fix negative timer loop with lots of ipv4 peers.
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

Found this bug while doing some scaling testing that created 500K inet
peers.

peer_check_expire() in net/ipv4/inetpeer.c isn't using inet_peer_gc_mintime
correctly and will end up creating an expire timer with less than the
minimum duration, and even zero/negative if enough active peers are
present.

If >65K peers, the timer will be less than inet_peer_gc_mintime, and with
>70K peers, the timer duration will reach zero and go negative.

The timer handler will continue to schedule another zero/negative timer in
a loop until peers can be aged.  This can continue for at least a few
minutes or even longer if the peers remain active due to arriving packets
while the loop is occurring.

Bug is present in both 2.4 and 2.6.  Same patch will apply to both just
fine.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:10:15 -07:00
Herbert Xu c3a20692ca [RPC]: Kill bogus kmap in krb5
While I was going through the crypto users recently, I noticed this
bogus kmap in sunrpc.  It's totally unnecessary since the crypto
layer will do its own kmap before touching the data.  Besides, the
kmap is throwing the return value away.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:09:53 -07:00
Dmitry Yusupov 14869c3886 [TCP]: Do TSO deferral even if tail SKB can go out now.
If the tail SKB fits into the window, it is still
benefitical to defer until the goal percentage of
the window is available.  This give the application
time to feed more data into the send queue and thus
results in larger TSO frames going out.

Patch from Dmitry Yusupov <dima@neterion.com>.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-23 10:09:27 -07:00
Patrick McHardy 7e71af49d4 [NETFILTER]: Fix HW checksum handling in TCPMSS target
Most importantly, remove bogus BUG() in receive path.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20 17:40:41 -07:00
Patrick McHardy f93592ff4f [NETFILTER]: Fix HW checksum handling in ECN target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20 17:39:15 -07:00
Patrick McHardy fd841326d7 [NETFILTER]: Fix ECN target TCP marking
An incorrect check made it bail out before doing anything.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-20 17:38:40 -07:00
Herbert Xu 6fc8b9e7c6 [IPCOMP]: Fix false smp_processor_id warning
This patch fixes a false-positive from debug_smp_processor_id().

The processor ID is only used to look up crypto_tfm objects.
Any processor ID is acceptable here as long as it is one that is
iterated on by for_each_cpu().

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18 14:36:59 -07:00
Patrick McHardy cb94c62c25 [IPV4]: Fix DST leak in icmp_push_reply()
Based upon a bug report and initial patch by
Ollie Wild.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18 14:05:44 -07:00
Jay Vosburgh 001dd250c1 [TOKENRING]: Use interrupt-safe locking with rif_lock.
Change operations on rif_lock from spin_{un}lock_bh to
spin_{un}lock_irq{save,restore} equivalents.  Some of the
rif_lock critical sections are called from interrupt context via
tr_type_trans->tr_add_rif_info.  The TR NIC drivers call tr_type_trans
from their packet receive handlers.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-18 14:04:51 -07:00
Paul E. McKenney 1f07247de5 [DECNET]: Fix RCU race condition in dn_neigh_construct().
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:05:27 -07:00
Patrick McHardy bfd272b1ca [IPV6]: Fix SKB leak in ip6_input_finish()
Changing it to how ip_input handles should fix it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:04:22 -07:00
Herbert Xu 35d59efd10 [TCP]: Fix bug #5070: kernel BUG at net/ipv4/tcp_output.c:864
1) We send out a normal sized packet with TSO on to start off.
2) ICMP is received indicating a smaller MTU.
3) We send the current sk_send_head which needs to be fragmented
since it was created before the ICMP event.  The first fragment
is then sent out.

At this point the remaining fragment is allocated by tcp_fragment.
However, its size is padded to fit the L1 cache-line size therefore
creating tail-room up to 124 bytes long.

This fragment will also be sitting at sk_send_head.

4) tcp_sendmsg is called again and it stores data in the tail-room of
of the fragment.
5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment
since the packet as a whole exceeds the MTU.

At this point we have a packet that has data in the head area being
fed to tso_fragment which bombs out.

My take on this is that we shouldn't ever call tcp_fragment on a TSO
socket for a packet that is yet to be transmitted since this creates
a packet on sk_send_head that cannot be extended.

So here is a patch to change it so that tso_fragment is always used
in this case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:03:59 -07:00
Patrick McHardy 97077c4a98 [IPV6]: Fix raw socket hardware checksum failures
When packets hit raw sockets the csum update isn't done yet, do it manually.
Packets can also reach rawv6_rcv on the output path through
ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this
codepath isn't executed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-17 12:03:32 -07:00
Trond Myklebust 58fcb8df0b [PATCH] NFS: Ensure ACL xdr code doesn't overflow.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-16 08:52:11 -07:00
Matt Mackall d7b9dfc8ea [NETPOLL]: remove unused variable
Remove unused variable

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:28:05 -07:00
Matt Mackall 53fb95d3c1 [NETPOLL]: fix initialization/NAPI race
This fixes a race during initialization with the NAPI softirq
processing by using an RCU approach.

This race was discovered when refill_skbs() was added to
the setup code.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:27:43 -07:00
Ingo Molnar 2652076507 [NETPOLL]: pre-fill skb pool
we could do one thing (see the patch below): i think it would be useful 
to fill up the netlogging skb queue straight at initialization time.  
Especially if netpoll is used for dumping alone, the system might not be 
in a situation to fill up the queue at the point of crash, so better be 
a bit more prepared and keep the pipeline filled.

[ I've modified this to be called earlier - mpm ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:26:42 -07:00
Matt Mackall 0db1d6fc1e [NETPOLL]: add retry timeout
Add limited retry logic to netpoll_send_skb

Each time we attempt to send, decrement our per-device retry counter.
On every successful send, we reset the counter. 

We delay 50us between attempts with up to 20000 retries for a total of
1 second. After we've exhausted our retries, subsequent failed
attempts will try only once until reset by success.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:25:54 -07:00
Matt Mackall f0d3459d07 [NETPOLL]: netpoll_send_skb simplify
Minor netpoll_send_skb restructuring

Restructure to avoid confusing goto and move some bits out of the
retry loop.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:25:11 -07:00
Jeff Moyer a636e13579 [NETPOLL]: deadlock bugfix
This fixes an obvious deadlock in the netpoll code.  netpoll_rx takes the
npinfo->rx_lock.  netpoll_rx is also the only caller of arp_reply (through
__netpoll_rx).  As such, it is not necessary to take this lock.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:23:50 -07:00
Jeff Moyer 11513128bb [NETPOLL]: rx_flags bugfix
Initialize npinfo->rx_flags.  The way it stands now, this will have random
garbage, and so will incur a locking penalty even when an rx_hook isn't
registered and we are not active in the netpoll polling code.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11 19:23:04 -07:00
Herbert Xu b5da623ae9 [TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()
Well I've only found one potential cause for the assertion
failure in tcp_mark_head_lost.  First of all, this can only
occur if cnt > 1 since tp->packets_out is never zero here.
If it did hit zero we'd have much bigger problems.

So cnt is equal to fackets_out - reordering.  Normally
fackets_out is less than packets_out.  The only reason
I've found that might cause fackets_out to exceed packets_out
is if tcp_fragment is called from tcp_retransmit_skb with a
TSO skb and the current MSS is greater than the MSS stored
in the TSO skb.  This might occur as the result of an expiring
dst entry.

In that case, packets_out may decrease (line 1380-1381 in
tcp_output.c).  However, fackets_out is unchanged which means
that it may in fact exceed packets_out.

Previously tcp_retrans_try_collapse was the only place where
packets_out can go down and it takes care of this by decrementing
fackets_out.

So we should make sure that fackets_out is reduced by an appropriate
amount here as well.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10 18:32:36 -07:00
Steven Whitehouse 001ab02a8c [DECNET]: Use sk_stream_error function rather than DECnet's own
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10 11:32:57 -07:00
Andrew Morton d64d387372 [NET]: Fix memory leak in sys_{send,recv}msg() w/compat
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will
cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall!

This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a
new iovec structure.  Only the one from sys_sendmsg() is free'ed.

I wrote a simple test program to confirm this after identifying the
problem:

http://davej.org/programs/testsendmsg.c

Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as
it also calls verify_compat_iovec() but expects it to malloc internally.

[ I fixed that. -DaveM ]

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09 15:29:19 -07:00
David S. Miller 3501466941 [SUNRPC]: Fix nsec --> usec conversion.
We need to divide, not multiply.  While we're here,
use NSEC_PER_USEC instead of a magic constant.

Based upon a report from Josip Loncaric and a patch
by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09 14:57:12 -07:00
Linus Torvalds 92e52b2e82 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-08-08 16:06:01 -07:00
Heikki Orsila ca9334523c [IPV4]: Debug cleanup
Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux 
kernel 2.6.13-rc5. Also weird use of indentation is changed in some
places.

Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08 14:26:52 -07:00
Harald Welte 8b83bc77bf [PATCH] don't try to do any NAT on untracked connections
With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing
NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no
longer sufficient.

The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively
prevents iteration of the 'nat' table, but doesn't prevent nat_packet()
to be executed.  Since nr_manips is gone in 'rustynat', nat_packet() now
implicitly thinks that it has to do NAT on the packet.

This patch fixes that problem by explicitly checking for
ip_conntrack_untracked in ip_nat_fn().

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-08 11:48:28 -07:00
Herbert Xu 6fc0b4a7a7 [IPSEC]: Restrict socket policy loading to CAP_NET_ADMIN.
The interface needs much redesigning if we wish to allow
normal users to do this in some way.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-06 06:33:15 -07:00
Marcel Holtmann 576c7d858f [Bluetooth] Add direction and timestamp to stack internal events
This patch changes the direction to incoming and adds the timestamp
to all stack internal events.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:54 +02:00
Marcel Holtmann 66e8b6c31b [Bluetooth] Remove unused functions and cleanup symbol exports
This patch removes the unused bt_dump() function and it also removes
its BT_DMP macro. It also unexports the hci_dev_get(), hci_send_cmd()
and hci_si_event() functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:51 +02:00
Marcel Holtmann dcc365d8f2 [Bluetooth] Revert session reference counting fix
The fix for the reference counting problem of the signal DLC introduced
a race condition which leads to an oops. The reason for it is not fully
understood by now and so revert this fix, because the reference counting
problem is not crashing the RFCOMM layer and its appearance it rare.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06 12:36:42 +02:00
David S. Miller b7656e7f29 [IPV4]: Fix memory leak during fib_info hash expansion.
When we grow the tables, we forget to free the olds ones
up.

Noticed by Yan Zheng.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-05 04:12:48 -07:00
Herbert Xu b68e9f8572 [PATCH] tcp: fix TSO cwnd caching bug
tcp_write_xmit caches the cwnd value indirectly in cwnd_quota.  When
tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached
value becomes invalid.

This patch ensures that the cwnd value is always reread after each
tcp_transmit_skb call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 21:43:14 -07:00
David S. Miller 846998ae87 [PATCH] tcp: fix TSO sizing bugs
MSS changes can be lost since we preemptively initialize the tso_segs count
for an SKB before we %100 commit to sending it out.

So, by the time we send it out, the tso_size information can be stale due
to PMTU events.  This mucks up all of the logic in our send engine, and can
even result in the BUG() triggering in tcp_tso_should_defer().

Another problem we have is that we're storing the tp->mss_cache, not the
SACK block normalized MSS, as the tso_size.  That's wrong too.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04 21:43:14 -07:00
Denis Lunev f0098f7863 [NET] Fix too aggressive backoff in dst garbage collection
The bug is evident when it is seen once. dst gc timer was backed off,
when gc queue is not empty. But this means that timer quickly backs off,
if at least one destination remains in use. Normally, the bug is invisible,
because adding new dst entry to queue cancels the backoff. But it shots
deadly with destination cache overflow when new destinations are not released
for long time f.e. after an interface goes down.

The fix is to cancel backoff when something was released.

Signed-off-by: Denis Lunev <den@sw.ru>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:47:25 -07:00
Alexey Kuznetsov db44575f6f [NET]: fix oops after tunnel module unload
Tunnel modules used to obtain module refcount each time when
some tunnel was created, which meaned that tunnel could be unloaded
only after all the tunnels are deleted.

Since killing old MOD_*_USE_COUNT macros this protection has gone.
It is possible to return it back as module_get/put, but it looks
more natural and practically useful to force destruction of all
the child tunnels on module unload.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:46:44 -07:00
Harald Welte 1f494c0e04 [NETFILTER] Inherit masq_index to slave connections
masq_index is used for cleanup in case the interface address changes
(such as a dialup ppp link with dynamic addreses).  Without this patch,
slave connections are not evicted in such a case, since they don't inherit
masq_index.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:44:07 -07:00
Baruch Even d1b04c081e [NET]: Spelling mistakes threshoulds -> thresholds
Just simple spelling mistake fixes.

Signed-Off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30 17:41:59 -07:00
David S. Miller 6192b54b84 [NET]: Fix busy waiting in dev_close().
If the current task has signal_pending(), the loop we have
to wait for the __LINK_STATE_RX_SCHED bit to clear becomes
a pure busy-loop.

Fixed by using msleep() instead of the hand-crafted version.

Noticed by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-28 12:12:58 -07:00
Linus Torvalds 839c5d2511 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-07-27 16:37:59 -07:00
Jesper Juhl 77933d7276 [PATCH] clean up inline static vs static inline
`gcc -W' likes to complain if the static keyword is not at the beginning of
the declaration.  This patch fixes all remaining occurrences of "inline
static" up with "static inline" in the entire kernel tree (140 occurrences in
47 files).

While making this change I came across a few lines with trailing whitespace
that I also fixed up, I have also added or removed a blank line or two here
and there, but there are no functional changes in the patch.

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:20 -07:00
Olaf Hering 44456d37b5 [PATCH] turn many #if $undefined_string into #ifdef $undefined_string
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27 16:26:08 -07:00
Matt Mackall 5e43db7730 [NET]: Move in_aton from net/ipv4/utils.c to net/core/utils.c
Move in_aton to allow netpoll and pktgen to work without the rest of
the IPv4 stack. Fix whitespace and add comment for the odd placement.

Delete now-empty net/ipv4/utils.c

Re-enable netpoll/netconsole without CONFIG_INET

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 15:24:42 -07:00
Nick Sillik 7cee432a22 [NETFILTER]: Fix -Wunder error in ip_conntrack_core.c
Signed-off-by: Nick Sillik <n.sillik@temple.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 14:46:03 -07:00
Kyle Moffett a77be819f9 [NET]: Fix setsockopt locking bug
On Sparc, SO_DONTLINGER support resulted in sock_reset_flag being 
called without lock_sock().

Signed-off-by: Kyle Moffett <mrmacman_g4@mac.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 14:22:30 -07:00
Hans-Juergen Tappe (SYSGO AG) eaa1c5d059 [IPV4]: Fix Kconfig syntax error
From: "Hans-Juergen Tappe (SYSGO AG)" <hjt@sysgo.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27 13:00:04 -07:00
Herbert Xu a4f1bac625 [XFRM]: Fix possible overflow of sock->sk_policy
Spotted by, and original patch by, Balazs Scheidler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-26 15:43:17 -07:00
Patrick McHardy 7686ee1ad9 [EMATCH]: Remove feature ifdefs in meta ematch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:44:23 -07:00
Cal Peake 227510c7f1 [IPV6]: fix implicit declaration of function `xfrm6_tunnel_unregister'
Signed-off-by: Cal Peake <cp@absolutedigital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24 19:30:06 -07:00
David S. Miller 261688d01e [PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}
More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 14:43:52 -07:00
Patrick McHardy d3984a6b6a [NETFILTER]: Fix ip6t_LOG MAC format
I broke this in the patch that consolidated MAC logging.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:52:47 -07:00
Patrick McHardy 74bb421da7 [NETFILTER]: Use correct byteorder in ICMP NAT
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:51:38 -07:00
Patrick McHardy 21f930e4ab [NETFILTER]: Wait until all references to ip_conntrack_untracked are dropped on unload
Fixes a crash when unloading ip_conntrack.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:51:03 -07:00
Patrick McHardy d04b4f8c1c [NETFILTER]: Fix potential memory corruption in NAT code (aka memory NAT)
The portptr pointing to the port in the conntrack tuple is declared static,
which could result in memory corruption when two packets of the same
protocol are NATed at the same time and one conntrack goes away.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:50:29 -07:00
Patrick McHardy 4c1217deeb [NETFILTER]: Fix deadlock in ip6_queue
Already fixed in ip_queue, ip6_queue was missed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22 12:49:30 -07:00
David S. Miller 28e212fb36 [PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch.
It won't exist any longer when we shrink the SKB in 2.6.14,
and we should kill this off before anyone in userspace starts
using it.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22 11:47:25 -07:00
Rusty Russell 4acdbdbe50 [NETFILTER]: ip_conntrack_expect_related must not free expectation
If a connection tracking helper tells us to expect a connection, and
we're already expecting that connection, we simply free the one they
gave us and return success.

The problem is that NAT helpers (eg. FTP) have to allocate the
expectation first (to see what port is available) then rewrite the
packet.  If that rewrite fails, they try to remove the expectation,
but it was freed in ip_conntrack_expect_related.

This is one example of a larger problem: having registered the
expectation, the pointer is no longer ours to use.  Reference counting
is needed for ctnetlink anyway, so introduce it now.

To have a single "put" path, we need to grab the reference to the
connection on creation, rather than open-coding it in the caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-21 13:14:46 -07:00
David S. Miller b72f6eccb0 [NET]: Fix tc_verd thinko in skb_clone()
It was overwriting the computer n->tc_verd value over
and over with skb->tc_verd, by mistake.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:13:54 -07:00
Patrick McHardy 0303770deb [NET]: Make ipip/ip6_tunnel independant of XFRM
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:03:34 -07:00
Stephen Hemminger c877efb207 [IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:01:51 -07:00
Adrian Bunk eb3f8f5e22 [NET]: BRIDGE_EBT_ARPREPLY must depend on INET
BRIDGE_EBT_ARPREPLY=y and INET=n results in the following compile error:

net/built-in.o: In function `ebt_target_reply':
ebt_arpreply.c:(.text+0x68fb9): undefined reference to `arp_send'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 14:00:13 -07:00
Patrick McHardy abaacad9bc [IPV4]: Don't select XFRM for ip_gre
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:59:17 -07:00
Patrick McHardy 6aef4fdfea [NET]: Only build flow.o if CONFIG_XFRM=y
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:58:40 -07:00
Jesper Juhl 88e9fa8a54 [ATM]: Trivial spelling fix patch for net/Kconfig
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:56:53 -07:00
Chas Williams 322361b371 [ATM]: allow bind() on point-to-multpoint svcs (from Martin Whitaker <martin_whitaker@ntlworld.com>)
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19 13:54:44 -07:00
David S. Miller 3f1c81ff10 [EMATCH]: Kill TCF_META_ID_TCCLASSID reference from meta ematch as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 17:10:55 -07:00
Adrian Bunk 6876f95f20 [IPV4]: fix IP_FIB_HASH kconfig warning
This patch fixes the following kconfig warning:
  net/ipv4/Kconfig:92:warning: defaults for choice values not supported

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:55:19 -07:00
Randy Dunlap 54208991e1 [NET]: Kconfig: NETCONSOLE and NETPOLL together
Put NETCONSOLE and NETPOLL options together since they are related.
This cuts down on the hassle of flipping back and forth between
the Networking menu and the Network drivers menu to change their
config settings.

Tested with menuconfig, gconfig, and xconfig.
gconfig has a small problem with this.  I think that it's
a bug in gconfig and I will take it up with Romain Lievin.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:45:12 -07:00
Sridhar Samudrala d1ad1ff299 [SCTP]: Fix potential null pointer dereference while handling an icmp error
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:44:10 -07:00
Christophe Lucas ee71a29eb5 [SCTP]: Audit return code of create_proc_*
From: Christophe Lucas <clucas@rotomalug.org>

Audit return of create_proc_* functions.

Signed-off-by: Christophe Lucas <clucas@rotomalug.org>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:38:07 -07:00
Victor Fusco 37da647d99 [NETLINK]: Fix "nocast type" warnings
From: Victor Fusco <victor@cetuc.puc-rio.br>

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:35:43 -07:00
Thomas Graf 452f299da3 [PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeue
The current call to __qdisc_dequeue_head leads to a branch
misprediction for every loop iteration, the fact that the
most common priority is 2 makes this even worse.  This issue
has been brought up by Eric Dumazet <dada1@cosmosbay.com>
but unlike his solution which was to manually unroll the loop,
this approach preserves the possibility to increase the number
of bands at compile time. 

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:30:53 -07:00
Thomas Graf d7c7ed4dbc [PKT_SCHED]: Remove debugging leftover from textsearch ematch
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18 13:29:49 -07:00
Tommy Christensen f4637b55ba [VLAN]: Fix early vlan adding leads to not functional device
OK, I can see what's happening here. eth0 doesn't detect link-up until
after a few seconds, so when the vlan interface is opened immediately
after eth0 has been opened, it inherits the link-down state. Subsequently
the vlan interface is never properly activated and are thus unable to
transmit any packets.

dev->state bits are not supposed to be manipulated directly. Something
similar is probably needed for the netif_device_present() bit, although
I don't know how this is meant to work for a virtual device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 12:13:49 -07:00
Alexey Dobriyan ab611487d8 [NET]: __be'ify *_type_trans()
tr_type_trans(), hippi_type_trans() left as-is.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 12:08:43 -07:00
Phil Oester 84531c24f2 [NETFILTER]: Revert nf_reset change
Revert the nf_reset change that caused so much trouble, drop conntrack
references manually before packets are queued to packet sockets.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12 11:57:52 -07:00
Sam Ravnborg 6a2e9b738c [NET]: move config options out to individual protocols
Move the protocol specific config options out to the specific protocols.
With this change net/Kconfig now starts to become readable and serve as a
good basis for further re-structuring.

The menu structure is left almost intact, except that indention is
fixed in most cases. Most visible are the INET changes where several
"depends on INET" are replaced with a single ifdef INET / endif pair.

Several new files were created to accomplish this change - they are
small but serve the purpose that config options are now distributed
out where they belongs.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:13:56 -07:00
Sam Ravnborg d5950b4355 [NET]: add a top-level Networking menu to *config
Create a new top-level menu named "Networking" thus moving
net related options and protocol selection way from the drivers
menu and up on the top-level where they belong.

To implement this all architectures has to source "net/Kconfig" before
drivers/*/Kconfig in their Kconfig file. This change has been
implemented for all architectures.

Device drivers for ordinary NIC's are still to be found
in the Device Drivers section, but Bluetooth, IrDA and ax25
are located with their corresponding menu entries under the new
networking menu item.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:03:49 -07:00
Olaf Kirch 0b7f22aab4 [IPV4]: Prevent oops when printing martian source
In some cases, we may be generating packets with a source address that
qualifies as martian. This can happen when we're in the middle of setting
up the network, and netfilter decides to reject a packet with an RST.
The IPv4 routing code would try to print a warning and oops, because
locally generated packets do not have a valid skb->mac.raw pointer
at this point.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 21:01:42 -07:00
Julian Anastasov af9debd461 [IPVS]: Add and reorder bh locks after moving to keventd.
An addition to the last ipvs changes that move
update_defense_level/si_meminfo to keventd:

- ip_vs_random_dropentry now runs in process context and should use _bh
  locks to protect from softirqs

- update_defense_level still needs _bh locks after si_meminfo is called,
  for the same purpose

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:59:57 -07:00
Jesper Juhl f5b8adb4f5 [NET]: Trivial spelling fix patch for net/Kconfig
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:59:03 -07:00
Alexey Dobriyan 3182cd84f0 [SCTP]: __nocast annotations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11 20:57:47 -07:00
David S. Miller 79af02c253 [SCTP]: Use struct list_head for chunk lists, not sk_buff_head.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 21:47:49 -07:00
David S. Miller 9c05989bb2 [IPV6]: Fix warning in ip6_mc_msfilter.
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 21:44:39 -07:00
David L Stevens 84b42baef7 [IPV4]: fix IPv4 leave-group group matching
This patch fixes the multicast group matching for 
IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior
patch. Groups are identifiedby <group address,interface> and including
the interface address in the match will fail if a leave-group is done
by address when the join was done by index, or if different addresses
on the same interface are used in the join and leave.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:48:38 -07:00
David L Stevens 9951f036fe [IPV4]: (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & errno fix
1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state 
   multicast source filter APIs (IPv4 and IPv6)

2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be
   EADDRNOTAVAIL)

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:47:28 -07:00
David L Stevens 917f2f105e [IPV4]: multicast API "join" issues
1) In the full-state API when imsf_numsrc == 0
   errno should be "0", but returns EADDRNOTAVAIL

2) An illegal filter mode change
   errno should be EINVAL, but returns EADDRNOTAVAIL

3) Trying to do an any-source option without IP_ADD_MEMBERSHIP
   errno should be EINVAL, but returns EADDRNOTAVAIL

4) Adds comments for the less obvious error return values

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:45:16 -07:00
David L Stevens 8cdaaa15da [IPV4]: multicast API "join" issues
1) Changes IP_ADD_SOURCE_MEMBERSHIP and MCAST_JOIN_SOURCE_GROUP to ignore
   EADDRINUSE errors on a "courtesy join" -- prior membership or not
   is ok for these.

2) Adds "leave group" equivalence of (INCLUDE, empty) filters in the 
   delta-based API. Without this, mixing delta-based API calls that
   end in an (INCLUDE, empty) filter would not allow a subsequent
   regular IP_ADD_MEMBERSHIP. It also frees socket buffer memory that
   isn't needed for both the multicast group record and source filter.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:39:23 -07:00
David L Stevens ca9b907d14 [IPV4]: multicast API "join" issues
This patch corrects a few problems with the IP_ADD_MEMBERSHIP
socket option:

1) The existing code makes an attempt at reference counting joins when
   using the ip_mreqn/imr_ifindex interface. Joining the same group
   on the same socket is an error, whatever the API. This leads to
   unexpected results when mixing ip_mreqn by index with ip_mreqn by
   address, ip_mreq, or other API's. For example, ip_mreq followed by
   ip_mreqn of the same group will "work" while the same two reversed
   will not.
           Fixed to always return EADDRINUSE on a duplicate join and
   removed the (now unused) reference count in ip_mc_socklist.

2) The group-search list in ip_mc_join_group() is comparing a full 
   ip_mreqn structure and all of it must match for it to find the
   group. This doesn't correctly match a group that was joined with
   ip_mreq or ip_mreqn with an address (with or without an index). It
   also doesn't match groups that are joined by different addresses on
   the same interface. All of these are the same multicast group,
   which is identified by group address and interface index.
           Fixed the check to correctly match groups so we don't get
   duplicate group entries on the ip_mc_socklist.

3) The old code allocates a multicast address before searching for
   duplicates requiring it to free in various error cases. This
   patch moves the allocate until after the search and
   igmp_max_memberships check, so never a need to allocate, then free
   an entry.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:38:07 -07:00
Alexey Kuznetsov 4c866aa798 [IPV4]: Apply sysctl_icmp_echo_ignore_broadcasts to ICMP_TIMESTAMP as well.
This was the full intention of the original code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 17:34:46 -07:00
Victor Fusco 86a76caf87 [NET]: Fix sparse warnings
From: Victor Fusco <victor@cetuc.puc-rio.br>

Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:47 -07:00
David S. Miller b03efcfb21 [NET]: Transform skb_queue_len() binary tests into skb_queue_empty()
This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08 14:57:23 -07:00
KAMBAROV, ZAUR 7e8d7e3c9e [PATCH] coverity: sunrpc/xprt task null check
In __xprt_lock_write() we check to see if `task' is NULL, but in other places
we just go and dereference it.

`task' shouldn't be NULL anyway, so remove this test.

This defect was found automatically by Coverity Prevent, a static analysis
tool.

Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07 18:23:47 -07:00
David S. Miller 908a75c17a [TCP]: Never TSO defer under periods of congestion.
Congestion window recover after loss depends upon the fact
that if we have a full MSS sized frame at the head of the
send queue, we will send it.  TSO deferral can defeat the
ACK clocking necessary to exit cleanly from recovery.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:43:58 -07:00
Thomas Graf 63d886c96b [PKT_SCHED]: Blackhole queueing discipline
Useful in combination with classful qdiscs to drop or
temporary disable certain flows, e.g. one could block
specific ds flows with dsmark.

Unlike the noop qdisc it can be controlled by the user and
statistic accounting is done.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:29:16 -07:00
David S. Miller c1b4a7e695 [TCP]: Move to new TSO segmenting scheme.
Make TSO segment transmit size decisions at send time not earlier.

The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.

This is guided by tp->xmit_size_goal.  It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.

Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant.  These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.

A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side.  Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:24:38 -07:00
David S. Miller 0d9901df62 [TCP]: Break out send buffer expansion test.
This makes it easier to understand, and allows easier
tweaking of the heuristic later on.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:21:10 -07:00
David S. Miller cb83199a29 [TCP]: Do not call tcp_tso_acked() if no work to do.
In tcp_clean_rtx_queue(), if the TSO packet is not even partially
acked, do not waste time calling tcp_tso_acked().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:55 -07:00
David S. Miller a56476962e [TCP]: Kill bogus comment above tcp_tso_acked().
Everything stated there is out of data.  tcp_trim_skb()
does adjust the available socket send buffer space and
skb->truesize now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:41 -07:00
David S. Miller b4e26f5ea0 [TCP]: Fix send-side cpu utiliziation regression.
Only put user data purely to pages when doing TSO.

The extra page allocations cause two problems:

1) Add the overhead of the page allocations themselves.
2) Make us do small user copies when we get to the end
   of the TCP socket cache page.

It is still beneficial to purely use pages for TSO,
so we will do it for that case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:27 -07:00
David S. Miller aa93466bdf [TCP]: Eliminate redundant computations in tcp_write_xmit().
tcp_snd_test() is run for every packet output by a single
call to tcp_write_xmit(), but this is not necessary.

For one, the congestion window space needs to only be
calculated one time, then used throughout the duration
of the loop.

This cleanup also makes experimenting with different TSO
packetization schemes much easier.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:20:09 -07:00
David S. Miller 7f4dd0a943 [TCP]: Break out tcp_snd_test() into it's constituent parts.
tcp_snd_test() does several different things, use inline
functions to express this more clearly.

1) It initializes the TSO count of SKB, if necessary.
2) It performs the Nagle test.
3) It makes sure the congestion window is adhered to.
4) It makes sure SKB fits into the send window.

This cleanup also sets things up so that things like the
available packets in the congestion window does not need
to be calculated multiple times by packet sending loops
such as tcp_write_xmit().
    
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:54 -07:00
David S. Miller 55c97f3e99 [TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.
'nonagle' should be passed to the tcp_snd_test() function
as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the
tail of the write_queue.  This is because Nagle does not
apply to such frames since we cannot possibly tack more
data onto them.

However, while doing this __tcp_push_pending_frames() makes
all of the packets in the write_queue use this modified
'nonagle' value.

Fix the bug and simplify this function by just calling
tcp_write_xmit() directly if sk_send_head is non-NULL.

As a result, we can now make tcp_data_snd_check() just call
tcp_push_pending_frames() instead of the specialized
__tcp_data_snd_check().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:38 -07:00
David S. Miller a2e2a59c93 [TCP]: Fix redundant calculations of tcp_current_mss()
tcp_write_xmit() uses tcp_current_mss(), but some of it's callers,
namely __tcp_push_pending_frames(), already has this value available
already.

While we're here, fix the "cur_mss" argument to be "unsigned int"
instead of plain "unsigned".

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:23 -07:00
David S. Miller 92df7b518d [TCP]: tcp_write_xmit() tabbing cleanup
Put the main basic block of work at the top-level of
tabbing, and mark the TCP_CLOSE test with unlikely().

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:19:06 -07:00
David S. Miller a762a98007 [TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().
The tcp_cwnd_validate() function should only be invoked
if we actually send some frames, yet __tcp_push_pending_frames()
will always invoke it.  tcp_write_xmit() does the call for us,
so the call here can simply be removed.

Also, tcp_write_xmit() can be marked static.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:51 -07:00
David S. Miller f44b527177 [TCP]: Add missing skb_header_release() call to tcp_fragment().
When we add any new packet to the TCP socket write queue,
we must call skb_header_release() on it in order for the
TSO sharing checks in the drivers to work.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:34 -07:00
David S. Miller 84d3e7b957 [TCP]: Move __tcp_data_snd_check into tcp_output.c
It reimplements portions of tcp_snd_check(), so it
we move it to tcp_output.c we can consolidate it's
logic much easier in a later change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:18 -07:00
David S. Miller f6302d1d78 [TCP]: Move send test logic out of net/tcp.h
This just moves the code into tcp_output.c, no code logic changes are
made by this patch.

Using this as a baseline, we can begin to untangle the mess of
comparisons for the Nagle test et al.  We will also be able to reduce
all of the redundant computation that occurs when outputting data
packets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:18:03 -07:00
David S. Miller fc6415bcb0 [TCP]: Fix quick-ack decrementing with TSO.
On each packet output, we call tcp_dec_quickack_mode()
if the ACK flag is set.  It drops tp->ack.quick until
it hits zero, at which time we deflate the ATO value.

When doing TSO, we are emitting multiple packets with
ACK set, so we should decrement tp->ack.quick that many
segments.

Note that, unlike this case, tcp_enter_cwr() should not
take the tcp_skb_pcount(skb) into consideration.  That
function, one time, readjusts tp->snd_cwnd and moves
into TCP_CA_CWR state.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:17:45 -07:00
David S. Miller c65f7f00c5 [TCP]: Simplify SKB data portion allocation with NETIF_F_SG.
The ideal and most optimal layout for an SKB when doing
scatter-gather is to put all the headers at skb->data, and
all the user data in the page array.

This makes SKB splitting and combining extremely simple,
especially before a packet goes onto the wire the first
time.

So, when sk_stream_alloc_pskb() is given a zero size, make
sure there is no skb_tailroom().  This is achieved by applying
SKB_DATA_ALIGN() to the header length used here.

Next, make select_size() in TCP output segmentation use a
length of zero when NETIF_F_SG is true on the outgoing
interface.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:17:25 -07:00
David Chau 52609c0b56 [NET]: improve readability of dev_set_promiscuity() in net/core/dev.c
A trivial patch to improve the readability of dev_set_promiscuity()
in net/core/dev.c. New code does exactly the same thing as original
code.

Signed-off-by: David Chau <ddcc@mit.edu>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:11:06 -07:00
Robert Olsson 2f36895aa7 [IPV4]: More broken memory allocation fixes for fib_trie
Below a patch to preallocate memory when doing resize of trie (inflate halve)
If preallocations fails it just skips the resize of this tnode for this time.

The oops we got when killing bgpd (with full routing) is now gone. 
Patrick memory patch is also used.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:02:40 -07:00
Thomas Graf db1322b801 [DECNET]: Fix memset overflow on 64bit archs while dumping decnet routing rules
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:01:25 -07:00
Eric Dumazet bb1d23b026 [IPV4]: Bug fix in rt_check_expire()
- rt_check_expire() fixes (an overflow occured if size of the hash
  was >= 65536)

reminder of the bugfix:

The rt_check_expire() has a serious problem on machines with large
route caches, and a standard HZ value of 1000.

With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ;

the loop count :

     for (t = ip_rt_gc_interval << rt_hash_log; t >= 0;


overflows (t is a 31 bit value) as soon rt_hash_log is >= 16  (65536
slots in route cache hash table).

In this case, rt_check_expire() does nothing at all

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 15:00:32 -07:00
Eric Dumazet 424c4b70cc [IPV4]: Use the fancy alloc_large_system_hash() function for route hash table
- rt hash table allocated using alloc_large_system_hash() function.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:58:19 -07:00
Eric Dumazet 22c047ccbc [NET]: Hashed spinlocks in net/ipv4/route.c
- Locking abstraction
- Spinlocks moved out of rt hash table : Less memory (50%) used by rt 
  hash table. it's a win even on UP.
- Sizing of spinlocks table depends on NR_CPUS

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:55:24 -07:00
Patrick McHardy f0e36f8cee [IPV4]: Handle large allocations in fib_trie
Inflating a node a couple of times makes it exceed the 128k kmalloc limit.
Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:44:55 -07:00
Herbert Xu e2ed4052aa [IPV6]: Makes IPv6 rcv registration happen last during initialisation.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:41:20 -07:00
Herbert Xu 30e224d76f [IPV4]: Fix crash in ip_rcv while booting related to netconsole
Makes IPv4 ip_rcv registration happen last in af_inet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:40:10 -07:00
Thomas Graf 023e09a767 [PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation
Current behaviour is to not report an error if a rate
estimator is created together with a qdisc and the
configuration of the rate estimator is bogus. This leads
to unexpected behaviour because the user is not notified.

New behaviour is to report the error and let the whole
qdisc creation operation fail so the user is able to fix
his mistake.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:15:53 -07:00
Thomas Graf 3d54b82fdf [PKT_SCHED]: Cleanup qdisc creation and alignment macros
Adds qdisc_alloc() to share code between qdisc_create()
and qdisc_create_dflt(). Hides the qdisc alignment behind
macros and makes use of them.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:15:09 -07:00
Thomas Graf e176fe8954 [NET]: Remove unused security member in sk_buff
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:12:44 -07:00
Patrick McHardy 3154e540e3 [NET]: net/core/filter.c: make len cover the entire packet
As suggested by Herbert Xu:

Since we don't require anything to be in the linear packet range
anymore make len cover the entire packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:10:40 -07:00
Patrick McHardy 0b05b2a49e [NET]: Consolidate common code in net/core/filter.c
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:10:21 -07:00
Patrick McHardy 6935d46c2d [NET]: Remove redundant code in net/core/filter.c
skb_header_pointer handles linear and non-linear data, no need to handle
linear data again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05 14:08:57 -07:00
Patrick McHardy 9666dae510 [NETFILTER]: Fix connection tracking bug in 2.6.12
In 2.6.12 we started dropping the conntrack reference when a packet
leaves the IP layer. This broke connection tracking on a bridge,
because bridge-netfilter defers calling some NF_IP_* hooks to the bridge
layer for locally generated packets going out a bridge, where the
conntrack reference is no longer available. This patch keeps the
reference in this case as a temporary solution, long term we will
remove the defered hook calling. No attempt is made to drop the
reference in the bridge-code when it is no longer needed, tc actions
could already have sent the packet anywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 16:04:44 -07:00
Denis Vlasenko ff593c592a [NET]: Micro optimization in eth_header()
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:49:06 -07:00
YOSHIFUJI Hideaki 7fe40f73d7 [IPV6]: remove more unused IPV6_AUTHHDR things.
Remove two more unused IPV6_AUTHHDR option things, 
which I failed to remove them last time,
plus, mark IPV6_AUTHHDR obsolete.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:46:24 -07:00
Neil Horman fb3d89498d [IPVS]: Close race conditions on ip_vs_conn_tab list modification
In an smp system, it is possible for an connection timer to expire, calling
ip_vs_conn_expire while the connection table is being flushed, before
ct_write_lock_bh is acquired.

Since the list iterator loop in ip_vs_con_flush releases and re-acquires the
spinlock (even though it doesn't re-enable softirqs), it is possible for the
expiration function to modify the connection list, while it is being traversed
in ip_vs_conn_flush.

The result is that the next pointer gets set to NULL, and subsequently
dereferenced, resulting in an oops.

Signed-off-by: Neil Horman <nhorman@redhat.com>
Acked-by: JulianAnastasov
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:40:02 -07:00
Robert Olsson f835e471b5 [IPV4]: Broken memory allocation in fib_trie
This should help up the insertion... but the resize is more crucial.
and complex and needs some thinking. 

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 15:00:39 -07:00
Vlad Yasevich 2f85a42964 [SCTP] Make init & delayed sack timeouts configurable by user.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:24:23 -07:00
Maxime Bizon 7a1af5d7bb [IPV4]: ipconfig.c: fix dhcp timeout behaviour
I think there is a small bug in ipconfig.c in case IPCONFIG_DHCP is set
and dhcp is used.

When a DHCPOFFER is received, ip address is kept until we get DHCPACK.
If no ack is received, ic_dynamic() returns negatively, but leaves the
offered ip address in ic_myaddr.

This makes the main loop in ip_auto_config() break and uses the maybe
incomplete configuration.

Not sure if it's the best way to do, but the following trivial patch
correct this. 

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:21:12 -07:00
Dietmar Eggemann 2c2910a401 [IPV4]: Snmpv2 Mib IP counter ipInAddrErrors support
I followed Thomas' proposal to see every martian destination as a case
where the ipInAddrErrors counter has to be incremented. There are
two advantages by doing so: (1) The relation between the ipInReceive
counter and all the other ipInXXX counters is more accurate in the
case the RTN_UNICAST code check fails and (2) it makes the code in
ip_route_input_slow easier.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:06:23 -07:00
YOSHIFUJI Hideaki ae9cda5d65 [IPV6]: Don't dump temporary addresses twice
Each IPv6 Temporary Address (w/ CONFIG_IPV6_PRIVACY) is dumped twice
to netlink.

Because temporary addresses are listed in idev->addr_list,
there's no need to dump idev->tempaddr separately.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 13:00:30 -07:00
Patrick McHardy 8a47077a0b [NETLINK]: Missing padding fields in dumped structures
Plug holes with padding fields and initialized them to zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:56:45 -07:00
Patrick McHardy 9ef1d4c7c7 [NETLINK]: Missing initializations in dumped data
Mostly missing initialization of padding fields of 1 or 2 bytes length,
two instances of uninitialized nlmsgerr->msg of 16 bytes length.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:55:30 -07:00
Patrick McHardy b3563c4fbf [NETLINK]: Clear padding in netlink messages
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:54:43 -07:00
Harald Welte 4095ebf1e6 [NETFILTER]: ipt_CLUSTERIP: fix ARP mangling
This patch adds mangling of ARP requests (in addition to replies),
since ARP caches are made from snooping both requests and replies.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:49:30 -07:00
David S. Miller 85c1937b26 [EBTABLES]: Fix thinkos in ebt_log.c
When converting over the skb_header_pointer(), I converted parts of
this module incorrectly.  Kill the 'u' union in ebt_log() and all the
bogus references to it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28 12:39:40 -07:00
pageexec 4da62fc70d [IPVS]: Fix for overflows
From: <pageexec@freemail.hu>

$subject was fixed in 2.4 already, 2.6 needs it as well.

The impact of the bugs is a kernel stack overflow and privilege escalation
from CAP_NET_ADMIN via the IP_VS_SO_SET_STARTDAEMON/IP_VS_SO_GET_DAEMON
ioctls.  People running with 'root=all caps' (i.e., most users) are not
really affected (there's nothing to escalate), but SELinux and similar
users should take it seriously if they grant CAP_NET_ADMIN to other users.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 16:00:19 -07:00
David S. Miller d470e3b483 [NETLINK]: Fix two socket hashing bugs.
1) netlink_release() should only decrement the hash entry
   count if the socket was actually hashed.

   This was causing hash->entries to underflow, which
   resulting in all kinds of troubles.

   On 64-bit systems, this would cause the following
   conditional to erroneously trigger:

	err = -ENOMEM;
	if (BITS_PER_LONG > 32 && unlikely(hash->entries >= UINT_MAX))
		goto err;

2) netlink_autobind() needs to propagate the error return from
   netlink_insert().  Otherwise, callers will not see the error
   as they should and thus try to operate on a socket with a zero pid,
   which is very bad.

   However, it should not propagate -EBUSY.  If two threads race
   to autobind the socket, that is fine.  This is consistent with the
   autobind behavior in other protocols.

   So bug #1 above, combined with this one, resulted in hangs
   on netlink_sendmsg() calls to the rtnetlink socket.  We'd try
   to do the user sendmsg() with the socket's pid set to zero,
   later we do a socket lookup using that pid (via the value we
   stashed away in NETLINK_CB(skb).pid), but that won't give us the
   user socket, it will give us the rtnetlink socket.  So when we
   try to wake up the receive queue, we dive back into rtnetlink_rcv()
   which tries to recursively take the rtnetlink semaphore.

Thanks to Jakub Jelink for providing backtraces.  Also, thanks to
Herbert Xu for supplying debugging patches to help track this down,
and also finding a mistake in an earlier version of this fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 15:31:51 -07:00
Robert Olsson 64053beeb5 [PKTGEN]: Fix random packet sizes causing panic
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 15:27:10 -07:00
Adrian Bunk 60fe740320 [TCP]: Let TCP_CONG_ADVANCED default to n
It doesn't seem to make much sense to let an "If unsure, say N." option 
default to y.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 15:21:15 -07:00
David S. Miller 6c3607676c [IPV4]: Fix thinko in TCP_CONG_BIC default.
Since it is tristate when we offer it as a choice, we should
definte it also as tristate when forcing it as the default.
Otherwise kconfig warns.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26 15:20:20 -07:00
Linus Torvalds 2031d0f586 Merge Christoph's freeze cleanup patch 2005-06-25 17:16:53 -07:00
Christoph Lameter 3e1d1d28d9 [PATCH] Cleanup patch for process freezing
1. Establish a simple API for process freezing defined in linux/include/sched.h:

   frozen(process)		Check for frozen process
   freezing(process)		Check if a process is being frozen
   freeze(process)		Tell a process to freeze (go to refrigerator)
   thaw_process(process)	Restart process
   frozen_process(process)	Process is frozen now

2. Remove all references to PF_FREEZE and PF_FROZEN from all
   kernel sources except sched.h

3. Fix numerous locations where try_to_freeze is manually done by a driver

4. Remove the argument that is no longer necessary from two function calls.

5. Some whitespace cleanup

6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
   cleared before setting PF_FROZEN, recalc_sigpending does not check
   PF_FROZEN).

This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!

Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25 17:10:13 -07:00
David S. Miller c54d7e03c3 [SUNRPC]: Fix {s,}size_t printf format strings in xprt.c
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 19:57:07 -07:00
David S. Miller a6484045fd [TCP]: Do not present confusing congestion control options by default.
Create TCP_CONG_ADVANCED option, akin to IP_ADVANCED_ROUTER, which
when disabled will bypass all of the congestion control Kconfig
options and leave the user with a safe default.

That safe default is currently BIC-TCP with new Reno as a fallback.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 18:07:51 -07:00
David S. Miller bb298ca3ce [IPV4]: Move FIB lookup algorithm choice under IP_ADVANCED_ROUTING
Most users need not be concerned with a complex choice of what
FIB lookup algorithm to use.  So give them the safe default of
IP_FIB_HASH if IP_ADVANCED_ROUTING is disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 17:50:53 -07:00
David S. Miller f7704347a7 [PKT_SCHED]: Make TEXTSEARCH* options only selected.
Do not present these confusing new options to the user
unless he picked some facility that makes use of it,
such as NET_EMATCH_TEXT.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24 17:39:03 -07:00
Linus Torvalds 59a49e3871 Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2005-06-24 00:31:46 -07:00
Adrian Bunk 52c1da3953 [PATCH] make various thing static
Another rollup of patches which give various symbols static scope

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 00:06:43 -07:00
NeilBrown 5ba266d632 [PATCH] knfsd: nfsd4: fix probe_callback
rpc_create_client was modified recently to do its own (synchronous) NULL ping
of the server.  We'd rather do that on our own, asynchronously, so that we
don't have to block the nfsd thread doing the probe, and so that setclientid
handling (hence, client mounts) can proceed normally whether the callback is
succesful or not.  (We can still function fine without the callback
channel--we just won't be able to give out delegations till it's verified to
work.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 00:06:30 -07:00
David S. Miller f2d368fa3e [PKT_SCHED]: Make NET_EMATCH_TEXT select TEXTSEARCh
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 23:55:41 -07:00
Thomas Graf d675c989ed [PKT_SCHED]: Packet classification based on textsearch (ematch)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 21:00:58 -07:00
Thomas Graf 3fc7e8a6d8 [NET]: skb_find_text() - Find a text pattern in skb data
Finds a pattern in the skb data according to the specified
textsearch configuration. Use textsearch_next() to retrieve
subsequent occurrences of the pattern. Returns the offset
to the first occurrence or UINT_MAX if no match was found.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 21:00:17 -07:00
Thomas Graf 677e90eda3 [NET]: Zerocopy sequential reading of skb data
Implements sequential reading for both linear and non-linear
skb data at zerocopy cost. The data is returned in chunks of
arbitary length, therefore random access is not possible.

Usage:
	from	 := 0
	to	 := 128
	state	 := undef
	data	 := undef
	len	 := undef
	consumed := 0

	skb_prepare_seq_read(skb, from, to, &state)
	while (len = skb_seq_read(consumed, &data, &state)) != 0 do
		/* do something with 'data' of length 'len' */
		if abort then
			/* abort read if we don't wait for
			 * skb_seq_read() to return 0 */
			skb_abort_seq_read(&state)
			return
		endif
		/* not necessary to consume all of 'len' */
		consumed += len
	done

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:59:51 -07:00
Stephen Hemminger 5f8ef48d24 [TCP]: Allow choosing TCP congestion control via sockopt.
Allow using setsockopt to set TCP congestion control to use on a per
socket basis.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:37:36 -07:00
Stephen Hemminger 51b0bdedb8 [NET]: Separate two usages of netdev_max_backlog.
Separate out the two uses of netdev_max_backlog. One controls the
upper bound on packets processed per softirq, the new name for this is
netdev_budget; the other controls the limit on packets queued via
netif_rx.

Increase the max_backlog default to account for faster processors.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:14:40 -07:00
Stephen Hemminger 31aa02c53c [NET]: Eliminate netif_rx massive packet drops.
Eliminate the throttling behaviour when the netif receive queue fills
because it behaves badly when using high speed networks under load.
The throttling cause multiple packet drops that cause TCP to go into
slow start mode. The same effective patch has been part of BIC TCP and
H-TCP as well as part of Web100.

The existing code drops 100's of packets when the queue fills;
this changes it to individual packet drop-tail. 

Signed-off-by: Stephen Hemmminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:12:48 -07:00
Stephen Hemminger 34008d8c63 [NET]: Remove obsolete netif_rx congestion sensing mechanism.
Remove the congestion sensing mechanism from netif_rx, and always
return either full or empty.  Almost no driver checks the return value
from netif_rx, and those that do only use it for debug messages.

The original design of netif_rx was to do flow control based on the
receive queue, but NAPI has supplanted this and no driver uses the
feedback.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:10:00 -07:00
Stephen Hemminger c1ebcdb8c4 [NET]: Remove obsolete fastroute stats.
Remove last vestiages of fastroute code that is no longer used.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 20:08:59 -07:00
John Heffner 0e57976b63 [TCP]: Add Scalable TCP congestion control module.
This patch implements Tom Kelly's Scalable TCP congestion control algorithm 
for the modular framework.

The algorithm has some nice scaling properties, and has been used a fair bit 
in research, though is known to have significant fairness issues, so it's not 
really suitable for general purpose use.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:29:07 -07:00
Baruch Even a7868ea68d [TCP]: Add H-TCP congestion control module.
H-TCP is a congestion control algorithm developed at the Hamilton Institute, by
Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm
with mode switching is thus a relatively simple modification.

H-TCP is defined in a layered manner as it is still a research platform. The
basic form includes the modification of beta according to the ratio of maxRTT
to min RTT and the alpha=2*factor*(1-beta) relation, where factor is dependant
on the time since last congestion.

The other layers improve convergence by adding appropriate factors to alpha.

The following patch implements the H-TCP algorithm in it's basic form.

Signed-Off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:28:11 -07:00
Stephen Hemminger b87d8561d8 [TCP]: Add TCP Vegas congestion control module.
TCP Vegas code modified for the new TCP infrastructure.  
Vegas now uses microsecond resolution timestamps for 
better estimation of performance over higher speed links.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:27:19 -07:00
Daniele Lacamera 835b3f0c0d [TCP]: Add TCP Hybla congestion control module.
TCP Hybla congestion avoidance.

- "In heterogeneous networks, TCP connections that incorporate a
terrestrial or satellite radio link are greatly disadvantaged with
respect to entirely wired connections, because of their longer round
trip times (RTTs). To cope with this problem, a new TCP proposal, the
TCP Hybla, is presented and discussed in the paper[1]. It stems from an
analytical evaluation of the congestion window dynamics in the TCP
standard versions (Tahoe, Reno, NewReno), which suggests the necessary
modifications to remove the performance dependence on RTT.[...]"[1]

[1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for
heterogeneous networks",
International Journal of Satellite Communications and Networking
Volume 22, Issue 5 , Pages 547 - 566. September 2004.

Signed-off-by: Daniele Lacamera (root at danielinux.net)net
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:26:34 -07:00
John Heffner a628d29b56 [TCP]: Add High Speed TCP congestion control module.
Sally Floyd's high speed TCP congestion control.
This is useful for comparison and research.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:24:58 -07:00
Stephen Hemminger 8727076289 [TCP]: Add TCP Westwood congestion control module.
This is the existing 2.6.12 Westwood code moved from tcp_input
to the new congestion framework. A lot of the inline functions
have been eliminated to try and make it clearer.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:24:09 -07:00
Stephen Hemminger 83803034f4 [TCP]: Add TCP BIC congestion control module.
TCP BIC congestion control reworked to use the new congestion control 
infrastructure. This version is more up to date than the BIC
code in 2.6.12; it incorporates enhancements from BICTCP 1.1, 
to handle low latency links.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:23:25 -07:00
Stephen Hemminger 056ede6cfa [TCP]: Report congestion control algorithm in tcp_diag.
Enhancement to the tcp_diag interface used by the iproute2 ss command
to report the tcp congestion control being used by a socket.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:21:28 -07:00
Stephen Hemminger 7c99c909fa [TCP]: Change tcp_diag to use the existing __RTA_PUT() macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:20:36 -07:00
Stephen Hemminger 317a76f9a4 [TCP]: Add pluggable congestion control algorithm infrastructure.
Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules.  The legacy "new RENO" algorithm is used as a starting
point and fallback.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23 12:19:55 -07:00
Paulo Marques 543537bd92 [PATCH] create a kstrdup library function
This patch creates a new kstrdup library function and changes the "local"
implementations in several places to use this function.

Most of the changes come from the sound and net subsystems.  The sound part
had already been acknowledged by Takashi Iwai and the net part by David S.
Miller.

I left UML alone for now because I would need more time to read the code
carefully before making changes there.

Signed-off-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23 09:45:18 -07:00
Linus Torvalds 060de20e82 Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2005-06-22 23:11:50 -07:00
Shaun Pereira ebc3f64b86 [X25]: Fast select with no restriction on response
This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data".  It allows use of the Fast-Select-Acceptance
optional user facility for X.25.

This patch just implements fast select with no restriction on response
(NRR).  What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data.  This patch allows such a response.  

The called DTE can also respond with a clear-request packet that contains
call-user-data.  However, this feature is currently not implemented by the
patch.

How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,  
After a listen socket in created and bound as follows
	socket(AF_X25, SOCK_SEQPACKET, 0);
	bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
	ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
	listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted 
packet will be sent back until the following system call is made on the socket
that accepts the call
	ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the 
application server's call-user-data in the call-accepted packet, 
provided the call-request was made with Fast-select NRR.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:16:17 -07:00
Shaun Pereira cb65d506c3 [X25]: Selective sub-address matching with call user data.
From: Shaun Pereira <spereira@tusc.com.au>

This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router).  Details
are as follows.

Current state of module:

A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.

If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own.  This is required when multiple servers listen at the same x25 address
and device interface.  The kernel currently matches ALL call user data, if
present.

Current Changes:

This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match.  By
default no call user data is matched, even if call user data is present. 
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched. 
Otherwise the kernel behavior is exactly as the original implementation.

This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller 

Future Changes on next patch:

There are cases however when call user data may be present in the call
accepted packet.  According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used.  My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used.  I
am currently testing this, again with xot on linux and cisco.  

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>

(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:15:01 -07:00
James Lamanna 68d3187200 [EBTABLES]: vfree() checking cleanups
From: jlamanna@gmail.com

ebtables.c vfree() checking cleanups.

Signed-off by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:12:57 -07:00
Nishanth Aravamudan 285b3afefa [ATALK] aarp: replace schedule_timeout() with msleep()
From: Nishanth Aravamudan <nacc@us.ibm.com>

Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not wrong, but it does not account for
early return due to signals, so I think msleep() should be appropriate.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:11:44 -07:00
Chuck Short 7abaa27c1c [IPV4]: Fix route.c gcc4 warnings
Signed-off by: Chuck Short <zulcss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:10:23 -07:00
Jeff Moyer fbeec2e155 [NETPOLL]: allow multiple netpoll_clients to register against one interface
This patch provides support for registering multiple netpoll clients to the
same network device.  Only one of these clients may register an rx_hook,
however.  In practice, this restriction has not been problematic.  It is
worth mentioning, though, that the current design can be easily extended to
allow for the registration of multiple rx_hooks.

The basic idea of the patch is that the rx_np pointer in the netpoll_info
structure points to the struct netpoll that has rx_hook filled in.  Aside
from this one case, there is no need for a pointer from the struct
net_device to an individual struct netpoll.

A lock is introduced to protect the setting and clearing of the np_rx
pointer.  The pointer will only be cleared upon netpoll client module
removal, and the lock should be uncontested.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:05:59 -07:00
Jeff Moyer 115c1d6e61 [NETPOLL]: Introduce a netpoll_info struct
This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll.  The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll;  and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.

The struct netpoll is now pointed to by the netpoll_info structure.  As
such, the previous behaviour of the code is preserved.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 22:05:31 -07:00
Eric Dumazet f31f5f0512 [NET]: dont use strlen() but the result from a prior sprintf()
Small patch to save an unecessary call to strlen() : sprintf() gave us
the length, just trust it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 14:32:51 -07:00
Chuck Lever ae3884621b [PATCH] RPC: kick off socket connect operations faster
Make the socket transport kick the event queue to start socket connects
 immediately.  This should improve responsiveness of applications that are
 sensitive to slow mount operations (like automounters).

 We are now also careful to cancel the connect worker before destroying
 the xprt.  This eliminates a race where xprt_destroy can finish before
 the connect worker is even allowed to run.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  Hard-code impossibly small connect timeout.

 Version: Fri, 29 Apr 2005 15:32:01 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:32 -04:00
Chuck Lever 20e5ac828d [PATCH] RPC: TCP reconnects are too slow
When the network layer reports a connection close, the RPC task
 waiting to reconnect should be notified so it can retry immediately
 instead of waiting for the normal connection establishment timeout.

 This reverts a change made in 2.6.6 as part of adding client support
 for RPC over TCP socket idle timeouts.

 Test-plan:
 Destructive testing with NFS over TCP mounts.

 Version: Fri, 29 Apr 2005 15:31:46 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:32 -04:00
Trond Myklebust 0f9dc2b168 [PATCH] RPC: Clean up socket autodisconnect
Cancel autodisconnect requests inside xprt_transmit() in order to avoid
 races.
 Use more efficient del_singleshot_timer_sync()

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:31 -04:00
Trond Myklebust 14b218a8e4 [PATCH] RPC: Ensure rpc calls respects the RPC_NOINTR flag
For internal purposes, the rpc_clnt_sigmask() call is replaced by
 a call to rpc_task_sigmask(), which ensures that the current task
 sigmask respects both the client cl_intr flag and the per-task NOINTR flag.

 Problem noted by Jiaying Zhang.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:30 -04:00
Andreas Gruenbacher 9ba02638e4 [PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port
The NFS and NFSACL programs run on the same RPC transport.  This patch adds
 support for this by converting svc_program into a chained list of programs
 (server-side).

 Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
 Signed-off-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Andrew Morton <akpm@osdl.org>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:22 -04:00
Andreas Gruenbacher bd8100e7ed [PATCH] RPC: Encode and decode arbitrary XDR arrays
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
 Acked-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:20 -04:00
Trond Myklebust 7e06b53d79 [PATCH] RPC: fix accounting bug in the case of a truncated RPC message
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:19 -04:00
Olaf Kirch e053d1ab62 [PATCH] RPC: Lazy RPC receive buffer allocation
Signed-off-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:19 -04:00
Andreas Gruenbacher 007e251f2b [PATCH] RPC: Allow multiple RPC client programs to share the same transport
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
 Acked-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:18 -04:00
Andreas Gruenbacher cdf477068e [PATCH] RPC: Return -EPFNOSUPPORT for RPC programs that are unavailable
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
 Signed-off-by: Olaf Kirch <okir@suse.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:17 -04:00
J. Bruce Fields 6a19275ada [PATCH] RPC: [PATCH] improve rpcauthauth_create error returns
Currently we return -ENOMEM for every single failure to create a new auth.
 This is actually accurate for auth_null and auth_unix, but for auth_gss it's a
 bit confusing.

 Allow rpcauth_create (and the ->create methods) to return errors.  With this
 patch, the user may sometimes see an EINVAL instead.  Whee.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:16 -04:00
J. Bruce Fields 438b6fdebf [PATCH] RPC: Don't fall back from krb5p to krb5i
We shouldn't be silently falling back from krb5p to krb5i.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:16 -04:00
Trond Myklebust 96651ab341 [PATCH] RPC: Shrink struct rpc_task by switching to wait_on_bit()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:07 -04:00
Trond Myklebust 5ee0ed7d3a [PATCH] RPC: Make rpc_create_client() probe server for RPC program+version support
Ensure that we don't create an RPC client without checking that the server
 does indeed support the RPC program + version that we are trying to set up.

 This enables us to immediately return an error to "mount" if it turns out
 that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:04 -04:00
Trond Myklebust 5b616f5d59 [PATCH] RPC: Make rpc_create_client() destroy the transport on failure.
This saves us a couple of lines of cleanup code for each call.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:03 -04:00
Trond Myklebust 334ccfd545 [PATCH] RPC: Ensure XDR iovec length is initialized correctly in call_header
Fix up call_header() so that it calls xdr_adjust_iovec().
 Fix calculation of the scratch buffer length in xdr_init_encode().

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:02 -04:00
Trond Myklebust d05fdb0cec [PATCH] RPC: Fix a race with rpc_restart_call()
If the task->tk_exit() wants to restart the RPC call after delaying
 then the current RPC code will clobber the timer by calling
 rpc_delete_timer() immediately after re-entering the loop in
 __rpc_execute().

 Problem noticed by Oleg Nesterov <oleg@tv-sign.ru>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22 16:07:01 -04:00
Harald Welte 5d927eb010 [NETFILTER]: Fix handling of ICMP packets (RELATED) in ipt_CLUSTERIP target.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22 12:37:50 -07:00
Kumar Gala b535420739 [PATCH] Fix extra double quote in IPV4 Kconfig
Kconfig option had an extra double quote at the end of the line
which was causing in warning when building.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22 10:40:39 -07:00
David S. Miller 90f66914c8 [IPV4]: Fix fib_trie.c's args to fib_dump_info().
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:43:28 -07:00
Patrick McHardy 047601bf7c [NETFILTER]: Fix ip6t_LOG sit tunnel logging
Sit tunnel logging is currently broken:

MAC=01:23:45:67:89:ab->01:23:45:47:89:ac TUNNEL=123.123.  0.123-> 12.123.  6.123

Apart from the broken IP address, MAC addresses are printed differently
for sit tunnels than for everything else.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:07:13 -07:00
Patrick McHardy 2715bcf9ef [NETFILTER]: Drop conntrack reference in ip_call_ra_chain()/ip_mr_input()
Drop reference before handing the packets to raw_rcv()

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:06:24 -07:00
Patrick McHardy 6150bacfec [NETFILTER]: Check TCP checksum in ipt_REJECT
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:03:46 -07:00
Keir Fraser e3be8ba792 [NETFILTER]: Avoid unncessary checksum validation in UDP connection tracking
Signed-off-by: Keir Fraser <Keir.Fraser@xl.cam.ac.uk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:03:23 -07:00
Patrick McHardy 97216c799a [NETFILTER]: Missing owner-field initialization in ip6table_raw
I missed this one when fixing up iptable_raw.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:03:01 -07:00
Phil Oester 1d3cdb41f5 [NETFILTER]: expectation timeouts are compulsory
Since expectation timeouts were made compulsory [1], there is no need to
check for them in ip_conntrack_expect_insert.

[1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018143.html

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:02:42 -07:00
Patrick McHardy e98231858b [NETFILTER]: Restore netfilter assumptions in IPv6 multicast
Netfilter assumes that skb->data == skb->nh.ipv6h

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:02:15 -07:00
Patrick McHardy 18b8afc771 [NETFILTER]: Kill nf_debug
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:01:57 -07:00
Patrick McHardy e45b1be8bc [NETFILTER]: Kill lockhelp.h
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 14:01:30 -07:00
David L Stevens c9e3e8b695 [IPV6]: multicast join and misc
Here is a simplified version of the patch to fix a bug in IPv6
multicasting. It:

1) adds existence check & EADDRINUSE error for regular joins
2) adds an exception for EADDRINUSE in the source-specific multicast
        join (where a prior join is ok)
3) adds a missing/needed read_lock on sock_mc_list; would've raced
        with destroying the socket on interface down without
4) adds a "leave group" in the (INCLUDE, empty) source filter case.
        This frees unneeded socket buffer memory, but also prevents
        an inappropriate interaction among the 8 socket options that
        mess with this. Some would fail as if in the group when you
        aren't really.

Item #4 had a locking bug in the last version of this patch; rather than
removing the idev->lock read lock only, I've simplified it to remove
all lock state in the path and treat it as a direct "leave group" call for
the (INCLUDE,empty) case it covers. Tested on an MP machine. :-)

Much thanks to HoerdtMickael <hoerdt@clarinet.u-strasbg.fr> who
reported the original bug.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 13:58:25 -07:00
Jamal Hadi Salim 0d51aa80a9 [IPV6]: V6 route events reported with wrong netlink PID and seq number
Essentially netlink at the moment always reports a pid and sequence of 0
always for v6 route activities. 
To understand the repurcassions of this look at:
http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html

While fixing this, i took the liberty to resolve the outstanding issue
of IPV6 routes inserted via ioctls to have the correct pids as well.

This patch tries to behave as close as possible to the v4 routes i.e
maintains whatever PID the socket issuing the command owns as opposed to
the process. That made the patch a little bulky.

I have tested against both netlink derived utility to add/del routes as
well as ioctl derived one. The Quagga folks have tested against quagga.
This fixes the problem and so far hasnt been detected to introduce any
new issues.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 13:51:04 -07:00
Robert Olsson 19baf839ff [IPV4]: Add LC-Trie FIB lookup algorithm.
Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21 12:43:18 -07:00
Robert Olsson 246955fe4c [NETLINK]: fib_lookup() via netlink
Below is a more generic patch to do fib_lookup via netlink. For others 
we should say that we discussed this as a way to verify route selection.
It's also possible there are others uses for this.

In short the fist half of struct fib_result_nl is filled in by caller 
and netlink call fills in the other half and returns it.

In case anyone is interested there is a corresponding user app to compare 
the full routing table this was used to test implementation of the LC-trie. 

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:36:39 -07:00
Alexey Dobriyan f6e276ee67 [ATALK]: endian annotations
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:32:05 -07:00
Herbert Xu dd87147eed [IPSEC]: Add XFRM_STATE_NOPMTUDISC flag
This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states.  It is
similar to the nopmtudisc on IPIP/GRE tunnels.  It only has an effect
on IPv4 tunnel mode states.  For these states, it will ensure that the
DF flag is always cleared.

This is primarily useful to work around ICMP blackholes.

In future this flag could also allow a larger MTU to be set within the
tunnel just like IPIP/GRE tunnels.  This could be useful for short haul
tunnels where temporary fragmentation outside the tunnel is desired over
smaller fragments inside the tunnel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:21:43 -07:00
Herbert Xu d094cd83c0 [IPSEC]: Add xfrm_state_afinfo->init_flags
This patch adds the xfrm_state_afinfo->init_flags hook which allows
each address family to perform any common initialisation that does
not require a corresponding destructor call.

It will be used subsequently to set the XFRM_STATE_NOPMTUDISC flag
in IPv4.

It also fixes up the error codes returned by xfrm_init_state.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:19:41 -07:00
Herbert Xu 72cb6962a9 [IPSEC]: Add xfrm_init_state
This patch adds xfrm_init_state which is simply a wrapper that calls
xfrm_get_type and subsequently x->type->init_state.  It also gets rid
of the unused args argument.

Abstracting it out allows us to add common initialisation code, e.g.,
to set family-specific flags.

The add_time setting in xfrm_user.c was deleted because it's already
set by xfrm_state_alloc.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:18:08 -07:00
Frank Filz 3f7a87d2fa [SCTP] sctp_connectx() API support
Implements sctp_connectx() as defined in the SCTP sockets API draft by
tunneling the request through a setsockopt().

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20 13:14:57 -07:00
David S. Miller 7df551254a [TCP]: Fix sysctl_tcp_low_latency
When enabled, this should disable UCOPY prequeue'ing altogether,
but it does not due to a missing test.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 23:01:10 -07:00
Jesper Juhl f7d7fc0322 [IPV4]: [4/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch changes the type of the third parameter 'length' of the 
raw_send_hdrinc() function from 'int' to 'size_t'.
This makes sense since this function is only ever called from one 
location, and the value passed as the third parameter in that location is 
itself of type size_t, so this makes the recieving functions parameter 
type match. Also, inside raw_send_hdrinc() the 'length' variable is 
used in comparisons with unsigned values and passed as parameter to 
functions expecting unsigned values (it's used in a single comparison with 
a signed value, but that one can never actually be negative so the patch 
also casts that one to size_t to stop gcc worrying, and it is passed in a 
single instance to memcpy_fromiovecend() which expects a signed int, but 
as far as I can see that's not a problem since the value of 'length' 
shouldn't ever exceed the value of a signed int).

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 23:00:34 -07:00
Jesper Juhl 93765d8a43 [IPV4]: [3/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch changes the type of the local variable 'i' in 
raw_probe_proto_opt() from 'int' to 'unsigned int'. The only use of 'i' in 
this function is as a counter in a for() loop and subsequent index into 
the msg->msg_iov[] array.
Since 'i' is compared in a loop to the unsigned variable msg->msg_iovlen 
gcc -W generates this warning : 

net/ipv4/raw.c:340: warning: comparison between signed and unsigned

Changing 'i' to unsigned silences this warning and is safe since the array 
index can never be negative anyway, so unsigned int is the logical type to 
use for 'i' and also enables a larger msg_iov[] array (but I don't know if 
that will ever matter).

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 23:00:15 -07:00
Jesper Juhl 926d4b8122 [IPV4]: [2/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch gets rid of the following gcc -W warning in net/ipv4/raw.c :

net/ipv4/raw.c:387: warning: comparison of unsigned expression < 0 is always false

Since 'len' is of type size_t it is unsigned and can thus never be <0, and 
since this is obvious from the function declaration just a few lines above 
I think it's ok to remove the pointless check for len<0.


Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 23:00:00 -07:00
Jesper Juhl 5418c6926f [IPV4]: [1/4] signed vs unsigned cleanup in net/ipv4/raw.c
This patch silences these two gcc -W warnings in net/ipv4/raw.c :

net/ipv4/raw.c:517: warning: signed and unsigned type in conditional expression
net/ipv4/raw.c:613: warning: signed and unsigned type in conditional expression

It doesn't change the behaviour of the code, simply writes the conditional 
expression with plain 'if()' syntax instead of '? :' , but since this 
breaks it into sepperate statements gcc no longer complains about having 
both a signed and unsigned value in the same conditional expression.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:59:45 -07:00
Thomas Graf 94df109a8c [PKT_SCHED]: noop/noqueue qdisc style cleanups
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:59:08 -07:00
Thomas Graf f87a9c3ddf [PKT_SCHED]: Cleanup pfifo_fast qdisc and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary initializers,
and simplifies the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:53 -07:00
Thomas Graf 321090e7a4 [PKT_SCHED]: Add and use prio2list() in the pfifo_fast qdisc
prio2list() returns the relevant sk_buff_head for the
band specified by the priority for a given skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:35 -07:00
Thomas Graf 821d24ae74 [PKT_SCHED]: Transform pfifo_fast to use generic queue management interface
Gives pfifo_fast a byte based backlog.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:15 -07:00
Thomas Graf 6fc8e84f4c [PKT_SCHED]: Cleanup fifo qdisc and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:58:00 -07:00
Thomas Graf aaae3013d1 [PKT_SCHED]: Transform fifo qdisc to use generic queue management interface
The simplicity of the fifo qdisc allows several qdisc operations to be
redirected to the relevant queue management function directly. Saves
a lot of code lines and gives the pfifo a byte based backlog.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:57:42 -07:00
Herbert Xu 1e061ab2e5 [SCTP]: Replace spin_lock_irqsave with spin_lock_bh
This patch replaces the spin_lock_irqsave call on the receive queue
lock in SCTP with spin_lock_bh.  Despite the proliferation of
spin_lock_irqsave calls in this stack, it is only entered from the
IPv4/IPv6 stack and user space.  That is, it is never entered from
hardirq context.

The call in question is only called from recvmsg which means that
IRQs aren't disabled.  Therefore it is safe to replace it with
spin_lock_bh.
 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:56:42 -07:00
Herbert Xu e0f9f8586a [IPV4/IPV6]: Replace spin_lock_irq with spin_lock_bh
In light of my recent patch to net/ipv4/udp.c that replaced the
spin_lock_irq calls on the receive queue lock with spin_lock_bh,
here is a similar patch for all other occurences of spin_lock_irq
on receive/error queue locks in IPv4 and IPv6.

In these stacks, we know that they can only be entered from user
or softirq context.  Therefore it's safe to disable BH only.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:56:18 -07:00
Jamal Hadi Salim 9ed19f339e [NETLINK]: Set correct pid for ioctl originating netlink events
This patch ensures that netlink events created as a result of programns
using ioctls (such as ifconfig, route etc) contains the correct PID of
those events.
 
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:51 -07:00
Jamal Hadi Salim e431b8c004 [NETLINK]: Explicit typing
This patch converts "unsigned flags" to use more explict types like u16
instead and incrementally introduces NLMSG_NEW().
 
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:31 -07:00
Thomas Graf 58b82150da [DECNET]: Remove unnecessary initilization of unused variable entries
This patch was supposed to be part of the neighbour tables related
patchset but apparently got lost.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-18 22:55:02 -07:00