All of the code considers ->dead as a hint that the cached policy
needs to get refreshed. The read side can just drop the read lock
without any side effects.
The write side needs to make sure that it's written only exactly
once. Only possible race is at xfrm_policy_kill(). This is fixed
by checking result of __xfrm_policy_unlink() when needed. It will
always succeed if the policy object is looked up from the hash
list (so some checks are removed), but it needs to be checked if
we are trying to unlink policy via a reference (appropriate
checks added).
Since policy->walk.dead is written exactly once, it no longer
needs to be protected with a write lock.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing check for policy direction verification. This is
especially important since without this xfrm_user may end up
deleting per-socket policy which is not allowed.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm state genid only needs to be matched against the copy
saved in xfrm_dst. So we don't need a global genid at all. In
fact, we don't even need to initialise it.
Based on observation by Timo Teräs.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle. Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.
The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.
Thanks to Jamal for find this problem.
Reported-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we clone the SP, we should also clone the mark.
Useful for socket based SPs.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ability for netlink userspace to manipulate the SPD
and manipulate the mark, retrieve it and get events with a defined
mark, etc.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ability for netlink userspace to manipulate the SAD
and manipulate the mark, retrieve it and get events with a defined
mark.
MIGRATE may be added later.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
pass mark to all SP lookups to prepare them for when we add code
to have them search.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
pass mark to all SA lookups to prepare them for when we add code
to have them search.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.
Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.
Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRMINHDRERROR counter is ambigous when validating forwarding
path. It makes it tricky to debug when you have both in and fwd
validation.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Alexey Dobriyan:
--------------------
setkey now takes several seconds to run this simple script
and it spits "recv: Resource temporarily unavailable" messages.
#!/usr/sbin/setkey -f
flush;
spdflush;
add A B ipcomp 44 -m tunnel -C deflate;
add B A ipcomp 45 -m tunnel -C deflate;
spdadd A B any -P in ipsec
ipcomp/tunnel/192.168.1.2-192.168.1.3/use;
spdadd B A any -P out ipsec
ipcomp/tunnel/192.168.1.3-192.168.1.2/use;
--------------------
Obviously applications want the events even when the table
is empty. So we cannot make this behavioral change.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add __percpu sparse annotations to net.
These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.
The macro and type tricks around snmp stats make things a bit
interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric's version fixed it for pfkey. This one is for xfrm user.
I thought about amortizing those two get_acqseq()s but it seems
reasonable to have two of these sequence spaces for the two different
interfaces.
cheers,
jamal
commit d5168d5addbc999c94aacda8f28a4a173756a72b
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Tue Feb 16 06:51:22 2010 -0500
xfrm: avoid spinlock in get_acqseq() used by xfrm user
This is in the same spirit as commit 28aecb9d77
by Eric Dumazet.
Use atomic_inc_return() in get_acqseq() to avoid taking a spinlock
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm_state_clone calls kfree instead of xfrm_state_put to free
a failed state. Depending on the state of the failed state, it
can cause leaks to things like module references.
All states should be freed by xfrm_state_put past the point of
xfrm_init_state.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Observed similar behavior on SPD as previouly seen on SAD flushing..
This fixes it.
cheers,
jamal
commit 428b20432dc31bc2e01a94cd451cf5a2c00d2bf4
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 05:49:38 2010 -0500
xfrm: Flushing empty SPD generates false events
To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SAD.
-On window1 "ip xfrm mon"
-on window2 issue "ip xfrm state flush"
You get prompt back in window1
and you see the flush event on window2.
With this fix, you still get prompt on window1 but no
event on window2.
I was tempted to return -ESRCH on window1 (which would
show "RTNETLINK answers: No such process") but didnt want
to change current behavior.
cheers,
jamal
commit 5f3dd4a772326166e1bcf54acc2391df00dc7ab5
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 04:41:36 2010 -0500
xfrm: Flushing empty SAD generates false events
To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some XFRM attributes were not going through basic validation.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.
Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.
Reorder GC threshold initialization so it'd be done before registering
XFRM policies.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"ip xfrm state|policy count" report SA/SP count from init_net,
not from netns of caller process.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the RFC4543 (GMAC) wrapper for GCM similar to the
existing RFC4106 wrapper. The main differences between GCM and GMAC are
the contents of the AAD and that the plaintext is empty for the latter.
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL. This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
can not add camellia cipher algorithm when using "ip xfrm state" command.
Signed-off-by: Li Yewang <lyw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These algorithms use a truncation of 192/256 bits, as specified
in RFC4868.
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.
If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.
In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.
Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine
in the suspended state. This leads to the connectivity problems because
after resuming local machine thinks that the SAD entry is still valid, while
it has already been expired on the remote server.
The cause of this is very simple: the timeouts in the net/xfrm are bound to
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.
I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.
This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).
This patch is against 2.6.31.4. Please CC me.
Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The last users of skb_icv_walk are converted to ahash now,
so skb_icv_walk is unused and can be removed.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All instances of file_operations should be const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes hash collisions in cases where number
of entries have incrementing IP source and destination addresses
from single respective subnets (i.e. 192.168.0.1-172.16.0.1,
192.168.0.2-172.16.0.2, and so on.).
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up to use xfrm_addr_cmp() instead of compare addresses directly.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our CAST algorithm is called cast5, not cast128. Clearly nobody
has ever used it :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When kernel inserts a temporary SA for IKE, it uses the wrong hash
value for dst list. Two hash values were calcultated before: one with
source address and one with a wildcard source address.
Bug hinted by Junwei Zhang <junwei.zhang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
spin_lock() should be spin_unlock() in xfrm_state_walk_done().
caused by:
commit 12a169e7d8
"ipsec: Put dumpers on the dump list"
Reported-by: Marc Milgram <mmilgram@redhat.com>
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The change to make xfrm_state objects hash on source address
broke the case where such source addresses are wildcarded.
Fix this by doing a two phase lookup, first with fully specified
source address, next using saddr wildcarded.
Reported-by: Nicolas Dichtel <nicolas.dichtel@dev.6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit fc8c7dc1b2.
As indicated by Jiri Klimes, this won't work. These numbers are
not only used the size validation, they are also used to locate
attributes sitting after the message.
Signed-off-by: David S. Miller <davem@davemloft.net>