Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.
In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.
This protects the data structure from accidental corruption.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of providing macros/inline functions to initialize
the families, make all users initialize them statically and
get rid of the macros.
This reduces the kernel code size by about 1.6k on x86-64
(with allyesconfig).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.
Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)
Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After backporting commit ee44b4bc05 ("dlm: use sctp 1-to-1 API")
series to a kernel with an older workqueue which didn't use RCU yet, it
was noticed that we are freeing the workqueues in dlm_lowcomms_stop()
too early as free_conn() will try to access that memory for canceling
the queued works if any.
This issue was introduced by commit 0d737a8cfd as before it such
attempt to cancel the queued works wasn't performed, so the issue was
not present.
This patch fixes it by simply inverting the free order.
Cc: stable@vger.kernel.org
Fixes: 0d737a8cfd ("dlm: fix race while closing connections")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
With the current kernel, `dlm_tool lockdebug` fails as below:
"dlm_tool lockdebug ED0BD86DCE724393918A1AE8FDBF1EE3
can't open /sys/kernel/debug/dlm/ED0BD86DCE724393918A1AE8FDBF1EE3:
Operation not permitted"
This is because table_open() depends on file->f_op to tell which
seq_file ops should be passed down. But, the original file ops in
file->f_op is replaced by "debugfs_full_proxy_file_operations" with
commit 49d200deaa ("debugfs: prevent access to removed files'
private data").
Currently, I can think up 2 solutions: 1st, replace
debugfs_create_file() with debugfs_create_file_unsafe();
2nd, make different table_open#() accordingly. The 1st one
is neat, but I don't thoroughly understand its risk. Maybe
someone has a better one.
Signed-off-by: Eric Ren <zren@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Replace calls to kmalloc followed by a memcpy with a direct call to
kmemdup.
The Coccinelle semantic patch used to make this change is as follows:
@@
expression from,to,size,flag;
statement S;
@@
- to = \(kmalloc\|kzalloc\)(size,flag);
+ to = kmemdup(from,size,flag);
if (to==NULL || ...) S
- memcpy(to, from, size);
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This config option can be used to disable the
LOG_INFO recovery messages.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1ae1602de0 "configfs: switch ->default groups to a linked list"
left the NULL gps pointer behind after removing the kcalloc() call which
made it non-NULL. It also left the !gps check in place so make_cluster()
now fails with ENOMEM. Remove the remaining uses of the gps variable to
fix that.
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Previous changes introduced the use of socket error reporting
for dlm sockets. This set includes two fixes in how the
socket error callbacks are used.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW4ISmAAoJEDgbc8f8gGmqnWgQANnS13rXt4NxnRtNqzrEUmjk
NbvP4BxeWKinxhc9ObUnV0CzDllYa2A6Te2AageCQr/qhRKfnnaQJczJ39Xp+289
oSgJJOo4HGCPjshrq9BwGo8ufijUIc0MsT64TzeI3ww58b1eK2CLoC9uiLDyYwjM
Hw0PRXU/MAxzOWJIIWgkh78FQmx8fswOSNyK49/p3/INMVNFxn75bd+shtxUOuCp
50gmI6DG4gGJDK3vtIjZStJhW4lcaM3tjGZ9+mcLQF2PZK5zIeHSr3nEfzJ4Qwps
0p55JeiXdfff6RTrxqnJewc+xysmD9594wG0G0VqLsWaLWulDrHZMFsVg1J10frk
bk0WwLjsYG/wLVZtKRe+3QwOyqgTxx1Cea8ZYB2yMcFBkDYxFmh8a00kJcVuucnS
W+w4rhI9blk1cc4eHhnuBIi5m2jbelu4NPG5722ORtv+gNpBl2ptecqIjfuhr8xE
IIF5tnkZb8lBuLyhCmg8in2mKnY5aKSk3kuQ98rDXZUMCLT0PKG2ZNsXJjpX6G38
uQ+sB9rH6c5pIe0S3keS2f2Ly3V7gtBErA0otyxaq/XlxnJeFlLU5G1chHUeW8VP
qxhtjDShPuIA97MxE2GA3ehr7r3jbOb+8qOc13E9ygXDyXNN0tb4JIeB0beNLdRx
db8Lt+OR10IUawNyuFB9
=qvzY
-----END PGP SIGNATURE-----
Merge tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"Previous changes introduced the use of socket error reporting for dlm
sockets. This set includes two fixes in how the socket error
callbacks are used"
* tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
DLM: Save and restore socket callbacks properly
DLM: Replace nodeid_to_addr with kernel_getpeername
Replace the current NULL-terminated array of default groups with a linked
list. This gets rid of lots of nasty code to size and/or dynamically
allocate the array.
While we're at it also provide a conveniant helper to remove the default
groups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Felipe Balbi <balbi@kernel.org> [drivers/usb/gadget]
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
This patch fixes the problems with patch b3a5bbfd7.
1. It removes a return statement from lowcomms_error_report
because it needs to call the original error report in all paths
through the function.
2. All socket callbacks are saved and restored, not just the
sk_error_report, and that's done so with proper locking like
sunrpc does.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch replaces the call to nodeid_to_addr with a call to
kernel_getpeername. This avoids taking a spinlock because it may
potentially be called from a softirq context.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
it's "bugger off if we got ERR_PTR", not the other way round...
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
A _lot_ of ->write() instances were open-coding it; some are
converted to memdup_user_nul(), a lot more remain...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch is a cleanup to make following patch easier to
review.
Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()
To ease backports, we rename both constants.
Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI target updates from Nicholas Bellinger:
"This series contains HCH's changes to absorb configfs attribute
->show() + ->store() function pointer usage from it's original
tree-wide consumers, into common configfs code.
It includes usb-gadget, target w/ drivers, netconsole and ocfs2
changes to realize the improved simplicity, that now renders the
original include/target/configfs_macros.h CPP magic for fabric drivers
and others, unnecessary and obsolete.
And with common code in place, new configfs attributes can be added
easier than ever before.
Note, there are further improvements in-flight from other folks for
v4.5 code in configfs land, plus number of target fixes for post -rc1
code"
In the meantime, a new user of the now-removed old configfs API came in
through the char/misc tree in commit 7bd1d4093c ("stm class: Introduce
an abstraction for System Trace Module devices").
This merge resolution comes from Alexander Shishkin, who updated his stm
class tracing abstraction to account for the removal of the old
show_attribute and store_attribute methods in commit 517982229f
("configfs: remove old API") from this pull. As Alexander says about
that patch:
"There's no need to keep an extra wrapper structure per item and the
awkward show_attribute/store_attribute item ops are no longer needed.
This patch converts policy code to the new api, all the while making
the code quite a bit smaller and easier on the eyes.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>"
That patch was folded into the merge so that the tree should be fully
bisectable.
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
configfs: remove old API
ocfs2/cluster: use per-attribute show and store methods
ocfs2/cluster: move locking into attribute store methods
netconsole: use per-attribute show and store methods
target: use per-attribute show and store methods
spear13xx_pcie_gadget: use per-attribute show and store methods
dlm: use per-attribute show and store methods
usb-gadget/f_serial: use per-attribute show and store methods
usb-gadget/f_phonet: use per-attribute show and store methods
usb-gadget/f_obex: use per-attribute show and store methods
usb-gadget/f_uac2: use per-attribute show and store methods
usb-gadget/f_uac1: use per-attribute show and store methods
usb-gadget/f_mass_storage: use per-attribute show and store methods
usb-gadget/f_sourcesink: use per-attribute show and store methods
usb-gadget/f_printer: use per-attribute show and store methods
usb-gadget/f_midi: use per-attribute show and store methods
usb-gadget/f_loopback: use per-attribute show and store methods
usb-gadget/ether: use per-attribute show and store methods
usb-gadget/f_acm: use per-attribute show and store methods
usb-gadget/f_hid: use per-attribute show and store methods
...
This includes one simple fix to make posix locks
interruptible by signals in cases where a signal
handler is used.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWO3ZiAAoJEDgbc8f8gGmqXc8P/j1GcuiMyHbRjlyGgExYYuFG
l9eeoCAzUy4aE9PnN90ur/9FhYGQ3fEte4xaKDU7FMTsQIvNrKYdNAt6qWQdQI6M
shogDw6hXwXJ42IPS9fuj2dEPFD5wBoMYjEGowzdAvsMH3cROyN03hIWqSWTL3jI
UFSR7NjbnQwT8ZUAYcICEE5VerqsGzxGkaFF+V3fYASsZlALTlQkVHT5DQzCkXgq
0CkNhsMx5H4Ng9y/2dnhPj3y24NqbhdtLX4dkcKevMmHP5FJ/rEI82oizxPgJ8oZ
QlcSOUZatNuqLVAStecmsd5sH80/IDspnpMDxnQCKnioNq3x6YXXfhyv5CKB6Ahy
atA3SlYDACiZz5tydJ/97DJvvIrF2rUETPXk2Lobc972UU99r8zxCUah8xv4ThD/
DtuSkqNnTmXjMcTssHDqo/Kg16dZxpx+itxsWCEivfZm6EL1j5RAvZO5G04wMmry
D/FXDKT/FZR+xYDIg1FLc1uOMldeRbMWhb+zGfTAnYy0aH43oyePVddbC+lSuVfp
Pat2avXoovR59+7nhFk+s+xf3c8oKMoSwCZOso4OoVySRZQmE1dH6m6D5RLkgRHw
nTGggRRAAOsLoiYFtnCKXHkxUTGZWDJfwtv1OI1IABqdoSj5Px/4JAswWj1e5+dH
haCGlyvMDqK8ImaekGcZ
=m+FJ
-----END PGP SIGNATURE-----
Merge tag 'dlm-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm update from David Teigland:
"This includes one simple fix to make posix locks interruptible by
signals in cases where a signal handler is used"
* tag 'dlm-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: make posix locks interruptible
Replace wait_event_killable with wait_event_interruptible
so that a program waiting for a posix lock can be
interrupted by a signal. With the killable version,
a program was not interruptible by a signal if it
had a signal handler set for it, overriding the default
action of terminating the process.
Signed-off-by: Eric Ren <zren@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait(). This
allows for some later cleanup.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
To simplify the configfs interface and remove boilerplate code that also
causes binary bloat.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Teigland <teigland@redhat.com
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This set mainly includes a change to the way the
dlm uses the SCTP API in the kernel, removing the
direct dependency on the sctp module. Other odd
SCTP-related fixes are also included. The other
notable fix is for a long standing regression in
the behavior of lock value blocks for user space
locks.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV5HwZAAoJEDgbc8f8gGmqoaQP/iz5zgKSjX0mOC3fz8BqXISk
85cKLPfsf0avDmGx6nkKp5wsmVDYkfrObkocvf7bOcemAuycuOmr9y22ZscNaAWM
vKLhTJQ0koAlZqhJmJx45w318BFY03RdDQmVKUnQHza9Ed7Uoa0CyR6jyuwBTuMP
gA9O6i6CezodtB8CLPySJa2znlt50CptLaJKj1V9/xCpBh7orwpihv4pBz8oH1lR
JXRj9hNEFy2+vk8Pce14fKmHgUROg5+y1V7jZeetpCbTxAAFOeFOL6EH28eWssbQ
YoWofcPugmOs9BDbnVZHf6+Y5xIaoiIylb2Q4/me4rjQfSmaiDbTZyqB4TtFrldF
BngaAJipmLQu8ELqQmwEMhZTAc/GsB60x1EcjrPVTKbW7pwsfVp2fPVV92a7koQe
prmz5rh8HCenrWuy3d4/EP7K+E4+W98ZXsDuym4pBNaoYwCPyvtWLa8kSqAdx47J
MNk/ak9ktP2NxsCs+EjCmP2hn2r+RTio6R2uCtKB2pdclfqOupIsYZkVdZERK5Ch
5+ALeVjHfxswFVRxGjbPQRs9x8ZclBydceAHgYbLQ2xDGRvTpQhnIyNLRXsZnkrD
t4mTokZG/GGgmWOscZ5nXOOGZt8SpX+UkICWWWbuy3dxuOK6al3lVeBcC0KW5Pki
KNHzcKrlGJJnCVr0nWTU
=iYRu
-----END PGP SIGNATURE-----
Merge tag 'dlm-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set mainly includes a change to the way the dlm uses the SCTP API
in the kernel, removing the direct dependency on the sctp module.
Other odd SCTP-related fixes are also included.
The other notable fix is for a long standing regression in the
behavior of lock value blocks for user space locks"
* tag 'dlm-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: print error from kernel_sendpage
dlm: fix lvb copy for user locks
dlm: sctp_accept_from_sock() can be static
dlm: fix reconnecting but not sending data
dlm: replace BUG_ON with a less severe handling
dlm: use sctp 1-to-1 API
dlm: fix not reconnecting on connecting error handling
dlm: fix race while closing connections
dlm: fix connection stealing if using SCTP
Print a dlm-specific error when a socket error occurs
when sending a dlm message.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
For a userland lock request, the previous and current
lock modes are used to decide when the lvb should be
copied back to the user. The wrong previous value was
used, so that it always matched the current value.
This caused the lvb to be copied back to the user in
the wrong cases.
Signed-off-by: David Teigland <teigland@redhat.com>
There are cases on which lowcomms_connect_sock() is called directly,
which caused the CF_WRITE_PENDING flag to not bet set upon reconnect,
specially on send_to_sock() error handling. On this last, the flag was
already cleared and no further attempt on transmitting would be done.
As dlm tends to connect when it needs to transmit something, it makes
sense to always mark this flag right after the connect.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
BUG_ON() is a severe action for this case, specially now that DLM with
SCTP will use 1 socket per association. Instead, we can just close the
socket on this error condition and return from the function.
Also move the check to an earlier stage as it won't change and thus we
can abort as soon as possible.
Although this issue was reported when still using SCTP with 1-to-many
API, this cleanup wouldn't be that simple back then because we couldn't
close the socket and making sure such event would cease would be hard.
And actually, previous code was closing the association, yet SCTP layer
is still raising the new data event. Probably a bug to be fixed in SCTP.
Reported-by: <tan.hu@zte.com.cn>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
needed but this causes it to use sctp_do_peeloff() to mimic an
kernel_accept() and this causes a symbol dependency on sctp module.
By switching it to 1-to-1 API we can avoid this dependency and also
reduce quite a lot of SCTP-specific code in lowcomms.c.
The caveat is that now DLM won't always use the same src port. It will
choose a random one, just like TCP code. This allows the peers to
attempt simultaneous connections, which now are handled just like for
TCP.
Even more sharing between TCP and SCTP code on DLM is possible, but it
is intentionally left for a later commit.
Note that for using nodes with this commit, you have to have at least
the early fixes on this patchset otherwise it will trigger some issues
on old nodes.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
If we don't clear that bit, lowcomms_connect_sock() will not schedule
another attempt, and no further attempt will be done.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
When a connection have issues DLM may need to close it. Therefore we
should also cancel pending workqueues for such connection at that time,
and not just when dlm is not willing to use this connection anymore.
Also, if we don't clear CF_CONNECT_PENDING flag, the error handling
routines won't be able to re-connect as lowcomms_connect_sock() will
check for it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
When using SCTP and accepting a new connection, DLM currently validates
if the peer trying to connect to it is one of the cluster nodes, but it
doesn't check if it already has a connection to it or not.
If it already had a connection, it will be overwritten, and the new one
will be used for writes, possibly causing the node to leave the cluster
due to communication breakage.
Still, one could DoS the node by attempting N connections and keeping
them open.
As said, but being explicit, both situations are only triggerable from
other cluster nodes, but are doable with only user-level perms.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
With well over 200+ users of this api, there are a mere 12 users that
actually checked the return value of this function. And all of them
really didn't do anything with that information as the system or module
was shutting down no matter what.
So stop pretending like it matters, and just return void from
misc_deregister(). If something goes wrong in the call, you will get a
WARNING splat in the syslog so you know how to fix up your driver.
Other than that, there's nothing that can go wrong.
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.
This makes the very common pattern of
if (genlmsg_end(...) < 0) { ... }
be a whole bunch of dead code. Many places also simply do
return nlmsg_end(...);
and the caller is expected to deal with it.
This also commonly (at least for me) causes errors, because it is very
common to write
if (my_function(...))
/* error condition */
and if my_function() does "return nlmsg_end()" this is of course wrong.
Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.
Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did
- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;
I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.
One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:
- unification of d_splice_alias()/d_materialize_unique()
- iov_iter rewrite
- killing a bunch of ->f_path.dentry users (and f_dentry macro).
Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.
Still not complete, but much closer now.
- crapectomy in lustre (dead code removal, mostly)
- "let's make seq_printf return nothing" preparations
- assorted cleanups and fixes
There _definitely_ will be more piles"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...
A process may exit, leaving an orphan lock in the lockspace.
This adds the capability for another process to acquire the
orphan lock. Acquiring the orphan just moves the lock from
the orphan list onto the acquiring process's list of locks.
An adopting process must specify the resource name and mode
of the lock it wants to adopt. If a matching lock is found,
the lock is moved to the caller's 's list of locks, and the
lkid of the lock is returned like the lkid of a new lock.
If an orphan with a different mode is found, then -EAGAIN is
returned. If no orphan lock is found on the resource, then
-ENOENT is returned. No async completion is used because
the result is immediately available.
Also, when orphans are purged, allow a zero nodeid to refer
to the local nodeid so the caller does not need to look up
the local nodeid.
Signed-off-by: David Teigland <teigland@redhat.com>
The flags are already converted to le when being sent,
but are not being converted back to cpu when received.
Signed-off-by: Neale Ferguson <neale@sinenomine.net>
Signed-off-by: David Teigland <teigland@redhat.com>
This argument is always NULL so don't pass it around.
[jlayton: remove dependencies on previous patches in series]
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
This fixes checkpatch warning:
WARNING: debugfs_remove(NULL) is safe this check is probably not required
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The connection struct with nodeid 0 is the listening socket,
not a connection to another node. The sctp resend function
was not checking that the nodeid was valid (non-zero), so it
would mistakenly get and resend on the listening connection
when nodeid was zero.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Replace seq_printf where possible. This patch also fixes the following
checkpatch warning "unnecessary whitespace before a quoted newline"
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace obsolete functions
simple_strtoul/kstrtouint
simple_strtol/kstrtoint
(kstr __must_check requires the right function to be applied)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.
Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.
And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.
So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.
Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: David S. Miller <davem@davemloft.net>
The log messages relating to the progress of recovery
are minimal and very often useful. Change these to
the KERN_INFO level so they are always available.
Signed-off-by: David Teigland <teigland@redhat.com>
Include appropriate header file fs/dlm/ast.h in fs/dlm/ast.c because it
contains function prototypes of some functions defined in fs/dlm/ast.c.
This also eliminates the following warning in fs/dlm/ast:
fs/dlm/ast.c:52:5: warning: no previous prototype for ‘dlm_add_lkb_callback’ [-Wmissing-prototypes]
fs/dlm/ast.c:113:5: warning: no previous prototype for ‘dlm_rem_lkb_callback’ [-Wmissing-prototypes]
fs/dlm/ast.c:174:6: warning: no previous prototype for ‘dlm_add_cb’ [-Wmissing-prototypes]
fs/dlm/ast.c:212:6: warning: no previous prototype for ‘dlm_callback_work’ [-Wmissing-prototypes]
fs/dlm/ast.c:267:5: warning: no previous prototype for ‘dlm_callback_start’ [-Wmissing-prototypes]
fs/dlm/ast.c:278:6: warning: no previous prototype for ‘dlm_callback_stop’ [-Wmissing-prototypes]
fs/dlm/ast.c:284:6: warning: no previous prototype for ‘dlm_callback_suspend’ [-Wmissing-prototypes]
fs/dlm/ast.c:292:6: warning: no previous prototype for ‘dlm_callback_resume’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: David Teigland <teigland@redhat.com>
We pass the freed "r" pointer back to the caller. It's harmless but it
upsets the static checkers.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...