netlink kernel socket is protected by refcount, not RCU.
Its rcv path is neither protected by RCU. So the synchronize_net()
is just pointless.
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.
2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.
"int" being used as an array index needs to be sign-extended
to 64-bit before being used.
void f(long *p, int i)
{
g(p[i]);
}
roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call g
MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.
Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:
static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}
And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]
However, overall balance is in negative direction:
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull audit updates from Paul Moore:
"Another relatively small pull request for v4.9 with just two patches.
The patch from Richard updates the list of features we support and
report back to userspace; this should have been sent earlier with the
rest of the v4.8 patches but it got lost in my inbox.
The second patch fixes a problem reported by our Android friends where
we weren't very consistent in recording PIDs"
* 'stable-4.9' of git://git.infradead.org/users/pcmoore/audit:
audit: add exclude filter extension to feature bitmap
audit: consistently record PIDs with task_tgid_nr()
Unfortunately we record PIDs in audit records using a variety of
methods despite the correct way being the use of task_tgid_nr().
This patch converts all of these callers, except for the case of
AUDIT_SET in audit_receive_msg() (see the comment in the code).
Reported-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull audit updates from Paul Moore:
"Six audit patches for 4.8.
There are a couple of style and minor whitespace tweaks for the logs,
as well as a minor fixup to catch errors on user filter rules, however
the major improvements are a fix to the s390 syscall argument masking
code (reviewed by the nice s390 folks), some consolidation around the
exclude filtering (less code, always a win), and a double-fetch fix
for recording the execve arguments"
* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
audit: fix a double fetch in audit_log_single_execve_arg()
audit: fix whitespace in CWD record
audit: add fields to exclude filter by reusing user filter
s390: ensure that syscall arguments are properly masked on s390
audit: fix some horrible switch statement style crimes
audit: fixup: log on errors from filter user rules
Pull audit fixes from Paul Moore:
"Two small patches to fix audit problems in 4.7-rcX: the first fixes a
potential kref leak, the second removes some header file noise.
The first is an important bug fix that really should go in before 4.7
is released, the second is not critical, but falls into the very-nice-
to-have category so I'm including in the pull request.
Both patches are straightforward, self-contained, and pass our
testsuite without problem"
* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_get_tty to reduce scope and kabi changes
audit: move calcs after alloc and check when logging set loginuid
The only users of audit_get_tty and audit_put_tty are internal to
audit, so move it out of include/linux/audit.h to kernel.h and create
a proper function rather than inlining it. This also reduces kABI
changes.
Suggested-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: line wrapped description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
RFE: add additional fields for use in audit filter exclude rules
https://github.com/linux-audit/audit-kernel/issues/5
Re-factor and combine audit_filter_type() with audit_filter_user() to
use audit_filter_user_rules() to enable the exclude filter to
additionally filter on PID, UID, GID, AUID, LOGINUID_SET, SUBJ_*.
The process of combining the similar audit_filter_user() and
audit_filter_type() functions, required inverting the meaning and
including the ALWAYS action of the latter.
Include audit_filter_user_rules() into audit_filter(), removing
unneeded logic in the process.
Keep the check to quit early if the list is empty.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: checkpatch.pl fixes - whitespace damage, wrapped description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull audit updates from Paul Moore:
"Four small audit patches for 4.7.
Two are simple cleanups around the audit thread management code, one
adds a tty field to AUDIT_LOGIN events, and the final patch makes
tty_name() usable regardless of CONFIG_TTY.
Nothing controversial, and it all passes our regression test"
* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
tty: provide tty_name() even without CONFIG_TTY
audit: add tty field to LOGIN event
audit: we don't need to __set_current_state(TASK_RUNNING)
audit: cleanup prune_tree_thread
The tty field was missing from AUDIT_LOGIN events.
Refactor code to create a new function audit_get_tty(), using it to
replace the call in audit_log_task_info() and to add it to
audit_log_set_loginuid(). Lock and bump the kref to protect it, adding
audit_put_tty() alias to decrement it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Remove the calls to __set_current_state() to mark the task as running
and do some related cleanup in wait_for_auditd() to limit the amount
of work we do when we aren't going to reschedule the current task.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Pull audit updates from Paul Moore:
"A small set of patches for audit this time; just three in total and
one is a spelling fix.
The two patches with actual content are designed to help prevent new
instances of auditd from displacing an existing, functioning auditd
and to generate a log of the attempt. Not to worry, dead/stuck auditd
instances can still be replaced by a new instance without problem.
Nothing controversial, and everything passes our regression suite"
* 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit:
audit: Fix typo in comment
audit: log failed attempts to change audit_pid configuration
audit: stop an old auditd being starved out by a new auditd
The audit_tty and audit_tty_log_passwd fields are actually bool
values, so merge into single memory location to access atomically.
NB: audit log operations may still occur after tty audit is disabled
which is consistent with the existing functionality
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
tty_audit_push() and tty_audit_push_current() perform identical
tasks; eliminate the tty_audit_push() implementation and the
tty_audit_push_current() name.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Failed attempts to change the audit_pid configuration are not presently
logged. One case is an attempt to starve an old auditd by starting up
a new auditd when the old one is still alive and active. The other
case is an attempt to orphan a new auditd when an old auditd shuts
down.
Log both as AUDIT_CONFIG_CHANGE messages with failure result.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Nothing prevents a new auditd starting up and replacing a valid
audit_pid when an old auditd is still running, effectively starving out
the old auditd since audit_pid no longer points to the old valid
auditd.
If no message to auditd has been attempted since auditd died
unnaturally or got killed, audit_pid will still indicate it is alive.
There isn't an easy way to detect if an old auditd is still running on
the existing audit_pid other than attempting to send a message to see
if it fails. An -ECONNREFUSED almost certainly means it disappeared
and can be replaced. Other errors are not so straightforward and may
indicate transient problems that will resolve themselves and the old
auditd will recover. Yet others will likely need manual intervention
for which a new auditd will not solve the problem.
Send a new message type (AUDIT_REPLACE) to the old auditd containing a
u32 with the PID of the new auditd. If the audit replace message
succeeds (or doesn't fail with certainty), fail to register the new
auditd and return an error (-EEXIST).
This is expected to make the patch preventing an old auditd orphaning a
new auditd redundant.
V3: Switch audit message type from 1000 to 1300 block.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull security subsystem updates from James Morris:
- EVM gains support for loading an x509 cert from the kernel
(EVM_LOAD_X509), into the EVM trusted kernel keyring.
- Smack implements 'file receive' process-based permission checking for
sockets, rather than just depending on inode checks.
- Misc enhancments for TPM & TPM2.
- Cleanups and bugfixes for SELinux, Keys, and IMA.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (41 commits)
selinux: Inode label revalidation performance fix
KEYS: refcount bug fix
ima: ima_write_policy() limit locking
IMA: policy can be updated zero times
selinux: rate-limit netlink message warnings in selinux_nlmsg_perm()
selinux: export validatetrans decisions
gfs2: Invalid security labels of inodes when they go invalid
selinux: Revalidate invalid inode security labels
security: Add hook to invalidate inode security labels
selinux: Add accessor functions for inode->i_security
security: Make inode argument of inode_getsecid non-const
security: Make inode argument of inode_getsecurity non-const
selinux: Remove unused variable in selinux_inode_init_security
keys, trusted: seal with a TPM2 authorization policy
keys, trusted: select hash algorithm for TPM2 chips
keys, trusted: fix: *do not* allow duplicate key options
tpm_ibmvtpm: properly handle interrupted packet receptions
tpm_tis: Tighten IRQ auto-probing
tpm_tis: Refactor the interrupt setup
tpm_tis: Get rid of the duplicate IRQ probing code
...
The functions consume_skb() and kfree_skb() test whether their argument
is NULL and then return immediately.
Thus the tests around their calls are not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[PM: tweak patch prefix]
Signed-off-by: Paul Moore <pmoore@redhat.com>
If the audit_backlog_limit is changed from a limited value to an
unlimited value (zero) while the queue was overflowed, wake up the
audit_backlog_wait queue to allow those processes to continue.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Should auditd spawn threads, allow all members of its thread group to
use the audit_backlog_limit reserves to bypass the queue limits too.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: minor upstream merge tweaks]
Signed-off-by: Paul Moore <pmoore@redhat.com>
After auditd has recovered from an overflowed queue, the first process
that doesn't use reserves to make it through the queue checks should
reset the audit backlog wait time to the configured value. After that,
there is no need to keep resetting it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Make the inode argument of the inode_getsecid hook non-const so that we
can use it to revalidate invalid security labels.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Variable rc in not required as it is just used for unchanged for return,
and return is always 0 in the function.
Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
[PM: fixed spelling errors in description]
Signed-off-by: Paul Moore <pmoore@redhat.com>
This patch makes audit_string_contains_control return bool to improve
readability due to this particular function only using either one or
zero as its return value.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
There are several reports of the kernel losing contact with auditd when
it is, in fact, still running. When this happens, kernel syslogs show:
"audit: *NO* daemon at audit_pid=<pid>"
although auditd is still running, and is apparently happy, listening on
the netlink socket. The pid in the "*NO* daemon" message matches the pid
of the running auditd process. Restarting auditd solves this.
The problem appears to happen randomly, and doesn't seem to be strongly
correlated to the rate of audit events being logged. The problem
happens fairly regularly (every few days), but not yet reproduced to
order.
On production kernels, BUG_ON() is a no-op, so any error will trigger
this.
Commit 34eab0a7cd ("audit: prevent an older auditd shutdown from
orphaning a newer auditd startup") eliminates one possible cause. This
isn't the case here, since the PID in the error message and the PID of
the running auditd match.
The primary expected cause of error here is -ECONNREFUSED when the audit
daemon goes away, when netlink_getsockbyportid() can't find the auditd
portid entry in the netlink audit table (or there is no receive
function). If -EPERM is returned, that situation isn't likely to be
resolved in a timely fashion without administrator intervention. In
both cases, reset the audit_pid. This does not rule out a race
condition. SELinux is expected to return zero since this isn't an INET
or INET6 socket. Other LSMs may have other return codes. Log the error
code for better diagnosis in the future.
In the case of -ENOMEM, the situation could be temporary, based on local
or general availability of buffers. -EAGAIN should never happen since
the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
-ERESTARTSYS and -EINTR are not expected since this kernel thread is not
expected to receive signals. In these cases (or any other unexpected
ones for now), report the error and re-schedule the thread, retrying up
to 5 times.
v2:
Removed BUG_ON().
Moved comma in pr_*() statements.
Removed audit_strerror() text.
Reported-by: Vipin Rathor <v.rathor@gmail.com>
Reported-by: <ctcard@hotmail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: applied rgb's fixup patch to correct audit_log_lost() format issues]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull audit update from Paul Moore:
"This is one of the larger audit patchsets in recent history,
consisting of eight patches and almost 400 lines of changes.
The bulk of the patchset is the new "audit by executable"
functionality which allows admins to set an audit watch based on the
executable on disk. Prior to this, admins could only track an
application by PID, which has some obvious limitations.
Beyond the new functionality we also have some refcnt fixes and a few
minor cleanups"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
fixup: audit: implement audit by executable
audit: implement audit by executable
audit: clean simple fsnotify implementation
audit: use macros for unset inode and device values
audit: make audit_del_rule() more robust
audit: fix uninitialized variable in audit_add_rule()
audit: eliminate unnecessary extra layer of watch parent references
audit: eliminate unnecessary extra layer of watch references
Clean up a number of places were casted magic numbers are used to represent
unset inode and device numbers in preparation for the audit by executable path
patch set.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: enclosed the _UNSET macros in parentheses for ./scripts/checkpatch]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull audit updates from Paul Moore:
"Four small audit patches for v4.2, all bug fixes. Only 10 lines of
change this time so very unremarkable, the patch subject lines pretty
much tell the whole story"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: Fix check of return value of strnlen_user()
audit: obsolete audit_context check is removed in audit_filter_rules()
audit: fix for typo in comment to function audit_log_link_denied()
lsm: rename duplicate labels in LSM_AUDIT_DATA_TASK audit message type
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
Pull audit fixes from Paul Moore:
"Seven audit patches for v4.1, all bug fixes.
The largest, and perhaps most significant commit helps resolve some
memory pressure issues related to the inode cache and audit, there are
also a few small commits which help resolve some timing issues with
the audit log queue, and the rest fall into the always popular "code
clean-up" category.
In general, nothing really substantial, just a nice set of maintenance
patches"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: Remove condition which always evaluates to false
audit: reduce mmap_sem hold for mm->exe_file
audit: consolidate handling of mm->exe_file
audit: code clean up
audit: don't reset working wait time accidentally with auditd
audit: don't lose set wait time on first successful call to audit_log_start()
audit: move the tree pruning to a dedicated thread
After commit 3e1d0bb622 ("audit: Convert int limit
uses to u32"), by converting an int to u32, few conditions will always evaluate
to false.
These warnings were emitted during compilation:
kernel/audit.c: In function ‘audit_set_enabled’:
kernel/audit.c:347:2: warning: comparison of unsigned expression < 0 is always
false [-Wtype-limits]
if (state < AUDIT_OFF || state > AUDIT_LOCKED)
^
kernel/audit.c: In function ‘audit_receive_msg’:
kernel/audit.c:880:9: warning: comparison of unsigned expression < 0 is
always false [-Wtype-limits]
if (s.backlog_wait_time < 0 ||
The following patch removes those unnecessary conditions.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The mm->exe_file is currently serialized with mmap_sem (shared)
in order to both safely (1) read the file and (2) audit it via
audit_log_d_path(). Good users will, on the other hand, make use
of the more standard get_mm_exe_file(), requiring only holding
the mmap_sem to read the value, and relying on reference counting
to make sure that the exe file won't dissapear underneath us.
Additionally, upon NULL return of get_mm_exe_file, we also call
audit_log_format(ab, " exe=(null)").
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
This patch adds a audit_log_d_path_exe() helper function
to share how we handle auditing of the exe_file's path.
Used by both audit and auditsc. No functionality is changed.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
During a queue overflow condition while we are waiting for auditd to drain the
queue to make room for regular messages, we don't want a successful auditd that
has bypassed the queue check to reset the backlog wait time.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Copy the set wait time to a working value to avoid losing the set
value if the queue overflows.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull networking fixes from David Miller:
1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.
2) Fix receive checksum handling in enic driver, from Govindarajulu
Varadarajan.
3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
Herbert Xu. Also, add code to detect drivers that have this mistake
in the future.
4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.
5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
input path,f rom Nicolas Dichtel.
6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.
7) Fix double SKB free in vxlan driver, also from Pravin.
8) When we scrub a packet, which happens when we are switching the
context of the packet (namespace, etc.), we should reset the
secmark. From Thomas Graf.
9) ->ndo_gso_check() needs to do more than return true/false, it also
has to allow the driver to clear netdev feature bits in order for
the caller to be able to proceed properly. From Jesse Gross.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
netlink/genetlink: pass network namespace to bind/unbind
ne2k-pci: Add pci_disable_device in error handling
bonding: change error message to debug message in __bond_release_one()
genetlink: pass multicast bind/unbind to families
netlink: call unbind when releasing socket
netlink: update listeners directly when removing socket
genetlink: pass only network namespace to genl_has_listeners()
netlink: rename netlink_unbind() to netlink_undo_bind()
net: Generalize ndo_gso_check to ndo_features_check
net: incorrect use of init_completion fixup
neigh: remove next ptr from struct neigh_table
net: xilinx: Remove unnecessary temac_property in the driver
net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
openvswitch: fix odd_ptr_err.cocci warnings
Bluetooth: Fix accepting connections when not using mgmt
Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
brcmfmac: Do not crash if platform data is not populated
ipw2200: select CFG80211_WEXT
...
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.
To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.
Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull audit fixes from Paul Moore:
"Four patches to fix various problems with the audit subsystem, all are
fairly small and straightforward.
One patch fixes a problem where we weren't using the correct gfp
allocation flags (GFP_KERNEL regardless of context, oops), one patch
fixes a problem with old userspace tools (this was broken for a
while), one patch fixes a problem where we weren't recording pathnames
correctly, and one fixes a problem with PID based filters.
In general I don't think there is anything controversial with this
patchset, and it fixes some rather unfortunate bugs; the allocation
flag one can be particularly scary looking for users"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: restore AUDIT_LOGINUID unset ABI
audit: correctly record file names with different path name types
audit: use supplied gfp_mask from audit_buffer in kauditd_send_multicast_skb
audit: don't attempt to lookup PIDs when changing PID filtering audit rules
Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
audit_log_end(), which can come from any context (aka even a sleeping context)
GFP_KERNEL can't be used. Since the audit_buffer knows what context it should
use, pass that down and use that.
See: https://lkml.org/lkml/2014/12/16/542
BUG: sleeping function called from invalid context at mm/slab.c:2849
in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
2 locks held by sulogin/885:
#0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
#1: (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
Call Trace:
[<ffffffff916ba529>] dump_stack+0x50/0xa8
[<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
[<ffffffff910632a6>] __might_sleep+0x119/0x128
[<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
[<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
[<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
[<ffffffff914e2b62>] skb_copy+0x3e/0xa3
[<ffffffff910c263e>] audit_log_end+0x83/0x100
[<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
[<ffffffff91252a73>] common_lsm_audit+0x441/0x450
[<ffffffff9123c163>] slow_avc_audit+0x63/0x67
[<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
[<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
[<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
[<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
[<ffffffff911515e6>] install_exec_creds+0xe/0x79
[<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
[<ffffffff9115198e>] search_binary_handler+0x81/0x18c
[<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
[<ffffffff91153669>] do_execve+0x1f/0x21
[<ffffffff91153967>] SyS_execve+0x25/0x29
[<ffffffff916c61a9>] stub_execve+0x69/0xa0
Cc: stable@vger.kernel.org #v3.16-rc1
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull audit updates from Paul Moore:
"Two small patches from the audit next branch; only one of which has
any real significant code changes, the other is simply a MAINTAINERS
update for audit.
The single code patch is pretty small and rather straightforward, it
changes the audit "version" number reported to userspace from an
integer to a bitmap which is used to indicate the functionality of the
running kernel. This really doesn't have much impact on the kernel,
but it will make life easier for the audit userspace folks.
Thankfully we were still on a version number which allowed us to do
this without breaking userspace"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
audit: convert status version to a feature bitmap
audit: add Paul Moore to the MAINTAINERS entry
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.
This instruments might_sleep() checks to catch places that nest
blocking primitives - such as mutex usage in a wait loop. Such
bugs can result in hard to debug races/hangs.
Another category of invalid nesting that this facility will detect
is the calling of blocking functions from within schedule() ->
sched_submit_work() -> blk_schedule_flush_plug().
There's some potential for false positives (if secondary blocking
primitives themselves are not ready yet for this facility), but the
kernel will warn once about such bugs per bootup, so the warning
isn't much of a nuisance.
This feature comes with a number of fixes, for problems uncovered
with it, so no messages are expected normally.
- Another round of sched/numa optimizations and refinements, for
CONFIG_NUMA_BALANCING=y.
- Another round of sched/dl fixes and refinements.
Plus various smaller fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched: Add missing rcu protection to wake_up_all_idle_cpus
sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
sched/numa: Init numa balancing fields of init_task
sched/deadline: Remove unnecessary definitions in cpudeadline.h
sched/cpupri: Remove unnecessary definitions in cpupri.h
sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
sched/fair: Fix stale overloaded status in the busiest group finding logic
sched: Move p->nr_cpus_allowed check to select_task_rq()
sched/completion: Document when to use wait_for_completion_io_*()
sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
sched/fair: Kill task_struct::numa_entry and numa_group::task_list
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
sched/deadline: Reschedule from switched_from_dl() after a successful pull
sched/deadline: Push task away if the deadline is equal to curr during wakeup
sched/deadline: Add deadline rq status print
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
sched/rt: Clean up check_preempt_equal_prio()
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...
The version field defined in the audit status structure was found to have
limitations in terms of its expressibility of features supported. This is
distict from the get/set features call to be able to command those features
that are present.
Converting this field from a version number to a feature bitmap will allow
distributions to selectively backport and support certain features and will
allow upstream to be able to deprecate features in the future. It will allow
userspace clients to first query the kernel for which features are actually
present and supported. Currently, EINVAL is returned rather than EOPNOTSUP,
which isn't helpful in determining if there was an error in the command, or if
it simply isn't supported yet. Past features are not represented by this
bitmap, but their use may be converted to EOPNOTSUP if needed in the future.
Since "version" is too generic to convert with a #define, use a union in the
struct status, introducing the member "feature_bitmap" unionized with
"version".
Convert existing AUDIT_VERSION_* macros over to AUDIT_FEATURE_BITMAP*
counterparts, leaving the former for backwards compatibility.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: minor whitespace tweaks]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull audit fixes from Paul Moore:
"After he sent the initial audit pull request for 3.18, Eric asked me
to take over the management of the audit tree, hence this pull request
to fix a couple of problems with audit.
As you can see below, the changes are minimal: adding some whitespace
to a string so userspace parses it correctly, and fixing a problem
with audit's usage of fsnotify that was causing audit watch rules to
be lost. Neither of these patches were very controversial on the
mailing lists and they fix real problems, getting them into 3.18 would
be a good thing"
* 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit:
audit: keep inode pinned
audit: AUDIT_FEATURE_CHANGE message format missing delimiting space