By copying BPF related operation to uprobe processing path, this patch
allow users attach BPF programs to uprobes like what they are already
doing on kprobes.
After this patch, users are allowed to use PERF_EVENT_IOC_SET_BPF on a
uprobe perf event. Which make it possible to profile user space programs
and kernel events together using BPF.
Because of this patch, CONFIG_BPF_EVENTS should be selected by
CONFIG_UPROBE_EVENT to ensure trace_call_bpf() is compiled even if
KPROBE_EVENT is not set.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit e1abf2cc8d ("bpf: Fix the build on
BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable")
updated the building condition of bpf_trace.o from CONFIG_BPF_SYSCALL
to CONFIG_BPF_EVENTS, but the corresponding #ifdef controller in
trace_events.h for trace_call_bpf() was not changed. Which, in theory,
is incorrect.
With current Kconfigs, we can create a .config with CONFIG_BPF_SYSCALL=y
and CONFIG_BPF_EVENTS=n by unselecting CONFIG_KPROBE_EVENT and
selecting CONFIG_BPF_SYSCALL. With these options, trace_call_bpf() will
be defined as an extern function, but if anyone calls it a symbol missing
error will be triggered since bpf_trace.o was not built.
This patch changes the #ifdef controller for trace_call_bpf() from
CONFIG_BPF_SYSCALL to CONFIG_BPF_EVENTS. I'll show its correctness:
Before this patch:
BPF_SYSCALL BPF_EVENTS trace_call_bpf bpf_trace.o
y y normal compiled
n n inline not compiled
y n normal not compiled (incorrect)
n y impossible (BPF_EVENTS depends on BPF_SYSCALL)
After this patch:
BPF_SYSCALL BPF_EVENTS trace_call_bpf bpf_trace.o
y y normal compiled
n n inline not compiled
y n inline not compiled (fixed)
n y impossible (BPF_EVENTS depends on BPF_SYSCALL)
So this patch doesn't break anything. QED.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Code on the kprobe blacklist doesn't want unexpected int3
exceptions. It probably doesn't want unexpected debug exceptions
either. Be safe: disallow breakpoints in nokprobes code.
On non-CONFIG_KPROBES kernels, there is no kprobe blacklist. In
that case, disallow kernel breakpoints entirely.
It will be particularly important to keep hw breakpoints out of the
entry and NMI code once we move debug exceptions off the IST stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e14b152af99640448d895e3c2a8c2d5ee19a1325.1438312874.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The previous change documents that cleanup_return_instances()
can't always detect the dead frames, the stack can grow. But
there is one special case which imho worth fixing:
arch_uretprobe_is_alive() can return true when the stack didn't
actually grow, but the next "call" insn uses the already
invalidated frame.
Test-case:
#include <stdio.h>
#include <setjmp.h>
jmp_buf jmp;
int nr = 1024;
void func_2(void)
{
if (--nr == 0)
return;
longjmp(jmp, 1);
}
void func_1(void)
{
setjmp(jmp);
func_2();
}
int main(void)
{
func_1();
return 0;
}
If you ret-probe func_1() and func_2() prepare_uretprobe() hits
the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not
reported.
When we know that the new call is not chained, we can do the
more strict check. In this case "sp" points to the new ret-addr,
so every frame which uses the same "sp" must be dead. The only
complication is that arch_uretprobe_is_alive() needs to know was
it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum
and change prepare_uretprobe() to pass RP_CHECK_CALL only if
!chained.
Note: arch_uretprobe_is_alive() could also re-read *sp and check
if this word is still trampoline_vaddr. This could obviously
improve the logic, but I would like to avoid another
copy_from_user() especially in the case when we can't avoid the
false "alive == T" positives.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch/x86 doesn't care (so far), but as Pratyush Anand pointed
out other architectures might want why arch_uretprobe_is_alive()
was called and use different checks depending on the context.
Add the new argument to distinguish 2 callers.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134026.GA4779@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the x86 specific version of arch_uretprobe_is_alive()
helper. It returns true if the stack frame mangled by
prepare_uretprobe() is still on stack. So if it returns false,
we know that the probed function has already returned.
We add the new return_instance->stack member and change the
generic code to initialize it in prepare_uretprobe, but it
should be equally useful for other architectures.
TODO: this assumes that the probed application can't use
multiple stacks (say sigaltstack). We will try to improve
this logic later.
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134018.GA4766@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the new "weak" helper, arch_uretprobe_is_alive(), used by
the next patches. It should return true if this return_instance
is still valid. The arch agnostic version just always returns
true.
The patch exports "struct return_instance" for the architectures
which want to override this hook. We can also cleanup
prepare_uretprobe() if we pass the new return_instance to
arch_uretprobe_hijack_return_addr().
Tested-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Anton Arapov <arapov@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150721134016.GA4762@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Thomas Gleixner:
"This update contains:
- the manual revert of the SYSCALL32 changes which caused a
regression
- a fix for the MPX vma handling
- three fixes for the ioremap 'is ram' checks.
- PAT warning fixes
- a trivial fix for the size calculation of TLB tracepoints
- handle old EFI structures gracefully
This also contains a PAT fix from Jan plus a revert thereof. Toshi
explained why the code is correct"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/pat: Revert 'Adjust default caching mode translation tables'
x86/asm/entry/32: Revert 'Do not use R9 in SYSCALL32' commit
x86/mm: Fix newly introduced printk format warnings
mm: Fix bugs in region_is_ram()
x86/mm: Remove region_is_ram() call from ioremap
x86/mm: Move warning from __ioremap_check_ram() to the call site
x86/mm/pat, drivers/media/ivtv: Move the PAT warning and replace WARN() with pr_warn()
x86/mm/pat, drivers/infiniband/ipath: Replace WARN() with pr_warn()
x86/mm/pat: Adjust default caching mode translation tables
x86/fpu: Disable dependent CPU features on "noxsave"
x86/mpx: Do not set ->vm_ops on MPX VMAs
x86/mm: Add parenthesis for TLB tracepoint size calculation
efi: Handle memory error structures produced based on old versions of standard
double iteration list (one for registered ftrace ops, and one for
registered "global" ops), to just use one list. That simplified the code
but also broke the function tracing filtering on pid.
This updates the code to handle the filtering again with the new logic.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVsq1dAAoJEEjnJuOKh9ldfagH/39dixx3tMMElO7pGM9Q3DBE
WmJuGSOmeZA1O0balnfcV2SgegEGCSEP6sjBtyuYCfgcFDWyIdAvGrMLS/hKTOxR
pilaqyNQz7CROx1zco9gbu5pGwkSkcAfnqYzOg6IpJ0WHtiyH36GMe2wMu29u2VT
I7NgSC6ByA82N4pwvgetlUcIDcPTyrkkhGmGPBJGY+diKzeSSo8NlRbv3SNYs0ua
V072Oumu64RrZBMdn/Sb2pCF2hf6vhTXD6qS4dbpK/Rfnlblqer9SqUIx2kpg603
yDOmPY7wN9FJ94Te3EeXubLi0LqDJH4iPOndrRn1fYsgMbUpq1BlViF7W7Ajze8=
=LQ+S
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Back in 3.16 the ftrace code was redesigned and cleaned up to remove
the double iteration list (one for registered ftrace ops, and one for
registered "global" ops), to just use one list. That simplified the
code but also broke the function tracing filtering on pid.
This updates the code to handle the filtering again with the new
logic"
* tag 'trace-v4.2-rc2-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix breakage of set_ftrace_pid
Two trivial updates. I meant to send these much earlier, but I've been
preoccupied.
* Add MAINTAINERS entry for diskonchip g3 driver
* Fix an overlooked conflict in bitfield value assignments
The latter update is a bit overdue, but there's no reason to wait any longer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVsuDvAAoJEFySrpd9RFgtqz4P/Rlls1oU8ah1+BOSJf9lksaf
qnYIH3Md7INnrfnfiFV0lfrYB3R3qFzHSksqZPco38x67jlRjMUy0qvEOrtFa8QY
KNK6KJkdb8gj4z0263thSTFJLBqpLpGLhd3yDebOACiDscQ67zrbWQ5ZuPOHPxQ1
IRDXjjrIlTnwUBe2Uig+vdyiOPU4eOkpPlpnpqEeEmCo/LlJrPikEdpONJN8fvXm
i1WMqJHBiC45ZQ2R9+rO4n+wj4AhbkueOeZcyUfzriUOZ32rU899RRpCiLv/WJ1P
HYNL3aDN0boizHFVQMrN1Qx9+T4rk5LcAHjZbFn5XUn+xOGnqVzLkUzkZ4lJwCFP
c+h0fyNLjNlvaaJ5ek4AcbwFA2ofIwJhNiGKCvufoxxGSdjdqIkXpo2UVu2CSjdI
vC3eX/eTz3aZIik0+2DkLozBcOGJfCD2ly0AIMu3QK80pOTuXljeb8HG+ypjFZKr
KerW5M3APuiWYgeemy01ojrkR64urS4Mq8e5qMtrqfk8vGgp0QxybxtCR+/hIcvH
IUinApew1Sh25Um8Dt1yothuvcU4df3olZoh/kcaw9I/+dwGsMYNkMKSC+g6e0uG
+AWtat/DrDI22Wef9TbbTu1gBBYiZgNjifmSTvBErDPMi0QFwCo6V7hrkWy3m99p
zwGbhfMDh9JzT52GDTnU
=9Q8t
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20150724' of git://git.infradead.org/linux-mtd
Pull MTD fixes from Brian Norris:
"Two trivial updates. I meant to send these much earlier, but I've
been preoccupied.
- Add MAINTAINERS entry for diskonchip g3 driver
- Fix an overlooked conflict in bitfield value assignments
The latter update is a bit overdue, but there's no reason to wait any
longer"
* tag 'for-linus-20150724' of git://git.infradead.org/linux-mtd:
mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict
MAINTAINERS: mtd: docg3: add docg3 maintainer
Pull libata fixes from Tejun Heo:
"A couple important fixes.
- A block layer change which removed restriction on max transfer size
led to silent data corruption on some devices. A new quirk is
added to restore the old size limit for the reported device. If it
gets reported on more devices, we might have to consider restoring
the restriction for ATA devices by default.
- There finally is a SSD which is confirmed to cause data corruption
on TRIM regardless of which flavor is used. A new quirk is added
and the device is blacklisted
- Other device-specific workarounds"
* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: Do not blacklist M510DC
libata: increase the timeout when setting transfer mode
libata: add ATA_HORKAGE_MAX_SEC_1024 to revert back to previous max_sectors limit
libata: force disable trim for SuperSSpeed S238
libata: add ATA_HORKAGE_NOTRIM
libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for HP 250GB SATA disk VB0250EAVER
ata: pmp: add quirk for Marvell 4140 SATA PMP
Commit 4104d326b6 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.
Add back the proper hooks to have pid filtering working once again.
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
caused old UEFI spec (< 2.3) versions of the memory error record
structure to be declared invalid - Tony Luck
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVpmyaAAoJEC84WcCNIz1Vy8sP/2k/io83aTzuePeJb2ub4TXn
/ZFA2jMQqKcZ69tr91F+zTeb/isA7+yijOzkJ4dO7HSfzsc8IWxujZf+iqGKQnpQ
JRq0zWfy3jXKnIE9CqDPEVRF0wkMgVIsowPTDVHhLeuy8R9LaF3KxO5ZM7FwPYAK
bAhZ8jYdw1DRQ0Vns4XD8B3j1GYe3BJ/ptAZCWoZ4Go3bxoU4VBsW7goZlVfcwg7
TY8mmwp7zoZS0frv3Ba42xGli9s3g4+8WJcWYVcYuB9NqKYhFjze2kmWZO68Le0o
3Vnppf3pYWE3YqgBsx8KlZ8XT0KwvPzc93XtW962+E8N603v8sbl6oy9gOe9KJEN
oDCH3TqTcFGcOwrVMgXgAHupXlHH1qHy0jevWVJ3mxsIyTNQN6fpTpIAaWRtmVW1
p9JTA62rTJ+bB7C1JXjVaLtLTBD/YnXqZM2z/O7zhomm1Myv+JrtphZ0MGb6cHqj
Db9OLU3SMONFsgp/FD4XDMz0BxpUxekvKHzzWL/PM8muN1O0RPhG/QE+m6P4007F
XtAb5oleKQawAmzzTyUN7gaRi2V4WI7+0BZ/Y9L9KnNZ01XX0LXgF/+nqdgfyqG+
lnWpuaEVePMsOPA2amtqY88AlRERZGjOuSbSO1NLjhHYzpVL2t+CuBJvDLfBGEc4
NtuxnN0bFL7RroIHIVQL
=kSV0
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull an EFI fix from Matt Fleming:
- Fix a bug in the Common Platform Error Record (CPER) driver that
caused old UEFI spec (< 2.3) versions of the memory error record
structure to be declared invalid. (Tony Luck)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Two families of fixes:
- Fix an FPU context related boot crash on newer x86 hardware with
larger context sizes than what most people test. To fix this
without ugly kludges or extensive reverts we had to touch core task
allocator, to allow x86 to determine the task size dynamically, at
boot time.
I've tested it on a number of x86 platforms, and I cross-built it
to a handful of architectures:
(warns) (warns)
testing x86-64: -git: pass ( 0), -tip: pass ( 0)
testing x86-32: -git: pass ( 0), -tip: pass ( 0)
testing arm: -git: pass ( 1359), -tip: pass ( 1359)
testing cris: -git: pass ( 1031), -tip: pass ( 1031)
testing m32r: -git: pass ( 1135), -tip: pass ( 1135)
testing m68k: -git: pass ( 1471), -tip: pass ( 1471)
testing mips: -git: pass ( 1162), -tip: pass ( 1162)
testing mn10300: -git: pass ( 1058), -tip: pass ( 1058)
testing parisc: -git: pass ( 1846), -tip: pass ( 1846)
testing sparc: -git: pass ( 1185), -tip: pass ( 1185)
... so I hope the cross-arch impact 'none', as intended.
(by Dave Hansen)
- Fix various NMI handling related bugs unearthed by the big asm code
rewrite and generally make the NMI code more robust and more
maintainable while at it. These changes are a bit late in the
cycle, I hope they are still acceptable.
(by Andy Lutomirski)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
x86/fpu, sched: Dynamically allocate 'struct fpu'
x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code
x86/nmi/64: Make the "NMI executing" variable more consistent
x86/nmi/64: Minor asm simplification
x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
x86/nmi/64: Reorder nested NMI checks
x86/nmi/64: Improve nested NMI comments
x86/nmi/64: Switch stacks on userspace NMI entry
x86/nmi/64: Remove asm code that saves CR2
x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.
Also optimize the x86 code a bit by caching task_struct_size.
Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).
Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already. This saves from doing an extra slab
allocation at fork().
The only real downside here is that we have to stick everything
and the end of the task_struct. But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, we set wrong gfp_mask to page_owner info in case of isolated
freepage by compaction and split page. It causes incorrect mixed
pageblock report that we can get from '/proc/pagetypeinfo'. This metric
is really useful to measure fragmentation effect so should be accurate.
This patch fixes it by setting correct information.
Without this patch, after kernel build workload is finished, number of
mixed pageblock is 112 among roughly 210 movable pageblocks.
But, with this fix, output shows that mixed pageblock is just 57.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch to my kernel.org alias instead of a badly named gmail address,
which I rarely use.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile). For example it can detect when formatting a pointer as a
number, like the issue fixed in commit a3fa71c40f ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f. for example commit 5ce1aca814
("reiserfs: fix __RASSERT format string").
To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/. These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 has a constant hugepage size, by setting HPAGE_SHIFT we also change
e.g. the pageblock_order, which should be independent in respect to
hugepage support.
With this patch every architecture is free to define how to check
for hugepage support.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's some staging and IIO driver fixes for 4.2-rc3.
Nothing major, the majority are IIO issues that were reported, with a
few other minor staging driver fixes. All have been in linux-next for a
while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlWpRFUACgkQMUfUDdst+yke/wCeJPDuYuOp/SFhyviMx9ojKIGF
kVIAoI1xDY4SwVExztXcqyKo6m0H2yZ4
=pxAa
-----END PGP SIGNATURE-----
Merge tag 'staging-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO driver fixes from Greg KH:
"Here's some staging and IIO driver fixes for 4.2-rc3.
Nothing major, the majority are IIO issues that were reported, with a
few other minor staging driver fixes. All have been in linux-next for
a while with no reported issues"
* tag 'staging-4.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (25 commits)
staging: vt6656: check ieee80211_bss_conf bssid not NULL
staging: vt6655: check ieee80211_bss_conf bssid not NULL
staging:lustre: remove irq.h from socklnd.h
staging: make board support depend on OF_IRQ and CLKDEV_LOOKUP
iio: tmp006: Check channel info on write
iio: sx9500: Add missing init in sx9500_buffer_pre{en,dis}able()
iio:light:ltr501: fix regmap dependency
iio:light:ltr501: fix variable in ltr501_init
iio: sx9500: fix bug in compensation code
iio: sx9500: rework error handling of raw readings
iio: magnetometer: mmc35240: fix available sampling frequencies
iio:light:stk3310: Fix REGMAP_I2C dependency
iio: light: STK3310: un-invert proximity values
iio:adc:cc10001_adc: fix Kconfig dependency
iio: light: tcs3414: Fix bug preventing to set integration time
iio:accel:bmc150-accel: fix counting direction
iio:light:cm3323: clear bitmask before set
iio: adc: at91_adc: allow to use full range of startup time
iio: DAC: ad5624r_spi: fix bit shift of output data value
iio: proximity: sx9500: Fix proximity value
...
hitting individual drivers and nothing else.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVqPoCAAoJEEEQszewGV1zccEP/2bGViiPxrlBV2plLlteE4sT
7zujWF3v1rkRZ1jr6tDafAthxtaL8WBY6O3rDWAVCZiVYATsYBGoh/Rg9EFwaKZ/
6i5aqVFqAo9aNOFVmKabjUs8YESRTB4Wo1Ccuju/n4PkL7sGM+fBuZw5R6J7R+GU
TpXMulHI4mDVjHuZWeUJ+GRnp6xmaTv5/PR1p1NC7D7zFulk6h4ZPYXJUhX8fhwC
6Pp0bgH8JEvlzphDLHziX41jSEz8Pr3vWk0WJTYCisdsNl6oklfCMBcCOgluZmgO
7v32RmxqXqBevhDgzd7sUUGjntmATb9kS4ZIcW6awX9Ia7TK55iG/pEd5pHfXAVV
j79ViWfWxWLH2kVPiBTKSe+nCtMSg8hEj5RtPxTkBdUOL4BBgAigyY1DW4OgKPv5
YOHiG0FN9UT/tNH8EmUNtnUW8k4a6LpLN+r+3Tgjbz1xgVcd/5wGlEzDm+TVnyMf
RBojHUDYI0taE/KpGl7RkhcKZoDNsrYK/M+pKw5DTD8Ca/hfjMoLvqAs0uKzWjc7
HDTtTTjAqFICNnf+yAaAKcjjpEp7r8xaj0c/cBIItIqG79vjLd+VnUvonjVzuTGP
icysBdY5J2+NWBgWeVotopGhGs4NOYNIDQjzUwBLUBFcAVzvhsvTRc9LoncZq5Bn
SWHIW/sZZ8hnWETNoz6q
=mTd/
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"This is a first set of GPIO fixes for the v4.2 series, all hitting
individual drivers and nothing else (except for a documentation
oneliner. I intended to send a request earlier but life intervened)"
* tag 'gpio-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: fix nested irqs rescheduling
gpio: omap: prevent module from being unloaded while in use
gpio: max732x: Add missing dev reference to gpiochip
gpio/xilinx: Use correct address when setting initial values.
gpio: zynq: Fix problem with unbalanced pm_runtime_enable
gpio: omap: add missed spin_unlock_irqrestore in omap_gpio_irq_type
gpio: brcmstb: fix null ptr dereference in driver remove
gpio: Remove double "base" in comment
Pull block fixes from Jens Axboe:
"A collection of fixes from the last few weeks that should go into the
current series. This contains:
- Various fixes for the per-blkcg policy data, fixing regressions
since 4.1. From Arianna and Tejun
- Code cleanup for bcache closure macros from me. Really just
flushing this out, it's been sitting in another branch for months
- FIELD_SIZEOF cleanup from Maninder Singh
- bio integrity oops fix from Mike
- Timeout regression fix for blk-mq from Ming Lei"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: set default timeout as 30 seconds
NVMe: Reread partitions on metadata formats
bcache: don't embed 'return' statements in closure macros
blkcg: fix blkcg_policy_data allocation bug
blkcg: implement all_blkcgs list
blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[]
blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods
block/blk-cgroup.c: free per-blkcg data when freeing the blkcg
block: use FIELD_SIZEOF to calculate size of a field
bio integrity: do not assume bio_integrity_pool exists if bioset exists
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVo5PmAAoJEAAOaEEZVoIVQkAP/iU8i/atra0YVACMckwLH0rV
OlMs66V1Ur/+3PNwnBAPAITIQTIokRcCUe+ChwlM5I0/N6sHb8b+qKqsc1cesSn4
rBIBXigjMTeBS4MZXYhCeo9oMPPRtTpKdZMGlh499wQcc39BkmRtYPeONQCaYovW
uDq4Mydbt3m92wJK3s2VNsAeNgGKsS7VNZkjQKFxsKSreFKz7NhDBab18lvqAC/9
1z4bqdM4I82uaDdecHiZu8EgTKzDN8wqxYwXJ6RmAtHDXn9r2aXOIwH9+nMGxXQF
DDBgiFb49moK1owJ9UUO3n6GR5HPmmlhshS426uJiODTbI5KlX+68kYQsTpcuRch
CjNBPtUxeDvqK+FRb1jCftA43tcRtqhLYrQ3lr+V4/UqWNZzH0xkrCozg1aP7yg2
XBhw+OWqLm7GyH51IdpRDKQi1hgX9QVp9s6XLhXf7R/o2Lsbyfehe31pJcgcjMbc
2QJiurbSK9+a89bwAn2xozMDOcIXyYAQyS2IBUMuNtCVo6vtsqmtYU+UEKJzoKph
BlwlMqIQyuT0P+jPjy4lxHmskz6I8ToykRS39RVtflS8JPrSAcJ3VVJHnabQcwA7
L1qrDbvaQ+nhLLoX7+zi0yqbLbdD5L+6WXJDaFQsK4XtF0c+hxxvoKCPg3vWOqt3
vAHDSy5Q8s94lsOvzcXC
=aj6S
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
"I had thought that I was going to get away without a pull request this
cycle. There was a NFSv4 file locking problem that cropped up that I
tried to fix in the NFSv4 code alone, but that fix has turned out to
be problematic. These patches fix this in the correct way.
Note that this touches some NFSv4 code as well. Ordinarily I'd wait
for Trond to ACK this, but he's on holiday right now and the bug is
rather nasty. So I suggest we merge this and if he raises issues with
it we can sort it out when he gets back"
Acked-by: Bruce Fields <bfields@fieldses.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
[ +1 to this series fixing a 100% reproducible slab corruption +
general protection fault in my nfs-root test environment. - Dan ]
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* tag 'locks-v4.2-1' of git://git.samba.org/jlayton/linux:
locks: inline posix_lock_file_wait and flock_lock_file_wait
nfs4: have do_vfs_lock take an inode pointer
locks: new helpers - flock_lock_inode_wait and posix_lock_inode_wait
locks: have flock_lock_file take an inode pointer instead of a filp
Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVou0MAAoJEOvOhAQsB9HWrTcQAIptAvZQldKDjt0C6B9WNJlX
YJ/HsoFPv3LXizBSH1T8OhMiV04b5bQoTxVIbZwbYHr7DCCRNhswKz+Yn4EGCw5v
PiyihxWPPYeXO9mtxS1cmuiQ78GlUl0ilNPkDI4NsQSISgrz4hj5dr3dZLj2Wcv1
pZqux5rrgJf06aoFD/7avS+e31HoLvglTrFnu0jdBs3G/lrjk0bjLX3PXFi2ACT7
eD20rRskpgOIQBk6AjL4Pqq1fpcWIRNwF5CCdgg4SaDP5W3kFhpanRKfHtovYbQ4
AC79M8SwDRGphNu0QJRtESeM9vpFNx0/9i+sbXRfbgbr/mDPbUCSaKvCdqDXVFHN
gp7u6bQKx1KoDfWCFM9KTF7KBST0cSk1YMUtWWaWWZp0f8mih9elpWja8anGlHe1
PlDY/BKGMpbrac1oVSzBmiT50b73C2BOrt9zAi/IrEwnQzmi2lvtnmYFLNhJhjjM
aSntD0OmSD8sJcjB98a/cxOrKsFEpt93C/sGSvZ8M4MWOziQQUtY2ryhNdH0utmL
QYqDaqoyuImvtwEBvidgVdNbOonNpBHljiPBgkDAtv6aMzxlrWrGGMaZlrWi0FDZ
S+tllpwGDXfAshZOmE0yZEwpQL7vjb3y31G+TrwKKXJvsczRaK5o4Mc8LELxqS/n
oAbZOMdu8Zy3ugTa5NDF
=TcF6
-----END PGP SIGNATURE-----
Merge tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull final init.h/module.h code relocation from Paul Gortmaker:
"With the release of 4.2-rc2 done, we should not be seeing any new code
added that gets upset by this small code move, and we've banked yet
another complete week of testing with this move in place on top of
4.2-rc1 via linux-next to ensure that remained true.
Given that, I'd like to put it in now so that people formulating new
work for 4.3-rc1 will be exposed to the ever so slightly stricter (but
sensible) requirements wrt. whether they are needing init.h vs.
module.h macros, even if they are not using linux-next.
The diffstat of the move is slightly asymmetrical due to needing to
leave behind a couple #ifdef in the old location and add the same ones
to the new location, but other than that, it is a 1:1 move, complete
with the module_init/exit trailing semicolon that we can't fix. That
is, until/unless someone does a tree-wide sed fix of all the
approximately 800 currently in tree users relying on it"
* tag 'module-final-v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
module: relocate module_init from init.h to module.h
Since no longer limiting max_sectors to BLK_DEF_MAX_SECTORS (commit 34b48db66e),
data corruption may occur on ST380013AS drive configured on 82801JI (ICH10 Family)
SATA controller. This patch will allow the driver to limit max_sectors as before
# cat /sys/block/sdb/queue/max_sectors_kb
512
I was able to double the max_sectors_kb value up to 16384 on linux-4.2.0-rc2
before seeing corruption, but seems safer to use previous limit. Without this
patch max_sectors_kb will be 32767.
tj: Minor comment update.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.19 and later
Fixes: 34b48db66e ("block: remove artifical max_hw_sectors cap")
Some devices lose data on TRIM whether queued or not. This patch adds
a horkage to disable TRIM.
tj: Collapsed unnecessary if() nesting.
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.
Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
* Fix a regression in hid sensors suspend time as a result of adding runtime
pm. The normal flow of waking up devices in order to go into suspend
(given the devices are normally suspended when not reading) to a regression
in suspend time on some laptops (reports of an additional 8 seconds).
Fix this by checking to see if a user action resulting in the wake up, and
make it a null operation if it didn't. Note that for hid sensors, there is
nothing useful to be done when moving into a full suspend from a runtime
suspend so they might as well be left alone.
* rochip_saradc: fix some missing MODULE_* data including the licence so that
the driver does not taint the kernel incorrectly and can build as a module.
* twl4030 - mark irq as oneshot as it always should have been.
* inv-mpu - write formats for attributes not specified, leading to miss
interpretation of the gyro scale channel when written.
* Proximity ABI clarification. This had snuck through as a mess. Some
drivers thought proximity went in one direction, some the other. We went
with the most common option, documented it and fixed up the drivers going
the other way. Fix for sx9500 included in this set.
* ad624r - fix a wrong shift in the output data.
* at91_adc - remove a false limit on the value of the STARTUP register
applied by too small a type for the device tree parameter.
* cm3323 - clear the bits when setting the integration time (otherwise
we can only ever set more bits in the relevant field).
* bmc150-accel - multiple triggers are registered, but on error were not being
unwound in the opposite order leading to removal of triggers that had not
yet successfully been registered (count down instead of up when unwinding).
* tcs3414 - ensure right part of val / val2 pair read so that the integration
time is not always 0.
* cc10001_adc - bug in kconfig dependency. Use of OR when AND was intended.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJVpA2CAAoJEFSFNJnE9BaIJGEP/2uXFSqOZhmJ9sGqICSgKj4W
OUjDtpXpKWxLC3Fypzvzg5u00R1t2likfvhXP2RsF5+/mqPb8NS6qzdcKMPCyxEl
5tvsmYHm9yxN8o3ZZqtgYaR4mYt5tH4dFZ9qHpbFeNAXGjTRrYhGbgmLnND0v7cA
5L1ABLls8ntoJ1aZZDsofwnmnUgclW5yqQZZ+huNkwaOpUQLke9tEL18cv+bsVgS
zw7j/t3oJVdcUB9OTB7T/sW9J0+W7XnXogATHksHdHh2cd5N7wh/EZ1bet69QUrI
PD2q2+0MUwyBMWDPveyWfm7XbS66lYxIRCmWZp+69Q1c/V91srhSPfh0kPcvHSQ1
Uzpba6oSFPlFyDAtXWhaSEBzjXaHwKBIQvIVYOKiE6JdrbsnSg4GHAcF8TMhGtwT
SDDgfb+cxOm6Vb4ws0+i15HMEiXpeK8AiJfHmLvau3OnA69/xzxHqqSg/oCO6/ES
IzoAMqIVEk3L5gu88qgnmWzmyWp1pyTf1u+Kr+gAXNdSF/b3wgVkGn2X9hNRQ4g/
XEFAD1PzarBnv1ce/HfWi+9/aUwv08nXHxBe4Yx3bfle2RUQKmFSbksVkCwu+ha3
E4jPs5Cf9MrHO4gbuFaZX2+bFYlMdbcEWVRcP/3CHUuxeixGwemOsak1L+J5h6+e
2xVEkGeywVdmDOsan1B8
=mBqw
-----END PGP SIGNATURE-----
Merge tag 'iio-fixes-for-4.2a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First set of IIO fixes for the 4.2 cycle.
* Fix a regression in hid sensors suspend time as a result of adding runtime
pm. The normal flow of waking up devices in order to go into suspend
(given the devices are normally suspended when not reading) to a regression
in suspend time on some laptops (reports of an additional 8 seconds).
Fix this by checking to see if a user action resulting in the wake up, and
make it a null operation if it didn't. Note that for hid sensors, there is
nothing useful to be done when moving into a full suspend from a runtime
suspend so they might as well be left alone.
* rochip_saradc: fix some missing MODULE_* data including the licence so that
the driver does not taint the kernel incorrectly and can build as a module.
* twl4030 - mark irq as oneshot as it always should have been.
* inv-mpu - write formats for attributes not specified, leading to miss
interpretation of the gyro scale channel when written.
* Proximity ABI clarification. This had snuck through as a mess. Some
drivers thought proximity went in one direction, some the other. We went
with the most common option, documented it and fixed up the drivers going
the other way. Fix for sx9500 included in this set.
* ad624r - fix a wrong shift in the output data.
* at91_adc - remove a false limit on the value of the STARTUP register
applied by too small a type for the device tree parameter.
* cm3323 - clear the bits when setting the integration time (otherwise
we can only ever set more bits in the relevant field).
* bmc150-accel - multiple triggers are registered, but on error were not being
unwound in the opposite order leading to removal of triggers that had not
yet successfully been registered (count down instead of up when unwinding).
* tcs3414 - ensure right part of val / val2 pair read so that the integration
time is not always 0.
* cc10001_adc - bug in kconfig dependency. Use of OR when AND was intended.
Pull networking fixes from David Miller:
1) Missing list head init in bluetooth hidp session creation, from Tedd
Ho-Jeong An.
2) Don't leak SKB in bridge netfilter error paths, from Florian
Westphal.
3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
Grall.
4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
Baechle.
5) Fix race between rhashtable resize events and table walks, from Phil
Sutter.
6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
Daniel Borkmann.
7) Missing security layer socket state initialization in tipc code,
from Stephen Smalley.
8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
Denys Vlasenko.
9) Missing minor_idr destroy on module unload on macvtap driver, from
Johannes Thumshirn.
10) Various pktgen kernel thread races, from Oleg Nesterov.
11) Fix races that can cause packets to be processed in the backlog even
after a device attached to that SKB has been fully unregistered.
From Julian Anastasov.
12) bcmgenet driver doesn't account packet drops vs. errors properly,
fix from Petri Gynther.
13) Array index validation and off by one fix in DSA layer from Florian
Fainelli
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
can: replace timestamp as unique skb attribute
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
can: c_can: Fix default pinmux glitch at init
can: rcar_can: unify error messages
can: rcar_can: print request_irq() error code
can: rcar_can: fix typo in error message
can: rcar_can: print signed IRQ #
can: rcar_can: fix IRQ check
net: dsa: Fix off-by-one in switch address parsing
net: dsa: Test array index before use
net: switchdev: don't abort unsupported operations
net: bcmgenet: fix accounting of packet drops vs errors
cdc_ncm: update specs URL
Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
bridge: mdb: allow the user to delete mdb entry if there's a querier
net: call rcu_read_lock early in process_backlog
net: do not process device backlog during unregistration
bridge: fix potential crash in __netdev_pick_tx()
net: axienet: Fix devm_ioremap_resource return value check
...
They just call file_inode and then the corresponding *_inode_file_wait
function. Just make them static inlines instead.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Allow callers to pass in an inode instead of a filp.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
Commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pull timer fixes from Thomas Gleixner:
"This update from the timer departement contains:
- A series of patches which address a shortcoming in the tick
broadcast code.
If the broadcast device is not available or an hrtimer emulated
broadcast device, some of the original assumptions lead to boot
failures. I rather plugged all of the corner cases instead of only
addressing the issue reported, so the change got a little larger.
Has been extensivly tested on x86 and arm.
- Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
- A regression fix for the imx clocksource driver
- An update to the new state callbacks mechanism for clockevents.
This is required to simplify the conversion, which will take place
in 4.3"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Prevent NULL pointer dereference
time: Get rid of do_posix_clock_monotonic_gettime
cris: Replace do_posix_clock_monotonic_gettime()
tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
tick/broadcast: Handle spurious interrupts gracefully
tick/broadcast: Check for hrtimer broadcast active early
tick/broadcast: Return busy when IPI is pending
tick/broadcast: Return busy if periodic mode and hrtimer broadcast
tick/broadcast: Move the check for periodic mode inside state handling
tick/broadcast: Prevent deep idle if no broadcast device available
tick/broadcast: Make idle check independent from mode and config
tick/broadcast: Sanity check the shutdown of the local clock_event
tick/broadcast: Prevent hrtimer recursion
clockevents: Allow set-state callbacks to be optional
clocksource/imx: Define clocksource for mx27
Pull irq fix from Thomas Gleixner:
"A single fix for a cpu hotplug race vs. interrupt descriptors:
Prevent irq setup/teardown across the cpu starting/dying parts of cpu
hotplug so that the starting/dying cpu has a stable view of the
descriptor space. This has been an issue for all architectures in the
cpu dying phase, where interrupts are migrated away from the dying
cpu. In the starting phase its mostly a x86 issue vs the vector space
update"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hotplug: Prevent alloc/free of irq descriptors during cpu up/down
Pull libnvdimm fixes from Dan Williams:
"1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
bug fixes (patches 1-6)
2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).
Granted these are slightly large for a -rc update. They have been
out for review in one form or another since the end of May and were
deferred from the merge window while we settled on the "PMEM API"
for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
and wmb_pmem).
Now that those apis are merged we implement them in the BLK driver
to guarantee that mmio aperture moves stay ordered with respect to
incoming read/write requests, and that writes are flushed through
those mmio-windows and platform-buffers to be persistent on media.
These pass the sub-system unit tests with the updates to
tools/testing/nvdimm, and have received a successful build-report from
the kbuild robot (468 configs).
With acks from Rafael for the touches to drivers/acpi/"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
nfit: add support for NVDIMM "latch" flag
nfit: update block I/O path to use PMEM API
tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
tools/testing/nvdimm: fix return code for unimplemented commands
tools/testing/nvdimm: mock ioremap_wt
pmem: add maintainer for include/linux/pmem.h
nfit: fix smatch "use after null check" report
nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
libnvdimm: smatch cleanups in __nd_ioctl
sparse: fix misplaced __pmem definition
A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
prima2 as well as a few arm64-specific DT fixes.
This series also includes a late to support a new Allwinner (sunxi)
SoC, but since it's rather simple and isolated to the
platform-specific code, it's included it for this -rc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVoCLHAAoJEFk3GJrT+8ZlDcQP/jVIDk0MvuvfeIsbgWw4Bhys
+ISmgdSTRwSAaI9oHp3ApNSOmq7QspqORdYsZinR6+Em1Seul5vvT9BN9bYAs4fP
Uefvcyo9YSgiKQCLVbOkWnp1pJIPq7BKSvfNco159N4vi6RX+4A4XRrHhEbdLkGa
OhKDnrh0TmbM5b2RkLXlMZR1vsBYEeKxpUlBe3FhKnXYo16yP9Aix2q6oMJBuf99
1kKNfp0DlGhBwkH+nqbUCgNi8OShFcIBrtR1X4fg7LjANEVNvE1Rv0yAJDzsz5hd
g8v2xWaB+ONY08c4NelMLu0ZpspMV+fmeDmTuYpvOEPSYWvGamqEZUsdFMe1Vurm
yqxIoMHSG71dW4SK35QtuvB5LJ/QPytaXidTBU4noFzTVqGaAsvZDHjbRh3YbFm1
3mB+l2oiWtS1zTOjNLK7fGpyWMZ5OKtKdIxMrDPdWR+IHQy7RDGomMIyT1KenrgF
FO2a/1l4CDHumWFAiDx/vyfAm/KSO9uB8p5XTNIdVqge+uT3dVpmwpVSrl9IGDZy
n0YCpqN94lqRR8tEZ+vzyK/zbaUN50t0xOIj3wQKqRUxQxWG//wuX8m1pYW3iJtB
q/CbgladY4jYcEZFcKoeddBBVzI7E/ntPfL54O36Ubv/BA3dSUZ8RHmYi97OzhLf
YuvxhEnO0zwLHvghibZk
=UEUK
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Kevin Hilman:
"A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
prima2 as well as a few arm64-specific DT fixes.
This series also includes a late to support a new Allwinner (sunxi)
SoC, but since it's rather simple and isolated to the
platform-specific code, it's included it for this -rc"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
arm: dts: vexpress: add missing CCI PMU device node to TC2
arm: dts: vexpress: describe all PMUs in TC2 dts
GICv3: Add ITS entry to THUNDER dts
arm64: dts: Add poweroff button device node for APM X-Gene platform
ARM: dts: am4372.dtsi: disable rfbi
ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
ARM: dts: am4372: Add emif node
Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
ARM: sunxi: Enable simplefb in the defconfig
ARM: Remove deprecated symbol from defconfig files
ARM: sunxi: Add Machine support for A33
ARM: sunxi: Introduce Allwinner H3 support
Documentation: sunxi: Update Allwinner SoC documentation
ARM: prima2: move to use REGMAP APIs for rtciobrg
ARM: dts: atlas7: add pinctrl and gpio descriptions
ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
memory: omap-gpmc: Fix parsing of devices
If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.
This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).
V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB
Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e48453c386 ("block, cgroup: implement policy-specific per-blkcg
data") updated per-blkcg policy data to be dynamically allocated.
When a policy is registered, its policy data aren't created. Instead,
when the policy is activated on a queue, the policy data are allocated
if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
This is buggy. Consider the following scenario.
1. A blkcg is created. No blkg's attached yet.
2. The policy is registered. No policy data is allocated.
3. The policy is activated on a queue. As the above blkcg doesn't
have any blkg's, it won't allocate the matching blkcg_policy_data.
4. An IO is issued from the blkcg and blkg is created and the blkcg
still doesn't have the matching policy data allocated.
With cfq-iosched, this leads to an oops.
It also doesn't free policy data on policy unregistration assuming
that freeing of all policy data on blkcg destruction should take care
of it; however, this also is incorrect.
1. A blkcg has policy data.
2. The policy gets unregistered but the policy data remains.
3. Another policy gets registered on the same slot.
4. Later, the new policy tries to allocate policy data on the previous
blkcg but the slot is already occupied and gets skipped. The
policy ends up operating on the policy data of the previous policy.
There's no reason to manage blkcg_policy_data lazily. The reason we
do lazy allocation of blkg's is that the number of all possible blkg's
is the product of cgroups and block devices which can reach a
surprising level. blkcg_policy_data is contrained by the number of
cgroups and shouldn't be a problem.
This patch makes blkcg_policy_data to be allocated for all existing
blkcg's on policy registration and freed on unregistration and removes
blkcg_policy_data handling from policy [de]activation paths. This
makes that blkcg_policy_data are created and removed with the policy
they belong to and fixes the above described problems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e48453c386 ("block, cgroup: implement policy-specific per-blkcg data")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add all_blkcgs list goes through blkcg->all_blkcgs_node and is
protected by blkcg_pol_mutex. This will be used to fix
blkcg_policy_data allocation bug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull Ceph fixes from Sage Weil:
"There is a fix for CephFS and RBD when used within containers/namespaces,
and a fix for the address learning the client is supposed to do when
initially talking to the Ceph cluster.
There are also two patches updating MAINTAINERS. One breaks out the
common Ceph code shared by fs/ceph and drivers/block/rbd.c into a
separate entry with the appropriate maintainers listed. The second
adds a second reference to the github tree where the Ceph client
development takes place (before it is pushed to korg and then to you).
The goal here is to move closer to a situation where Ilya Dryomov or
one of the other maintainers can push things to you if I am
unavailable. Ilya has done most of the work preparing branches for
upstream recently; you should not be surprised to hear from him if I
am trapped in some internet-less wasteland or hit by a bus or
something. In the meantime, we'll work on getting him added to the
kernel web of trust"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
MAINTAINERS: add secondary tree for ceph modules
MAINTAINERS: update ceph entries
libceph: treat sockaddr_storage with uninitialized family as blank
libceph: enable ceph in a non-default network namespace
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net. Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.
This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
- Fix for an ACPI resources management regression introduced
during the 4.1 cycle (that unfortunately went into -stable)
effectively reverting the bad commit along with the recent
fixups on top of it and using an alternative approach to
address the underlying issue (Rafael J Wysocki).
- Fix for a memory leak and an incorrect return value in an
error code path in the ACPI LPSS (Low-Power Subsystem) driver
(Rafael J Wysocki).
- Fix for a leftover dangling pointer in an error code path in
the new wakeup IRQ support code (Rafael J Wysocki).
- Fix to prevent infinite loops (due to errors in other places)
from happening in the core generic PM domains support code
(Geert Uytterhoeven).
- Hibernation documentation update/clarification (Uwe Geuder).
- Support for _CLS-based device enumeration in the ACPI core
and in the ATA subsystem (Suravee Suthikulpanit).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJVnZFwAAoJEILEb/54YlRxuysQALZ3y6QAtVFWurwOLXg0Zwoy
qoBWlfCRCEKsiLQNe1sDK+dqmfi+ZXvsEqm4y/4uU8dsJdTv1MByaX0aQxP0qmOj
Lre9AGNq28JxtTmU2o00qjLW9d9WIQKJn1N51+qqGKTVoqijmEcVlz5xVnqoJmBQ
s0Af2kAzG7CLFtNzPI6RbOPMm7S0VEiI/WQhKZXWqGheppfDaRG+wl0pKPv95G73
7wCG+yklSi4j5u2Aox/x1samOy+c+S/l2VgU+Decnv0ccPlChYBCMOzWSJuFn5v7
qVEvUrFYc2ItQElzaqTWc1uEPzf0yg/S4pWJjLBADpDAUBQNXExmaQ2wLlzgR00/
K/cGUqCn0APcw4tjZjVlGvTHmJg6ivdeceQuLdgxGoGgvph9V9JlHD9zmzFLqAXl
3VCJiuHnT9QyfEcAkOhZUvO5WcX892fNpHYe/riLQOll9ODwoWCr+TofGfIvHKZp
1XQ1r68rBpsZnnb5JpYT8F/LgaKa4nWZboLroFWi9SZ0jYdH4U7k88C2bCYpaZGz
S07l4GEICSuTPHpQNzshzYw6lWJqoXPFQ5okqnZxwNDWAE30ZII0LQwasuupmuR4
S1fFi30YTI23tiDEZRWgVWbMPD++jZdQoflBZq5Lo8s4EfPD5yS+LtxACzktDlM0
fq6bcUUO85Weo+zNxEa3
=qomE
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"These are fixes on top of the previous PM+ACPI pull requests
(including one fix for a 4.1 regression) and two commits adding
_CLS-based device enumeration support to the ACPI core and the ATA
subsystem that waited for the latest ACPICA changes to be merged.
Specifics:
- Fix for an ACPI resources management regression introduced during
the 4.1 cycle (that unfortunately went into -stable) effectively
reverting the bad commit along with the recent fixups on top of it
and using an alternative approach to address the underlying issue
(Rafael J Wysocki).
- Fix for a memory leak and an incorrect return value in an error
code path in the ACPI LPSS (Low-Power Subsystem) driver (Rafael J
Wysocki).
- Fix for a leftover dangling pointer in an error code path in the
new wakeup IRQ support code (Rafael J Wysocki).
- Fix to prevent infinite loops (due to errors in other places) from
happening in the core generic PM domains support code (Geert
Uytterhoeven).
- Hibernation documentation update/clarification (Uwe Geuder).
- Support for _CLS-based device enumeration in the ACPI core and in
the ATA subsystem (Suravee Suthikulpanit)"
* tag 'pm+acpi-4.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / wakeirq: Avoid setting power.wakeirq too hastily
ata: ahci_platform: Add ACPI _CLS matching
ACPI / scan: Add support for ACPI _CLS device matching
PM / hibernate: clarify resume documentation
PM / Domains: Avoid infinite loops in attach/detach code
ACPI / LPSS: Fix up acpi_lpss_create_device()
ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage
this moves to general APIs, and all drivers will be changed based
on it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJVd+OkAAoJEDIv4aC191RhuWkP/2LzJBRoBP1WJmTOvR0Kofl7
4PG5Xi/+1NtSHubvTSJJX2kZ9I9FHemQAJba8pqZ0pgZkb+7t0dA2+G1mDi8SOr8
VyWlMotI8Im0PoUHXmt5GPHEI4E6WqyX1TQFUD4JM/Ln07mevH/ocjd94LN79eO6
tc1z8AEw1/d5MCcEI34FP+vNXyLOE/9hM7IsmBtf4Niq7VrO8SWS5y1tJBOz+d2v
HmZZYEP/WS2mX8E77cz89pf0+BkmhF6nj1zoFCmlLVxCXrlKncs+Uvnsb+n/k1BI
gA4UNOfSoA+k+h+1F/7BhyEABqYkEuBDYs9m5NsypL+WITaVnI9pmNWNAA7IZNTg
dEYoVKqpJX3lVPWJ0i0hvgK1xXNMKlbwFb6iV80upvUVNfzZs9JveSL97Iiy9WjI
LlRdCTlteBvgsNBJyVG//788VU5+dr8HXV1UCfm+drs3H43TudCnJ8AhF3JaToO+
wv045McJTUL/CI+WQiXGNlAoXDjprgjLSiXRyEKF1xNsem3u78Z5bpjGX5crpzgC
HIdnsxb39jCHCNVJLuK/eCJ9Oocpo9LdKU0wSqIx0GNAx+xdoPo5H8jmlE4bnSWo
9f/9784A1SI/tpc0mKRz/z3FEKiP0i3CTJfcmC9EAd4LWFYk54WgtlzPztRkE625
YQMHjM+cavYWpzzgnQ+B
=Beg0
-----END PGP SIGNATURE-----
Merge tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux into fixes
Merge "CSR SiRFSoC rtc iobrg move to regmap for 4.2" from Barry Song:
move CSR rtc iobrg read/write API to be regmap
this moves to general APIs, and all drivers will be changed based
on it.
* tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux:
ARM: prima2: move to use REGMAP APIs for rtciobrg
When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.
When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.
Example:
CPU1 CPU2
cpu_up/down
irq_desc = irq_to_desc(irq);
remove_from_radix_tree(desc);
raw_spin_lock(&desc->lock);
free(desc);
We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:
Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de