The bfq_find_set_group() function takes as input a blkcg (which represents
a cgroup) and retrieves the corresponding bfq_group, then it updates the
bfq internal group hierarchy (see comments inside the function for why
this is needed) and finally it returns the bfq_group.
In the hierarchy update cycle, the pointer holding the correct bfq_group
that has to be returned is mistakenly used to traverse the hierarchy
bottom to top, meaning that in each iteration it gets overwritten with the
parent of the current group. Since the update cycle stops at root's
children (depth = 2), the overwrite becomes a problem only if the blkcg
describes a cgroup at a hierarchy level deeper than that (depth > 2). In
this case the root's child that happens to be also an ancestor of the
correct bfq_group is returned. The main consequence is that processes
contained in a cgroup at depth greater than 2 are wrongly placed in the
group described above by BFQ.
This commits fixes this problem by using a different bfq_group pointer in
the update cycle in order to avoid the overwrite of the variable holding
the original group reference.
Reported-by: Kwon Je Oh <kwonje.oh2@gmail.com>
Signed-off-by: Carlo Nonato <carlo.nonato95@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the firmware is in a bad state or not initialized fully, sending
the DBGC_SUSPEND_RESUME command fails but we can still collect logs.
Instead of aborting the entire dump process, simply ignore the error.
By removing the last callpoint that was checking the return value, we
can also convert the function to return void.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Fixes: 576058330f ("iwlwifi: dbg: support debug recording suspend resume command")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200306151129.dcec37b2efd4.I8dcd190431d110a6a0e88095ce93591ccfb3d78d@changeid
The TLV offset is only used to read registers, while the offset used for
the FIFO addresses are hard coded in the driver and not given by the
TLV.
If we try to apply the TLV offset when reading the FIFOs, we'll read
from invalid addresses, causing the driver to hang.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Fixes: 8d7dea25ad ("iwlwifi: dbg_ini: implement Rx fifos dump")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200306151129.fbab869c26fa.I4ddac20d02f9bce41855a816aa6855c89bc3874e@changeid
We were erroneously checking the length of the tlv instead of checking
the pointer returned by kmemdup() when allocating dbg_conf_tlv[].
This was probably a typo. Fix it by checking the returned pointer
instead of the length.
Reported-by: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200306151128.06e00e6e980f.I9a890ce83493b79892a5f690d12016525317fa7e@changeid
The AP may set the LDPC capability only in HE (IEEE80211_HE_PHY_CAP1),
but we were checking it only in the HT capabilities.
If we don't use this capability when required, the DSP gets the wrong
configuration in HE and doesn't work properly.
Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com>
Fixes: befebbb30a ("iwlwifi: rs: consider LDPC capability in case of HE")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200306151128.492d167c1a25.I1ad1353dbbf6c99ae57814be750f41a1c9f7f4ac@changeid
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description
mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
mm/z3fold.c: do not include rwlock.h directly
fat: fix uninit-memory access for partial initialized inode
mm: avoid data corruption on CoW fault into PFN-mapped VMA
mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
The existing tests attempt to check that JMP32 JSET ignores the upper
bits in the operand registers. However, the tests missed one such bug in
the x32 JIT that is only uncovered when a previous instruction pollutes
the upper 32 bits of the registers.
This patch adds a new test case that catches the bug by first executing
a 64-bit JSET to pollute the upper 32-bits of the temporary registers,
followed by a 32-bit JSET which should ignore the upper 32 bits.
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200305234416.31597-2-luke.r.nels@gmail.com
The current x32 BPF JIT is incorrect for JMP32 JSET BPF_X when the upper
32 bits of operand registers are non-zero in certain situations.
The problem is in the following code:
case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X:
...
/* and dreg_lo,sreg_lo */
EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
/* and dreg_hi,sreg_hi */
EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
/* or dreg_lo,dreg_hi */
EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
This code checks the upper bits of the operand registers regardless if
the BPF instruction is BPF_JMP32 or BPF_JMP64. Registers dreg_hi and
dreg_lo are not loaded from the stack for BPF_JMP32, however, they can
still be polluted with values from previous instructions.
The following BPF program demonstrates the bug. The jset64 instruction
loads the temporary registers and performs the jump, since ((u64)r7 &
(u64)r8) is non-zero. The jset32 should _not_ be taken, as the lower
32 bits are all zero, however, the current JIT will take the branch due
the pollution of temporary registers from the earlier jset64.
mov64 r0, 0
ld64 r7, 0x8000000000000000
ld64 r8, 0x8000000000000000
jset64 r7, r8, 1
exit
jset32 r7, r8, 1
mov64 r0, 2
exit
The expected return value of this program is 2; under the buggy x32 JIT
it returns 0. The fix is to skip using the upper 32 bits for jset32 and
compare the upper 32 bits for jset64 only.
All tests in test_bpf.ko and selftests/bpf/test_verifier continue to
pass with this change.
We found this bug using our automated verification tool, Serval.
Fixes: 69f827eb6e ("x32: bpf: implement jitting of JMP32")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200305234416.31597-1-luke.r.nels@gmail.com
Since commit 3bc3206e1c ("serial: fsl_lpuart: Remove the alias node
dependence") the port line number can also be allocated by IDA, but in
case of an error the ID will no be removed again. More importantly, any
ID will be freed in remove(), even if it wasn't allocated but instead
fetched by of_alias_get_id(). If it was not allocated by IDA there will
be a warning:
WARN(1, "ida_free called for id=%d which is not allocated.\n", id);
Move the ID allocation more to the end of the probe() so that we still
can use plain return in the first error cases.
Fixes: 3bc3206e1c ("serial: fsl_lpuart: Remove the alias node dependence")
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200303174306.6015-3-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit a659652f61.
This broke the earlycon on LS1021A processors because the order of the
earlycon_setup() functions were changed. Before the commit the normal
lpuart32_early_console_setup() was called. After the commit the
lpuart32_imx_early_console_setup() is called instead.
Fixes: a659652f61 ("tty: serial: fsl_lpuart: drop EARLYCON_DECLARE")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20200303174306.6015-2-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Apple devices the _CRS method returns an empty resource template, and
the resource settings are instead provided by the _DSM method. But
commit 33364d63c7 (serdev: Add ACPI
devices by ResourceSource field) changed the search for serdev devices
to require valid, non-empty resource template, thereby breaking Apple
devices and causing bluetooth devices to not be found.
This expands the check so that if we don't find a valid template, and
we're on an Apple machine, then just check for the device being an
immediate child of the controller and having a "baud" property.
Cc: <stable@vger.kernel.org> # 5.5
Fixes: 33364d63c7 ("serdev: Add ACPI devices by ResourceSource field")
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Link: https://lore.kernel.org/r/20200211194723.486217-1-ronald@innovation.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
save_stack_trace_tsk_reliable() is not the only function providing the
reliable stack traces anymore. Architecture might define ARCH_STACKWALK
which provides a newer stack walking interface and has
arch_stack_walk_reliable() function. Update the description accordingly.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/20200120154042.9934-1-mbenes@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit cd02cf1ace ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
fixed memory hotplug with debug_pagealloc enabled, where onlining a page
goes through page freeing, which removes the direct mapping. Some arches
don't like when the page is not mapped in the first place, so
generic_online_page() maps it first. This is somewhat wasteful, but
better than special casing page freeing fast paths.
The commit however missed that DEBUG_PAGEALLOC configured doesn't mean
it's actually enabled. One has to test debug_pagealloc_enabled() since
031bc5743f ("mm/debug-pagealloc: make debug-pagealloc boottime
configurable"), or alternatively debug_pagealloc_enabled_static() since
8e57f8acbb ("mm, debug_pagealloc: don't rely on static keys too early"),
but this is not done.
As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled
will crash:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000001ece13400b R2:000003fff7fd000b R3:000003fff7fcc007 S:000003fff7fd7000 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased)
Krnl PSW : 0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000800 0000400b00000000 0000000000000100
0000000000000001 0000000000000000 0000000000000002 0000000000000100
0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000
000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20
Krnl Code: 0000001ecd281b8e: ec17ffff00d8 ahik %r1,%r7,-1
0000001ecd281b94: ec111dbc0355 risbg %r1,%r1,29,188,3
>0000001ecd281b9e: 94fb5006 ni 6(%r5),251
0000001ecd281ba2: 41505008 la %r5,8(%r5)
0000001ecd281ba6: ec51fffc6064 cgrj %r5,%r1,6,1ecd281b9e
0000001ecd281bac: 1a07 ar %r0,%r7
0000001ecd281bae: ec03ff584076 crj %r0,%r3,4,1ecd281a5e
Call Trace:
[<0000001ecd281b9e>] __kernel_map_pages+0x166/0x188
[<0000001ecd4d9516>] online_pages_range+0xf6/0x128
[<0000001ecd2a8186>] walk_system_ram_range+0x7e/0xd8
[<0000001ecda28aae>] online_pages+0x2fe/0x3f0
[<0000001ecd7d02a6>] memory_subsys_online+0x8e/0xc0
[<0000001ecd7add42>] device_online+0x5a/0xc8
[<0000001ecd7d0430>] state_store+0x88/0x118
[<0000001ecd5b9f62>] kernfs_fop_write+0xc2/0x200
[<0000001ecd5064b6>] vfs_write+0x176/0x1e0
[<0000001ecd50676a>] ksys_write+0xa2/0x100
[<0000001ecda315d4>] system_call+0xd8/0x2c8
Fix this by checking debug_pagealloc_enabled_static() before calling
kernel_map_pages(). Backports for kernel before 5.5 should use
debug_pagealloc_enabled() instead. Also add comments.
Fixes: cd02cf1ace ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When get an error in the middle of reading an inode, some fields in the
inode might be still not initialized. And then the evict_inode path may
access those fields via iput().
To fix, this makes sure that inode fields are initialized.
Reported-by: syzbot+9d82b8de2992579da5d0@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/871rqnreqx.fsf@mail.parknet.co.jp
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Moyer has reported that one of xfstests triggers a warning when run
on DAX-enabled filesystem:
WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
...
wp_page_copy+0x98c/0xd50 (unreliable)
do_wp_page+0xd8/0xad0
__handle_mm_fault+0x748/0x1b90
handle_mm_fault+0x120/0x1f0
__do_page_fault+0x240/0xd70
do_page_fault+0x38/0xd0
handle_page_fault+0x10/0x30
The warning happens on failed __copy_from_user_inatomic() which tries to
copy data into a CoW page.
This happens because of race between MADV_DONTNEED and CoW page fault:
CPU0 CPU1
handle_mm_fault()
do_wp_page()
wp_page_copy()
do_wp_page()
madvise(MADV_DONTNEED)
zap_page_range()
zap_pte_range()
ptep_get_and_clear_full()
<TLB flush>
__copy_from_user_inatomic()
sees empty PTE and fails
WARN_ON_ONCE(1)
clear_page()
The solution is to re-try __copy_from_user_inatomic() under PTL after
checking that PTE is matches the orig_pte.
The second copy attempt can still fail, like due to non-readable PTE, but
there's nothing reasonable we can do about, except clearing the CoW page.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Justin He <Justin.He@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In set_pmd_migration_entry(), pmdp_invalidate() is used to change PMD
atomically. But the PMD is read before that with an ordinary memory
reading. If the THP (transparent huge page) is written between the PMD
reading and pmdp_invalidate(), the PMD dirty bit may be lost, and cause
data corruption. The race window is quite small, but still possible in
theory, so need to be fixed.
The race is fixed via using the return value of pmdp_invalidate() to get
the original content of PMD, which is a read/modify/write atomic
operation. So no THP writing can occur in between.
The race has been introduced when the THP migration support is added in
the commit 616b837153 ("mm: thp: enable thp migration in generic path").
But this fix depends on the commit d52605d7cb ("mm: do not lose dirty
and accessed bits in pmdp_invalidate()"). So it's easy to be backported
after v4.16. But the race window is really small, so it may be fine not
to backport the fix at all.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/20200220075220.2327056-1-ying.huang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
: A user reported a bug against a distribution kernel while running a
: proprietary workload described as "memory intensive that is not swapping"
: that is expected to apply to mainline kernels. The workload is
: read/write/modifying ranges of memory and checking the contents. They
: reported that within a few hours that a bad PMD would be reported followed
: by a memory corruption where expected data was all zeros. A partial
: report of the bad PMD looked like
:
: [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2)
: [ 5195.341184] ------------[ cut here ]------------
: [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35!
: ....
: [ 5195.410033] Call Trace:
: [ 5195.410471] [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930
: [ 5195.410716] [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
: [ 5195.410918] [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
: [ 5195.411200] [<ffffffff81098322>] task_work_run+0x72/0x90
: [ 5195.411246] [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2
: [ 5195.411494] [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40
: [ 5195.411739] [<ffffffff815e56af>] retint_user+0x8/0x10
:
: Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD
: was a false detection. The bug does not trigger if automatic NUMA
: balancing or transparent huge pages is disabled.
:
: The bug is due a race in change_pmd_range between a pmd_trans_huge and
: pmd_nond_or_clear_bad check without any locks held. During the
: pmd_trans_huge check, a parallel protection update under lock can have
: cleared the PMD and filled it with a prot_numa entry between the transhuge
: check and the pmd_none_or_clear_bad check.
:
: While this could be fixed with heavy locking, it's only necessary to make
: a copy of the PMD on the stack during change_pmd_range and avoid races. A
: new helper is created for this as the check if quite subtle and the
: existing similar helpful is not suitable. This passed 154 hours of
: testing (usually triggers between 20 minutes and 24 hours) without
: detecting bad PMDs or corruption. A basic test of an autonuma-intensive
: workload showed no significant change in behaviour.
Although Mel withdrew the patch on the face of LKML comment
https://lkml.org/lkml/2017/4/10/922 the race window aforementioned is
still open, and we have reports of Linpack test reporting bad residuals
after the bad PMD warning is observed. In addition to that, bad
rss-counter and non-zero pgtables assertions are triggered on mm teardown
for the task hitting the bad PMD.
host kernel: mm/pgtable-generic.c:40: bad pmd 00000000b3152f68(8000000d2d2008e7)
....
host kernel: BUG: Bad rss-counter state mm:00000000b583043d idx:1 val:512
host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096
The issue is observed on a v4.18-based distribution kernel, but the race
window is expected to be applicable to mainline kernels, as well.
[akpm@linux-foundation.org: fix comment typo, per Rafael]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/20200216191800.22423-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert a problematic commit from the 5.3 development
cycle (Brendan Higgins).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl5iIf0SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxFMEQAIF33vTkF9Jaj2ZMqLr3L0dSmtnvxvxk
iYGpUIkFA9rIOrh+tBsgS3CekyiBAVNU+4Rmv5kWjVVwFNIRjJbFHK3CNF6IcKrN
/MGk4VJJwSjify0XjuZVFL3rQSURKkLAnyOug7oW7oorpxCVEGeIS9KCtQ9gsonb
NrlH/I43cM063LGz3ilBh80A6NvvzMSgkmoMyODSA8faVvrR7D87NkcGTdPge2q/
3qrI+kkBdmchYR/hLlZtC2HAEv2I+B49ewec9BN4cB8LCDxe68psEyLfxGTdNIWh
SNXL9DSboF8IbBKkg6CHAXztqUKJggIrjjvFUEOCvtf7oAKCYqshDw69FwnaFsTr
+V4jaBquqmvMb1saKUaB4Oh+puHpQRKWjXjgF6OSJzbxGA0WzxG3LPZoWKIlj9P6
Oz5f1QrE4RzVoQ0pA6vUTWWFPnPgo78KcmWoGzQlCnu4iLBYM3VdaBmU5a0gvWVB
24807O0/OzfQKcA1e5iHu7b0rgNaeJqCq0df60OPTtZq1OvYbp1xG8ogckMdsOmE
FQHvB92wVNyeQC2/urRrlF5VHxBmw3+O0G9diL5UFlH7haXvowGb8PlI1HNwEzMF
YTleQfdaKjI2GW9+56y5oKwAhAmHPUHkpPY19wj1zgPdDqaDuecLNxlfEwkYO3tY
g3HRhfyQcmJ0
=knd+
-----END PGP SIGNATURE-----
Merge tag 'devprop-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"Revert a problematic commit from the 5.3 development cycle (Brendan
Higgins)"
* tag 'devprop-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "software node: Simplify software_node_release() function"
Fix Sphinx format warinings in an ACPI fan document added
recently (Randy Dunlap).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl5iIZASHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxxfsQAI+MIL4xLajSLgxu+2vUPC6MZi2zwyM2
Gg8W1IquAVhcTO9RS1JgrLDeVPAE/CFWXGsm1S9V8zvlZMw1vtmwsp94beuKjJuS
SOS6YsQKBVsrT2oB3gTtJAN2098QDrTZcb7a0D1cmnHNpE5gmjau8YmnS/VNjHsn
QXmH/qDnrgr/J9JI3xc+3bALC+QetPjHZOKGJJG/AmH3bG8PZ1flEoCG6lvFBDWw
obIgfmOB2uU54SFbS9LMAWjN13XA7D8RV910J0SEyYmhzu4z3nix3TKyQPJnTn8W
mNKeTG3qX+Fim3FTKSfc0xhg2yO0pNXtohc+a3HGfZpWSi8Ek6VhVmv6A2WOajet
3X7NmhfhO4Q8sEcz2rrc7Lh2N/fjWXGXua8R68KLBtQ/X1xTnZx+0UmsNwFMLN1a
sg3aclKnsfOtn+OX3NXrbm+J1NA3SXZ/5MTdtUsHzac0t2/N8xMaKK0fXNElF1e4
n1bQhIMh/YnN/5FA304mGVqUZlCdZV+1AU7+Z/uRHS5TrVyNxPt5YFPugcGNBpX+
Yj7R9iq4OspTa+fZn7Hbe4TQSqPCok1gK7ZMew4sAyaqNZoENKOkuGgPMc1Za1nb
1Qi7gyinjz22PVL8TDvETyR6vpydBp+V0ZpcxgVdRykzms4/40AWdc7YGSD3gPIq
UwYqASaJdyR6
=OM5C
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI documentation fix from Rafael Wysocki:
"Fix Sphinx format warinings in an ACPI fan document added recently
(Randy Dunlap)"
* tag 'acpi-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation/admin-guide/acpi: fix fan_performance_states.rst warnings
dma-buf:
- fix memory leak
core:
- shmem object mmap fix.
ttm:
- Fix fence leak in ttm_buffer_object_transfer().
amdgpu:
- Gfx reset fix for gfx9, 10
- Fix for gfx10
- DP MST fix
- DCC fix
- Renoir power fixes
- Navi power fix
i915:
- Break up long lists of object reclaim with cond_resched()
- PSR probe fix
- TGL workarounds
- Selftest return value fix
- Drop timeline mutex while waiting for retirement
- Wait for OA configuration completion before writes to OA buffer
virtio:
- Fix resource id creation race in virtio.
- mmap fixes
sun4i:
- Fixes for sun4i VI layer format support.
kirin:
- kirin: Revert "Fix for hikey620 display offset problem"
exynos:
- fix a kernel oops problem in case that driver is loaded as module.
- fix a regulator warning issue when I2C DDC adapter cannot be gathered.
- print out an error message only in error case excepting -EPROBE_DEFER.
mediatek:
- overlay, cursor and gce fixes.
`
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJeYbXXAAoJEAx081l5xIa+QtoQAJNMM9HBThWZjX+j/+LX1MUg
xW6ZqOmYjL/FfViSEpASLJuIz9+JF6/FK8RtuzbcsoPesLhfGtsQiM83FN+6KKs8
4kUa+ahh4+xuz4Q1mPJWH4kb0NpmHhNbuN1z/6fQwwYUYS3ltL21vR/6KE8x2FXu
rOmrcTVjiMvDjc0d8uN0GBEBON7Sf7EG2wZ0EN2uNR0lEzBSYczXRioDBC9Pj/Va
MGc4WzULJppTyvbUxnbtHYKXZfOHvW5pcKAR3utJQi1oJRKmNjm76lzNr5r4t7v3
IEhpe5UfjY01CUhpBWfwUa4ZuESKS9gFNlHN6rqADrdngyn8zDBFFd07B3MCg651
VURASTcyw7yJgVAhMRyxjcj76GYYm7AX1vyN6tkbBcY+yt2ZYdCYQh5k17U2FYeR
fn/15SuRqa06p2+cHfyEVNQBwtZ2l10SyJHBTOMlgN5V0AH+itU+2t03SmhuEQaG
CeiOVJJtJFWyvvwzDeqGf6YqTTicyqciviRVrkLL2KlnoCzTamZosv8aWzKtQKEj
TKcDp5Toc0TK3EMtPYRwuXGkWr+0WB8PuNYmouhauPMm+cgHO2F4JtgJfPspbw+l
BOks64uQQNGhfyjfpcaWjIxnyC5dtGJoGhsiP8OQF38utUwZlkOh1/90hcUapocQ
E4FC8X8mAckSRce9/fca
=ckrX
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2020-03-06' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Weekly fixes round, looks like a few people woke up, got a bunch of
fixes across the drivers. Bit bigger than I'd like but they all seem
fine and hopefully it quiets down now.
sun4i, kirin, mediatek and exynos on the ARM side. virtio-gpu and core
have some mmap fixes, and there is a dma-buf leak. one ttm fence leak
is also fixed.
Otherwise it's mostly amdgpu and i915.
One of the i915 fixes is for a very long latency I was seeing (using
latencytop) running gnome-shell locally when using firefox and eating
nearly all my RAM, it really helps with desktop responsiveness esp
when firefox is chewing a lot.
dma-buf:
- fix memory leak
core:
- shmem object mmap fix.
ttm:
- Fix fence leak in ttm_buffer_object_transfer().
amdgpu:
- Gfx reset fix for gfx9, 10
- Fix for gfx10
- DP MST fix
- DCC fix
- Renoir power fixes
- Navi power fix
i915:
- Break up long lists of object reclaim with cond_resched()
- PSR probe fix
- TGL workarounds
- Selftest return value fix
- Drop timeline mutex while waiting for retirement
- Wait for OA configuration completion before writes to OA buffer
virtio:
- Fix resource id creation race in virtio.
- mmap fixes
sun4i:
- Fixes for sun4i VI layer format support.
kirin:
- kirin: Revert "Fix for hikey620 display offset problem"
exynos:
- fix a kernel oops problem in case that driver is loaded as module.
- fix a regulator warning issue when I2C DDC adapter cannot be gathered.
- print out an error message only in error case excepting -EPROBE_DEFER.
mediatek:
- overlay, cursor and gce fixes"
`
* tag 'drm-fixes-2020-03-06' of git://anongit.freedesktop.org/drm/drm: (38 commits)
drm/amdgpu/display: navi1x copy dcn watermark clock settings to smu resume from s3 (v2)
drm/amd/powerplay: map mclk to fclk for COMBINATIONAL_BYPASS case
drm/amd/powerplay: fix pre-check condition for setting clock range
drm/amd/display: fix dcc swath size calculations on dcn1
drm/amd/display: Clear link settings on MST disable connector
drm/amdgpu: disable 3D pipe 1 on Navi1x
drm/amdgpu: clean wptr on wb when gpu recovery
drm: kirin: Revert "Fix for hikey620 display offset problem"
drm/i915/gt: Drop the timeline->mutex as we wait for retirement
drm/i915/perf: Reintroduce wait on OA configuration completion
drm/sun4i: Fix DE2 VI layer format support
drm/sun4i: Add separate DE3 VI layer formats
drm/sun4i: de2/de3: Remove unsupported VI layer formats
drm/i915/selftests: Fix return in assert_mmap_offset()
drm/i915: Protect i915_request_await_start from early waits
drm/i915/tgl: Add Wa_1608008084
drm/i915/tgl: Add Wa_22010178259:tgl
drm/i915: Program MBUS with rmw during initialization
drm/i915/psr: Force PSR probe only after full initialization
drm/i915/gem: Break up long lists of object reclaim
...
Commit ee88f4ebe5 ("ALSA: mips: Use managed buffer allocation") removed
superfluous hw_params/hw_free callbacks, but forgot to remove them where
they were used.
Fixes: ee88f4ebe5 ("ALSA: mips: Use managed buffer allocation")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://lore.kernel.org/r/20200306105837.31523-1-tsbogend@alpha.franken.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Currently passive MPTCP socket can skip including the DACK
option - if the peer sends data before accept() completes.
The above happens because the msk 'can_ack' flag is set
only after the accept() call.
Such missing DACK option may cause - as per RFC spec -
unwanted fallback to TCP.
This change addresses the issue using the key material
available in the current subflow, if any, to create a suitable
dack option when msk ack seq is not yet available.
v1 -> v2:
- adavance the generated ack after the initial MPC packet
Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is similar to commit 674d9de02a ("NFC: Fix possible memory
corruption when handling SHDLC I-Frame commands") and commit d7ee81ad09
("NFC: nci: Add some bounds checking in nci_hci_cmd_received()") which
added range checks on "pipe".
The "pipe" variable comes skb->data[0] in nfc_hci_msg_rx_work().
It's in the 0-255 range. We're using it as the array index into the
hdev->pipes[] array which has NFC_HCI_MAX_PIPES (128) members.
Fixes: 118278f20a ("NFC: hci: Add pipes table to reference them with a tuple {gate, host}")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The emulated vbt doesn't tell its size correctly. According to the
intel_vbt_defs.h, vbt_header.vbt_size should the size of VBT (VBT Header,
BDB Header and data blocks), and bdb_header.bdb_size should be the size
of BDB (BDB Header and data blocks).
This patch fixes the issue and lets vbt provided by GVT-g pass the guest
i915's sanity test.
v2: refine the commit message. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200305131600.29640-1-tina.zhang@intel.com
When local NET_RX backlog is full due to traffic overrun,
peer veth tx_dropped counter increases. At that time, list
local veth stats, rx_dropped has double value of peer
tx_dropped, even bigger than transmit packets by peer.
In NET_RX softirq process, if any packet drop case happens,
it increases dev's rx_dropped counter and returns NET_RX_DROP.
At veth tx side, it records any error returned from peer netif_rx
into local dev tx_dropped counter.
In veth get stats process, it puts local dev rx_dropped and
peer dev tx_dropped into together as local rx_drpped value.
So that it shows double value of real dropped packets number in
this case.
This patch ignores peer tx_dropped when counting local rx_dropped,
since peer tx_dropped is duplicated to local rx_dropped at most cases.
Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Break up long lists of object reclaim with cond_resched()
- PSR probe fix
- TGL workarounds
- Selftest return value fix
- Drop timeline mutex while waiting for retirement
- Wait for OA configuration completion before writes to OA buffer
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFWWmW3ewYy4RJOWc05gHnSar7m8FAl5g1ecACgkQ05gHnSar
7m/y3A/8DTpsr/iCkp2z0RfTzl7n0Miw4DSJ53ma4xjV8oVUji2cZVQu3W+lb/4Q
VBvLt6LIGDwe/v0gi4IuJaBKnRjfi4UXntb4Np9k75Mol5FPKdGIgan38SSIv1f4
7a8a+EG773sYETHmHhbt5wHDTTfTNHrjoit/KnkeT0gRZaUlpcmLHKWBiLrwVwvO
NP4eOL++k7OzXH8+osk2C5oYxf3YeQn+nkt9qlnLqcLjW3ZNoXDlc5Zmhk1ZB7tx
Ij+LBM2jslpl8cKi2potrfw2W5Q28XzQt9AzAXQRCSR0CHqZMTlXxOTjhsMEh2pp
aDkvcrApi73lt6OsspxbflcmfwM5oTU+xSiIbRu07ZJ2FFC2PSMzGaaOZihjNIeU
XtdGs/95rl38qU42+0epImt/Wz8WAparNymUa1KAfe1XRpDg72EpCXJmESoxm15z
jRLZm4jvcxoYtet2fAk5W9nmaIzD7pLgO2cXHWYXMEpyw5Ts+BJyWnJFlfTIlOpL
qv9AjuryPgEVVN1/q/WZzD3GKeFeEODiPgZjx/OF/DXG7rkfKwLuecnoFzuHUbJh
YNJtE+2stqq6LkGsaS1pE6FkSsjNBvQ9z5EZaIBZKfWau1tRF38iON3e1VA+v78Z
z0diCC3pjRCjnilVbmiiF1HMiiftNBHRWGwK1tDklAm2GMcmf7U=
=kXvj
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-fixes-2020-03-05' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.6-rc5:
- Break up long lists of object reclaim with cond_resched()
- PSR probe fix
- TGL workarounds
- Selftest return value fix
- Drop timeline mutex while waiting for retirement
- Wait for OA configuration completion before writes to OA buffer
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87eeu7nl6z.fsf@intel.com
Second set of fixes for v5.6. Only two small fixes this time.
iwlwifi
* fix another initialisation regression with 3168 devices
mt76
* fix memory corruption with too many rx fragments
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJeYQGUAAoJEG4XJFUm622bJPoIAJ9lKMQWzHP7s/PouMw4qa3l
KyQHl0uZe6clEULYFi89vaUUzjmTZqsBkQXsKM+0cppxx1IxukLFtQw8UqgEWbpB
eSDajpB5tvt7gSvLjGLyXtxZzj7Bn5T0IdgcPLxg9lmp0EwW/ycbX/XTCDqZSSq0
IHkYKjRBoIhJ+JTZVYGPIk3GDwSb3PTVeT9C239Rj6Jhwou0p1vlopOjAsIr3dv9
PIf/+21wHTtAgVejxKSQV4qUrNT2/J27gXht3hLwkm8goQV01z/3z03bQhEpfvm5
CJr0hZAjLIR1T5RvYCqb41nhTFznlmrBYsozy1EtpMJGfGbNQ2XvRGZCU964lnk=
=5YYp
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-2020-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.6
Second set of fixes for v5.6. Only two small fixes this time.
iwlwifi
* fix another initialisation regression with 3168 devices
mt76
* fix memory corruption with too many rx fragments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We now ignore the "completion" event when using tx queue timestamping,
and only pay attention to the two (high and low) timestamp events. The
NIC will send a pair of timestamp events for every packet transmitted.
The current firmware may merge the completion events, and it is possible
that future versions may reorder the completion and timestamp events.
As such the completion event is not useful.
Without this patch in place a merged completion event on a queue with
timestamping will cause a "spurious TX completion" error. This affects
SFN8000-series adapters.
Signed-off-by: Tom Zhao <tzhao@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct_ops map cannot support map_freeze. Otherwise, a struct_ops
cannot be unregistered from the subsystem.
Fixes: 85d33df357 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200305013454.535397-1-kafai@fb.com
The current always succeed behavior in bpf_struct_ops_map_delete_elem()
is not ideal for userspace tool. It can be improved to return proper
error value.
If it is in TOBEFREE, it means unregistration has been already done
before but it is in progress and waiting for the subsystem to clear
the refcnt to zero, so -EINPROGRESS.
If it is INIT, it means the struct_ops has not been registered yet,
so -ENOENT.
Fixes: 85d33df357 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200305013447.535326-1-kafai@fb.com
Yonghong Song says:
====================
Commit 8b401f9ed2 ("bpf: implement bpf_send_signal() helper")
introduced bpf_send_signal() helper and Commit 8482941f09
("bpf: Add bpf_send_signal_thread() helper") added bpf_send_signal_thread()
helper. Both helpers try to send a signel to current process or thread.
When bpf_prog is called with scheduler rq_lock held, a deadlock
could happen since bpf_send_signal() and bpf_send_signal_thread()
will call group_send_sig_info() which may ultimately want to acquire
rq_lock() again. This happens in 5.2 and 4.16 based kernels in our
production environment with perf_sw_event.
In a different scenario, the following is also possible in the last kernel:
cpu 1:
do_task_stat <- holding sighand->siglock
...
task_sched_runtime <- trying to grab rq_lock
cpu 2:
__schedule <- holding rq_lock
...
do_send_sig_info <- trying to grab sighand->siglock
Commit eac9153f2b ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") has a similar issue with above
rq_lock() deadlock. This patch set addressed the issue
in a similar way. Patch #1 provided kernel solution and
Patch #2 added a selftest.
Changelogs:
v2 -> v3:
. simplify selftest send_signal_sched_switch().
The previous version has mmap/munmap inherited
from Song's reproducer. They are not necessary
in this context.
v1 -> v2:
. previous fix using task_work in nmi() is incorrect.
there is no nmi() involvement here. Using task_work
in all cases might be a solution. But decided to
use a similar fix as in Commit eac9153f2b.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Added one test, send_signal_sched_switch, to test bpf_send_signal()
helper triggered by sched/sched_switch tracepoint. This test can be used
to verify kernel deadlocks fixed by the previous commit. The test itself
is heavily borrowed from Commit eac9153f2b ("bpf/stackmap: Fix deadlock
with rq_lock in bpf_get_stack()").
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191105.2796601-1-yhs@fb.com
When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
#5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
#6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
#7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
#8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
#9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
#10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
#11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
#12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
#13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
#14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
#15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
#17 [ffffc9002219fc90] schedule at ffffffff81a3e219
#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068
The above call stack is actually very similar to an issue
reported by Commit eac9153f2b ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.
The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153f2b, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
/* stress_test.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define THREAD_COUNT 1000
char *filename;
void *worker(void *p)
{
void *ptr;
int fd;
char *pptr;
fd = open(filename, O_RDONLY);
if (fd < 0)
return NULL;
while (1) {
struct timespec ts = {0, 1000 + rand() % 2000};
ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
usleep(1);
if (ptr == MAP_FAILED) {
printf("failed to mmap\n");
break;
}
munmap(ptr, 4096 * 64);
usleep(1);
pptr = malloc(1);
usleep(1);
pptr[0] = 1;
usleep(1);
free(pptr);
usleep(1);
nanosleep(&ts, NULL);
}
close(fd);
return NULL;
}
int main(int argc, char *argv[])
{
void *ptr;
int i;
pthread_t threads[THREAD_COUNT];
if (argc < 2)
return 0;
filename = argv[1];
for (i = 0; i < THREAD_COUNT; i++) {
if (pthread_create(threads + i, NULL, worker, NULL)) {
fprintf(stderr, "Error creating thread\n");
return 0;
}
}
for (i = 0; i < THREAD_COUNT; i++)
pthread_join(threads[i], NULL);
return 0;
}
and the following command:
1. run `stress_test /bin/ls` in one windown
2. hack bcc trace.py with the following change:
--- a/tools/trace.py
+++ b/tools/trace.py
@@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
__data.tgid = __tgid;
__data.pid = __pid;
bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
+ bpf_send_signal(10);
%s
%s
%s.perf_submit(%s, &__data, sizeof(__data));
3. in a different window run
./trace.py -p $(pidof stress_test) t:sched:sched_switch
The deadlock can be reproduced in our production system.
Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.
I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
[ 32.832450] -> #1 (&p->pi_lock){-.-.}:
[ 32.833100] _raw_spin_lock_irqsave+0x44/0x80
[ 32.833696] task_rq_lock+0x2c/0xa0
[ 32.834182] task_sched_runtime+0x59/0xd0
[ 32.834721] thread_group_cputime+0x250/0x270
[ 32.835304] thread_group_cputime_adjusted+0x2e/0x70
[ 32.835959] do_task_stat+0x8a7/0xb80
[ 32.836461] proc_single_show+0x51/0xb0
...
[ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
[ 32.840275] __lock_acquire+0x1358/0x1a20
[ 32.840826] lock_acquire+0xc7/0x1d0
[ 32.841309] _raw_spin_lock_irqsave+0x44/0x80
[ 32.841916] __lock_task_sighand+0x79/0x160
[ 32.842465] do_send_sig_info+0x35/0x90
[ 32.842977] bpf_send_signal+0xa/0x10
[ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
[ 32.844301] trace_call_bpf+0x115/0x270
[ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0
[ 32.845411] perf_trace_sched_switch+0x10f/0x180
[ 32.846014] __schedule+0x45d/0x880
[ 32.846483] schedule+0x5f/0xd0
...
[ 32.853148] Chain exists of:
[ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
[ 32.853148]
[ 32.854451] Possible unsafe locking scenario:
[ 32.854451]
[ 32.855173] CPU0 CPU1
[ 32.855745] ---- ----
[ 32.856278] lock(&rq->lock);
[ 32.856671] lock(&p->pi_lock);
[ 32.857332] lock(&rq->lock);
[ 32.857999] lock(&(&sighand->siglock)->rlock);
Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
but it has been held by CPU1 and CPU1 tries to grab &rq->lock
and cannot get it.
This is not exactly the callstack in our production environment,
but sympotom is similar and both locks are using spin_lock_irqsave()
to acquire the lock, and both involves rq_lock. The fix to delay
sending signal when irq is disabled also fixed this issue.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
If secure_computing() rejected a system call, we were previously setting
the system call number to -1, to indicate to later code that the syscall
failed. However, if something (e.g. a user notification) was sleeping, and
received a signal, we may set a0 to -ERESTARTSYS and re-try the system call
again.
In this case, seccomp "denies" the syscall (because of the signal), and we
would set a7 to -1, thus losing the value of the system call we want to
restart.
Instead, let's return -1 from do_syscall_trace_enter() to indicate that the
syscall was rejected, so we don't clobber the value in case of -ERESTARTSYS
or whatever.
This commit fixes the user_notification_signal seccomp selftest on riscv to
no longer hang. That test expects the system call to be re-issued after the
signal, and it wasn't due to the above bug. Now that it is, everything
works normally.
Note that in the ptrace (tracer) case, the tracer can set the register
values to whatever they want, so we still need to keep the code that
handles out-of-bounds syscalls. However, we can drop the comment.
We can also drop syscall_set_nr(), since it is no longer used anywhere, and
the code that re-loads the value in a7 because of it.
Reported in: https://lore.kernel.org/bpf/CAEn-LTp=ss0Dfv6J00=rCAy+N78U2AmhqJNjfqjr2FDpPYjxEQ@mail.gmail.com/
Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
There was a recent change in blktrace.c that added a RCU protection to
`q->blk_trace` in order to fix a use-after-free issue during access.
However the change missed an edge case that can lead to dereferencing of
`bt` pointer even when it's NULL:
Coverity static analyzer marked this as a FORWARD_NULL issue with CID
1460458.
```
/kernel/trace/blktrace.c: 1904 in sysfs_blk_trace_attr_store()
1898 ret = 0;
1899 if (bt == NULL)
1900 ret = blk_trace_setup_queue(q, bdev);
1901
1902 if (ret == 0) {
1903 if (attr == &dev_attr_act_mask)
>>> CID 1460458: Null pointer dereferences (FORWARD_NULL)
>>> Dereferencing null pointer "bt".
1904 bt->act_mask = value;
1905 else if (attr == &dev_attr_pid)
1906 bt->pid = value;
1907 else if (attr == &dev_attr_start_lba)
1908 bt->start_lba = value;
1909 else if (attr == &dev_attr_end_lba)
```
Added a reassignment with RCU annotation to fix the issue.
Fixes: c780e86dd4 ("blktrace: Protect q->blk_trace with RCU")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Cengiz Can <cengiz@kernel.wtf>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8250_dw has been split to library part and the driver, the library
is being used by 8250_lpss, which represents Synosys DesignWare UART
(with optional Synopsys Designware DMA) enumerated by PCI.
Add missed above mentioned files to the database record for review.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20200305123108.41320-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the ability to reboot the HiFive Unleashed board via GPIO.
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
in this place, the function should return a
negative value and the PTR_ERR already returns
a negative,so return -PTR_ERR() is wrong.
Signed-off-by: tangbin <tangbin@cmss.chinamobile.com>
Cc: stable <stable@vger.kernel.org>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20200305013823.20976-1-tangbin@cmss.chinamobile.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When fibre port supports auto-negotiation, the IMP(Intelligent
Management Process) processes the speed of auto-negotiation
and the user's speed separately.
For below case, the port will get a not link up problem.
step 1: disables auto-negotiation and sets speed to A, then
the driver's MAC speed will be updated to A.
step 2: enables auto-negotiation and MAC gets negotiated
speed B, then the driver's MAC speed will be updated to B
through querying in periodical task.
step 3: MAC gets new negotiated speed A.
step 4: disables auto-negotiation and sets speed to B before
periodical task query new MAC speed A, the driver will ignore
the speed configuration.
This patch fixes it by skipping speed and duplex checking when
fibre port supports auto-negotiation.
Fixes: 22f48e24a2 ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My Netronome address is no longer active. I am no maintainer, but
get_maintainer.pl sometimes returns my name for a small number of
files (BPF-related). Add an entry to .mailmap for good measure.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200226171353.18982-1-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>