With HW tag-based KASAN, error checks are performed implicitly by the
load and store instructions in the memcpy implementation. A failed
check results in tag checks being disabled and execution will keep
going. As a result, under HW tag-based KASAN, prior to commit
1b0668be62 ("kasan: test: disable kmalloc_memmove_invalid_size for
HW_TAGS"), this memcpy would end up corrupting memory until it hits an
inaccessible page and causes a kernel panic.
This is a pre-existing issue that was revealed by commit 285133040e
("arm64: Import latest memcpy()/memmove() implementation") which changed
the memcpy implementation from using signed comparisons (incorrectly,
resulting in the memcpy being terminated early for negative sizes) to
using unsigned comparisons.
It is unclear how this could be handled by memcpy itself in a reasonable
way. One possibility would be to add an exception handler that would
force memcpy to return if a tag check fault is detected -- this would
make the behavior roughly similar to generic and SW tag-based KASAN.
However, this wouldn't solve the problem for asynchronous mode and also
makes memcpy behavior inconsistent with manually copying data.
This test was added as a part of a series that taught KASAN to detect
negative sizes in memory operations, see commit 8cceeff48f ("kasan:
detect negative size in memory operation function"). Therefore we
should keep testing for negative sizes with generic and SW tag-based
KASAN. But there is some value in testing small memcpy overflows, so
let's add another test with memcpy that does not destabilize the kernel
by performing out-of-bounds writes, and run it in all modes.
Link: https://linux-review.googlesource.com/id/I048d1e6a9aff766c4a53f989fb0c83de68923882
Link: https://lkml.kernel.org/r/20210910211356.3603758-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an object is allocated on a tail page of a multi-page slab, kasan
will get the wrong tag because page->s_mem is NULL for tail pages. I'm
not quite sure what the user-visible effect of this might be.
Link: https://lkml.kernel.org/r/20211001024105.3217339-1-willy@infradead.org
Fixes: 7f94ffbc4c ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shuah Khan reported:
| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
| it tries to allocate memory attempting to acquire spinlock in page
| allocation code while holding workqueue pool raw_spinlock.
|
| There are several instances of this problem when block layer tries
| to __queue_work(). Call trace from one of these instances is below:
|
| kblockd_mod_delayed_work_on()
| mod_delayed_work_on()
| __queue_delayed_work()
| __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
| insert_work()
| kasan_record_aux_stack()
| kasan_save_stack()
| stack_depot_save()
| alloc_pages()
| __alloc_pages()
| get_page_from_freelist()
| rm_queue()
| rm_queue_pcplist()
| local_lock_irqsave(&pagesets.lock, flags);
| [ BUG: Invalid wait context triggered ]
The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and commmit ab00db216c).
Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using
kasan_record_aux_stack_noalloc().
While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit).
For frequent calls from the same location, it therefore becomes
extremely unlikely that kasan_record_aux_stack_noalloc() fails.
Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
Link: https://lkml.kernel.org/r/20210913112609.2651084-7-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a variant of kasan_record_aux_stack() that does not do any
memory allocation through stackdepot. This will permit using it in
contexts that cannot allocate any memory.
Link: https://lkml.kernel.org/r/20210913112609.2651084-6-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add __stack_depot_save(), which provides more fine-grained control over
stackdepot's memory allocation behaviour, in case stackdepot runs out of
"stack slabs".
Normally stackdepot uses alloc_pages() in case it runs out of space;
passing can_alloc==false to __stack_depot_save() prohibits this, at the
cost of more likely failure to record a stack trace.
Link: https://lkml.kernel.org/r/20210913112609.2651084-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Taras Madan <tarasmadan@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "stackdepot, kasan, workqueue: Avoid expanding stackdepot
slabs when holding raw_spin_lock", v2.
Shuah Khan reported [1]:
| When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
| kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
| it tries to allocate memory attempting to acquire spinlock in page
| allocation code while holding workqueue pool raw_spinlock.
|
| There are several instances of this problem when block layer tries
| to __queue_work(). Call trace from one of these instances is below:
|
| kblockd_mod_delayed_work_on()
| mod_delayed_work_on()
| __queue_delayed_work()
| __queue_work() (rcu_read_lock, raw_spin_lock pool->lock held)
| insert_work()
| kasan_record_aux_stack()
| kasan_save_stack()
| stack_depot_save()
| alloc_pages()
| __alloc_pages()
| get_page_from_freelist()
| rm_queue()
| rm_queue_pcplist()
| local_lock_irqsave(&pagesets.lock, flags);
| [ BUG: Invalid wait context triggered ]
PROVE_RAW_LOCK_NESTING is pointing out that (on RT kernels) the locking
rules are being violated. More generally, memory is being allocated
from a non-preemptive context (raw_spin_lock'd c-s) where it is not
allowed.
To properly fix this, we must prevent stackdepot from replenishing its
"stack slab" pool if memory allocations cannot be done in the current
context: it's a bug to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts, including raw_spin_locks (see gfp.h and commit
ab00db216c).
The only downside is that saving a stack trace may fail if: stackdepot
runs out of space AND the same stack trace has not been recorded before.
I expect this to be unlikely, and a simple experiment (boot the kernel)
didn't result in any failure to record stack trace from insert_work().
The series includes a few minor fixes to stackdepot that I noticed in
preparing the series. It then introduces __stack_depot_save(), which
exposes the option to force stackdepot to not allocate any memory.
Finally, KASAN is changed to use the new stackdepot interface and
provide kasan_record_aux_stack_noalloc(), which is then used by
workqueue code.
[1] https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
This patch (of 6):
<linux/stackdepot.h> refers to gfp_t, but doesn't include gfp.h.
Fix it by including <linux/gfp.h>.
Link: https://lkml.kernel.org/r/20210913112609.2651084-1-elver@google.com
Link: https://lkml.kernel.org/r/20210913112609.2651084-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Taras Madan <tarasmadan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not required at all, and having this causes a huge kernel rebuild as
soon as something in dax.h changes.
Link: https://lkml.kernel.org/r/20210921082253.1859794-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
TRANSPARENT_HUGEPAGE:
There are potential non-deterministic delays to an RT thread if a
critical memory region is not THP-aligned and a non-RT buffer is
located in the same hugepage-aligned region. It's also possible for an
unrelated thread to migrate pages belonging to an RT task incurring
unexpected page faults due to memory defragmentation even if
khugepaged is disabled.
Regular HUGEPAGEs are not affected by this can be used.
NUMA_BALANCING:
There is a non-deterministic delay to mark PTEs PROT_NONE to gather
NUMA fault samples, increased page faults of regions even if mlocked
and non-deterministic delays when migrating pages.
[Mel Gorman worded 99% of the commit description].
Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/
Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/
Link: https://lkml.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0ad9500e16 ("slub: prefetch next freelist pointer in
slab_alloc()") introduced prefetch_freepointer() because when other
cpu(s) freed objects into a page that current cpu owns, the freelist
link is hot on cpu(s) which freed objects and possibly very cold on
current cpu.
But if freelist link chain is hot on cpu(s) which freed objects, it's
better to invalidate that chain because they're not going to access
again within a short time.
So use prefetchw instead of prefetch. On supported architectures like
x86 and arm, it invalidates other copied instances of a cache line when
prefetching it.
Before:
Time: 91.677
Performance counter stats for 'hackbench -g 100 -l 10000':
1462938.07 msec cpu-clock # 15.908 CPUs utilized
18072550 context-switches # 12.354 K/sec
1018814 cpu-migrations # 696.416 /sec
104558 page-faults # 71.471 /sec
1580035699271 cycles # 1.080 GHz (54.51%)
2003670016013 instructions # 1.27 insn per cycle (54.31%)
5702204863 branch-misses (54.28%)
643368500985 cache-references # 439.778 M/sec (54.26%)
18475582235 cache-misses # 2.872 % of all cache refs (54.28%)
642206796636 L1-dcache-loads # 438.984 M/sec (46.87%)
18215813147 L1-dcache-load-misses # 2.84% of all L1-dcache accesses (46.83%)
653842996501 dTLB-loads # 446.938 M/sec (46.63%)
3227179675 dTLB-load-misses # 0.49% of all dTLB cache accesses (46.85%)
537531951350 iTLB-loads # 367.433 M/sec (54.33%)
114750630 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.37%)
630135543177 L1-icache-loads # 430.733 M/sec (46.80%)
22923237620 L1-icache-load-misses # 3.64% of all L1-icache accesses (46.76%)
91.964452802 seconds time elapsed
43.416742000 seconds user
1422.441123000 seconds sys
After:
Time: 90.220
Performance counter stats for 'hackbench -g 100 -l 10000':
1437418.48 msec cpu-clock # 15.880 CPUs utilized
17694068 context-switches # 12.310 K/sec
958257 cpu-migrations # 666.651 /sec
100604 page-faults # 69.989 /sec
1583259429428 cycles # 1.101 GHz (54.57%)
2004002484935 instructions # 1.27 insn per cycle (54.37%)
5594202389 branch-misses (54.36%)
643113574524 cache-references # 447.409 M/sec (54.39%)
18233791870 cache-misses # 2.835 % of all cache refs (54.37%)
640205852062 L1-dcache-loads # 445.386 M/sec (46.75%)
17968160377 L1-dcache-load-misses # 2.81% of all L1-dcache accesses (46.79%)
651747432274 dTLB-loads # 453.415 M/sec (46.59%)
3127124271 dTLB-load-misses # 0.48% of all dTLB cache accesses (46.75%)
535395273064 iTLB-loads # 372.470 M/sec (54.38%)
113500056 iTLB-load-misses # 0.02% of all iTLB cache accesses (54.35%)
628871845924 L1-icache-loads # 437.501 M/sec (46.80%)
22585641203 L1-icache-load-misses # 3.59% of all L1-icache accesses (46.79%)
90.514819303 seconds time elapsed
43.877656000 seconds user
1397.176001000 seconds sys
Link: https://lkml.org/lkml/2021/10/8/598=20
Link: https://lkml.kernel.org/r/20211011144331.70084-1-42.hyeyoo@gmail.com
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The defaults are determined based on object size and can go up to 30 for
objects smaller than 256 bytes. Before the previous patch changed the
accounting, this could have made cpu partial list contain up to 30
pages. After that patch, only up to 2 pages with default allocation
order.
Very short lists limit the usefulness of the whole concept of cpu
partial lists, so this patch aims at a more reasonable default under the
new accounting. The defaults are quadrupled, except for object size >=
PAGE_SIZE where it's doubled. This makes the lists grow up to 10 pages
in practice.
A quick test of booting a kernel under virtme with 4GB RAM and 8 vcpus
shows the following slab memory usage after boot:
Before previous patch (using page->pobjects):
Slab: 36732 kB
SReclaimable: 14836 kB
SUnreclaim: 21896 kB
After previous patch (using page->pages):
Slab: 34720 kB
SReclaimable: 13716 kB
SUnreclaim: 21004 kB
After this patch (using page->pages, higher defaults):
Slab: 35252 kB
SReclaimable: 13944 kB
SUnreclaim: 21308 kB
In the same setup, I also ran 5 times:
hackbench -l 16000 -g 16
Differences in time were in the noise, we can compare slub stats as
given by slabinfo -r skbuff_head_cache (the other cache heavily used by
hackbench, kmalloc-cg-512 looks similar). Negligible stats left out for
brevity.
Before previous patch (using page->pobjects):
Objects: 1408, Memory Total: 401408 Used : 304128
Slab Perf Counter Alloc Free %Al %Fr
--------------------------------------------------
Fastpath 469952498 5946606 91 1
Slowpath 42053573 506059465 8 98
Page Alloc 41093 41044 0 0
Add partial 18 21229327 0 4
Remove partial 20039522 36051 3 0
Cpu partial list 4686640 24767229 0 4
RemoteObj/SlabFrozen 16 124027841 0 24
Total 512006071 512006071
Flushes 18
Slab Deactivation Occurrences %
-------------------------------------------------
Slab empty 4993 0%
Deactivation bypass 24767229 99%
Refilled from foreign frees 21972674 88%
After previous patch (using page->pages):
Objects: 480, Memory Total: 131072 Used : 103680
Slab Perf Counter Alloc Free %Al %Fr
--------------------------------------------------
Fastpath 473016294 5405653 92 1
Slowpath 38989777 506600418 7 98
Page Alloc 32717 32701 0 0
Add partial 3 22749164 0 4
Remove partial 11371127 32474 2 0
Cpu partial list 11686226 23090059 2 4
RemoteObj/SlabFrozen 2 67541803 0 13
Total 512006071 512006071
Flushes 3
Slab Deactivation Occurrences %
-------------------------------------------------
Slab empty 227 0%
Deactivation bypass 23090059 99%
Refilled from foreign frees 27585695 119%
After this patch (using page->pages, higher defaults):
Objects: 896, Memory Total: 229376 Used : 193536
Slab Perf Counter Alloc Free %Al %Fr
--------------------------------------------------
Fastpath 473799295 4980278 92 0
Slowpath 38206776 507025793 7 99
Page Alloc 32295 32267 0 0
Add partial 11 23291143 0 4
Remove partial 5815764 31278 1 0
Cpu partial list 18119280 23967320 3 4
RemoteObj/SlabFrozen 10 76974794 0 15
Total 512006071 512006071
Flushes 11
Slab Deactivation Occurrences %
-------------------------------------------------
Slab empty 989 0%
Deactivation bypass 23967320 99%
Refilled from foreign frees 32358473 135%
As expected, memory usage dropped significantly with change of
accounting, increasing the defaults increased it, but not as much. The
number of page allocation/frees dropped significantly with the new
accounting, but didn't increase with the higher defaults.
Interestingly, the number of fasthpath allocations increased, as well as
allocations from the cpu partial list, even though it's shorter.
Link: https://lkml.kernel.org/r/20211012134651.11258-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With CONFIG_SLUB_CPU_PARTIAL enabled, SLUB keeps a percpu list of
partial slabs that can be promoted to cpu slab when the previous one is
depleted, without accessing the shared partial list. A slab can be
added to this list by 1) refill of an empty list from get_partial_node()
- once we really have to access the shared partial list, we acquire
multiple slabs to amortize the cost of locking, and 2) first free to a
previously full slab - instead of putting the slab on a shared partial
list, we can more cheaply freeze it and put it on the per-cpu list.
To control how large a percpu partial list can grow for a kmem cache,
set_cpu_partial() calculates a target number of free objects on each
cpu's percpu partial list, and this can be also set by the sysfs file
cpu_partial.
However, the tracking of actual number of objects is imprecise, in order
to limit overhead from cpu X freeing an objects to a slab on percpu
partial list of cpu Y. Basically, the percpu partial slabs form a
single linked list, and when we add a new slab to the list with current
head "oldpage", we set in the struct page of the slab we're adding:
page->pages = oldpage->pages + 1; // this is precise
page->pobjects = oldpage->pobjects + (page->objects - page->inuse);
page->next = oldpage;
Thus the real number of free objects in the slab (objects - inuse) is
only determined at the moment of adding the slab to the percpu partial
list, and further freeing doesn't update the pobjects counter nor
propagate it to the current list head. As Jann reports [1], this can
easily lead to large inaccuracies, where the target number of objects
(up to 30 by default) can translate to the same number of (empty) slab
pages on the list. In case 2) above, we put a slab with 1 free object
on the list, thus only increase page->pobjects by 1, even if there are
subsequent frees on the same slab. Jann has noticed this in practice
and so did we [2] when investigating significant increase of kmemcg
usage after switching from SLAB to SLUB.
While this is no longer a problem in kmemcg context thanks to the
accounting rewrite in 5.9, the memory waste is still not ideal and it's
questionable whether it makes sense to perform free object count based
control when object counts can easily become so much inaccurate. So
this patch converts the accounting to be based on number of pages only
(which is precise) and removes the page->pobjects field completely.
This is also ultimately simpler.
To retain the existing set_cpu_partial() heuristic, first calculate the
target number of objects as previously, but then convert it to target
number of pages by assuming the pages will be half-filled on average.
This assumption might obviously also be inaccurate in practice, but
cannot degrade to actual number of pages being equal to the target
number of objects.
We could also skip the intermediate step with target number of objects
and rewrite the heuristic in terms of pages. However we still have the
sysfs file cpu_partial which uses number of objects and could break
existing users if it suddenly becomes number of pages, so this patch
doesn't do that.
In practice, after this patch the heuristics limit the size of percpu
partial list up to 2 pages. In case of a reported regression (which
would mean some workload has benefited from the previous imprecise
object based counting), we can tune the heuristics to get a better
compromise within the new scheme, while still avoid the unexpectedly
long percpu partial lists.
[1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=-umR7BLsEgjEYzA@mail.gmail.com/
[2] https://lore.kernel.org/all/2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz/
==========
Evaluation
==========
Mel was kind enough to run v1 through mmtests machinery for netperf
(localhost) and hackbench and, for most significant results see below.
So there are some apparent regressions, especially with hackbench, which
I think ultimately boils down to having shorter percpu partial lists on
average and some benchmarks benefiting from longer ones. Monitoring
slab usage also indicated less memory usage by slab. Based on that, the
following patch will bump the defaults to allow longer percpu partial
lists than after this patch.
However the goal is certainly not such that we would limit the percpu
partial lists to 30 pages just because previously a specific alloc/free
pattern could lead to the limit of 30 objects translate to a limit to 30
pages - that would make little sense. This is a correctness patch, and
if a workload benefits from larger lists, the sysfs tuning knobs are
still there to allow that.
Netperf
2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads per socket), 384GB RAM
TCP-RR:
hmean before 127045.79 after 121092.94 (-4.69%, worse)
stddev before 2634.37 after 1254.08
UDP-RR:
hmean before 166985.45 after 160668.94 ( -3.78%, worse)
stddev before 4059.69 after 1943.63
2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads per socket), 512GB RAM
TCP-RR:
hmean before 84173.25 after 76914.72 ( -8.62%, worse)
UDP-RR:
hmean before 93571.12 after 96428.69 ( 3.05%, better)
stddev before 23118.54 after 16828.14
2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads per socket), 64GB RAM
TCP-RR:
hmean before 49984.92 after 48922.27 ( -2.13%, worse)
stddev before 6248.15 after 4740.51
UDP-RR:
hmean before 61854.31 after 68761.81 ( 11.17%, better)
stddev before 4093.54 after 5898.91
other machines - within 2%
Hackbench
(results before and after the patch, negative % means worse)
2-socket AMD EPYC 7713 (64 cores, 128 threads per core), 256GB RAM
hackbench-process-sockets
Amean 1 0.5380 0.5583 ( -3.78%)
Amean 4 0.7510 0.8150 ( -8.52%)
Amean 7 0.7930 0.9533 ( -20.22%)
Amean 12 0.7853 1.1313 ( -44.06%)
Amean 21 1.1520 1.4993 ( -30.15%)
Amean 30 1.6223 1.9237 ( -18.57%)
Amean 48 2.6767 2.9903 ( -11.72%)
Amean 79 4.0257 5.1150 ( -27.06%)
Amean 110 5.5193 7.4720 ( -35.38%)
Amean 141 7.2207 9.9840 ( -38.27%)
Amean 172 8.4770 12.1963 ( -43.88%)
Amean 203 9.6473 14.3137 ( -48.37%)
Amean 234 11.3960 18.7917 ( -64.90%)
Amean 265 13.9627 22.4607 ( -60.86%)
Amean 296 14.9163 26.0483 ( -74.63%)
hackbench-thread-sockets
Amean 1 0.5597 0.5877 ( -5.00%)
Amean 4 0.7913 0.8960 ( -13.23%)
Amean 7 0.8190 1.0017 ( -22.30%)
Amean 12 0.9560 1.1727 ( -22.66%)
Amean 21 1.7587 1.5660 ( 10.96%)
Amean 30 2.4477 1.9807 ( 19.08%)
Amean 48 3.4573 3.0630 ( 11.41%)
Amean 79 4.7903 5.1733 ( -8.00%)
Amean 110 6.1370 7.4220 ( -20.94%)
Amean 141 7.5777 9.2617 ( -22.22%)
Amean 172 9.2280 11.0907 ( -20.18%)
Amean 203 10.2793 13.3470 ( -29.84%)
Amean 234 11.2410 17.1070 ( -52.18%)
Amean 265 12.5970 23.3323 ( -85.22%)
Amean 296 17.1540 24.2857 ( -41.57%)
2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
per socket), 384GB RAM
hackbench-process-sockets
Amean 1 0.5760 0.4793 ( 16.78%)
Amean 4 0.9430 0.9707 ( -2.93%)
Amean 7 1.5517 1.8843 ( -21.44%)
Amean 12 2.4903 2.7267 ( -9.49%)
Amean 21 3.9560 4.2877 ( -8.38%)
Amean 30 5.4613 5.8343 ( -6.83%)
Amean 48 8.5337 9.2937 ( -8.91%)
Amean 79 14.0670 15.2630 ( -8.50%)
Amean 110 19.2253 21.2467 ( -10.51%)
Amean 141 23.7557 25.8550 ( -8.84%)
Amean 172 28.4407 29.7603 ( -4.64%)
Amean 203 33.3407 33.9927 ( -1.96%)
Amean 234 38.3633 39.1150 ( -1.96%)
Amean 265 43.4420 43.8470 ( -0.93%)
Amean 296 48.3680 48.9300 ( -1.16%)
hackbench-thread-sockets
Amean 1 0.6080 0.6493 ( -6.80%)
Amean 4 1.0000 1.0513 ( -5.13%)
Amean 7 1.6607 2.0260 ( -22.00%)
Amean 12 2.7637 2.9273 ( -5.92%)
Amean 21 5.0613 4.5153 ( 10.79%)
Amean 30 6.3340 6.1140 ( 3.47%)
Amean 48 9.0567 9.5577 ( -5.53%)
Amean 79 14.5657 15.7983 ( -8.46%)
Amean 110 19.6213 21.6333 ( -10.25%)
Amean 141 24.1563 26.2697 ( -8.75%)
Amean 172 28.9687 30.2187 ( -4.32%)
Amean 203 33.9763 34.6970 ( -2.12%)
Amean 234 38.8647 39.3207 ( -1.17%)
Amean 265 44.0813 44.1507 ( -0.16%)
Amean 296 49.2040 49.4330 ( -0.47%)
2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
per socket), 512GB RAM
hackbench-process-sockets
Amean 1 0.5027 0.5017 ( 0.20%)
Amean 4 1.1053 1.2033 ( -8.87%)
Amean 7 1.8760 2.1820 ( -16.31%)
Amean 12 2.9053 3.1810 ( -9.49%)
Amean 21 4.6777 4.9920 ( -6.72%)
Amean 30 6.5180 6.7827 ( -4.06%)
Amean 48 10.0710 10.5227 ( -4.48%)
Amean 79 16.4250 17.5053 ( -6.58%)
Amean 110 22.6203 24.4617 ( -8.14%)
Amean 141 28.0967 31.0363 ( -10.46%)
Amean 172 34.4030 36.9233 ( -7.33%)
Amean 203 40.5933 43.0850 ( -6.14%)
Amean 234 46.6477 48.7220 ( -4.45%)
Amean 265 53.0530 53.9597 ( -1.71%)
Amean 296 59.2760 59.9213 ( -1.09%)
hackbench-thread-sockets
Amean 1 0.5363 0.5330 ( 0.62%)
Amean 4 1.1647 1.2157 ( -4.38%)
Amean 7 1.9237 2.2833 ( -18.70%)
Amean 12 2.9943 3.3110 ( -10.58%)
Amean 21 4.9987 5.1880 ( -3.79%)
Amean 30 6.7583 7.0043 ( -3.64%)
Amean 48 10.4547 10.8353 ( -3.64%)
Amean 79 16.6707 17.6790 ( -6.05%)
Amean 110 22.8207 24.4403 ( -7.10%)
Amean 141 28.7090 31.0533 ( -8.17%)
Amean 172 34.9387 36.8260 ( -5.40%)
Amean 203 41.1567 43.0450 ( -4.59%)
Amean 234 47.3790 48.5307 ( -2.43%)
Amean 265 53.9543 54.6987 ( -1.38%)
Amean 296 60.0820 60.2163 ( -0.22%)
1-socket Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 cores, 8 threads),
32 GB RAM
hackbench-process-sockets
Amean 1 1.4760 1.5773 ( -6.87%)
Amean 3 3.9370 4.0910 ( -3.91%)
Amean 5 6.6797 6.9357 ( -3.83%)
Amean 7 9.3367 9.7150 ( -4.05%)
Amean 12 15.7627 16.1400 ( -2.39%)
Amean 18 23.5360 23.6890 ( -0.65%)
Amean 24 31.0663 31.3137 ( -0.80%)
Amean 30 38.7283 39.0037 ( -0.71%)
Amean 32 41.3417 41.6097 ( -0.65%)
hackbench-thread-sockets
Amean 1 1.5250 1.6043 ( -5.20%)
Amean 3 4.0897 4.2603 ( -4.17%)
Amean 5 6.7760 7.0933 ( -4.68%)
Amean 7 9.4817 9.9157 ( -4.58%)
Amean 12 15.9610 16.3937 ( -2.71%)
Amean 18 23.9543 24.3417 ( -1.62%)
Amean 24 31.4400 31.7217 ( -0.90%)
Amean 30 39.2457 39.5467 ( -0.77%)
Amean 32 41.8267 42.1230 ( -0.71%)
2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
per socket), 64GB RAM
hackbench-process-sockets
Amean 1 1.0347 1.0880 ( -5.15%)
Amean 4 1.7267 1.8527 ( -7.30%)
Amean 7 2.6707 2.8110 ( -5.25%)
Amean 12 4.1617 4.3383 ( -4.25%)
Amean 21 7.0070 7.2600 ( -3.61%)
Amean 30 9.9187 10.2397 ( -3.24%)
Amean 48 15.6710 16.3923 ( -4.60%)
Amean 79 24.7743 26.1247 ( -5.45%)
Amean 110 34.3000 35.9307 ( -4.75%)
Amean 141 44.2043 44.8010 ( -1.35%)
Amean 172 54.2430 54.7260 ( -0.89%)
Amean 192 60.6557 60.9777 ( -0.53%)
hackbench-thread-sockets
Amean 1 1.0610 1.1353 ( -7.01%)
Amean 4 1.7543 1.9140 ( -9.10%)
Amean 7 2.7840 2.9573 ( -6.23%)
Amean 12 4.3813 4.4937 ( -2.56%)
Amean 21 7.3460 7.5350 ( -2.57%)
Amean 30 10.2313 10.5190 ( -2.81%)
Amean 48 15.9700 16.5940 ( -3.91%)
Amean 79 25.3973 26.6637 ( -4.99%)
Amean 110 35.1087 36.4797 ( -3.91%)
Amean 141 45.8220 46.3053 ( -1.05%)
Amean 172 55.4917 55.7320 ( -0.43%)
Amean 192 62.7490 62.5410 ( 0.33%)
Link: https://lkml.kernel.org/r/20211012134651.11258-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Jann Horn <jannh@google.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit f227f0faf6 ("slub: fix unreclaimable slab stat for bulk
free"), the check for free nonslab page is replaced by VM_BUG_ON_PAGE,
which only check with CONFIG_DEBUG_VM enabled, but this config may
impact performance, so it only for debug.
Commit 0937502af7 ("slub: Add check for kfree() of non slab objects.")
add the ability, which should be needed in any configs to catch the
invalid free, they even could be potential issue, eg, memory corruption,
use after free and double free, so replace VM_BUG_ON_PAGE to
WARN_ON_ONCE, add object address printing to help use to debug the
issue.
Link: https://lkml.kernel.org/r/20210930070214.61499-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rienjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These lines are useless, so remove them.
Link: https://lkml.kernel.org/r/20210930034845.2539-1-shi_lei@massclouds.com
Fixes: 10befea91b ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Shi Lei <shi_lei@massclouds.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not all files in the kernel should include mm.h. Migrating callers from
kmalloc to kvmalloc is easier if the kvmalloc functions are in slab.h.
[akpm@linux-foundation.org: move the new kvrealloc() also]
[akpm@linux-foundation.org: drivers/hwmon/occ/p9_sbe.c needs slab.h]
Link: https://lkml.kernel.org/r/20210622215757.3525604-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel doc validator complains:
Function parameter or member 'p' not described in 'prepend_name'
Excess function parameter 'buffer' description in 'prepend_name'
Link: https://lkml.kernel.org/r/20211011005614.26189-1-justin.he@arm.com
Fixes: ad08ae5865 ("d_path: introduce struct prepend_buffer")
Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fallthrough comment for an ignored cmpxchg() return value produces a
harmless warning with 'make W=1':
fs/posix_acl.c: In function 'get_acl':
fs/posix_acl.c:127:36: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
127 | /* fall through */ ;
| ^
Simplify it as a step towards a clean W=1 build. As all architectures
define cmpxchg() as a statement expression these days, it is no longer
necessary to evaluate its return code, and the if() can just be droped.
Link: https://lkml.kernel.org/r/20210927102410.1863853-1-arnd@kernel.org
Link: https://lore.kernel.org/all/20210322132103.qiun2rjilnlgztxe@wittgenstein/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <jamorris@linux.microsoft.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ocfs2_zero_range_for_truncate() can try to zero pages beyond current
inode size despite the fact that underlying blocks should be already
zeroed out and writeback will skip writing such pages anyway. Avoid the
pointless work.
Link: https://lkml.kernel.org/r/20211025151332.11301-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ocfs2: Truncate data corruption fix".
As further testing has shown, commit 5314454ea3 ("ocfs2: fix data
corruption after conversion from inline format") didn't fix all the data
corruption issues the customer started observing after 6dbf7bb555
("fs: Don't invalidate page buffers in block_write_full_page()") This
time I have tracked them down to two bugs in ocfs2 truncation code.
One bug (truncating page cache before clearing tail cluster and setting
i_size) could cause data corruption even before 6dbf7bb555, but before
that commit it needed a race with page fault, after 6dbf7bb555 it
started to be pretty deterministic.
Another bug (zeroing pages beyond old i_size) used to be harmless
inefficiency before commit 6dbf7bb555. But after commit 6dbf7bb555
in combination with the first bug it resulted in deterministic data
corruption.
Although fixing only the first problem is needed to stop data
corruption, I've fixed both issues to make the code more robust.
This patch (of 2):
ocfs2_truncate_file() did unmap invalidate page cache pages before
zeroing partial tail cluster and setting i_size. Thus some pages could
be left (and likely have left if the cluster zeroing happened) in the
page cache beyond i_size after truncate finished letting user possibly
see stale data once the file was extended again. Also the tail cluster
zeroing was not guaranteed to finish before truncate finished causing
possible stale data exposure. The problem started to be particularly
easy to hit after commit 6dbf7bb555 "fs: Don't invalidate page buffers
in block_write_full_page()" stopped invalidation of pages beyond i_size
from page writeback path.
Fix these problems by unmapping and invalidating pages in the page cache
after the i_size is reduced and tail cluster is zeroed out.
Link: https://lkml.kernel.org/r/20211025150008.29002-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20211025151332.11301-1-jack@suse.cz
Fixes: ccd979bdbc ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable ret is being assigned a value that is never read, it is
updated later on with a different value. The assignment is redundant
and can be removed.
Addresses-Coverity: ("Unused value")
Link: https://lkml.kernel.org/r/20211007233452.30815-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allocate and free struct ocfs2_journal in ocfs2_journal_init and
ocfs2_journal_shutdown. Init and release of system inodes references
the journal so reorder calls to make sure they work correctly.
Link: https://lkml.kernel.org/r/20211009145006.3478-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reference counting issue happens in two exception handling paths of
ocfs2_replay_truncate_records(). When executing these two exception
handling paths, the function forgets to decrease the refcount of handle
increased by ocfs2_start_trans(), causing a refcount leak.
Fix this issue by using ocfs2_commit_trans() to decrease the refcount of
handle in two handling paths.
Link: https://lkml.kernel.org/r/20210908102055.10168-1-cymi20@fudan.edu.cn
Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If opps.file is in DOS format, faulting instruction cannot be printed:
/ # ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
/ # ./scripts/decodecode < oops.file
[ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)
aarch64-linux-gnu-strip: '/tmp/tmp.5Y9eybnnSi.o': No such file
aarch64-linux-gnu-objdump: '/tmp/tmp.5Y9eybnnSi.o': No such file
All code
========
0: d0002881 adrp x1, 0x512000
4: 912f9c21 add x1, x1, #0xbe7
8: 94067e68 bl 0x19f9a8
c: d2800001 mov x1, #0x0 // #0
10: b900003f str wzr, [x1]
Code starting with the faulting instruction
===========================================
Background: The compilation environment is Ubuntu, and the test
environment is Windows. Most logs are generated in the Windows
environment. In this way, CR (carriage return) will inevitably appear,
which will affect the use of decodecode in the Ubuntu environment.
The repaired effect is as follows:
/ # ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
/ # ./scripts/decodecode < oops.file
[ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)
All code
========
0: d0002881 adrp x1, 0x512000
4: 912f9c21 add x1, x1, #0xbe7
8: 94067e68 bl 0x19f9a8
c: d2800001 mov x1, #0x0 // #0
10:* b900003f str wzr, [x1] <-- trapping instruction
Code starting with the faulting instruction
===========================================
0: b900003f str wzr, [x1]
Link: https://lkml.kernel.org/r/20211008064712.926-1-weidonghui@allwinnertech.com
Signed-off-by: weidonghui <weidonghui@allwinnertech.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Marc Zyngier <maz@misterjones.org>
Cc: Will Deacon <will@kernel.org>
Cc: Rabin Vincent <rabin@rab.in>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If both "mistake" version and "correction" version are the same, a
warning message is created by checkpatch which is impossible to fix.
But it was noticed that Colan Ian King created a commit e6c0a0889b
("ALSA: aloop: Fix spelling mistake "synchronization" ->
"synchronization"") which suggests that this spelling mistake was fixed
by replacing the word "synchronization" with itself. But the actual
diff shows that the mistake in the code was "sychronization". It is
rather likely that the "mistake" in spelling.txt should have been the
latter.
Link: https://lkml.kernel.org/r/20210926065529.6880-1-sven@narfation.org
Fixes: 2e74c9433ba8 ("scripts/spelling.txt: add more spellings to spelling.txt")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of the more common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel in the past few months.
Link: https://lkml.kernel.org/r/20210907072941.7033-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix compilation of callchain related code on powerpc with gcc11+.
- Fix PERF_SAMPLE_WEIGHT_STRUCT support in 'perf script'
- Check session->header.env.arch before using it, fixing a segmentation fault.
- Suppress 'rm dlfilter' build messages.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYX7NvgAKCRCyPKLppCJ+
J0kmAQCAuh2Pt2eL+KUat/RgHXG+on5EkFfqlHVzebjmTv3/nwD/eyKF5AcgvTfb
DGIntW5QskCeAal05g9Po1+xPdU2aQU=
=3oio
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix compilation of callchain related code on powerpc with gcc11+
- Fix PERF_SAMPLE_WEIGHT_STRUCT support in 'perf script'
- Check session->header.env.arch before using it, fixing a segmentation
fault
- Suppress 'rm dlfilter' build messages
* tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support
perf callchain: Fix compilation on powerpc with gcc11+
perf script: Check session->header.env.arch before using it
perf build: Suppress 'rm dlfilter' build message
* Fixes for Xen emulator bugs showing up as debug kernel WARNs
* Fix another issue with SEV/ES string I/O VMGEXITs
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmF6uGIUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNRagf/Srvk9lNcRh4cEzsczErKMyr3xOqA
jgsTSqgl1ExJI9sBLMpVYBOFGILMaMSrhLPIltKPy0Bj/E+hw8WOQwPa44QjWlSD
MAUxO1Nryt9Luc2L8uSd1c//g4fr4V1BhOaumk1lM14Q8EDfQBcDIMI2ZKueMU1+
2Q+n8/AsG63jQIINwKNidof0dzRtbfcE30Wq/8QHttIPo5wt6l0YClOlOikqNY8N
5+WSQFmuutHIXftq5Jb/Ldn/+HVukWZyZOEVwLnBpM9uBvIubNgcEakqvxsaVtAn
FHdvnA+Bk99/Xuhl+wRLQo8ofzQIQ13RQv3HPArJAJv34oAJZx2rNObVlA==
=6ofB
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- Fixes for s390 interrupt delivery
- Fixes for Xen emulator bugs showing up as debug kernel WARNs
- Fix another issue with SEV/ES string I/O VMGEXITs
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Take srcu lock in post_kvm_run_save()
KVM: SEV-ES: fix another issue with string I/O VMGEXITs
KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()
KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
KVM: s390: clear kicked_mask before sleeping again
-F weight in perf script is broken.
# ./perf mem record
# ./perf script -F weight
Samples for 'dummy:HG' event do not have WEIGHT attribute set. Cannot
print 'weight' field.
The sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. They share the same space, weight. The
lower 32 bits are exactly the same for both sample type. The higher 32
bits may be different for different architecture. For a new kernel on
x86, the PERF_SAMPLE_WEIGHT_STRUCT is used. For an old kernel or other
ARCHs, the PERF_SAMPLE_WEIGHT is used.
With -F weight, current perf script will only check the input string
"weight" with the PERF_SAMPLE_WEIGHT sample type. Because the commit
ea8d0ed6ea ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT") didn't
update the PERF_SAMPLE_WEIGHT_STRUCT sample type for perf script. For a
new kernel on x86, the check fails.
Use PERF_SAMPLE_WEIGHT_TYPE, which supports both sample types, to
replace PERF_SAMPLE_WEIGHT
Fixes: ea8d0ed6ea ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT")
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Joe Mario <jmario@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Joe Mario <jmario@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lore.kernel.org/r/1632929894-102778-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Got following build fail on powerpc:
CC arch/powerpc/util/skip-callchain-idx.o
In function ‘check_return_reg’,
inlined from ‘check_return_addr’ at arch/powerpc/util/skip-callchain-idx.c:213:7,
inlined from ‘arch_skip_callchain_idx’ at arch/powerpc/util/skip-callchain-idx.c:265:7:
arch/powerpc/util/skip-callchain-idx.c:54:18: error: ‘dwarf_frame_register’ accessing 96 bytes \
in a region of size 64 [-Werror=stringop-overflow=]
54 | result = dwarf_frame_register(frame, ra_regno, ops_mem, &ops, &nops);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/util/skip-callchain-idx.c: In function ‘arch_skip_callchain_idx’:
arch/powerpc/util/skip-callchain-idx.c:54:18: note: referencing argument 3 of type ‘Dwarf_Op *’
In file included from /usr/include/elfutils/libdwfl.h:32,
from arch/powerpc/util/skip-callchain-idx.c:10:
/usr/include/elfutils/libdw.h:1069:12: note: in a call to function ‘dwarf_frame_register’
1069 | extern int dwarf_frame_register (Dwarf_Frame *frame, int regno,
| ^~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The dwarf_frame_register args changed with [1],
Updating ops_mem accordingly.
[1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=5621fe5443da23112170235dd5cac161e5c75e65
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Mark Wieelard <mjw@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20210928195253.1267023-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf.data is not written cleanly, we would like to process existing
data as much as possible (please see f_header.data.size == 0 condition
in perf_session__read_header). However, perf.data with partial data may
crash perf. Specifically, we see crash in 'perf script' for NULL
session->header.env.arch.
Fix this by checking session->header.env.arch before using it to determine
native_arch. Also split the if condition so it is easier to read.
Committer notes:
If it is a pipe, we already assume is a native arch, so no need to check
session->header.env.arch.
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211004053238.514936-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following build message:
rm dlfilters/dlfilter-test-api-v0.o
is unwanted.
The object file is being treated as an intermediate file and being
automatically removed. Mark the object file as .SECONDARY to prevent
removal and hence the message.
Requested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210930062849.110416-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Three small fixes, all in drivers, and one sizeable update to the UFS
driver to remove the HPB 2.0 feature that has been objected to by Jens
and Christoph. Although the UFS patch is large and last minute, it's
essentially the least intrusive way of resolving the objections in
time for the 5.15 release.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYX25RSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQSWAP48whPB
LUQx2tffOVmf18q5HA22mFXz/KTjgmvTVvNJtAEAn8OPN3OAnndPgBYDXAKhChI9
iMvhn3UwYGM+6DQ9bHo=
=qS2s
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small fixes, all in drivers, and one sizeable update to the UFS
driver to remove the HPB 2.0 feature that has been objected to by Jens
and Christoph.
Although the UFS patch is large and last minute, it's essentially the
least intrusive way of resolving the objections in time for the 5.15
release"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: ufshpb: Remove HPB2.0 flows
scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT
scsi: ufs: ufs-exynos: Correct timeout value setting registers
scsi: ibmvfc: Fix up duplicate response detection
to use the determine_rate instead of round_rate clk op by default. This
caused lots of problems on Rockchip SoCs because they heavily use
the composite clk code to model the clk tree.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmF8sUARHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSVDDhAAn4VSkWRS2mlwSWHCjrNc5XBVqTuzQX4k
pH8k8rwLYJgEMnzN0a62I5kPUYcx47ILX/5+64vHfUyxYQsuDWu3O+Uvn8N5q8iD
pXwH+Thac30DdCYoi7NRHmfKg0wFuMf4ExlLzfSP64OvPR4OeIz45BLo2KcnASy5
9KW/epH89fwUDpE01p9wI+nS6ItJ3qTvm9PUIOWqUOJz0ZkBjsIid2I6y7k/HgEk
sfjBq2swvxaF7cIuCA11vesJ8pFVSSRowHgHdGsHa3okrB/x0nsi2JW2ie2zt/VQ
EFkuhuiWfs2O5brjqZiQLgkeggEKkO+9V4o30s3mVFLx9hcEsKR7a3XaPOtXdlHh
XrnDKnspWV1HKRRceRMbNw2oxeaDcJ2RC5kXpAVJraGeMDjn1NyYMxCFOyuwY0q6
8Mb7M1dRuMCdjIrhBGss+u5wg/wEG/vUZF6M4LBZAPzgRmX6lKVqzZrLH+9M0xSJ
ey77No79nhntzPg57FZHED/ghL+5TNVyp9pO2rDcVcn1QjPGcNw3M9Fm/yfRPAJF
C+5mvW5aoc+B6DFuQrpKSfUaa2KgqFKBj2CcEST/fcAH+FNKe9EZ8fkkM0jCdqZo
nNK5CbsuW5HSCF+L5EALosru4krAFK/VNzXI7hzfraKOLymkw6h9j78F+Gi0qRzS
0IHipNdFIhA=
=5b/O
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One fix for the composite clk that broke when we changed this clk type
to use the determine_rate instead of round_rate clk op by default.
This caused lots of problems on Rockchip SoCs because they heavily use
the composite clk code to model the clk tree"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: composite: Also consider .determine_rate for rate + mux composites
* A fix to ensure the trap vector's address is aligned.
* A fix to avoid re-populating the KASAN shadow memory.
* A fix to allow kasan to build without warnings, which have recently
become errors.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmF8OSoTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiVQvD/9u7Dgx473QjQWhP4mgHe6QWhBHVOB3
3b4eJ39aSvpo7ODbw8ZpJ/SfHLM4UzB3/jjuIZW1tsM0TfwryOVPho5rqRugP5ho
8kHX4TS6eRz3wVxpBWaiJP8sSoeDkq3+SyVVA+vK4siFZ67RdzRbn+vt3JsF0B9M
jvyJJQrNxRvnY/yR6CWXSvyrzHPxfd0OPOHF5cYBFRnPK/bp3hSVhyTFUt17MCWE
SBOKkUU5zuWFqZxYxq8i+g+wd5HvMGOkm4NhpVtU88sgq3EVh6yCMDeFSTn54n2r
NIw356/nNLOeX+nRJP7vQZrKeU4lHrSSc8Kma1tuekgdV6mxbZbueJWrOzDAE7XM
0CAMwwF2tn+jjR0Q9LmtonjYggGzQuOF0djLyL6HFTbp3VD7I6J8HXtmD5K9vA+I
zspx7kHT6zkF/6XzTkDImkXSAJTlzxgeFKVYhHfSzd4oKviYrCTf92t3XZnO55Ug
settBaGeMjzi9uYaHjdeiOBJnZXbL1Yxvy4zDcK91Gwhsp4f3Q45Qc/90WtUOxIu
R91xrrEH4cmQIbPMH1UvuQ710wRfypv+AuzeYgufj1HAihNV2cs90m34y1uFzD6y
Ozgdu+x46ac1oC+iOrMygzli+OCqLKHrqBkDLI2Zd1pzBgJWXAuT/oxarOkv5p6P
Ot8yxTZ72qJxWA==
=BArU
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"These are pretty late, but they do fix concrete issues.
- ensure the trap vector's address is aligned.
- avoid re-populating the KASAN shadow memory.
- allow kasan to build without warnings, which have recently become
errors"
* tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix asan-stack clang build
riscv: Do not re-populate shadow memory with kasan_populate_early_shadow
riscv: fix misalgned trap vector base address
The Host Performance Buffer feature allows UFS read commands to carry the
physical media addresses along with the LBAs, thus allowing less internal
L2P-table switches in the device. HPB1.0 allowed a single LBA, while
HPB2.0 increases this capacity up to 255 blocks.
Carrying more than a single record, the read operation is no longer purely
of type "read" but a "hybrid" command: Writing the physical address to the
device in one operation and reading back the required payload in another.
The JEDEC HPB spec defines two commands for this operation:
HPB-WRITE-BUFFER (0x2) to write the physical addresses to device, and
HPB-READ to read the payload.
With the current HPB design the UFS driver has no alternative but to divide
the READ request into 2 separate commands: HPB-WRITE-BUFFER and HPB-READ.
This causes a great deal of aggravation to the block layer guys who
demanded that we completely revert the entire HPB driver regardless of the
huge amount of corporate effort already invested in it.
As a compromise, remove only the pieces that implement the 2.0
specification. This is done as a matter of urgency for the final 5.15
release.
Link: https://lore.kernel.org/r/20211030062301.248-1-avri.altman@wdc.com
Tested-by: Avri Altman <avri.altman@wdc.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Three commits fixing some issues introduced with the recent IOMMU changes we merged.
Thanks to: Alexey Kardashevskiy
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmF8e20THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG4GEAC6SAo82kdWOR5MEwZ+U9ht91KqzK/O
UYm5WLapGaWzKtS5zMZUzc3dTXFsbeSdPghfGp2eLpnkQkfDnltyGA4ERVunEQUg
soGXt0OXBfmt++D+yogMkURr9tYzZ+ssrdCEC6Vmuv35Tf/dsHA9upWCVVw4UOOv
w/RRR1uku6kup7NCX6TJZoUQSeAXISDhwk3LXF5jh/hqSyV3BV9yulHYs3J4WtFQ
zr2dsxXL5DvgM3cOmLBZbnmTYSmU1f0jtGhqEf/6Ar3ljLHZgIvLLWmZK1UyyMwv
4P7yFIBVObW0sBqGz/4K9p73l8MnVHORrBUe73OavZgrxaM2u0k/7JRj72txcbMw
KXNKvDoA9nambWV98OXt4HE2bInsporn4DSDSAwJZmuTcqltbFOCf47t0kpe66fB
ZC2IJCOfrol5ELZvmdAsuVWEucHkdPMPqz0ZB46E/givQf45RmfYnjvVKm8HTQls
aOivKNuP2VLXhHdeocly1adaDMaeMEouYw1p00VKhOW4GbfHCZs4IczDtqacHRMR
NB+C0awi6DDQ7WXCEYURte9iURn8owjJhMuIsQj2/SUHVfuQEeJL1llSzo9sa/f2
ONT0esDhKSuq74T1nmimUMOvAyBg/TKuwY87TJ9BKcJzqSj+oPuuw28d8HBi61qk
S9rG2jHto84HVw==
=R9Yq
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Three commits fixing some issues introduced with the recent IOMMU
changes we merged.
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
powerpc/pseries/iommu: Check if the default window in use before removing it
powerpc/pseries/iommu: Use correct vfree for it_map
- fix the return value check when parsing the ngpios property in gpio-xgs-iproc
- check the return value of bgpio_init() in gpio-mlxbf2
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmF8XHAACgkQEacuoBRx
13ImBg/+NT/XE6gU+vuXNrRhNRqYI1s3HRn/mviCXMnjS2OJGD5Pd6KOt4gWX+Sm
z8/+/4W9hwLRmd4oFF+4+Q073YHmNNBXYdQ/JZM37KDF88hHDcX675pBX19qJjKT
xWgAzF3LMDHgTtXUrvsf36e2tezkZcnN6Rda2xzEi+j9Tq+ZrePfD7V44YlY3ka6
Vu/+NODR2umYc6rcrvIb2hHqUBIQUDCmIkIAGFK4CwVdC+VxtWya3cB4L8JqtF4l
KMPIC9mO2jyAho0LufF/uBAlgNyeg/rfYpSYJo9pw882aKupxAMjI9OUT2CkwuKU
tCi4wsYZUas5CqULe7bNNoLT4kZNVbiauc/3pqew0P84T9Uus2I4CyxHSgU9sfLx
3767JdXwJPZPyJFVSkdvQTidVRVZEgRKTkV/mc8f3FBw6sAa3OGOQKoSeuhS8lii
BYdZ+9dulLHI3wlhfNSaY/mBdO0flLwsUZppOo5nDXm36Wz1jbbk1sApDFP6Ywsd
YHY5W5gyb84Qt7alQ2MbzhkFrr9dFKdfIGmLDK0dtPna8DM7Wy0GyLGiJKkUf/Kw
sP2SYGrNMP0s/pH1j4O3ULiVX1Fo0L7FgyycnWfUmxinE8Z/yzjbdAcxdO6SwYMy
iSxvuuT08wvnJ8tZih3+y3gnL+gAOhlu0xndfcXdSYDs2f31ZeQ=
=Sxrc
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix the return value check when parsing the ngpios property in
gpio-xgs-iproc
- check the return value of bgpio_init() in gpio-mlxbf2
* tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: mlxbf2.c: Add check for bgpio_init failure
gpio: xgs-iproc: fix parsing of ngpios property
Testing revealed a problem with how the reference tag was handled for
a WRITE_INSERT operation. The SCSI_PROT_REF_CHECK flag is not set when
the controller is asked to generate the protection information
(i.e. not DIX). And as a result the initial reference tag would not be
set in the WRITE_INSERT case.
Separate handling of the REF_CHECK and REF_INCREMENT flags to align
with both the DIX spec and the MPI implementation.
Link: https://lore.kernel.org/r/20211028034202.24225-1-martin.petersen@oracle.com
Fixes: b3e2c72af1 ("scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- tmio: Re-enable card irqs after a reset
- mtk-sd: Fixup probing of cqhci for crypto
- cqhci: Fix support for suspend/resume
- vub300: Fix control-message timeouts
- dw_mmc-exynos: Fix support for tuning
- winbond: Silences build errors on M68K
- sdhci-esdhc-imx: Fix support for tuning
- sdhci-pci: Read card detect from ACPI for Intel Merrifield
- sdhci: Fix eMMC support for Thundercomm TurboX CM2290
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmF74zoXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCmQ2Q//fF/GEc69vN2pP9lps/jN6pdT
Njd29UHonDf89krYYEoUd0MmwijndNC9HKIICM3XKRwzywweVDcgDGn0yJ/DPtma
vgZRsZG4KM1h6RPs8TbGS17aMO1ZGNvZZAiP4f6CQzQqJUGS2Iu8kTx+XeZmqkuF
IShAQB2X4CcThI7nZXPsaQVYFHgXY6L0MMxgZmvO36WDCHRij22xaEG5S4gmv825
l+P+z3KabAeIrpTU098EshvXp/TfmAxar9OSymxgQqDlD2t3eW1NwcexNOfa/aW0
yfQrMH/r37T6hDZoa7U27e7cA2Ma/y+A3c6J22siVEuiq/DQ3cUrRrHJvoYtz9uG
m98z9ZjosxJy3O09bKjUWgzYwVDywwZtWOmXgNNRQpm8Qa4R6+2ZRitst+H9iJ6L
m81Wx3iZK1I6Iak9a8palxOskHiD76IQ3V2xqbUP08O/DQSc+/460iuFqQ+yIybd
PvNp0PpeZ7iCGC4gzTjGpL+SXxgrqGIO+MRMPUaSOMAZuaiIoAmBn8qwWdUUxiwD
csmej1e64bDDctvHCg9PvZU2a8+TmQcdGxUTGB5pvf7mcJ5gSfN9AzKneqhWzw9J
eBJwRMjTqGrRkSBGzj7PUrEjISlkzb0rt0sa7bYmd/7nIh1XX+MP5v7I35162jC6
mPLTdFW+mxaNn6Nx2+E=
=SM2+
-----END PGP SIGNATURE-----
Merge tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- tmio: Re-enable card irqs after a reset
- mtk-sd: Fixup probing of cqhci for crypto
- cqhci: Fix support for suspend/resume
- vub300: Fix control-message timeouts
- dw_mmc-exynos: Fix support for tuning
- winbond: Silences build errors on M68K
- sdhci-esdhc-imx: Fix support for tuning
- sdhci-pci: Read card detect from ACPI for Intel Merrifield
- sdhci: Fix eMMC support for Thundercomm TurboX CM2290
* tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: tmio: reenable card irqs after the reset callback
mmc: mediatek: Move cqhci init behind ungate clock
mmc: cqhci: clear HALT state after CQE enable
mmc: vub300: fix control-message timeouts
mmc: dw_mmc: exynos: fix the finding clock sample value
mmc: winbond: don't build on M68K
mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning circuit
mmc: sdhci-pci: Read card detect from ACPI for Intel Merrifield
mmc: sdhci: Map more voltage level to SDHCI_POWER_330
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmF72q0ACgkQxWXV+ddt
WDvFOxAAkcryx2FP5aqaoMzBKfoCtMFHO3uAvm+rsMcglWe5kaXhBnHa2HPzoyEh
YqEx2TeXMTuA2I15bU8KV1RMhQzzRjC4NhdRqY6uaKAcKgON6sJlK5qsq2BnB+V3
nrue1jppM2Vv8wNzjMNeVETQNC7pmg29yQP/fvWaB36Yar2tyfyWDF11e42HR7cU
yLQUedg30WEayz3Mp6MTBF36h09WXQrZSs7Iwk1JMQbpxWcpn2CjXrO+vIZOMdvH
XZZsxBTNB8GJIaJlXssgsq3OP2wspK1lrVHNfi5PYtcZEaFrhkPaVB6enDfd41YV
zXwj1dnemCni9fh88gZprel9bLyB37dSVfIqq2Ly3hQbSAN4dmHIpxGwPSRIr+Hl
Bn3UfClHpAftbpd/Y77U7GgcYnkuRo3Bd4mGTF3ZuPDLVrf/QX5BlfGa2dmJYoml
NfBit7Ha4UrxLW6C8RC6fyEbLQxpNYFY55Ra0Tj0BBO/uhWiqtQGZwC/qbyPKfzN
YZFcPR6iTILoCHXNan3iZIuLeASMT0djgAtunXXf/BuFnxGfnOuqL3bKt2vojh3+
rsqpeIxSP/VklKv4JcP3axeLmUK6cA8/9dV2ES0M0Fc0o341jfh+AoVw0GleFeus
gXlDFPRJeE8yyXmjKyW4shctOczqoeMIq3umebXPP9R4jd/LU/g=
=YWGa
-----END PGP SIGNATURE-----
Merge tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Last minute fixes for crash on 32bit architectures when compression is
in use. It's a regression introduced in 5.15-rc and I'd really like
not let this into the final release, fixes via stable trees would add
unnecessary delay.
The problem is on 32bit architectures with highmem enabled, the pages
for compression may need to be kmapped, while the patches removed that
as we don't use GFP_HIGHMEM allocations anymore. The pages that don't
come from local allocation still may be from highmem. Despite being on
32bit there's enough such ARM machines in use so it's not a marginal
issue.
I did full reverts of the patches one by one instead of a huge one.
There's one exception for the "lzo" revert as there was an
intermediate patch touching the same code to make it compatible with
subpage. I can't revert that one too, so the revert in lzo.c is
manual. Qu Wenruo has worked on that with me and verified the changes"
* tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Revert "btrfs: compression: drop kmap/kunmap from lzo"
Revert "btrfs: compression: drop kmap/kunmap from zlib"
Revert "btrfs: compression: drop kmap/kunmap from zstd"
Revert "btrfs: compression: drop kmap/kunmap from generic helpers"
- Some bots have informed me that some of the ftrace functions kernel-doc
has formatting issues.
- Also, fix my snake instinct.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYXwAqxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qpl+AQDn8QTG2RN9+jWVTNkCcWNNQDKPi5Ij
QYquG/clrCVnwgEAtxUoC+ZAcnsadBm5gXwjzCi62aWfAb9apkTHb6fZAgc=
=H3GU
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing comment fixes from Steven Rostedt:
- Some bots have informed me that some of the ftrace functions
kernel-doc has formatting issues.
- Also, fix my snake instinct.
* tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix misspelling of "missing"
ftrace: Fix kernel-doc formatting issues
Pull crypto fix from Herbert Xu:
"Fix a build-time warning in x86/sm4"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/sm4 - Fix invalid section entry size
Merge misc fixes from Andrew Morton:
"11 patches.
Subsystems affected by this patch series: mm (memcg, memory-failure,
oom-kill, secretmem, vmalloc, hugetlb, damon, and tools), and ocfs2"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer
mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'
mm: khugepaged: skip huge page collapse for special files
mm, thp: bail out early in collapse_file for writeback page
mm/vmalloc: fix numa spreading for large hash tables
mm/secretmem: avoid letting secretmem_users drop to zero
ocfs2: fix race between searching chunks and release journal_head from buffer_head
mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap
mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
mm: hwpoison: remove the unnecessary THP check
memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
When calling this function, all the shadow memory is already populated
with kasan_early_shadow_pte which has PAGE_KERNEL protection.
kasan_populate_early_shadow write-protects the mapping of the range
of addresses passed in argument in zero_pte_populate, which actually
write-protects all the shadow memory mapping since kasan_early_shadow_pte
is used for all the shadow memory at this point. And then when using
memblock API to populate the shadow memory, the first write access to the
kernel stack triggers a trap. This becomes visible with the next commit
that contains a fix for asan-stack.
We already manually populate all the shadow memory in kasan_early_init
and we write-protect kasan_early_shadow_pte at the end of kasan_init
which makes the calls to kasan_populate_early_shadow superfluous so
we can remove them.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: e178d670f2 ("riscv/kasan: add KASAN_VMALLOC support")
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.
Also, the "ftrace_is_dead()" had an incomplete description.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>