WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Mitchell Levy	4fa7bc1bbd	Merge fix/xsaves-lbr/5.15 into v5.15 * commit '46b414261e8193c1118924e0c62b773ad1747aff': (1884 commits) x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported Linux 5.15.167 udp: fix receiving fraglist GSO packets memcg: protect concurrent access to mem_cgroup_idr btrfs: fix race between direct IO write and fsync when using same fd net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket x86/mm: Fix PTI for i386 some more net: drop bad gso csum_start and offset in virtio_net_hdr gso: fix dodgy bit handling for GSO_UDP_L4 net: change maximum number of UDP segments to 128 net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation gpio: rockchip: fix OF node leak in probe() drm/i915/fence: Mark debug_fence_free() with __maybe_unused drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused ASoC: sunxi: sun4i-i2s: fix LRCLK polarity in i2s mode nvmet-tcp: fix kernel crash if commands allocation fails arm64: acpi: Harden get_cpu_for_acpi_id() against missing CPU entry arm64: acpi: Move get_cpu_for_acpi_id() to a header ACPI: processor: Fix memory leaks in error paths of processor_add() ACPI: processor: Return an error if acpi_processor_get_info() fails in processor_add() ...	2024-10-10 15:55:58 -07:00
Mitchell Levy	e704bade90	Merge feature/memory-reclaim/5.15 into v5.15 * commit 'fed46d1f99d22a5a9efd06da0bf5baf6a04045d8': selftests: cgroup: add a selftest for memory.reclaim selftests: cgroup: fix unsigned comparison with less than zero selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory selftests: cgroup: return -errno from cg_read()/cg_write() on failure memcg: introduce per-memcg reclaim interface	2024-10-10 15:55:57 -07:00
Yuri Benditovich	5c3e0ed810	net: change maximum number of UDP segments to 128 [ Upstream commit 1382e3b6a3500c245e5278c66d210c02926f804f ] The commit `fc8b2a6194` ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") adds check of potential number of UDP segments vs UDP_MAX_SEGMENTS in linux/virtio_net.h. After this change certification test of USO guest-to-guest transmit on Windows driver for virtio-net device fails, for example with packet size of ~64K and mss of 536 bytes. In general the USO should not be more restrictive than TSO. Indeed, in case of unreasonably small mss a lot of segments can cause queue overflow and packet loss on the destination. Limit of 128 segments is good for any practical purpose, with minimal meaningful mss of 536 the maximal UDP packet will be divided to ~120 segments. The number of segments for UDP packets is validated vs UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect quest-to-guest path but does affect packets sent to host, for example. It is important to mention that UDP_MAX_SEGMENTS is kernel-only define and not available to user mode socket applications. In order to request MSS smaller than MTU the applications just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is no limitations on socket API level. Fixes: `fc8b2a6194` ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [5.15-stable: fix conflict with neighboring but unrelated code from e2a4392b61f6 ("udp: introduce udp->udp_flags") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-09-12 11:07:53 +02:00
Zenghui Yu	18e65173fe	kselftests: dmabuf-heaps: Ensure the driver name is null-terminated [ Upstream commit 291e4baf70019f17a81b7b47aeb186b27d222159 ] Even if a vgem device is configured in, we will skip the import_vgem_fd() test almost every time. TAP version 13 1..11 # Testing heap: system # ======================================= # Testing allocation and importing: ok 1 # SKIP Could not open vgem -1 The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver version information but leave the name field a non-null-terminated string. Terminate it properly to actually test against the vgem device. While at it, let's check the length of the driver name is exactly 4 bytes and return early otherwise (in case there is a name like "vgemfoo" that gets converted to "vgem\0" unexpectedly). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240729024604.2046-1-yuzenghui@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-09-12 11:07:49 +02:00
Andreas Ziegler	bc152bd3c9	libbpf: Add NULL checks to bpf_object__{prev_map,next_map} [ Upstream commit cedc12c5b57f7efa6dbebfb2b140e8675f5a2616 ] In the current state, an erroneous call to bpf_object__find_map_by_name(NULL, ...) leads to a segmentation fault through the following call chain: bpf_object__find_map_by_name(obj = NULL, ...) -> bpf_object__for_each_map(pos, obj = NULL) -> bpf_object__next_map((obj = NULL), NULL) -> return (obj = NULL)->maps While calling bpf_object__find_map_by_name with obj = NULL is obviously incorrect, this should not lead to a segmentation fault but rather be handled gracefully. As __bpf_map__iter already handles this situation correctly, we can delegate the check for the regular case there and only add a check in case the prev or next parameter is NULL. Signed-off-by: Andreas Ziegler <ziegler.andreas@siemens.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240703083436.505124-1-ziegler.andreas@siemens.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-09-12 11:07:48 +02:00
Alexander Lobakin	ce0aa899c9	tools: move alignment-related macros to new <linux/align.h> commit 10a04ff09bcc39e0044190ffe9f00f998f13647c upstream. Currently, tools have ALIGN() macros scattered across the unrelated headers, as there are only 3 of them and they were added separately each time on an as-needed basis. Anyway, let's make it more consistent with the kernel headers and allow using those macros outside of the mentioned headers. Create <linux/align.h> inside the tools/ folder and include it where needed. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-09-04 13:23:37 +02:00
Simon Horman	38188b4d6d	tc-testing: don't access non-existent variable on exception [ Upstream commit a0c9fe5eecc97680323ee83780ea3eaf440ba1b7 ] Since commit `255c1c7279` ("tc-testing: Allow test cases to be skipped") the variable test_ordinal doesn't exist in call_pre_case(). So it should not be accessed when an exception occurs. This resolves the following splat: ... During handling of the above exception, another exception occurred: Traceback (most recent call last): File ".../tdc.py", line 1028, in <module> main() File ".../tdc.py", line 1022, in main set_operation_mode(pm, parser, args, remaining) File ".../tdc.py", line 966, in set_operation_mode catresults = test_runner_serial(pm, args, alltests) File ".../tdc.py", line 642, in test_runner_serial (index, tsr) = test_runner(pm, args, alltests) File ".../tdc.py", line 536, in test_runner res = run_one_test(pm, args, index, tidx) File ".../tdc.py", line 419, in run_one_test pm.call_pre_case(tidx) File ".../tdc.py", line 146, in call_pre_case print('test_ordinal is {}'.format(test_ordinal)) NameError: name 'test_ordinal' is not defined Fixes: `255c1c7279` ("tc-testing: Allow test cases to be skipped") Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20240815-tdc-test-ordinal-v1-1-0255c122a427@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-09-04 13:23:32 +02:00
Al Viro	5053581fe5	fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE commit 9a2fa1472083580b6c66bdaf291f591e1170123a upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, if all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-09-04 13:23:17 +02:00
Alexander Lobakin	0f00902172	bitmap: introduce generic optimized bitmap_size() commit a37fbe666c016fd89e4460d0ebfcea05baba46dc upstream. The number of times yet another open coded `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge. Some generic helper is long overdue. Add one, bitmap_size(), but with one detail. BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both divident and divisor are compile-time constants or when the divisor is not a pow-of-2. When it is however, the compilers sometimes tend to generate suboptimal code (GCC 13): 48 83 c0 3f add $0x3f,%rax 48 c1 e8 06 shr $0x6,%rax 48 8d 14 c5 00 00 00 00 lea 0x0(,%rax,8),%rdx %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does full division of `nbits + 63` by it and then multiplication by 8. Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC: 8d 50 3f lea 0x3f(%rax),%edx c1 ea 03 shr $0x3,%edx 81 e2 f8 ff ff 1f and $0x1ffffff8,%edx Now it shifts `nbits + 63` by 3 positions (IOW performs fast division by 8) and then masks bits[2:0]. bloat-o-meter: add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617) Clang does it better and generates the same code before/after starting from -O1, except that with the ALIGN() approach it uses %edx and thus still saves some bytes: add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520) Note that we can't expand DIV_ROUND_UP() by adding a check and using this approach there, as it's used in array declarations where expressions are not allowed. Add this helper to tools/ as well. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-09-04 13:23:16 +02:00
Matthieu Baerts (NGI0)	9d1f4ecc31	selftests: mptcp: join: check backup support in signal endp commit f833470c27832136d4416d8fc55d658082af0989 upstream. Before the previous commit, 'signal' endpoints with the 'backup' flag were ignored when sending the MP_JOIN. The MPTCP Join selftest has then been modified to validate this case: the "single address, backup" test, is now validating the MP_JOIN with a backup flag as it is what we expect it to do with such name. The previous version has been kept, but renamed to "single address, switch to backup" to avoid confusions. The "single address with port, backup" test is also now validating the MPJ with a backup flag, which makes more sense than checking the switch to backup with an MP_PRIO. The "mpc backup both sides" test is now validating that the backup flag is also set in MP_JOIN from and to the addresses used in the initial subflow, using the special ID 0. The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: `4596a2c1b7` ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> [ Conflicts in mptcp_join.sh because 'run_tests' helper has been modified in multiple commits that are not in this version, e.g. commit `e571fb09c8` ("selftests: mptcp: add speed env var") and commit `ae7bd9ccec` ("selftests: mptcp: join: option to execute specific tests"). Adaptations have been made to use the old way, similar to what is done around. Also in this version, there is no "single address with port, backup" subtest. Same for "mpc backup both sides". ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-19 05:45:49 +02:00
Matthieu Baerts (NGI0)	34558a433f	selftests: mptcp: join: validate backup in MPJ commit 935ff5bb8a1cfcdf8e60c8f5c794d0bbbc234437 upstream. A peer can notify the other one that a subflow has to be treated as "backup" by two different ways: either by sending a dedicated MP_PRIO notification, or by setting the backup flag in the MP_JOIN handshake. The selftests were previously monitoring the former, but not the latter. This is what is now done here by looking at these new MIB counters when validating the 'backup' cases: MPTcpExtMPJoinSynBackupRx MPTcpExtMPJoinSynAckBackupRx The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it will help to validate a new fix for an issue introduced by this commit ID. Fixes: `4596a2c1b7` ("mptcp: allow creating non-backup subflows") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> [ Conflicts in mptcp_join.sh because the check are done has changed, e.g. in commit `03668c65d1` ("selftests: mptcp: join: rework detailed report"), or commit `985de45923` ("selftests: mptcp: centralize stats dumping"), etc. Adaptations have been made to use the old way, similar to what is done just above. Also, in this version, some subtests are missing. Only the two using chk_prio_nr() have been modified. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-19 05:45:49 +02:00
Feng Tang	fcd4f3a9d9	clocksource: Scale the watchdog read retries automatically [ Upstream commit 2ed08e4bc53298db3f87b528cd804cb0cce066a9 ] On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled during boot time on about one out of 120 boot attempts: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is that for platform with a large number of CPUs, there are sporadic big or huge read latencies while reading the watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the number of online CPUs. The cCurrent code already has logic to detect and filter such high latency case by reading the watchdog twice and checking the two deltas. Due to the randomness of the latency, there is a low probabilty that the first delta (latency) is big, but the second delta is small and looks valid. The watchdog code retries the readouts by default twice, which is not necessarily sufficient for systems with a large number of CPUs. There is a command line parameter 'max_cswd_read_retries' which allows to increase the number of retries, but that's not user friendly as it needs to be tweaked per system. As the number of required retries is proportional to the number of online CPUs, this parameter can be calculated at runtime. Scale and enlarge the number of retries according to the number of online CPUs and remove the command line parameter completely. [ tglx: Massaged change log and comments ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com Stable-dep-of: f2655ac2c06a ("clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:45 +02:00
Paul E. McKenney	d607bbc7f0	torture: Enable clocksource watchdog with "tsc=watchdog" [ Upstream commit `877a0e83c5` ] This commit tests the "tsc=watchdog" kernel boot parameter when running the clocksourcewd torture tests. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Stable-dep-of: f2655ac2c06a ("clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:45 +02:00
Yonghong Song	6f8dc63f8e	selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT [ Upstream commit 7015843afcaf68c132784c89528dfddc0005e483 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:37 +02:00
Andrii Nakryiko	ca667c3c90	libbpf: Fix no-args func prototype BTF dumping syntax [ Upstream commit 189f1a976e426011e6a5588f1d3ceedf71fe2965 ] For all these years libbpf's BTF dumper has been emitting not strictly valid syntax for function prototypes that have no input arguments. Instead of `int (blah)()` we should emit `int (blah)(void)`. This is not normally a problem, but it manifests when we get kfuncs in vmlinux.h that have no input arguments. Due to compiler internal specifics, we get no BTF information for such kfuncs, if they are not declared with proper `(void)`. The fix is trivial. We also need to adjust a few ancient tests that happily assumed `()` is correct. Fixes: `351131b51c` ("libbpf: add btf_dump API for BTF-to-C conversion") Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/bpf/20240712224442.282823-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:23 +02:00
Michael Ellerman	22cc7f013a	selftests/sigaltstack: Fix ppc64 GCC build commit 17c743b9da9e0d073ff19fd5313f521744514939 upstream. Building the sigaltstack test with GCC on 64-bit powerpc errors with: gcc -Wall sas.c -o /home/michael/linux/.build/kselftest/sigaltstack/sas In file included from sas.c:23: current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent" 22 \| #error "implement current_stack_pointer equivalent" \| ^~~~~ sas.c: In function ‘my_usr1’: sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’? 50 \| if (sp < (unsigned long)sstack \|\| \| ^~ This happens because GCC doesn't define __ppc__ for 64-bit builds, only 32-bit builds. Instead use __powerpc__ to detect powerpc builds, which is defined by clang and GCC for 64-bit and 32-bit builds. Fixes: `05107edc91` ("selftests: sigaltstack: fix -Wuninitialized") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240520062647.688667-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-19 05:45:20 +02:00
Alan Stern	f5c99f224e	tools/memory-model: Fix bug in lock.cat commit 4c830eef806679dc243e191f962c488dd9d00708 upstream. Andrea reported that the following innocuous litmus test: C T {} P0(spinlock_t *x) { int r0; spin_lock(x); spin_unlock(x); r0 = spin_is_locked(x); } gives rise to a nonsensical empty result with no executions: $ herd7 -conf linux-kernel.cfg T.litmus Test T Required States 0 Ok Witnesses Positive: 0 Negative: 0 Condition forall (true) Observation T Never 0 0 Time T 0.00 Hash=6fa204e139ddddf2cb6fa963bad117c0 The problem is caused by a bug in the lock.cat part of the LKMM. Its computation of the rf relation for RU (read-unlocked) events is faulty; it implicitly assumes that every RU event must read from either a UL (unlock) event in another thread or from the lock's initial state. Neither is true in the litmus test above, so the computation yields no possible executions. The lock.cat code tries to make up for this deficiency by allowing RU events outside of critical sections to read from the last po-previous UL event. But it does this incorrectly, trying to keep these rfi links separate from the rfe links that might also be needed, and passing only the latter to herd7's cross() macro. The problem is fixed by merging the two sets of possible rf links for RU events and using them all in the call to cross(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Andrea Parri <parri.andrea@gmail.com> Closes: https://lore.kernel.org/linux-arch/ZlC0IkzpQdeGj+a3@andrea/ Tested-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Andrea Parri <parri.andrea@gmail.com> Fixes: `15553dcbca` ("tools/memory-model: Add model support for spin_is_locked()") CC: <stable@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-19 05:45:15 +02:00
Mickaël Salaün	fc2ea3b5f7	selftests/landlock: Add cred_transfer test commit cc374782b6ca0fd634482391da977542443d3368 upstream. Check that keyctl(KEYCTL_SESSION_TO_PARENT) preserves the parent's restrictions. Fixes: `e1199815b4` ("selftests/landlock: Add user space tests") Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240724.Ood5aige9she@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-08-19 05:45:14 +02:00
Adrian Hunter	d856cb53b6	perf intel-pt: Fix exclude_guest setting [ Upstream commit b40934ae32232140e85dc7dc1c3ea0e296986723 ] In the past, the exclude_guest setting has had no effect on Intel PT tracing, but that may not be the case in the future. Set the flag correctly based upon whether KVM is using Intel PT "Host/Guest" mode, which is determined by the kvm_intel module parameter pt_mode: pt_mode=0 System-wide mode : host and guest output to host buffer pt_mode=1 Host/Guest mode : host/guest output to host/guest buffers respectively Fixes: `6e86bfdc4a` ("perf intel-pt: Support decoding of guest kernel") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240625104532.11990-3-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:02 +02:00
Adrian Hunter	d5f39d2b82	perf intel-pt: Fix aux_watermark calculation for 64-bit size [ Upstream commit 36b4cd990a8fd3f5b748883050e9d8c69fe6398d ] aux_watermark is a u32. For a 64-bit size, cap the aux_watermark calculation at UINT_MAX instead of truncating it to 32-bits. Fixes: `874fc35cdd` ("perf intel-pt: Use aux_watermark") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240625104532.11990-2-adrian.hunter@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:02 +02:00
Namhyung Kim	2f74f09fcc	perf report: Fix condition in sort__sym_cmp() [ Upstream commit cb39d05e67dc24985ff9f5150e71040fa4d60ab8 ] It's expected that both hist entries are in the same hists when comparing two. But the current code in the function checks one without dso sort key and other with the key. This would make the condition true in any case. I guess the intention of the original commit was to add '!' for the right side too. But as it should be the same, let's just remove it. Fixes: `69849fc5d2` ("perf hists: Move sort__has_dso into struct perf_hpp_list") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240621170528.608772-2-namhyung@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:45:01 +02:00
Amit Cohen	28dfdb7d67	selftests: forwarding: devlink_lib: Wait for udev events after reloading [ Upstream commit f67a90a0c8f5b3d0acc18f10650d90fec44775f9 ] Lately, an additional locking was added by commit c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()"). The locking protects dev_uevent() calling. This function is used to send messages from the kernel to user space. Uevent messages notify user space about changes in device states, such as when a device is added, removed, or changed. These messages are used by udev (or other similar user-space tools) to apply device-specific rules. After reloading devlink instance, udev events should be processed. This locking causes a short delay of udev events handling. One example for useful udev rule is renaming ports. 'forwading.config' can be configured to use names after udev rules are applied. Some tests run devlink_reload() and immediately use the updated names. This worked before the above mentioned commit was pushed, but now the delay of uevent messages causes that devlink_reload() returns before udev events are handled and tests fail. Adjust devlink_reload() to not assume that udev events are already processed when devlink reload is done, instead, wait for udev events to ensure they are processed before returning from the function. Without this patch: TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory Cannot find device "swp1" Cannot find device "swp2" TEST: setup_wait_dev (: Interface swp1 does not come up.) [FAIL] With this patch: $ TESTS='rif_mac_profile' ./resource_scale.sh TEST: 'rif_mac_profile' 4 [ OK ] TEST: 'rif_mac_profile' overflow 5 [ OK ] This is relevant not only for this test. Fixes: `bc7cbb1e9f` ("selftests: forwarding: Add devlink_lib.sh") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/89367666e04b38a8993027f1526801ca327ab96a.1720709333.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:44:59 +02:00
Geliang Tang	a0737beff6	selftests/bpf: Close fd in error path in drop_on_reuseport [ Upstream commit adae187ebedcd95d02f045bc37dfecfd5b29434b ] In the error path when update_lookup_map() fails in drop_on_reuseport in prog_tests/sk_lookup.c, "server1", the fd of server 1, should be closed. This patch fixes this by using "goto close_srv1" lable instead of "detach" to close "server1" in this case. Fixes: `0ab5539f85` ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point") Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/86aed33b4b0ea3f04497c757845cff7e8e621a2d.1720515893.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:44:59 +02:00
Donglin Peng	fb274d9c68	libbpf: Checking the btf_type kind when fixing variable offsets [ Upstream commit cc5083d1f3881624ad2de1f3cbb3a07e152cb254 ] I encountered an issue when building the test_progs from the repository [1]: $ pwd /work/Qemu/x86_64/linux-6.10-rc2/tools/testing/selftests/bpf/ $ make test_progs V=1 [...] ./tools/sbin/bpftool gen object ./ip_check_defrag.bpf.linked2.o ./ip_check_defrag.bpf.linked1.o libbpf: failed to find symbol for variable 'bpf_dynptr_slice' in section '.ksyms' Error: failed to link './ip_check_defrag.bpf.linked1.o': No such file or directory (2) [...] Upon investigation, I discovered that the btf_types referenced in the '.ksyms' section had a kind of BTF_KIND_FUNC instead of BTF_KIND_VAR: $ bpftool btf dump file ./ip_check_defrag.bpf.linked1.o [...] [2] DATASEC '.ksyms' size=0 vlen=2 type_id=16 offset=0 size=0 (FUNC 'bpf_dynptr_from_skb') type_id=17 offset=0 size=0 (FUNC 'bpf_dynptr_slice') [...] [16] FUNC 'bpf_dynptr_from_skb' type_id=82 linkage=extern [17] FUNC 'bpf_dynptr_slice' type_id=85 linkage=extern [...] For a detailed analysis, please refer to [2]. We can add a kind checking to fix the issue. [1] https://github.com/eddyz87/bpf/tree/binsort-btf-dedup [2] https://lore.kernel.org/all/0c0ef20c-c05e-4db9-bad7-2cbc0d6dfae7@oracle.com/ Fixes: `8fd27bf69b` ("libbpf: Add BPF static linker BTF and BTF.ext support") Signed-off-by: Donglin Peng <dolinux.peng@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240619122355.426405-1-dolinux.peng@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:44:57 +02:00
Ido Schimmel	d794f62614	mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors [ Upstream commit 75d8d7a63065b18df9555dbaab0b42d4c6f20943 ] ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and newer ASICs can share the same mask if their masks only differ in up to 8 consecutive bits. For example, consider the following filters: # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop The second filter can use the same mask as the first (dst_ip/24) with a delta of 1 bit. However, the above only works because the two filters have different values in the common unmasked part (dst_ip/24). When entries have the same value in the common unmasked part they create undesired collisions in the device since many entries now have the same key. This leads to firmware errors such as [1] and to a reduced scale. Fix by adjusting the hash table key to only include the value in the common unmasked part. That is, without including the delta bits. That way the driver will detect the collision during filter insertion and spill the filter into the circuit TCAM (C-TCAM). Add a test case that fails without the fix and adjust existing cases that check C-TCAM spillage according to the above limitation. [1] mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available)) Fixes: `c22291f7cf` ("mlxsw: spectrum: acl: Implement delta for ERP") Reported-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Tested-by: Alexander Zubkov <green@qrator.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:44:57 +02:00
Geliang Tang	4f44cb495c	selftests/bpf: Check length of recv in test_sockmap [ Upstream commit de1b5ea789dc28066cc8dc634b6825bd6148f38b ] The value of recv in msg_loop may be negative, like EWOULDBLOCK, so it's necessary to check if it is positive before accumulating it to bytes_recvd. Fixes: `16962b2404` ("bpf: sockmap, add selftests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/5172563f7c7b2a2e953cef02e89fc34664a7b190.1716446893.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:44:56 +02:00
Geliang Tang	d0c8fb1b55	selftests/bpf: Fix prog numbers in test_sockmap [ Upstream commit 6c8d7598dfed759bf1d9d0322b4c2b42eb7252d8 ] bpf_prog5 and bpf_prog7 are removed from progs/test_sockmap_kern.h in commit `d79a32129b` ("bpf: Selftests, remove prints from sockmap tests"), now there are only 9 progs in it, not 11: SEC("sk_skb1") int bpf_prog1(struct __sk_buff skb) SEC("sk_skb2") int bpf_prog2(struct __sk_buff skb) SEC("sk_skb3") int bpf_prog3(struct __sk_buff skb) SEC("sockops") int bpf_sockmap(struct bpf_sock_ops skops) SEC("sk_msg1") int bpf_prog4(struct sk_msg_md msg) SEC("sk_msg2") int bpf_prog6(struct sk_msg_md msg) SEC("sk_msg3") int bpf_prog8(struct sk_msg_md msg) SEC("sk_msg4") int bpf_prog9(struct sk_msg_md msg) SEC("sk_msg5") int bpf_prog10(struct sk_msg_md *msg) This patch updates the array sizes of prog_fd[], prog_attach_type[] and prog_type[] from 11 to 9 accordingly. Fixes: `d79a32129b` ("bpf: Selftests, remove prints from sockmap tests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/9c10d9f974f07fcb354a43a8eca67acb2fafc587.1715926605.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-08-19 05:44:55 +02:00
John Hubbard	119aa28dc2	selftests/vDSO: fix clang build errors and warnings [ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ] When building with clang, via: make LLVM=1 -C tools/testing/selftests ...there are several warnings, and an error. This fixes all of those and allows these tests to run and pass. 1. Fix linker error (undefined reference to memcpy) by providing a local version of memcpy. 2. clang complains about using this form: if (g = h & 0xf0000000) ...so factor out the assignment into a separate step. 3. The code is passing a signed const char* to elf_hash(), which expects a const unsigned char *. There are several callers, so fix this at the source by allowing the function to accept a signed argument, and then converting to unsigned operations, once inside the function. 4. clang doesn't have __attribute__((externally_visible)) and generates a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem to require that attribute in order to build, run and pass tests here, so remove it. Reviewed-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Edward Liaw <edliaw@google.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-27 10:46:14 +02:00
Michael Ellerman	07fb3ed9f8	selftests/openat2: Fix build warnings on ppc64 [ Upstream commit 84b6df4c49a1cc2854a16937acd5fd3e6315d083 ] Fix warnings like: openat2_test.c: In function ‘test_openat2_flags’: openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] By switching to unsigned long long for u64 for ppc64 builds. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-27 10:46:08 +02:00
Dhananjay Ugwekar	a3c944359f	tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs [ Upstream commit 43cad521c6d228ea0c51e248f8e5b3a6295a2849 ] Update cpupower's P-State frequency calculation and reporting with AMD Family 1Ah+ processors, when using the acpi-cpufreq driver. This is due to a change in the PStateDef MSR layout in AMD Family 1Ah+. Tested on 4th and 5th Gen AMD EPYC system Signed-off-by: Ananth Narayan <Ananth.Narayan@amd.com> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-27 10:46:07 +02:00
Eduard Zingerman	e30bc19a9e	bpf: Allow reads from uninit stack commit `6715df8d5d` upstream. This commits updates the following functions to allow reads from uninitialized stack locations when env->allow_uninit_stack option is enabled: - check_stack_read_fixed_off() - check_stack_range_initialized(), called from: - check_stack_read_var_off() - check_helper_mem_access() Such change allows to relax logic in stacksafe() to treat STACK_MISC and STACK_INVALID in a same way and make the following stack slot configurations equivalent: \| Cached state \| Current state \| \| stack slot \| stack slot \| \|------------------+------------------\| \| STACK_INVALID or \| STACK_INVALID or \| \| STACK_MISC \| STACK_SPILL or \| \| \| STACK_MISC or \| \| \| STACK_ZERO or \| \| \| STACK_DYNPTR \| This leads to significant verification speed gains (see below). The idea was suggested by Andrii Nakryiko [1] and initial patch was created by Alexei Starovoitov [2]. Currently the env->allow_uninit_stack is allowed for programs loaded by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities. A number of test cases from verifier/.c were expecting uninitialized stack access to be an error. These test cases were updated to execute in unprivileged mode (thus preserving the tests). The test progs/test_global_func10.c expected "invalid indirect read from stack" error message because of the access to uninitialized memory region. This error is no longer possible in privileged mode. The test is updated to provoke an error "invalid indirect access to stack" because of access to invalid stack address (such error is not verified by progs/test_global_func.c series of tests). The following tests had to be removed because these can't be made unprivileged: - verifier/sock.c: - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value" BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode. - verifier/var_off.c: - "indirect variable-offset stack access, max_off+size > max_initialized" - "indirect variable-offset stack access, uninitialized" These tests verify that access to uninitialized stack values is detected when stack offset is not a constant. However, variable stack access is prohibited in unprivileged mode, thus these tests are no longer valid. * * * Here is veristat log comparing this patch with current master on a set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg and cilium BPF binaries (see [3]): $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log File Program States (A) States (B) States (DIFF) -------------------------- -------------------------- ---------- ---------- ---------------- bpf_host.o tail_handle_ipv6_from_host 349 244 -105 (-30.09%) bpf_host.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_lxc.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_sock.o cil_sock4_connect 70 48 -22 (-31.43%) bpf_sock.o cil_sock4_sendmsg 68 46 -22 (-32.35%) bpf_xdp.o tail_handle_nat_fwd_ipv4 1554 803 -751 (-48.33%) bpf_xdp.o tail_lb_ipv4 6457 2473 -3984 (-61.70%) bpf_xdp.o tail_lb_ipv6 7249 3908 -3341 (-46.09%) pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%) strobemeta.bpf.o on_event 15915 4772 -11143 (-70.02%) strobemeta_nounroll2.bpf.o on_event 17087 3820 -13267 (-77.64%) xdp_synproxy_kern.bpf.o syncookie_tc 21271 6635 -14636 (-68.81%) xdp_synproxy_kern.bpf.o syncookie_xdp 23122 6024 -17098 (-73.95%) -------------------------- -------------------------- ---------- ---------- ---------------- Note: I limited selection by states_pct<-30%. Inspection of differences in pyperf600_bpf_loop behavior shows that the following patch for the test removes almost all differences: - a/tools/testing/selftests/bpf/progs/pyperf.h + b/tools/testing/selftests/bpf/progs/pyperf.h @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args ctx) } if (event->pthread_match \|\| !pidData->use_tls) { - void frame_ptr; - FrameData frame; + void* frame_ptr = 0; + FrameData frame = {}; Symbol sym = {}; int cur_cpu = bpf_get_smp_processor_id(); W/o this patch the difference comes from the following pattern (for different variables): static bool get_frame_data(... FrameData frame ...) { ... bpf_probe_read_user(&frame->f_code, ...); if (!frame->f_code) return false; ... bpf_probe_read_user(&frame->co_name, ...); if (frame->co_name) ...; } int __on_event(struct bpf_raw_tracepoint_args ctx) { FrameData frame; ... get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback ... } SEC("raw_tracepoint/kfree_skb") int on_event(struct bpf_raw_tracepoint_args* ctx) { ... ret \|= __on_event(ctx); ret \|= __on_event(ctx); ... } With regards to value `frame->co_name` the following is important: - Because of the conditional `if (!frame->f_code)` each call to __on_event() produces two states, one with `frame->co_name` marked as STACK_MISC, another with it as is (and marked STACK_INVALID on a first call). - The call to bpf_probe_read_user() does not mark stack slots corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks these slots as BPF_MISC, this happens because of the following loop in the check_helper_call(): for (i = 0; i < meta.access_size; i++) { err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, BPF_WRITE, -1, false); if (err) return err; } Note the size of the write, it is a one byte write for each byte touched by a helper. The BPF_B write does not lead to write marks for the target stack slot. - Which means that w/o this patch when second __on_event() call is verified `if (frame->co_name)` will propagate read marks first to a stack slot with STACK_MISC marks and second to a stack slot with STACK_INVALID marks and these states would be considered different. [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/ [3] git@github.com:anakryiko/cilium.git Suggested-by: Andrii Nakryiko <andrii@kernel.org> Co-developed-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-18 13:07:43 +02:00
Zijian Zhang	f48b0cd338	selftests: make order checking verbose in msg_zerocopy selftest [ Upstream commit 7d6d8f0c8b700c9493f2839abccb6d29028b4219 ] We find that when lock debugging is on, notifications may not come in order. Thus, we have order checking outputs managed by cfg_verbose, to avoid too many outputs in this case. Fixes: `07b65c5b31` ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-18 13:07:31 +02:00
Zijian Zhang	ab52b11416	selftests: fix OOM in msg_zerocopy selftest [ Upstream commit af2b7e5b741aaae9ffbba2c660def434e07aa241 ] In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg on a socket with MSG_ZEROCOPY flag, and it will recv the notifications until the socket is not writable. Typically, it will start the receiving process after around 30+ sendmsgs. However, as the introduction of commit `dfa2f04833` ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is always writable and does not get any chance to run recv notifications. The selftest always exits with OUT_OF_MEMORY because the memory used by opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a different value to trigger OOM on older kernels too. Thus, we introduce "cfg_notification_limit" to force sender to receive notifications after some number of sendmsgs. Fixes: `07b65c5b31` ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-18 13:07:31 +02:00
Len Brown	8786e47861	tools/power turbostat: Remember global max_die_id [ Upstream commit cda203388687aa075db6f8996c3c4549fa518ea8 ] This is necessary to gracefully handle sparse die_id's. no functional change Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-18 13:07:30 +02:00
Jose E. Marchesi	3364c2ed1c	bpf: Avoid uninitialized value in BPF_CORE_READ_BITFIELD [ Upstream commit 009367099eb61a4fc2af44d4eb06b6b4de7de6db ] [Changes from V1: - Use a default branch in the switch statement to initialize `val'.] GCC warns that `val' may be used uninitialized in the BPF_CRE_READ_BITFIELD macro, defined in bpf_core_read.h as: [...] unsigned long long val; \ [...] \ switch (__CORE_RELO(s, field, BYTE_SIZE)) { \ case 1: val = (const unsigned char )p; break; \ case 2: val = (const unsigned short )p; break; \ case 4: val = (const unsigned int )p; break; \ case 8: val = (const unsigned long long )p; break; \ } \ [...] val; \ } \ This patch adds a default entry in the switch statement that sets `val' to zero in order to avoid the warning, and random values to be used in case __builtin_preserve_field_info returns unexpected values for BPF_FIELD_BYTE_SIZE. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240508101313.16662-1-jose.marchesi@oracle.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-18 13:07:29 +02:00
Adrian Hunter	e24d9a5c73	perf script: Show also errors for --insn-trace option [ Upstream commit d4a98b45fbe6d06f4b79ed90d0bb05ced8674c23 ] The trace could be misleading if trace errors are not taken into account, so display them also by adding the itrace "e" option. Note --call-trace and --call-ret-trace already add the itrace "e" option. Fixes: `b585ebdb59` ("perf script: Add --insn-trace for instruction decoding") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:35 +02:00
Changbin Du	69c5f3ca16	perf: script: add raw\|disasm arguments to --insn-trace option [ Upstream commit 6750ba4b6442fa5ea4bf5c0e4b4ff8b0249ef71d ] Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-5-changbin.du@huawei.com Stable-dep-of: d4a98b45fbe6 ("perf script: Show also errors for --insn-trace option") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:35 +02:00
Kunwu Chan	a5bd59e048	kselftest: arm64: Add a null pointer check [ Upstream commit 80164282b3620a3cb73de6ffda5592743e448d0e ] There is a 'malloc' call, which can be unsuccessful. This patch will add the malloc failure checking to avoid possible null dereference and give more information about test fail reasons. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240423082102.2018886-1-chentao@kylinos.cn Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:26 +02:00
Yonghong Song	ec874fb27f	selftests/bpf: Fix flaky test btf_map_in_map/lookup_update [ Upstream commit 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5 ] Recently, I frequently hit the following test failure: [root@arch-fb-vm1 bpf]# ./test_progs -n 33/1 test_lookup_update:PASS:skel_open 0 nsec [...] test_lookup_update:PASS:sync_rcu 0 nsec test_lookup_update:FAIL:map1_leak inner_map1 leaked! #33/1 btf_map_in_map/lookup_update:FAIL #33 btf_map_in_map:FAIL In the test, after map is closed and then after two rcu grace periods, it is assumed that map_id is not available to user space. But the above assumption cannot be guaranteed. After zero or one or two rcu grace periods in different siturations, the actual freeing-map-work is put into a workqueue. Later on, when the work is dequeued, the map will be actually freed. See bpf_map_put() in kernel/bpf/syscall.c. By using workqueue, there is no ganrantee that map will be actually freed after a couple of rcu grace periods. This patch removed such map leak detection and then the test can pass consistently. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:25 +02:00
Alessandro Carminati (Red Hat)	f4258833ff	selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh [ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ] In some systems, the netcat server can incur in delay to start listening. When this happens, the test can randomly fail in various points. This is an example error message: # ip gre none gso # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000 # test basic connectivity # Ncat: Connection refused. The issue stems from a race condition between the netcat client and server. The test author had addressed this problem by implementing a sleep, which I have removed in this patch. This patch introduces a function capable of sleeping for up to two seconds. However, it can terminate the waiting period early if the port is reported to be listening. Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:25 +02:00
YonglongLi	b4697a762d	mptcp: pm: update add_addr counters after connect commit 40eec1795cc27b076d49236649a29507c7ed8c2d upstream. The creation of new subflows can fail for different reasons. If no subflow have been created using the received ADD_ADDR, the related counters should not be updated, otherwise they will never be decremented for events related to this ID later on. For the moment, the number of accepted ADD_ADDR is only decremented upon the reception of a related RM_ADDR, and only if the remote address ID is currently being used by at least one subflow. In other words, if no subflow can be created with the received address, the counter will not be decremented. In this case, it is then important not to increment pm.add_addr_accepted counter, and not to modify pm.accept_addr bit. Note that this patch does not modify the behaviour in case of failures later on, e.g. if the MP Join is dropped or rejected. The "remove invalid addresses" MP Join subtest has been modified to validate this case. The broadcast IP address is added before the "valid" address that will be used to successfully create a subflow, and the limit is decreased by one: without this patch, it was not possible to create the last subflow, because: - the broadcast address would have been accepted even if it was not usable: the creation of a subflow to this address results in an error, - the limit of 2 accepted ADD_ADDR would have then been reached. Fixes: `01cacb00b3` ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in pm_netlink.c because commit `12a18341b5` ("mptcp: send ADD_ADDR echo before create subflows") is not present in this version, and it changes the context, but not the block that needs to be moved. Conflicts in the selftests, because many features modifying the whole file have been added later, e.g. commit `ae7bd9ccec` ("selftests: mptcp: join: option to execute specific tests"). The same modifications have been reported to the old code: simply moving one line, and changing the limits. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 09:14:23 +02:00
YonglongLi	9c2ed72112	mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID commit 6a09788c1a66e3d8b04b3b3e7618cc817bb60ae9 upstream. The RmAddr MIB counter is supposed to be incremented once when a valid RM_ADDR has been received. Before this patch, it could have been incremented as many times as the number of subflows connected to the linked address ID, so it could have been 0, 1 or more than 1. The "RmSubflow" is incremented after a local operation. In this case, it is normal to tied it with the number of subflows that have been actually removed. The "remove invalid addresses" MP Join subtest has been modified to validate this case. A broadcast IP address is now used instead: the client will not be able to create a subflow to this address. The consequence is that when receiving the RM_ADDR with the ID attached to this broadcast IP address, no subflow linked to this ID will be found. Fixes: `7a7e52e38a` ("mptcp: add RM_ADDR related mibs") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: YonglongLi <liyonglong@chinatelecom.cn> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in pm_netlink.c because the context has changed later in multiple commits linked to new features, e.g. commit `86e39e0448` ("mptcp: keep track of local endpoint still available for each msk"), commit `a88c9e4969` ("mptcp: do not block subflows creation on errors") and commit `3ad14f54bd` ("mptcp: more accurate MPC endpoint tracking"), but the independent lines that needed to be modified were still there. Conflicts in the selftests, because many features modifying the whole file have been added later, e.g. commit `ae7bd9ccec` ("selftests: mptcp: join: option to execute specific tests"). The same modifications have been reported to the old code: simply changing the IP address and add a new comment. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 09:14:23 +02:00
Matthias Goergens	f571c8ab18	hugetlb_encode.h: fix undefined behaviour (34 << 26) commit `710bb68c2e` upstream. Left-shifting past the size of your datatype is undefined behaviour in C. The literal 34 gets the type `int`, and that one is not big enough to be left shifted by 26 bits. An `unsigned` is long enough (on any machine that has at least 32 bits for their ints.) For uniformity, we mark all the literals as unsigned. But it's only really needed for HUGETLB_FLAG_ENCODE_16GB. Thanks to Randy Dunlap for an initial review and suggestion. Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 09:14:23 +02:00
Steven Rostedt (Google)	367ef3c865	tracing/selftests: Fix kprobe event name test for .isra. functions commit 23a4b108accc29a6125ed14de4a044689ffeda78 upstream. The kprobe_eventname.tc test checks if a function with .isra. can have a kprobe attached to it. It loops through the kallsyms file for all the functions that have the .isra. name, and checks if it exists in the available_filter_functions file, and if it does, it uses it to attach a kprobe to it. The issue is that kprobes can not attach to functions that are listed more than once in available_filter_functions. With the latest kernel, the function that is found is: rapl_event_update.isra.0 # grep rapl_event_update.isra.0 /sys/kernel/tracing/available_filter_functions rapl_event_update.isra.0 rapl_event_update.isra.0 It is listed twice. This causes the attached kprobe to it to fail which in turn fails the test. Instead of just picking the function function that is found in available_filter_functions, pick the first one that is listed only once in available_filter_functions. Cc: stable@vger.kernel.org Fixes: `604e354823` ("selftests/ftrace: Select an existing function in kprobe_eventname test") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 09:14:20 +02:00
Dev Jain	fdba4fbe5e	selftests/mm: compaction_test: fix bogus test success on Aarch64 [ Upstream commit d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 ] Patch series "Fixes for compaction_test", v2. The compaction_test memory selftest introduces fragmentation in memory and then tries to allocate as many hugepages as possible. This series addresses some problems. On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since compaction_index becomes 0, which is less than 3, due to no division by zero exception being raised. We fix that by checking for division by zero. Secondly, correctly set the number of hugepages to zero before trying to set a large number of them. Now, consider a situation in which, at the start of the test, a non-zero number of hugepages have been already set (while running the entire selftests/mm suite, or manually by the admin). The test operates on 80% of memory to avoid OOM-killer invocation, and because some memory is already blocked by hugepages, it would increase the chance of OOM-killing. Also, since mem_free used in check_compaction() is the value before we set nr_hugepages to zero, the chance that the compaction_index will be small is very high if the preset nr_hugepages was high, leading to a bogus test success. This patch (of 3): Currently, if at runtime we are not able to allocate a huge page, the test will trivially pass on Aarch64 due to no exception being raised on division by zero while computing compaction_index. Fix that by checking for nr_hugepages == 0. Anyways, in general, avoid a division by zero by exiting the program beforehand. While at it, fix a typo, and handle the case where the number of hugepages may overflow an integer. Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com Fixes: `bd67d5c15c` ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-07-05 09:14:13 +02:00
Muhammad Usama Anjum	9df7bb7090	selftests/mm: conform test to TAP format output [ Upstream commit 9a21701edc41465de56f97914741bfb7bfc2517d ] Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240101083614.1076768-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:13 +02:00
Dev Jain	68fdfb1dfe	selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages [ Upstream commit 9ad665ef55eaad1ead1406a58a34f615a7c18b5e ] Currently, the test tries to set nr_hugepages to zero, but that is not actually done because the file offset is not reset after read(). Fix that using lseek(). Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com Fixes: `bd67d5c15c` ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-07-05 09:14:13 +02:00
Friedrich Vock	e5138f43c9	bpf: Fix potential integer overflow in resolve_btfids [ Upstream commit 44382b3ed6b2787710c8ade06c0e97f5970a47c8 ] err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit at least on 64-bit platforms. If symbols_patch is called on a binary between 2-4GB in size, the result will be negative when cast to a 32-bit integer, which the code assumes means an error occurred. This can wrongly trigger build failures when building very large kernel images. Fixes: `fbbb68de80` ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240514070931.199694-1-friedrich.vock@gmx.de Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:50 +02:00
Ian Rogers	6dbeee1608	libsubcmd: Fix parse-options memory leak [ Upstream commit 230a7a71f92212e723fa435d4ca5922de33ec88a ] If a usage string is built in parse_options_subcommand, also free it. Fixes: `901421a5bd` ("perf tools: Remove subcmd dependencies on strbuf") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240509052015.1914670-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:40 +02:00
Nikolay Aleksandrov	d449edd806	selftests: net: bridge: increase IGMP/MLD exclude timeout membership interval [ Upstream commit 06080ea23095afe04a2cb7a8d05fab4311782623 ] When running the bridge IGMP/MLD selftests on debug kernels we can get spurious errors when setting up the IGMP/MLD exclude timeout tests because the membership interval is just 3 seconds and the setup has 2 seconds of sleep plus various validations, the one second that is left is not enough. Increase the membership interval from 3 to 5 seconds to make room for the setup validation and 2 seconds of sleep. Fixes: `34d7ecb3d4` ("selftests: net: bridge: update IGMP/MLD membership interval value") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-06-16 13:39:33 +02:00

1 2 3 4 5 ...

27698 Коммитов