-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI
g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG
2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc=
=EjJQ
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities,
from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc,
from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them,
from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment
to functions inside ELF objects contained in APKs via function names,
from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
to start the timer with absolute expiration value instead of relative
one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes
in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF
instructions in terms of how bytes are stored under big/little endian,
from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels,
from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark
a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
selftests/bpf: Split test_attach_probe into multi subtests
libbpf: Add support to set kprobe/uprobe attach mode
tools/resolve_btfids: Add /libsubcmd to .gitignore
bpf: add support for fixed-size memory pointer returns for kfuncs
bpf: generalize dynptr_get_spi to be usable for iters
bpf: mark PTR_TO_MEM as non-null register type
bpf: move kfunc_call_arg_meta higher in the file
bpf: ensure that r0 is marked scratched after any function call
bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
bpf: clean up visit_insn()'s instruction processing
selftests/bpf: adjust log_fixup's buffer size for proper truncation
bpf: honor env->test_state_freq flag in is_state_visited()
selftests/bpf: enhance align selftest's expected log matching
bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
bpf: improve stack slot state printing
selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
bpf: allow ctx writes using BPF_ST_MEM instruction
bpf: Use separate RCU callbacks for freeing selem
...
====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- 'perf lock contention' improvements:
- Add -o/--lock-owner option:
$ sudo ./perf lock contention -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
The owner is unknown in most cases. Filtering only for the mutex locks, it
will more likely get the owners.
- -S/--callstack-filter is to limit display entries having the given
string in the callstack
$ sudo ./perf lock contention -abv -S net sleep 1
...
contended total wait max wait avg wait type caller
5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d
0xffffffffa5dd1c60 _raw_spin_lock+0x30
0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d
0xffffffffa5cd8267 ip6_finish_output2+0x2c7
0xffffffffa5cdac14 ip6_finish_output+0x1d4
0xffffffffa5cdb477 ip6_xmit+0x457
0xffffffffa5d1fd17 inet6_csk_xmit+0xd7
0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a
0xffffffffa5c6467d tcp_keepalive_timer+0x2fd
Please note that to have the -b option (BPF) working above one has to build
with BUILD_BPF_SKEL=1.
- Add more 'perf test' entries to test these new features.
- Add Ian Rogers to MAINTAINERS as a perf tools reviewer.
- Add support for retire latency feature (pipeline stall of a instruction
compared to the previous one, in cycles) present on some Intel processors.
- Add 'perf c2c' report option to show false sharing with adjacent cachelines, to
be used in machines with cacheline prefetching, where accesses to a cacheline
brings the next one too.
- Skip 'perf test bpf' when the required kernel-debuginfo package isn't installed.
perf script:
- Add 'cgroup' field for 'perf script' output:
$ perf record --all-cgroups -- true
$ perf script -F comm,pid,cgroup
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
- Add support for showing branch speculation information in 'perf
script' and in the 'perf report' raw dump (-D).
perf record:
- Fix 'perf record' segfault with --overwrite and --max-size.
Intel PT:
- Add support for synthesizing "cycle" events from Intel PT traces as we
support "instruction" events when Intel PT CYC packets are available. This
enables much more accurate profiles than when using the regular 'perf record -e
cycles' (the default) when the workload lasts for very short periods (<10ms).
- .plt symbol handling improvements, better handling IBT (in the past
MPX) done in the context of decoding Intel PT processor traces, IFUNC
symbols on x86_64, static executables, understanding .plt.got symbols on
x86_64.
- Add a 'perf test' to test symbol resolution, part of the .plt
improvements series, this tests things like symbol size in contexts
where only the symbol start is available (kallsyms), etc.
- Better handle auxtrace/Intel PT data when using pipe mode (perf record sleep 1|perf report).
- Fix symbol lookup with kcore with multiple segments match stext,
getting the symbol resolution to just show DSOs as unknown.
ARM:
- Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace
Macrocell v4).
- Ensure ARM64 CoreSight timestamps don't go backwards.
- Document that ARM64 SPE (Statistical Profiling Extension) is used with 'perf c2c/mem'.
- Add raw decoding for ARM64 SPEv1.2 previous branch address.
- Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, TLB,
cache, branch, PE utilization and instruction mix metrics.
- Update decoder code for OpenCSD version 1.4, on ARM64 systems.
- Fix command line auto-complete of CPU events on aarch64.
perf test/bench:
- Switch basic BPF filtering test to use syscall tracepoint to avoid the
variable number of probes inserted when using the previous probe point
(do_epoll_wait) that happens on different CPU architectures.
- Fix DWARF unwind test by adding non-inline to expected function in a
backtrace.
- Use 'grep -c' where the longer form 'grep | wc -l' was being used.
- Add getpid and execve benchmarks to 'perf bench syscall'.
Miscellaneous:
- Avoid d3-flame-graph package dependency in 'perf script flamegraph',
making this feature more generally available.
- Add JSON metric events to present CPI stall cycles in Power10.
- Assorted improvements/refactorings on the JSON metrics parsing code.
Build:
- Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, as
several tests use tracepoints, those should be skipped.
- More fallout fixes for the removal of tools/lib/traceevent/.
- Fix build error when linking with libpfm.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY/YzGgAKCRCyPKLppCJ+
J98CAP4/GD3E86Dk+S+w5FmPEHuBKootuZ3pHOqCnXLiyKFZqgEAs9TWOg9KVKGh
io9cLluMjzfRwQrND8cpn3VfXxWvVAQ=
=L1qh
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"Miscellaneous:
- Add Ian Rogers to MAINTAINERS as a perf tools reviewer.
- Add support for retire latency feature (pipeline stall of a
instruction compared to the previous one, in cycles) present on
some Intel processors.
- Add 'perf c2c' report option to show false sharing with adjacent
cachelines, to be used in machines with cacheline prefetching,
where accesses to a cacheline brings the next one too.
- Skip 'perf test bpf' when the required kernel-debuginfo package
isn't installed.
- Avoid d3-flame-graph package dependency in 'perf script flamegraph',
making this feature more generally available.
- Add JSON metric events to present CPI stall cycles in Power10.
- Assorted improvements/refactorings on the JSON metrics parsing
code.
perf lock contention:
- Add -o/--lock-owner option:
$ sudo ./perf lock contention -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
The owner is unknown in most cases. Filtering only for the
mutex locks, it will more likely get the owners.
- -S/--callstack-filter is to limit display entries having the given
string in the callstack:
$ sudo ./perf lock contention -abv -S net sleep 1
...
contended total wait max wait avg wait type caller
5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d
0xffffffffa5dd1c60 _raw_spin_lock+0x30
0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d
0xffffffffa5cd8267 ip6_finish_output2+0x2c7
0xffffffffa5cdac14 ip6_finish_output+0x1d4
0xffffffffa5cdb477 ip6_xmit+0x457
0xffffffffa5d1fd17 inet6_csk_xmit+0xd7
0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a
0xffffffffa5c6467d tcp_keepalive_timer+0x2fd
Please note that to have the -b option (BPF) working above one has
to build with BUILD_BPF_SKEL=1.
- Add more 'perf test' entries to test these new features.
perf script:
- Add 'cgroup' field for 'perf script' output:
$ perf record --all-cgroups -- true
$ perf script -F comm,pid,cgroup
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
- Add support for showing branch speculation information in 'perf
script' and in the 'perf report' raw dump (-D).
perf record:
- Fix 'perf record' segfault with --overwrite and --max-size.
perf test/bench:
- Switch basic BPF filtering test to use syscall tracepoint to avoid
the variable number of probes inserted when using the previous
probe point (do_epoll_wait) that happens on different CPU
architectures.
- Fix DWARF unwind test by adding non-inline to expected function in
a backtrace.
- Use 'grep -c' where the longer form 'grep | wc -l' was being used.
- Add getpid and execve benchmarks to 'perf bench syscall'.
Intel PT:
- Add support for synthesizing "cycle" events from Intel PT traces as
we support "instruction" events when Intel PT CYC packets are
available. This enables much more accurate profiles than when using
the regular 'perf record -e cycles' (the default) when the workload
lasts for very short periods (<10ms).
- .plt symbol handling improvements, better handling IBT (in the past
MPX) done in the context of decoding Intel PT processor traces,
IFUNC symbols on x86_64, static executables, understanding .plt.got
symbols on x86_64.
- Add a 'perf test' to test symbol resolution, part of the .plt
improvements series, this tests things like symbol size in contexts
where only the symbol start is available (kallsyms), etc.
- Better handle auxtrace/Intel PT data when using pipe mode (perf
record sleep 1|perf report).
- Fix symbol lookup with kcore with multiple segments match stext,
getting the symbol resolution to just show DSOs as unknown.
ARM:
- Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace
Macrocell v4).
- Ensure ARM64 CoreSight timestamps don't go backwards.
- Document that ARM64 SPE (Statistical Profiling Extension) is used
with 'perf c2c/mem'.
- Add raw decoding for ARM64 SPEv1.2 previous branch address.
- Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1,
TLB, cache, branch, PE utilization and instruction mix metrics.
- Update decoder code for OpenCSD version 1.4, on ARM64 systems.
- Fix command line auto-complete of CPU events on aarch64.
Build:
- Fix 'perf probe' and 'perf test' when libtraceevent isn't linked,
as several tests use tracepoints, those should be skipped.
- More fallout fixes for the removal of tools/lib/traceevent/.
- Fix build error when linking with libpfm"
* tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits)
perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc
perf intel-pt: Synthesize cycle events
perf c2c: Add report option to show false sharing in adjacent cachelines
perf record: Fix segfault with --overwrite and --max-size
perf stat: Avoid merging/aggregating metric counts twice
perf tools: Fix perf tool build error in util/pfm.c
perf tools: Fix auto-complete on aarch64
perf lock contention: Support old rw_semaphore type
perf lock contention: Add -o/--lock-owner option
perf lock contention: Fix to save callstack for the default modified
perf test bpf: Skip test if kernel-debuginfo is not present
perf probe: Update the exit error codes in function try_to_find_probe_trace_event
perf script: Fix missing Retire Latency fields option documentation
perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT
perf test x86: Support the retire_lat (Retire Latency) sample_type check
perf test bpf: Check for libtraceevent support
perf script: Support Retire Latency
perf report: Support Retire Latency
perf lock contention: Support filters for different aggregation
...
The following three uapi headers:
tools/arch/arm64/include/uapi/asm/bpf_perf_event.h
tools/arch/s390/include/uapi/asm/bpf_perf_event.h
tools/arch/s390/include/uapi/asm/ptrace.h
were introduced in commit 618e165b2a ("selftests/bpf: sync kernel headers
and introduce arch support in Makefile"), they are not used any more after
commit 720f228e8d ("bpf: fix broken BPF selftest build"), so remove them.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1676533861-27508-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
where possible, when supporting a debug registers swap feature for
SEV-ES guests
- Add support for AMD's version of eIBRS called Automatic IBRS which is
a set-and-forget control of indirect branch restriction speculation
resources on privilege change
- Add support for a new x86 instruction - LKGS - Load kernel GS which is
part of the FRED infrastructure
- Reset SPEC_CTRL upon init to accomodate use cases like kexec which
rediscover
- Other smaller fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe
VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t
aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE
E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk
GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45
isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9
u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn
A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg
s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew
eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t
g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR
RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o=
=v/ZC
-----END PGP SIGNATURE-----
Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Cache the AMD debug registers in per-CPU variables to avoid MSR
writes where possible, when supporting a debug registers swap feature
for SEV-ES guests
- Add support for AMD's version of eIBRS called Automatic IBRS which is
a set-and-forget control of indirect branch restriction speculation
resources on privilege change
- Add support for a new x86 instruction - LKGS - Load kernel GS which
is part of the FRED infrastructure
- Reset SPEC_CTRL upon init to accomodate use cases like kexec which
rediscover
- Other smaller fixes and cleanups
* tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd: Cache debug register values in percpu variables
KVM: x86: Propagate the AMD Automatic IBRS feature to the guest
x86/cpu: Support AMD Automatic IBRS
x86/cpu, kvm: Add the SMM_CTL MSR not present feature
x86/cpu, kvm: Add the Null Selector Clears Base feature
x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf
x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature
KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code
x86/cpu, kvm: Add support for CPUID_80000021_EAX
x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h>
x86/gsseg: Use the LKGS instruction if available for load_gs_index()
x86/gsseg: Move load_gs_index() to its own new header file
x86/gsseg: Make asm_load_gs_index() take an u16
x86/opcode: Add the LKGS instruction to x86-opcode-map
x86/cpufeature: Add the CPU feature bit for LKGS
x86/bugs: Reset speculation control settings on init
x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
This commit adds the execve syscall benchmark, more syscall benchmarks
can be added in the future.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1668052208-14047-5-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit adds a simple getpgid syscall benchmark, more syscall
benchmarks can be added in the future.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1668052208-14047-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the current code, there is only a basic syscall benchmark via
getppid, this is not enough. Introduce bench_syscall_common() so that we
can add more syscalls to benchmark.
This is preparation for later patch, no functionality change.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1668052208-14047-3-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is better to keep list sorted by number in unistd_{32,64}.h,
so that we can add more syscall number to a proper position.
This is preparation for later patch, no functionality change.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1668052208-14047-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
decb17aeb8 ("KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations")
07e39e60bb ("arm64: Add Cortex-715 CPU part definition")
8ec8490a19 ("arm64: Fix bit-shifting UB in the MIDR_CPU_MODEL() macro")
That addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/asm/cputype.h' differs from latest version at 'arch/arm64/include/asm/cputype.h'
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/Y8fvEGCGn+227qW0@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
9cb1096f85 ("KVM: arm64: Enable ring-based dirty memory tracking")
That doesn't result in any changes in tooling (built on a Libre Computer
Firefly ROC-RK3399-PC-V1.1-A running Ubuntu 22.04), only addresses this
perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/Y8fmIT5PIfGaZuwa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
8aff460f21 ("KVM: x86: Add a VALID_MASK for the flags in kvm_msr_filter_range")
c1340fe359 ("KVM: x86: Add a VALID_MASK for the flag in kvm_msr_filter")
be83794210 ("KVM: x86: Disallow the use of KVM_MSR_FILTER_DEFAULT_ALLOW in the kernel")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Y8VR5wSAkd2A0HxS@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add the instruction opcode used by LKGS to x86-opcode-map.
Opcode number is per public FRED draft spec v3.0.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112072032.35626-3-xin3.li@intel.com
Add the CPU feature bit for LKGS (Load "Kernel" GS).
LKGS instruction is introduced with Intel FRED (flexible return and
event delivery) specification. Search for the latest FRED spec in most
search engines with this search pattern:
site:intel.com FRED (flexible return and event delivery) specification
LKGS behaves like the MOV to GS instruction except that it loads
the base address into the IA32_KERNEL_GS_BASE MSR instead of the
GS segment’s descriptor cache, which is exactly what Linux kernel
does to load a user level GS base. Thus, with LKGS, there is no
need to SWAPGS away from the kernel GS base.
[ mingo: Minor tweaks to the description. ]
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230112072032.35626-2-xin3.li@intel.com
- Don't stop building perf if python setuptools isn't installed, just
disable the affected perf feature.
- Remove explicit reference to python 2.x devel files, that warning is
about python-devel, no matter what version, being unavailable and thus
disabling the linking with libpython.
- Don't use -Werror=switch-enum when building the python support that
handles libtraceevent enumerations, as there is no good way to test
if some specific enum entry is available with the libtraceevent
installed on the system.
- Introduce 'perf lock contention' --type-filter and --lock-filter, to
filter by lock type and lock name:
$ sudo ./perf lock record -a -- ./perf bench sched messaging
$ sudo ./perf lock contention -E 5 -Y spinlock
contended total wait max wait avg wait type caller
802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62
13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14
12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27
114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5
83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e
$ sudo ./perf lock contention -l
contended total wait max wait avg wait address symbol
57 1.11 ms 42.83 us 19.54 us ffff9f4140059000
15 280.88 us 23.51 us 18.73 us ffffffff9d007a40 jiffies_lock
1 20.49 us 20.49 us 20.49 us ffffffff9d0d50c0 rcu_state
1 9.02 us 9.02 us 9.02 us ffff9f41759e9ba0
$ sudo ./perf lock contention -L jiffies_lock,rcu_state
contended total wait max wait avg wait type caller
15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93
1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb
$ sudo ./perf lock contention -L ffff9f4140059000
contended total wait max wait avg wait type caller
38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50
11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39
8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5
- Fix splitting CC into compiler and options when checking if a option
is present in clang to build the python binding, needed in systems
such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c".
- Refresh metris and events for Intel systems: alderlake. alderlake-n,
bonnell, broadwell, broadwellde, broadwellx, cascadelakex,
elkhartlake, goldmont, goldmontplus, haswell, haswellx, icelake,
icelakex, ivybridge, ivytown, jaketown, knightslanding, meteorlake,
nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake,
skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp,
westmereex.
- Add vendor events files (JSON) for AMD Zen 4, from sections 2.1.15.4
"Core Performance Monitor Counters", 2.1.15.5 "L3 Cache Performance
Monitor Counter"s and Section 7.1 "Fabric Performance Monitor Counter
(PMC) Events" in the Processor Programming Reference (PPR) for AMD
Family 19h Model 11h Revision B1 processors.
This constitutes events which capture op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB activity,
L3 cache activity and data bandwidth for various links and interfaces in
the Data Fabric.
- Also, from the same PPR are metrics taken from Section 2.1.15.2
"Performance Measurement", including pipeline utilization, which are
new to Zen 4 processors and useful for finding performance bottlenecks
by analyzing activity at different stages of the pipeline.
- Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and
'srcfile' sort keys performance by postponing calling the external
addr2line utility to the collapse phase of histogram bucketing.
- Fix 'perf test' "all PMU test" to skip parametrized events, that
requires setting up and are not supported by this test.
- Update tools/ copies of kernel headers: features, disabled-features,
fscrypt.h, i915_drm.h, msr-index.h, power pc syscall table and kvm.h.
- Add .DELETE_ON_ERROR special Makefile target to clean up partially
updated files on error.
- Simplify the mksyscalltbl script for arm64 by avoiding to run the host
compiler to create the syscall table, do it all just with the shell
script.
- Further fixes to honour quiet mode (-q).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY6SJ+gAKCRCyPKLppCJ+
J5JSAQCSokw2lsIqelDfoBfOQcMwah4ogW1vuO5KiepHgGOjuwD/d+65IxFIRA/h
tJjAtq4fReyi4u4eTc1aLgUwFh7V0ws=
=rneN
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
"perf tools fixes and improvements:
- Don't stop building perf if python setuptools isn't installed, just
disable the affected perf feature.
- Remove explicit reference to python 2.x devel files, that warning
is about python-devel, no matter what version, being unavailable
and thus disabling the linking with libpython.
- Don't use -Werror=switch-enum when building the python support that
handles libtraceevent enumerations, as there is no good way to test
if some specific enum entry is available with the libtraceevent
installed on the system.
- Introduce 'perf lock contention' --type-filter and --lock-filter,
to filter by lock type and lock name:
$ sudo ./perf lock record -a -- ./perf bench sched messaging
$ sudo ./perf lock contention -E 5 -Y spinlock
contended total wait max wait avg wait type caller
802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62
13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14
12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27
114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5
83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e
$ sudo ./perf lock contention -l
contended total wait max wait avg wait address symbol
57 1.11 ms 42.83 us 19.54 us ffff9f4140059000
15 280.88 us 23.51 us 18.73 us ffffffff9d007a40 jiffies_lock
1 20.49 us 20.49 us 20.49 us ffffffff9d0d50c0 rcu_state
1 9.02 us 9.02 us 9.02 us ffff9f41759e9ba0
$ sudo ./perf lock contention -L jiffies_lock,rcu_state
contended total wait max wait avg wait type caller
15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93
1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb
$ sudo ./perf lock contention -L ffff9f4140059000
contended total wait max wait avg wait type caller
38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50
11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39
8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5
- Fix splitting CC into compiler and options when checking if a
option is present in clang to build the python binding, needed in
systems such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c".
- Refresh metris and events for Intel systems: alderlake.
alderlake-n, bonnell, broadwell, broadwellde, broadwellx,
cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
haswellx, icelake, icelakex, ivybridge, ivytown, jaketown,
knightslanding, meteorlake, nehalemep, nehalemex, sandybridge,
sapphirerapids, silvermont, skylake, skylakex, snowridgex,
tigerlake, westmereep-dp, westmereep-sp, westmereex.
- Add vendor events files (JSON) for AMD Zen 4, from sections
2.1.15.4 "Core Performance Monitor Counters", 2.1.15.5 "L3 Cache
Performance Monitor Counter"s and Section 7.1 "Fabric Performance
Monitor Counter (PMC) Events" in the Processor Programming
Reference (PPR) for AMD Family 19h Model 11h Revision B1
processors.
This constitutes events which capture op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB
activity, L3 cache activity and data bandwidth for various links
and interfaces in the Data Fabric.
- Also, from the same PPR are metrics taken from Section 2.1.15.2
"Performance Measurement", including pipeline utilization, which
are new to Zen 4 processors and useful for finding performance
bottlenecks by analyzing activity at different stages of the
pipeline.
- Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and
'srcfile' sort keys performance by postponing calling the external
addr2line utility to the collapse phase of histogram bucketing.
- Fix 'perf test' "all PMU test" to skip parametrized events, that
requires setting up and are not supported by this test.
- Update tools/ copies of kernel headers: features,
disabled-features, fscrypt.h, i915_drm.h, msr-index.h, power pc
syscall table and kvm.h.
- Add .DELETE_ON_ERROR special Makefile target to clean up partially
updated files on error.
- Simplify the mksyscalltbl script for arm64 by avoiding to run the
host compiler to create the syscall table, do it all just with the
shell script.
- Further fixes to honour quiet mode (-q)"
* tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits)
perf python: Fix splitting CC into compiler and options
perf scripting python: Don't be strict at handling libtraceevent enumerations
perf arm64: Simplify mksyscalltbl
perf build: Remove explicit reference to python 2.x devel files
perf vendor events amd: Add Zen 4 mapping
perf vendor events amd: Add Zen 4 metrics
perf vendor events amd: Add Zen 4 uncore events
perf vendor events amd: Add Zen 4 core events
perf vendor events intel: Refresh westmereex events
perf vendor events intel: Refresh westmereep-sp events
perf vendor events intel: Refresh westmereep-dp events
perf vendor events intel: Refresh tigerlake metrics and events
perf vendor events intel: Refresh snowridgex events
perf vendor events intel: Refresh skylakex metrics and events
perf vendor events intel: Refresh skylake metrics and events
perf vendor events intel: Refresh silvermont events
perf vendor events intel: Refresh sapphirerapids metrics and events
perf vendor events intel: Refresh sandybridge metrics and events
perf vendor events intel: Refresh nehalemex events
perf vendor events intel: Refresh nehalemep events
...
To pick up the changes in:
97fa21f65c ("x86/resctrl: Move MSR defines into msr-index.h")
7420ae3bb9 ("x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPB")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2022-12-20 14:28:40.893794072 -0300
+++ after 2022-12-20 14:28:54.831993914 -0300
@@ -266,6 +266,7 @@
[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
[0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
+ [0xc0000200 - x86_64_specific_MSRs_offset] = "IA32_MBA_BW_BASE",
[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
[0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/lkml/Y6HyTOGRNvKfCVe4@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes:
- Fix potential null-ptr-deref in start_task()
- Fix kgdb console on serial port
- Add missing FORCE prerequisites in Makefile
- Drop PMD_SHIFT from calculation in pgtable.h
Enhancements:
- Implement a wrapper to align madvise() MADV_* constants with other
architectures
- If machine supports running MPE/XL, show the MPE model string
Cleanups:
- Drop duplicate kgdb console code
- Indenting fixes in setup_cmdline()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCY6B/cgAKCRD3ErUQojoP
X85pAQCC6YpSYON3KZRfABeiDTRCKcGm72p7JQRnyj88XCq6ZAEA40T2qpRpjoYi
NaXr28mxHFYh4Z0c5Y7K5EuFTT7gAA4=
=e2Jd
-----END PGP SIGNATURE-----
Merge tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"There is one noteable patch, which allows the parisc kernel to use the
same MADV_xxx constants as the other architectures going forward. With
that change only alpha has one entry left (MADV_DONTNEED is 6 vs 4 on
others) which is different. To prevent an ABI breakage, a wrapper is
included which translates old MADV values to the new ones, so existing
userspace isn't affected. Reason for that patch is, that some
applications wrongly used the standard MADV_xxx values even on some
non-x86 platforms and as such those programs failed to run correctly
on parisc (examples are qemu-user, tor browser and boringssl).
Then the kgdb console and the LED code received some fixes, and some
0-day warnings are now gone. Finally, the very last compile warning
which was visible during a kernel build is now fixed too (in the vDSO
code).
The majority of the patches are tagged for stable series and in
summary this patchset is quite small and drops more code than it adds:
Fixes:
- Fix potential null-ptr-deref in start_task()
- Fix kgdb console on serial port
- Add missing FORCE prerequisites in Makefile
- Drop PMD_SHIFT from calculation in pgtable.h
Enhancements:
- Implement a wrapper to align madvise() MADV_* constants with other
architectures
- If machine supports running MPE/XL, show the MPE model string
Cleanups:
- Drop duplicate kgdb console code
- Indenting fixes in setup_cmdline()"
* tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Show MPE/iX model string at bootup
parisc: Add missing FORCE prerequisites in Makefile
parisc: Move pdc_result struct to firmware.c
parisc: Drop locking in pdc console code
parisc: Drop duplicate kgdb_pdc console
parisc: Fix locking in pdc_iodc_print() firmware call
parisc: Drop PMD_SHIFT from calculation in pgtable.h
parisc: Align parisc MADV_XXX constants with all other architectures
parisc: led: Fix potential null-ptr-deref in start_task()
parisc: Fix inconsistent indenting in setup_cmdline()
To pick the changes from:
5e85c4ebf2 ("x86: KVM: Advertise AVX-IFMA CPUID to user space")
af2872f622 ("x86: KVM: Advertise AMX-FP16 CPUID to user space")
6a19d7aa58 ("x86: KVM: Advertise CMPccXADD CPUID to user space")
aaa65d17ee ("x86/tsx: Add a feature bit for TSX control MSR support")
b1599915f0 ("x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit")
16a7fe3728 ("KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest")
80e4c1cd42 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")
7df548840c ("x86/bugs: Add "unknown" reporting for MMIO Stale Data")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiaxi Chen <jiaxi.chen@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kai Huang <kai.huang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/Y6CD%2FIcEbDW5X%2FpN@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
15e15d64bd ("x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Cc: Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/Y6B2w3WqifB%2FV70T@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adjust some MADV_XXX constants to be in sync what their values are on
all other platforms. There is currently no reason to have an own
numbering on parisc, but it requires workarounds in many userspace
sources (e.g. glibc, qemu, ...) - which are often forgotten and thus
introduce bugs and different behaviour on parisc.
A wrapper avoids an ABI breakage for existing userspace applications by
translating any old values to the new ones, so this change allows us to
move over all programs to the new ABI over time.
Signed-off-by: Helge Deller <deller@gmx.de>
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
* Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
* Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d:
"Fix a number of issues with MTE, such as races on the tags being
initialised vs the PG_mte_tagged flag as well as the lack of support
for VM_SHARED when KVM is involved. Patches from Catalin Marinas and
Peter Collingbourne").
* Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
* Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
* Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
* Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
* Second batch of the lazy destroy patches
* First batch of KVM changes for kernel virtual != physical address support
* Removal of a unused function
x86:
* Allow compiling out SMM support
* Cleanup and documentation of SMM state save area format
* Preserve interrupt shadow in SMM state save area
* Respond to generic signals during slow page faults
* Fixes and optimizations for the non-executable huge page errata fix.
* Reprogram all performance counters on PMU filter change
* Cleanups to Hyper-V emulation and tests
* Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
running on top of a L1 Hyper-V hypervisor)
* Advertise several new Intel features
* x86 Xen-for-KVM:
** Allow the Xen runstate information to cross a page boundary
** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
** Add support for 32-bit guests in SCHEDOP_poll
* Notable x86 fixes and cleanups:
** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
** Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
** Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
** Advertise (on AMD) that the SMM_CTL MSR is not supported
** Remove unnecessary exports
Generic:
* Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
* Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
* Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
* Introduce actual atomics for clear/set_bit() in selftests
* Add support for pinning vCPUs in dirty_log_perf_test.
* Rename the so called "perf_util" framework to "memstress".
* Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress tests.
* Add a common ucall implementation; code dedup and pre-work for running
SEV (and beyond) guests in selftests.
* Provide a common constructor and arch hook, which will eventually be
used by x86 to automatically select the right hypercall (AMD vs. Intel).
* A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
breakpoints, stage-2 faults and access tracking.
* x86-specific selftest changes:
** Clean up x86's page table management.
** Clean up and enhance the "smaller maxphyaddr" test, and add a related
test to cover generic emulation failure.
** Clean up the nEPT support checks.
** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
** Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
Documentation:
* Remove deleted ioctls from documentation
* Clean up the docs for the x86 MSR filter.
* Various fixes
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
=QAYX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on (see merge
commit 382b5b87a97d: "Fix a number of issues with MTE, such as
races on the tags being initialised vs the PG_mte_tagged flag as
well as the lack of support for VM_SHARED when KVM is involved.
Patches from Catalin Marinas and Peter Collingbourne").
- Merge the pKVM shadow vcpu state tracking that allows the
hypervisor to have its own view of a vcpu, keeping that state
private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB
pages only) as a prefix of the oncoming support for 4kB and 16kB
pages.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
- Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address
support
- Removal of a unused function
x86:
- Allow compiling out SMM support
- Cleanup and documentation of SMM state save area format
- Preserve interrupt shadow in SMM state save area
- Respond to generic signals during slow page faults
- Fixes and optimizations for the non-executable huge page errata
fix.
- Reprogram all performance counters on PMU filter change
- Cleanups to Hyper-V emulation and tests
- Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
guest running on top of a L1 Hyper-V hypervisor)
- Advertise several new Intel features
- x86 Xen-for-KVM:
- Allow the Xen runstate information to cross a page boundary
- Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
- Add support for 32-bit guests in SCHEDOP_poll
- Notable x86 fixes and cleanups:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
a few years back when eliminating unnecessary barriers when
switching between vmcs01 and vmcs02.
- Clean up vmread_error_trampoline() to make it more obvious that
params must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL
irrespective of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM
incorrectly thinking a guest needs TSC scaling when running on a
CPU with a constant TSC, but no hardware-enumerated TSC
frequency.
- Advertise (on AMD) that the SMM_CTL MSR is not supported
- Remove unnecessary exports
Generic:
- Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix build errors that occur in certain setups (unsure exactly what
is unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
- Introduce actual atomics for clear/set_bit() in selftests
- Add support for pinning vCPUs in dirty_log_perf_test.
- Rename the so called "perf_util" framework to "memstress".
- Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress
tests.
- Add a common ucall implementation; code dedup and pre-work for
running SEV (and beyond) guests in selftests.
- Provide a common constructor and arch hook, which will eventually
be used by x86 to automatically select the right hypercall (AMD vs.
Intel).
- A bunch of added/enabled/fixed selftests for ARM64, covering
memslots, breakpoints, stage-2 faults and access tracking.
- x86-specific selftest changes:
- Clean up x86's page table management.
- Clean up and enhance the "smaller maxphyaddr" test, and add a
related test to cover generic emulation failure.
- Clean up the nEPT support checks.
- Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
- Fix an ordering issue in the AMX test introduced by recent
conversions to use kvm_cpu_has(), and harden the code to guard
against similar bugs in the future. Anything that tiggers
caching of KVM's supported CPUID, kvm_cpu_has() in this case,
effectively hides opt-in XSAVE features if the caching occurs
before the test opts in via prctl().
Documentation:
- Remove deleted ioctls from documentation
- Clean up the docs for the x86 MSR filter.
- Various fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
KVM: x86: Add proper ReST tables for userspace MSR exits/flags
KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
KVM: arm64: selftests: Align VA space allocator with TTBR0
KVM: arm64: Fix benign bug with incorrect use of VA_BITS
KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
KVM: x86: Advertise that the SMM_CTL MSR is not supported
KVM: x86: remove unnecessary exports
KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
perf tools: Use dedicated non-atomic clear/set bit helpers
tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
KVM: arm64: selftests: Enable single-step without a "full" ucall()
KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
KVM: Remove stale comment about KVM_REQ_UNHALT
KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
KVM: Reference to kvm_userspace_memory_region in doc and comments
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
...
Highlights:
- Intel:
- PMC: Add support for Meteor Lake
- Intel On Demand: various updates
- ideapad-laptop:
- Add support for various Fn keys on new models
- Fix touchpad on/off handling in a generic way to avoid having
to add more and more quirks
- android-x86-tablets: Add support for 2 more X86 Android tablet models
- New Dell WMI DDV driver
- Miscellaneous cleanups and small bugfixes
The following is an automated git shortlog grouped by driver:
ACPI:
- battery: Pass battery hook pointer to hook callbacks
ISST:
- Fix typo in comments
Move existing HP drivers to a new hp subdir:
- Move existing HP drivers to a new hp subdir
dell:
- Add new dell-wmi-ddv driver
dell-ddv:
- Warn if ePPID has a suspicious length
- Improve buffer handling
huawei-wmi:
- remove unnecessary member
- fix return value calculation
- do not hard-code sizes
ideapad-laptop:
- Make touchpad_ctrl_via_ec a module option
- Stop writing VPCCMD_W_TOUCHPAD at probe time
- Send KEY_TOUCHPAD_TOGGLE on some models
- Only toggle ps2 aux port on/off on select models
- Do not send KEY_TOUCHPAD* events on probe / resume
- Refactor ideapad_sync_touchpad_state()
- support for more special keys in WMI
- Add new _CFG bit numbers for future use
- Revert "check for touchpad support in _CFG"
intel/pmc:
- Relocate Alder Lake PCH support
- Relocate Tiger Lake PCH support
- Relocate Ice Lake PCH support
- Relocate Cannon Lake Point PCH support
- Relocate Sunrise Point PCH support
- Move variable declarations and definitions to header and core.c
- Replace all the reg_map with init functions
intel/pmc/core:
- Add Meteor Lake support to pmc core driver
intel_scu_ipc:
- fix possible name leak in __intel_scu_ipc_register()
mxm-wmi:
- fix memleak in mxm_wmi_call_mx[ds|mx]()
platform/mellanox:
- mlxbf-pmc: Fix event typo
- Add BlueField-3 support in the tmfifo driver
platform/x86/amd:
- pmc: Add a workaround for an s0i3 issue on Cezanne
platform/x86/amd/pmf:
- pass the struct by reference
platform/x86/dell:
- alienware-wmi: Use sysfs_emit() instead of scnprintf()
platform/x86/intel:
- pmc: Fix repeated word in comment
platform/x86/intel/hid:
- Add module-params for 5 button array + SW_TABLET_MODE reporting
platform/x86/intel/sdsi:
- Add meter certificate support
- Support different GUIDs
- Hide attributes if hardware doesn't support
- Add Intel On Demand text
sony-laptop:
- Convert to use sysfs_emit_at() API
thinkpad_acpi:
- use strstarts()
- Fix max_brightness of thinklight
tools/arch/x86:
- intel_sdsi: Add support for reading meter certificates
- intel_sdsi: Add support for new GUID
- intel_sdsi: Read more On Demand registers
- intel_sdsi: Add Intel On Demand text
- intel_sdsi: Add support for reading state certificates
uv_sysfs:
- Use sysfs_emit() instead of scnprintf()
wireless-hotkey:
- use ACPI HID as phys
x86-android-tablets:
- Add Advantech MICA-071 extra button
- Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data
- Add Medion Lifetab S10346 data
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmOW+QgUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yAPwf/dAYLHiC2ox5YlNTLX2DvU+jOpeBv
W+EIx4oHQz1+O9jrWMLyvS9zTwTEAf6ANLiMP3damEvtJnB72ClgFITzlJAaB4zN
yj0SdxoBRMt6zDL2QwMkwitvb5kJonLfO2H7NsMwA6f0KP1X8sio3oVRAMMVwlzz
nwDKM/VBpuxmy+d880wRRoAkgRkTsPIOwBkYdo1525NU7kkTmtrMpgM+SXQsHTJn
TB9uQnyuiq5/znh3k1Qn+OGwXQezmGz2Fb76IcW5RzUQDew6n6b3kzILee5ddynT
Pa7/ibwpV+FtZjm2kS/l4tV+WPdA+s5TSWoq7Hz0jzBX9GdOORcMZmEneg==
=z16d
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- Intel:
- PMC: Add support for Meteor Lake
- Intel On Demand: various updates
- Ideapad-laptop:
- Add support for various Fn keys on new models
- Fix touchpad on/off handling in a generic way to avoid having to
add more and more quirks
- Android x86 tablets:
- Add support for two more X86 Android tablet models
- New Dell WMI DDV driver
- Miscellaneous cleanups and small bugfixes
* tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (52 commits)
platform/mellanox: mlxbf-pmc: Fix event typo
platform/x86: intel_scu_ipc: fix possible name leak in __intel_scu_ipc_register()
platform/x86: sony-laptop: Convert to use sysfs_emit_at() API
platform/x86/dell: alienware-wmi: Use sysfs_emit() instead of scnprintf()
platform/x86: uv_sysfs: Use sysfs_emit() instead of scnprintf()
platform/x86: mxm-wmi: fix memleak in mxm_wmi_call_mx[ds|mx]()
platform/x86: x86-android-tablets: Add Advantech MICA-071 extra button
platform/x86: x86-android-tablets: Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data
platform/x86: x86-android-tablets: Add Medion Lifetab S10346 data
platform/x86: wireless-hotkey: use ACPI HID as phys
platform/x86/intel/hid: Add module-params for 5 button array + SW_TABLET_MODE reporting
platform/x86: ideapad-laptop: Make touchpad_ctrl_via_ec a module option
platform/x86: ideapad-laptop: Stop writing VPCCMD_W_TOUCHPAD at probe time
platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models
platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models
platform/x86: ideapad-laptop: Do not send KEY_TOUCHPAD* events on probe / resume
platform/x86: ideapad-laptop: Refactor ideapad_sync_touchpad_state()
tools/arch/x86: intel_sdsi: Add support for reading meter certificates
tools/arch/x86: intel_sdsi: Add support for new GUID
tools/arch/x86: intel_sdsi: Read more On Demand registers
...
Convert {clear,set}_bit() to atomics as KVM's ucall implementation relies
on clear_bit() being atomic, they are defined in atomic.h, and the same
helpers in the kernel proper are atomic.
KVM's ucall infrastructure is the only user of clear_bit() in tools/, and
there are no true set_bit() users. tools/testing/nvdimm/ does make heavy
use of set_bit(), but that code builds into a kernel module of sorts, i.e.
pulls in all of the kernel's header and so is already getting the kernel's
atomic set_bit().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to
match the kernel nomenclature where test_and_set_bit() is atomic,
and __test_and_set_bit() provides the non-atomic variant.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-3-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The structure and content of the On Demand registers is based on the GUID
which is read from hardware through sysfs. Add support for decoding the
registers of a new GUID 0xF210D9EF.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20221119002343.1281885-9-david.e.box@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Intel Software Defined Silicon (SDSi) is now officially known as Intel
On Demand. Change text in tool to indicate this.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20221119002343.1281885-7-david.e.box@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Add x86 and generic implementations of atomic_test_and_set_bit() to allow
KVM selftests to atomically manage bitmaps.
Note, the generic version is taken from arch_test_and_set_bit() as of
commit 415d832497 ("locking/atomic: Make test_and_*_bit() ordered on
failure").
Signed-off-by: Peter Gonda <pgonda@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-5-seanjc@google.com
DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.
Unify and correct naming while at it.
Fixes: e4d0e84e49 ("x86/cpu/AMD: Make LFENCE a serializing instruction")
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To pick the changes from:
257449c6a5 ("x86/cpufeatures: Add LbrExtV2 feature bit")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/lkml/Y1g6vGPqPhOrXoaN@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We also need to add SYM_TYPED_FUNC_START() to util/include/linux/linkage.h
and update tools/perf/check_headers.sh to ignore the include cfi_types.h
line when checking if the kernel original files drifted from the copies
we carry.
This is to get the changes from:
ccace936ee ("x86: Add types to indirectly called assembly functions")
Addressing these tools/perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/lkml/Y1f3VRIec9EBgX6F@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
0e5d5ae837 ("arm64: Add AMPERE1 to the Spectre-BHB affected list")
That addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/asm/cputype.h' differs from latest version at 'arch/arm64/include/asm/cputype.h'
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: D Scott Phillips <scott@os.amperecomputing.com>
https://lore.kernel.org/lkml/Y1fy5GD7ZYvkeufv@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
b8d1d16360 ("x86/apic: Don't disable x2APIC if locked")
ca5b7c0d96 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2022-10-14 18:06:34.294561729 -0300
+++ after 2022-10-14 18:06:41.285744044 -0300
@@ -264,6 +264,7 @@
[0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
+ [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT",
[0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
[0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
[0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
$
Now one can trace systemwide asking to see backtraces to where that MSR
is being read/written, see this example with a previous update:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Although new details added into this header is currently used by kernel
only, tools copy needs to be in sync with kernel file to avoid
tools/perf/check-headers.sh warnings.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-3-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
7df548840c ("x86/bugs: Add "unknown" reporting for MMIO Stale Data")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Link: https://lore.kernel.org/lkml/YysTRji90sNn2p5f@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
ae3b1da954 ("KVM: arm64: Fix compile error due to sign extension")
That doesn't result in any changes in tooling (when built on x86), only
addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/all/YwOMCCc4E79FuvDe@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0.
The minimum supported versions of these tools for the build according to
Documentation/process/changes.rst are 5.1 and 11.0.0 respectively.
Remove the feature detection script, Kconfig option, and clean up some
fallback code that is no longer supported.
The removed script was also testing for a GCC specific bug that was
fixed in the 4.7 release.
Also remove workarounds for bpftrace using clang older than 9.0.0, since
other BPF backend fixes are required at this point.
Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637
Acked-by: Borislav Petkov <bp@suse.de>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To pick the changes in:
43bb9e000e ("KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific")
94dfc73e7c ("treewide: uapi: Replace zero-length arrays with flexible-array members")
bfbcc81bb8 ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior")
b172862241 ("KVM: x86: PIT: Preserve state of speaker port data bit")
ed2351174e ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")
That just rebuilds kvm-stat.c on x86, no change in functionality.
This silences these perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Link: https://lore.kernel.org/lkml/Yv6OMPKYqYSbUxwZ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
2f4073e08f ("KVM: VMX: Enable Notify VM exit")
That makes 'perf kvm-stat' aware of this new NOTIFY exit reason, thus
addressing the following perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tao Xu <tao3.xu@intel.com>
Link: http://lore.kernel.org/lkml/Yv6LavXMZ+njijpq@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
f5ecfee944 ("KVM: s390: resetting the Topology-Change-Report")
None of them trigger any changes in tooling, this time this is just to silence
these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Link: http://lore.kernel.org/lkml/YvzwMXzaIzOU4WAY@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Intel eIBRS machines do not sufficiently mitigate against RET
mispredictions when doing a VM Exit therefore an additional RSB,
one-entry stuffing is needed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe
VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ
AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl
2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt
DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox
ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf
0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x
GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK
YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN
jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS
ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1
xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo=
=a8em
-----END PGP SIGNATURE-----
Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 eIBRS fixes from Borislav Petkov:
"More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET
mispredictions when doing a VM Exit therefore an additional RSB,
one-entry stuffing is needed"
* tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add LFENCE to RSB fill sequence
x86/speculation: Add RSB VM Exit protections
- Introduce 'perf lock contention' subtool, using new lock contention
tracepoints and using BPF for in kernel aggregation and then userspace
processing using the perf tooling infrastructure for resolving symbols, target
specification, etc.
Since the new lock contention tracepoints don't provide lock names, get up to
8 stack traces and display the first non-lock function symbol name as a caller:
$ perf lock report -F acquired,contended,avg_wait,wait_total
Name acquired contended avg wait total wait
update_blocked_a... 40 40 3.61 us 144.45 us
kernfs_fop_open+... 5 5 3.64 us 18.18 us
_nohz_idle_balance 3 3 2.65 us 7.95 us
tick_do_update_j... 1 1 6.04 us 6.04 us
ep_scan_ready_list 1 1 3.93 us 3.93 us
Supports the usual 'perf record' + 'perf report' workflow as well as a
BCC/bpftrace like mode where you start the tool and then press control+C to get
results:
$ sudo perf lock contention -b
^C
contended total wait max wait avg wait type caller
42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20
23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a
6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30
3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c
1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115
1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148
2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b
1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06
2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f
1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c
...
- Add new 'perf kwork' tool to trace time properties of kernel work (such as
softirq, and workqueue), uses eBPF skeletons to collect info in kernel space,
aggregating data that then gets processed by the userspace tool, e.g.:
# perf kwork report
Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
----------------------------------------------------------------------------------------------------
nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s |
amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s |
nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s |
nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s |
nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s |
nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s |
nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s |
nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s |
nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s |
nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s |
nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s |
enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s |
enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s |
--------------------------------------------------------------------------------------------------------
See commit log messages for more examples with extra options to limit the events time window, etc.
- Add support for new AMD IBS (Instruction Based Sampling) features:
With the DataSrc extensions, the source of data can be decoded among:
- Local L3 or other L1/L2 in CCX.
- A peer cache in a near CCX.
- Data returned from DRAM.
- A peer cache in a far CCX.
- DRAM address map with "long latency" bit set.
- Data returned from MMIO/Config/PCI/APIC.
- Extension Memory (S-Link, GenZ, etc - identified by the CS target
and/or address map at DF's choice).
- Peer Agent Memory.
- Support hardware tracing with Intel PT on guest machines, combining the
traces with the ones in the host machine.
- Add a "-m" option to 'perf buildid-list' to show kernel and modules
build-ids, to display all of the information needed to do external
symbolization of kernel stack traces, such as those collected by
bpf_get_stackid().
- Add arch TSC frequency information to perf.data file headers.
- Handle changes in the binutils disassembler function signatures in
perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).
- Fix building the perf perl binding with the newest gcc in distros such
as fedora rawhide, where some new warnings were breaking the build as
perf uses -Werror.
- Add 'perf test' entry for branch stack sampling.
- Add ARM SPE system wide 'perf test' entry.
- Add user space counter reading tests to 'perf test'.
- Build with python3 by default, if available.
- Add python converter script for the vendor JSON event files.
- Update vendor JSON files for alderlake, bonnell, broadwell, broadwellde,
broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding,
nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake,
skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp and westmereex.
- Add vendor JSON File for Intel meteorlake.
- Add Arm Cortex-A78C and X1C JSON vendor event files.
- Add workaround to symbol address reading from ELF files without phdr,
falling back to the previoous equation.
- Convert legacy map definition to BTF-defined in the perf BPF script test.
- Rework prologue generation code to stop using libbpf deprecated APIs.
- Add default hybrid events for 'perf stat' on x86.
- Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores).
- Prefer sampled CPU when exporting JSON in 'perf data convert'
- Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYuw6gwAKCRCyPKLppCJ+
J5+iAP0RL6sKMhzdkRjRYfG8CluJ401YaPHadzv5jxP8gOZz2gEAsuYDrMF9t1zB
4DqORfobdX9UQEJjP9oRltU73GM0swI=
=2/M0
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Introduce 'perf lock contention' subtool, using new lock contention
tracepoints and using BPF for in kernel aggregation and then
userspace processing using the perf tooling infrastructure for
resolving symbols, target specification, etc.
Since the new lock contention tracepoints don't provide lock names,
get up to 8 stack traces and display the first non-lock function
symbol name as a caller:
$ perf lock report -F acquired,contended,avg_wait,wait_total
Name acquired contended avg wait total wait
update_blocked_a... 40 40 3.61 us 144.45 us
kernfs_fop_open+... 5 5 3.64 us 18.18 us
_nohz_idle_balance 3 3 2.65 us 7.95 us
tick_do_update_j... 1 1 6.04 us 6.04 us
ep_scan_ready_list 1 1 3.93 us 3.93 us
Supports the usual 'perf record' + 'perf report' workflow as well as
a BCC/bpftrace like mode where you start the tool and then press
control+C to get results:
$ sudo perf lock contention -b
^C
contended total wait max wait avg wait type caller
42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20
23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a
6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30
3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c
1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115
1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148
2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b
1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06
2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f
1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c
...
- Add new 'perf kwork' tool to trace time properties of kernel work
(such as softirq, and workqueue), uses eBPF skeletons to collect info
in kernel space, aggregating data that then gets processed by the
userspace tool, e.g.:
# perf kwork report
Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end |
----------------------------------------------------------------------------------------------------
nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s |
amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s |
nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s |
nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s |
nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s |
nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s |
nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s |
nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s |
nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s |
nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s |
nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s |
enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s |
enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s |
--------------------------------------------------------------------------------------------------------
See commit log messages for more examples with extra options to limit
the events time window, etc.
- Add support for new AMD IBS (Instruction Based Sampling) features:
With the DataSrc extensions, the source of data can be decoded among:
- Local L3 or other L1/L2 in CCX.
- A peer cache in a near CCX.
- Data returned from DRAM.
- A peer cache in a far CCX.
- DRAM address map with "long latency" bit set.
- Data returned from MMIO/Config/PCI/APIC.
- Extension Memory (S-Link, GenZ, etc - identified by the CS target
and/or address map at DF's choice).
- Peer Agent Memory.
- Support hardware tracing with Intel PT on guest machines, combining
the traces with the ones in the host machine.
- Add a "-m" option to 'perf buildid-list' to show kernel and modules
build-ids, to display all of the information needed to do external
symbolization of kernel stack traces, such as those collected by
bpf_get_stackid().
- Add arch TSC frequency information to perf.data file headers.
- Handle changes in the binutils disassembler function signatures in
perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer).
- Fix building the perf perl binding with the newest gcc in distros
such as fedora rawhide, where some new warnings were breaking the
build as perf uses -Werror.
- Add 'perf test' entry for branch stack sampling.
- Add ARM SPE system wide 'perf test' entry.
- Add user space counter reading tests to 'perf test'.
- Build with python3 by default, if available.
- Add python converter script for the vendor JSON event files.
- Update vendor JSON files for most Intel cores.
- Add vendor JSON File for Intel meteorlake.
- Add Arm Cortex-A78C and X1C JSON vendor event files.
- Add workaround to symbol address reading from ELF files without phdr,
falling back to the previoous equation.
- Convert legacy map definition to BTF-defined in the perf BPF script
test.
- Rework prologue generation code to stop using libbpf deprecated APIs.
- Add default hybrid events for 'perf stat' on x86.
- Add topdown metrics in the default 'perf stat' on the hybrid machines
(big/little cores).
- Prefer sampled CPU when exporting JSON in 'perf data convert'
- Fix ('perf stat CSV output linter') and ("Check branch stack
sampling") 'perf test' entries on s390.
* tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
perf stat: Refactor __run_perf_stat() common code
perf lock: Print the number of lost entries for BPF
perf lock: Add --map-nr-entries option
perf lock: Introduce struct lock_contention
perf scripting python: Do not build fail on deprecation warnings
genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO
perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test
perf parse-events: Break out tracepoint and printing
perf parse-events: Don't #define YY_EXTRA_TYPE
tools bpftool: Don't display disassembler-four-args feature test
tools bpftool: Fix compilation error with new binutils
tools bpf_jit_disasm: Don't display disassembler-four-args feature test
tools bpf_jit_disasm: Fix compilation error with new binutils
tools perf: Fix compilation error with new binutils
tools include: add dis-asm-compat.h to handle version differences
tools build: Don't display disassembler-four-args feature test
tools build: Add feature test for init_disassemble_info API changes
perf test: Add ARM SPE system wide test
perf tools: Rework prologue generation code
perf bpf: Convert legacy map definition to BTF-defined
...
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as
documented for RET instructions after VM exits. Mitigate it with a new
one-entry RSB stuffing mechanism and a new LFENCE.
== Background ==
Indirect Branch Restricted Speculation (IBRS) was designed to help
mitigate Branch Target Injection and Speculative Store Bypass, i.e.
Spectre, attacks. IBRS prevents software run in less privileged modes
from affecting branch prediction in more privileged modes. IBRS requires
the MSR to be written on every privilege level change.
To overcome some of the performance issues of IBRS, Enhanced IBRS was
introduced. eIBRS is an "always on" IBRS, in other words, just turn
it on once instead of writing the MSR on every privilege level change.
When eIBRS is enabled, more privileged modes should be protected from
less privileged modes, including protecting VMMs from guests.
== Problem ==
Here's a simplification of how guests are run on Linux' KVM:
void run_kvm_guest(void)
{
// Prepare to run guest
VMRESUME();
// Clean up after guest runs
}
The execution flow for that would look something like this to the
processor:
1. Host-side: call run_kvm_guest()
2. Host-side: VMRESUME
3. Guest runs, does "CALL guest_function"
4. VM exit, host runs again
5. Host might make some "cleanup" function calls
6. Host-side: RET from run_kvm_guest()
Now, when back on the host, there are a couple of possible scenarios of
post-guest activity the host needs to do before executing host code:
* on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not
touched and Linux has to do a 32-entry stuffing.
* on eIBRS hardware, VM exit with IBRS enabled, or restoring the host
IBRS=1 shortly after VM exit, has a documented side effect of flushing
the RSB except in this PBRSB situation where the software needs to stuff
the last RSB entry "by hand".
IOW, with eIBRS supported, host RET instructions should no longer be
influenced by guest behavior after the host retires a single CALL
instruction.
However, if the RET instructions are "unbalanced" with CALLs after a VM
exit as is the RET in #6, it might speculatively use the address for the
instruction after the CALL in #3 as an RSB prediction. This is a problem
since the (untrusted) guest controls this address.
Balanced CALL/RET instruction pairs such as in step #5 are not affected.
== Solution ==
The PBRSB issue affects a wide variety of Intel processors which
support eIBRS. But not all of them need mitigation. Today,
X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates
PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e.,
eIBRS systems which enable legacy IBRS explicitly.
However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT
and most of them need a new mitigation.
Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE
which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT.
The lighter-weight mitigation performs a CALL instruction which is
immediately followed by a speculative execution barrier (INT3). This
steers speculative execution to the barrier -- just like a retpoline
-- which ensures that speculation can never reach an unbalanced RET.
Then, ensure this CALL is retired before continuing execution with an
LFENCE.
In other words, the window of exposure is opened at VM exit where RET
behavior is troublesome. While the window is open, force RSB predictions
sampling for RET targets to a dead end at the INT3. Close the window
with the LFENCE.
There is a subset of eIBRS systems which are not vulnerable to PBRSB.
Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB.
Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO.
[ bp: Massage, incorporate review comments from Andy Cooper. ]
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Hi Linus,
Please, pull the following treewide patch that replaces zero-length arrays
with flexible-array members in UAPI. This patch has been baking in
linux-next for 5 weeks now.
-fstrict-flex-arrays=3 is coming and we need to land these changes
to prevent issues like these in the short future:
../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0,
but the source string has length 2 (including NUL byte) [-Wfortify-source]
strcpy(de3->name, ".");
^
Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If
this breaks anything, we can use a union with a new member name.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmLoNdcACgkQRwW0y0cG
2zEVeg//QYJ3j2pbKt9zB6muO3SkrNoMPc5wpY/SITUeiDscukLvGzJG88eIZskl
NaEjbmacHmdlQrBkUdr10i1+hkb2zRd6/j42GIDXEhhKTMoT2UxJCBp47KSvd7VY
dKNLGsgQs3kwmmxLEGu6w6vywWpI5wxXTKWL1Q/RpUXoOnLmsMEbzKTjf12a1Edl
9gPNY+tMHIHyB0pGIRXDY/ZF5c+FcRFn6kKeMVzJL0bnX7FI4UmYe83k9ajEiLWA
MD3JAw/mNv2X0nizHHuQHIjtky8Pr+E8hKs5ni88vMYmFqeABsTw4R1LJykv/mYa
NakU1j9tHYTKcs2Ju+gIvSKvmatKGNmOpti/8RAjEX1YY4cHlHWNsigVbVRLqfo7
SKImlSUxOPGFS3HAJQCC9P/oZgICkUdD6sdLO1PVBnE1G3Fvxg5z6fGcdEuEZkVR
PQwlYDm1nlTuScbkgVSBzyU/AkntVMJTuPWgbpNo+VgSXWZ8T/U8II0eGrFVf9rH
+y5dAS52/bi6OP0la7fNZlq7tcPfNG9HJlPwPb1kQtuPT4m6CBhth/rRrDJwx8za
0cpJT75Q3CI0wLZ7GN4yEjtNQrlAeeiYiS4LMQ/SFFtg1KzvmYYVmWDhOf0+mMDA
f7bq4cxEg2LHwrhRgQQWowFVBu7yeiwKbcj9sybfA27bMqCtfto=
=8yMq
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull uapi flexible array update from Gustavo Silva:
"A treewide patch that replaces zero-length arrays with flexible-array
members in UAPI. This has been baking in linux-next for 5 weeks now.
'-fstrict-flex-arrays=3' is coming and we need to land these changes
to prevent issues like these in the short future:
fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source]
strcpy(de3->name, ".");
^
Since these are all [0] to [] changes, the risk to UAPI is nearly
zero. If this breaks anything, we can use a union with a new member
name"
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
* tag 'flexible-array-transformations-UAPI-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: uapi: Replace zero-length arrays with flexible-array members
To pick the changes from:
28a99e95f5 ("x86/amd: Use IBPB for firmware calls")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org
Link: https://lore.kernel.org/lkml/Yt6oWce9UDAmBAtX@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>