12960 Коммитов
Автор | SHA1 | Сообщение | Дата |
---|---|---|---|
Alexandre Truong | ab23692134 |
perf script: Use callchain_param_setup() instead of open coded equivalent
Refactoring script__setup_sample_type() by using callchain_param_setup() to replace the duplicate code for callchain parameter setting up. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-4-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Alexandre Truong | 32bfa5bf71 |
perf machine: Add a mechanism to inject stack frames
Add a mechanism for platforms to inject stack frames for the leaf frame caller if there is enough information to determine a frame is missing from dwarf or other post processing mechanisms. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-3-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Alexandre Truong | 7248e308a5 |
perf tools: Record ARM64 LR register automatically
On ARM64, automatically record the link register if the frame pointer mode is on. It will be used to do a dwarf unwind to find the caller of the leaf frame if the frame pointer was omitted. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-2-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Carsten Haitzler | f8464e084d |
perf test: Use 3 digits for test numbering now we can have more tests
This is in preparation for adding more tests that will need the test number to be 3 digts so they align nicely in the output. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Link: http://lore.kernel.org/lkml/20211215160403.69264-3-carsten.haitzler@foss.arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
German Gomez | ff8752d761 |
perf arm-spe: Synthesize SPE instruction events
Synthesize instruction events for every ARM SPE record. Arm SPE implements a hardware-based sample period, and perf implements a software-based one. Add a warning message to inform the user of this. Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211216152404.52474-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Thomas Richter | a840974e96 |
perf test: Test 73 Sig_trap fails on s390
In Linux next commit |
|
Namhyung Kim | 9c5c605219 |
perf ftrace: Implement cpu and task filters in BPF
Honor cpu and task options to set up filters (by pid or tid) in the BPF program. For example, the following command will show latency of the mutex_lock for process 2570. # perf ftrace latency -b -T mutex_lock -p 2570 sleep 3 # DURATION | COUNT | GRAPH | 0 - 1 us | 675 | ############################## | 1 - 2 us | 9 | | 2 - 4 us | 0 | | 4 - 8 us | 0 | | 8 - 16 us | 0 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | Committer testing: Looking at faults on a firefox process: # strace -e bpf perf ftrace latency -b -p 1674378 -T __handle_mm_fault bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffee1fee740, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\08\0\0\08\0\0\0\t\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=89, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\f\0\0\0\f\0\0\0\7\0\0\0\1\0\0\0\0\0\0\20"..., btf_log_buf=NULL, btf_size=43, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\5\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=77, btf_log_size=0, btf_log_level=0}, 128) = -1 EINVAL (Invalid argument) bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0 \3\0\0 \3\0\0\306\3\0\0\0\0\0\0\0\0\0\2"..., btf_log_buf=NULL, btf_size=1790, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=32, max_entries=1, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=5, insns=0x7ffee1fee570, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffee1fee3c0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="test", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10000, map_flags=0, inner_map_fd=0, map_name="functime", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="cpu_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=36, map_flags=0, inner_map_fd=0, map_name="task_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 6 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_ARRAY, key_size=4, value_size=8, max_entries=22, map_flags=0, inner_map_fd=0, map_name="latency", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 7 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=12, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="func_lat.bss", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=32, btf_vmlinux_value_type_id=0}, 128) = 8 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=8, key=0x7ffee1fee580, value=0x7f01d940a000, flags=BPF_ANY}, 128) = 0 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=42, insns=0x1871f30, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_begin", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x18746a0, func_info_cnt=1, line_info_rec_size=16, line_info=0x1874550, line_info_cnt=20, attach_btf_id=0, attach_prog_fd=0}, 128) = 9 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=99, insns=0x18769b0, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_end", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x188a640, func_info_cnt=1, line_info_rec_size=16, line_info=0x188a660, line_info_cnt=20, attach_btf_id=0, attach_prog_fd=0}, 128) = 10 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=2, insns=0x7ffee1fee3c0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 12 bpf(BPF_LINK_CREATE, {link_create={prog_fd=12, target_fd=-1, attach_type=0x29 /* BPF_??? */, flags=0}}, 128) = -1 EINVAL (Invalid argument) ^Cstrace: Process 1702285 detached # DURATION | COUNT | GRAPH | 0 - 1 us | 109 | ################# | 1 - 2 us | 127 | ################### | 2 - 4 us | 36 | ##### | 4 - 8 us | 20 | ### | 8 - 16 us | 2 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 177f4eac7f |
perf ftrace: Add -b/--use-bpf option for latency subcommand
The -b/--use-bpf option is to use BPF to get latency info of kernel functions. It'd have better performance impact and I observed that latency of same function is smaller than before when using BPF. Committer testing: # strace -e bpf perf ftrace latency -b -T __handle_mm_fault -a sleep 1 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7fff51914e00, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\08\0\0\08\0\0\0\t\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=89, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\f\0\0\0\f\0\0\0\7\0\0\0\1\0\0\0\0\0\0\20"..., btf_log_buf=NULL, btf_size=43, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\5\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=77, btf_log_size=0, btf_log_level=0}, 128) = -1 EINVAL (Invalid argument) bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\350\2\0\0\350\2\0\0\353\2\0\0\0\0\0\0\0\0\0\2"..., btf_log_buf=NULL, btf_size=1515, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=32, max_entries=1, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=5, insns=0x7fff51914c30, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7fff51914a80, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="test", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10000, map_flags=0, inner_map_fd=0, map_name="functime", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="cpu_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="task_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 7 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_ARRAY, key_size=4, value_size=8, max_entries=22, map_flags=0, inner_map_fd=0, map_name="latency", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 8 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="func_lat.bss", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=30, btf_vmlinux_value_type_id=0}, 128) = 9 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=9, key=0x7fff51914c40, value=0x7f6e99be2000, flags=BPF_ANY}, 128) = 0 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=18, insns=0x11e4160, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_begin", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x11dfc50, func_info_cnt=1, line_info_rec_size=16, line_info=0x11e04c0, line_info_cnt=9, attach_btf_id=0, attach_prog_fd=0}, 128) = 10 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=99, insns=0x11ded70, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_end", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x11dfc70, func_info_cnt=1, line_info_rec_size=16, line_info=0x11f6e10, line_info_cnt=20, attach_btf_id=0, attach_prog_fd=0}, 128) = 11 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=2, insns=0x7fff51914a80, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 13 bpf(BPF_LINK_CREATE, {link_create={prog_fd=13, target_fd=-1, attach_type=0x29 /* BPF_??? */, flags=0}}, 128) = -1 EINVAL (Invalid argument) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1699992, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 # DURATION | COUNT | GRAPH | 0 - 1 us | 52 | ################### | 1 - 2 us | 36 | ############# | 2 - 4 us | 24 | ######### | 4 - 8 us | 7 | ## | 8 - 16 us | 1 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | +++ exited with 0 +++ # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-5-namhyung@kernel.org [ Add missing util/cpumap.h include and removed unused 'fd' variable ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 53be502822 |
perf ftrace: Add 'latency' subcommand
The perf ftrace latency is to get a histogram of function execution time. Users should give a function name using -T option. This is implemented using function_graph tracer with the given function only. And it parses the output to extract the time. $ sudo perf ftrace latency -a -T mutex_lock sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 4596 | ######################## | 1 - 2 us | 1680 | ######### | 2 - 4 us | 1106 | ##### | 4 - 8 us | 546 | ## | 8 - 16 us | 562 | ### | 16 - 32 us | 1 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | Committer testing: Latency for the __handle_mm_fault kernel function, system wide for 1 second, see how one can go from the usual 'perf ftrace' output, now the same as for the 'perf ftrace trace' subcommand, to the new 'perf ftrace latency' subcommand: # perf ftrace -T __handle_mm_fault -a sleep 1 | wc -l 709 # perf ftrace -T __handle_mm_fault -a sleep 1 | wc -l 510 # perf ftrace -T __handle_mm_fault -a sleep 1 | head -20 # tracer: function # # entries-in-buffer/entries-written: 0/0 #P:32 # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | perf-exec-1685104 [007] 90638.894613: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894620: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894622: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894635: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894688: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894702: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894714: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894728: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894740: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894751: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894962: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894977: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894983: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894995: __handle_mm_fault <-handle_mm_fault # perf ftrace latency -T __handle_mm_fault -a sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 125 | ###### | 1 - 2 us | 249 | ############# | 2 - 4 us | 455 | ######################## | 4 - 8 us | 37 | # | 8 - 16 us | 0 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | a9b8ae8ae3 |
perf ftrace: Move out common code from __cmd_ftrace
The signal setup code and evlist__prepare_workload() can be used for other subcommands. Let's move them out of the __cmd_ftrace(). Then it doesn't need to pass argc and argv. On the other hand, select_tracer() is specific to the 'trace' subcommand so it'd better moving it into the __cmd_ftrace(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 416e15ad17 |
perf ftrace: Add 'trace' subcommand
This is a preparation to add more sub-commands for ftrace. The 'trace' subcommand does the same thing when no subcommand is given. Committer testing: The previous mode, i.e. no subcommand and the new 'perf ftrace trace' are equivalent: # perf ftrace -G check_preempt_curr sleep 0.00001 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 25) | check_preempt_curr() { 25) | resched_curr() { 25) | native_smp_send_reschedule() { 25) | default_send_IPI_single_phys() { 25) 0.110 us | __default_send_IPI_dest_field(); 25) 0.490 us | } 25) 0.640 us | } 25) 0.850 us | } 25) 2.060 us | } # perf ftrace trace -G check_preempt_curr sleep 0.00001 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 10) | check_preempt_curr() { 10) | resched_curr() { 10) | native_smp_send_reschedule() { 10) | default_send_IPI_single_phys() { 10) 0.080 us | __default_send_IPI_dest_field(); 10) 0.460 us | } 10) 0.610 us | } 10) 0.830 us | } 10) 2.020 us | } # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
German Gomez | 83869019c7 |
perf arch: Support register names from all archs
When reading a perf.data file with register values, there is a mismatch between the names and the values of the registers because the tool is built using only the register names from the local architecture. Reading a perf.data file that was recorded on ARM64, gives the following erroneous output on an X86 machine: # perf report -i perf_arm64.data -D [...] 24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0 ... user regs: mask 0x1ffffffff ABI 64-bit .... AX 0x0000ffffd1515817 .... BX 0x0000ffffd1515480 .... CX 0x0000aaaadabf6c80 .... DX 0x000000000000002e .... SI 0x0000000040100401 .... DI 0x0040600200000080 .... BP 0x0000ffffd1510e10 .... SP 0x0000000000000000 .... IP 0x00000000000000dd .... FLAGS 0x0000ffffd1510cd0 .... CS 0x0000000000000000 .... SS 0x0000000000000030 .... DS 0x0000ffffa569a208 .... ES 0x0000000000000000 .... FS 0x0000000000000000 .... GS 0x0000000000000000 .... R8 0x0000aaaad3de9650 .... R9 0x0000ffffa57397f0 .... R10 0x0000000000000001 .... R11 0x0000ffffa57fd000 .... R12 0x0000ffffd1515817 .... R13 0x0000ffffd1515480 .... R14 0x0000aaaadabf6c80 .... R15 0x0000000000000000 .... unknown 0x0000000000000001 .... unknown 0x0000000000000000 .... unknown 0x0000000000000000 .... unknown 0x0000000000000000 .... unknown 0x0000000000000000 .... unknown 0x0000ffffd1510d90 .... unknown 0x0000ffffa5739b90 .... unknown 0x0000ffffd1510d80 .... XMM0 0x0000ffffa57392c8 ... thread: perf-exec:43239 ...... dso: [kernel.kallsyms] As can be seen, the register names correspond to X86 registers, even though the perf.data file was recorded on an ARM64 system. After this patch, the output of the command displays the correct register names: # perf report -i perf_arm64.data -D [...] 24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0 ... user regs: mask 0x1ffffffff ABI 64-bit .... x0 0x0000ffffd1515817 .... x1 0x0000ffffd1515480 .... x2 0x0000aaaadabf6c80 .... x3 0x000000000000002e .... x4 0x0000000040100401 .... x5 0x0040600200000080 .... x6 0x0000ffffd1510e10 .... x7 0x0000000000000000 .... x8 0x00000000000000dd .... x9 0x0000ffffd1510cd0 .... x10 0x0000000000000000 .... x11 0x0000000000000030 .... x12 0x0000ffffa569a208 .... x13 0x0000000000000000 .... x14 0x0000000000000000 .... x15 0x0000000000000000 .... x16 0x0000aaaad3de9650 .... x17 0x0000ffffa57397f0 .... x18 0x0000000000000001 .... x19 0x0000ffffa57fd000 .... x20 0x0000ffffd1515817 .... x21 0x0000ffffd1515480 .... x22 0x0000aaaadabf6c80 .... x23 0x0000000000000000 .... x24 0x0000000000000001 .... x25 0x0000000000000000 .... x26 0x0000000000000000 .... x27 0x0000000000000000 .... x28 0x0000000000000000 .... x29 0x0000ffffd1510d90 .... lr 0x0000ffffa5739b90 .... sp 0x0000ffffd1510d80 .... pc 0x0000ffffa57392c8 ... thread: perf-exec:43239 ...... dso: [kernel.kallsyms] Tester comments: Athira reports: "Looks good to me. Tested this patchset in powerpc by capturing regs in powerpc and doing perf report to read the data from x86." Reported-by: Alexandre Truong <alexandre.truong@arm.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211207180653.1147374-4-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
German Gomez | d3b58af9a8 |
perf arm64: Rename perf_event_arm_regs for ARM64 registers
The registers for ARM and ARM64 are enumerated using two enums that have the same name. In order to be able to import both headers, the name of one can be replaced using the C preprocessor like so: #define perf_event_arm_regs perf_event_arm64_regs #include <asm/perf_regs.h> #undef perf_event_arm_regs This patch updates all imports of ARM64's perf_regs.h in order to prevent the naming collision. Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211207180653.1147374-3-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Leo Yan | 5d28a17c1c |
perf namespaces: Add helper nsinfo__is_in_root_namespace()
Refactors code for gathering PID infos, it creates the function nsinfo__get_nspid() to parse process 'status' node in folder '/proc'. Base on the refactoring, this patch introduces a new helper nsinfo__is_in_root_namespace(), it returns true when the caller runs in the root PID namespace. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20211212134721.1721245-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Miaoqian Lin | 8acf3793ea |
perf bpf-loader: Use IS_ERR_OR_NULL() to clean code and fix check
Use IS_ERR_OR_NULL() to make the code cleaner. Also if the priv is NULL, it's improper to call PTR_ERR(priv). Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Cc: unlisted-recipients Link: http://lore.kernel.org/lkml/20211212135613.20000-1-linmq006@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
James Clark | 7cc9680c4b |
perf cs-etm: Remove duplicate and incorrect aux size checks
There are two checks, one is for size when running without admin, but this one is covered by the driver and reported on in more detail here (builtin-record.c): pr_err("Permission error mapping pages.\n" "Consider increasing " "/proc/sys/kernel/perf_event_mlock_kb,\n" "or try again with a smaller value of -m/--mmap_pages.\n" "(current value: %u,%u)\n", This had the effect of artificially limiting the aux buffer size to a value smaller than what was allowed because perf_event_mlock_kb wasn't taken into account. The second is to check for a power of two, but this is covered here (evlist.c): pr_info("rounding mmap pages size to %s (%lu pages)\n", buf, pages); Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211208115435.610101-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Andrew Kilroy | 6732f10b11 |
perf vendor events: Rename arm64 arch std event files
A previous commit adds pmu events into the files armv8-common-and-microarch.json armv8-recommended.json that are actually specified in an armv9 reference supplement, not armv8. As such, naming the files with the armv8 prefix seems artificial. This patch renames the files to reflect that these two files are for arch std events regardless of whether they are defined in armv8 or armv9. Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211210123706.7490-3-andrew.kilroy@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Andrew Kilroy | 3987d65f45 |
perf vendor events: For the Arm Neoverse N2
Updates the common and microarch json file to add counters available in the Arm Neoverse N2 chip, but should also apply to other ArmV8 and ArmV9 cpus. Specified in ArmV8 architecture reference manual https://developer.arm.com/documentation/ddi0487/gb/?lang=en Some of the counters added to armv8-common-and-microarch.json are specified in the ArmV9 architecture reference manual supplement (issue A.a): https://developer.arm.com/documentation/ddi0608/aa The additional ArmV9 counters are TRB_WRAP TRCEXTOUT0 TRCEXTOUT1 TRCEXTOUT2 TRCEXTOUT3 CTI_TRIGOUT4 CTI_TRIGOUT5 CTI_TRIGOUT6 CTI_TRIGOUT7 This patch also adds files in pmu-events/arch/arm64/arm/neoverse-n2 for perf list to output the counter names in categories. Counters on the Neoverse N2 are stated in its reference manual: https://developer.arm.com/documentation/102099/0000 Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211210123706.7490-2-andrew.kilroy@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Salvatore Bonaccorso | 888569dbcd |
perf dlfilter: Drop unused variable
Compiling tools/perf/dlfilters/dlfilter-test-api-v0.c result in: checking for stdlib.h... dlfilters/dlfilter-test-api-v0.c: In function ‘filter_event’: dlfilters/dlfilter-test-api-v0.c:311:29: warning: unused variable ‘d’ [-Wunused-variable] 311 | struct filter_data *d = data; | So remove the variable now. Reviewed-by: German Gomez <german.gomez@arm.com> Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20211123211821.132924-1-carnil@debian.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | b0fde9c6e2 |
perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHT
Use total latency info in the SPE counter packet as sample weight so that we can see it in local_weight and (global) weight sort keys. Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well but I'm not sure which latency it matches. So just adding total latency first. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211201220855.1260688-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Sohaib Mohamed | f0a29c9647 |
perf bench: Use unbuffered output when pipe/tee'ing to a file
The output of 'perf bench' gets buffered when I pipe it to a file or to tee, in such a way that I can see it only at the end. E.g. $ perf bench internals synthesize -t < output comes out fine after each test run > $ perf bench internals synthesize -t | tee file.txt < output comes out only at the end of all tests > This patch resolves this issue for 'bench' and 'test' subcommands. See, also: $ perf bench mem all | tee file.txt $ perf bench sched all | tee file.txt $ perf bench internals all -t | tee file.txt $ perf bench internals all | tee file.txt Committer testing: It really gets staggered, i.e. outputs in bursts, when the buffer fills up and has to be drained to make up space for more output. Suggested-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Sohaib Mohamed <sohaib.amhmd@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211119061409.78004-1-sohaib.amhmd@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 39f054a98a |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Miaoqian Lin | 9937e8daab |
perf python: Fix NULL vs IS_ERR_OR_NULL() checking
The function trace_event__tp_format_id may return ERR_PTR(-ENOMEM). Use IS_ERR_OR_NULL to check tp_format. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: http://lore.kernel.org/lkml/20211211053856.19827-1-linmq006@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Adrian Hunter | 6665b8e483 |
perf intel-pt: Fix error timestamp setting on the decoder error path
An error timestamp shows the last known timestamp for the queue, but this
is not updated on the error path. Fix by setting it.
Fixes:
|
|
Adrian Hunter | a882cc9497 |
perf intel-pt: Fix missing 'instruction' events with 'q' option
FUP packets contain IP information, which makes them also an 'instruction'
event in 'hop' mode i.e. the itrace 'q' option. That wasn't happening, so
restructure the logic so that FUP events are added along with appropriate
'instruction' and 'branch' events.
Fixes:
|
|
Adrian Hunter | a32e6c5da5 |
perf intel-pt: Fix next 'err' value, walking trace
Code after label 'next:' in intel_pt_walk_trace() assumes 'err' is zero,
but it may not be, if arrived at via a 'goto'. Ensure it is zero.
Fixes:
|
|
Adrian Hunter | c79ee2b216 |
perf intel-pt: Fix state setting when receiving overflow (OVF) packet
An overflow (OVF packet) is treated as an error because it represents a
loss of trace data, but there is no loss of synchronization, so the packet
state should be INTEL_PT_STATE_IN_SYNC not INTEL_PT_STATE_ERR_RESYNC.
To support that, some additional variables must be reset, and the FUP
packet that may follow OVF is treated as an FUP event.
Fixes:
|
|
Adrian Hunter | 4c761d805b |
perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state type
intel_pt_fup_event() assumes it can overwrite the state type if there has
been an FUP event, but this is an unnecessary and unexpected constraint on
callers.
Fix by touching only the state type flags that are affected by an FUP
event.
Fixes:
|
|
Adrian Hunter | ad106a26ae |
perf intel-pt: Fix sync state when a PSB (synchronization) packet is found
When syncing, it may be that branch packet generation is not enabled at
that point, in which case there will not immediately be a control-flow
packet, so some packets before a control flow packet turns up, get
ignored. However, the decoder is in sync as soon as a PSB is found, so
the state should be set accordingly.
Fixes:
|
|
Adrian Hunter | 057ae59f5a |
perf intel-pt: Fix some PGE (packet generation enable/control flow packets) usage
Packet generation enable (PGE) refers to whether control flow (COFI) packets are being produced. PGE may be false even when branch-tracing is enabled, due to being out-of-context, or outside a filter address range. Fix some missing PGE usage. Fixes: |
|
German Gomez | c897899752 |
perf tools: Prevent out-of-bounds access to registers
The size of the cache of register values is arch-dependant (PERF_REGS_MAX). This has the potential of causing an out-of-bounds access in the function "perf_reg_value" if the local architecture contains less registers than the one the perf.data file was recorded on. Since the maximum number of registers is bound by the bitmask "u64 cache_mask", and the size of the cache when running under x86 systems is 64 already, fix the size to 64 and add a range-check to the function "perf_reg_value" to prevent out-of-bounds access. Reported-by: Alexandre Truong <alexandre.truong@arm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211201123334.679131-2-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Andrew Kilroy | 8ff4f20f3e |
perf vendor events arm64: Fix JSON indentation to 4 spaces standard
Correct indentation to 4 spaces, same as the other JSON files. Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20211203123525.31127-2-andrew.kilroy@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jin Yao | e69dc84282 |
perf stat: Support --cputype option for hybrid events
In previous patch, we have supported the syntax which enables the event on a specified pmu, such as: cpu_core/<event>/ cpu_atom/<event>/ While this syntax is not very easy for applying on a set of events or applying on a group. In following example, we have to explicitly assign the pmu prefix. # ./perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -- sleep 1 Performance counter stats for 'sleep 1': 1,158,545 cpu_core/cycles/ 1,003,113 cpu_core/instructions/ 1.002428712 seconds time elapsed A much easier way is: # ./perf stat --cputype core -e '{cycles,instructions}' -- sleep 1 Performance counter stats for 'sleep 1': 1,101,071 cpu_core/cycles/ 939,892 cpu_core/instructions/ 1.002363142 seconds time elapsed For this example, the '--cputype' enables the events from specified pmu (cpu_core). If '--cputype' conflicts with pmu prefix, '--cputype' is ignored. # ./perf stat --cputype core -e cycles,cpu_atom/instructions/ -a -- sleep 1 Performance counter stats for 'system wide': 21,003,407 cpu_core/cycles/ 367,886 cpu_atom/instructions/ 1.002203520 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210909062215.10278-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Uwe Kleine-König | ed17b19149 |
perf tools: Drop requirement for libstdc++.so for libopencsd check
It's possible to link against libopencsd_c_api without having
libstdc++.so available, only libstdc++.so.6.0.28 (or whatever version is
in use) needs to be available. The same holds true for libopencsd.so.
When -lstdc++ (or -lopencsd) is explicitly passed to the linker however
the .so file must be available.
So wrap adding the dependencies into a check for static linking that
actually requires adding them all. The same construct is already used
for some other tests in the same file to reduce dependencies in the
dynamic linking case.
Fixes:
|
|
Ian Rogers | 94dbfd6781 |
perf parse-events: Architecture specific leader override
Currently topdown events must appear after a slots event: $ perf stat -e '{slots,topdown-fe-bound}' /bin/true Performance counter stats for '/bin/true': 3,183,090 slots 986,133 topdown-fe-bound Reversing the events yields: $ perf stat -e '{topdown-fe-bound,slots}' /bin/true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-fe-bound). For metrics the order of events is determined by iterating over a hashmap, and so slots isn't guaranteed to be first which can yield this error. Change the set_leader in parse-events, called when a group is closed, so that rather than always making the first event the leader, if the slots event exists then it is made the leader. It is then moved to the head of the evlist otherwise it won't be opened in the correct order. The result is: $ perf stat -e '{topdown-fe-bound,slots}' /bin/true Performance counter stats for '/bin/true': 3,274,795 slots 1,001,702 topdown-fe-bound A problem with this approach is the slots event is identified by name, names can be overwritten like 'cpu/slots,name=foo/' and this causes the leader change to fail. The change also modifies and fixes mixed groups like, with the change: $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2 Performance counter stats for 'system wide': 5574985410 slots 971981616 instructions 1348461887 topdown-fe-bound 2.001263120 seconds time elapsed Without the change: $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2 Performance counter stats for 'system wide': <not counted> instructions <not counted> slots <not supported> topdown-fe-bound 2.006247990 seconds time elapsed Something that may be undesirable here is that the events are reordered in the output. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vineet Singh <vineet.singh@intel.com> Link: http://lore.kernel.org/lkml/20211130174945.247604-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | ecdcf630d7 |
perf evlist: Allow setting arbitrary leader
The leader of a group is the first, but allow it to be an arbitrary list member so that for Intel topdown events slots may always be the group leader. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vineet Singh <vineet.singh@intel.com> Link: http://lore.kernel.org/lkml/20211130174945.247604-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | 6b6b16b3bb |
perf metric: Reduce multiplexing with duration_time
It is common to use the same counters with and without duration_time. The ID sharing code treats duration_time as if it were a hardware event placed in the same group. This causes unnecessary multiplexing such as in the following example where l3_cache_access isn't shared: $ perf stat -M l3 -a sleep 1 Performance counter stats for 'system wide': 3,117,007 l3_cache_miss # 199.5 MB/s l3_rd_bw # 43.6 % l3_hits # 56.4 % l3_miss (50.00%) 5,526,447 l3_cache_access (50.00%) 5,392,435 l3_cache_access # 5389191.2 access/s l3_access_rate (50.00%) 1,000,601,901 ns duration_time 1.000601901 seconds time elapsed Fix this by placing duration_time in all groups unless metric sharing has been disabled on the command line: $ perf stat -M l3 -a sleep 1 Performance counter stats for 'system wide': 3,597,972 l3_cache_miss # 230.3 MB/s l3_rd_bw # 48.0 % l3_hits # 52.0 % l3_miss 6,914,459 l3_cache_access # 6909935.9 access/s l3_access_rate 1,000,654,579 ns duration_time 1.000654579 seconds time elapsed $ perf stat --metric-no-merge -M l3 -a sleep 1 Performance counter stats for 'system wide': 3,501,834 l3_cache_miss # 53.5 % l3_miss (24.99%) 6,548,173 l3_cache_access (24.99%) 3,417,622 l3_cache_miss # 45.7 % l3_hits (25.04%) 6,294,062 l3_cache_access (25.04%) 5,923,238 l3_cache_access # 5919688.1 access/s l3_access_rate (24.99%) 1,000,599,683 ns duration_time 3,607,486 l3_cache_miss # 230.9 MB/s l3_rd_bw (49.97%) 1.000599683 seconds time elapsed v2. Doesn't count duration_time in the metric_list_cmp function that sorts larger metrics first. Without this a metric with duration_time and an event is sorted the same as a metric with two events, possibly not allowing the first metric to share with the second. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211124015226.3317994-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Gang Li | b4515ad6e1 |
perf trace: Enable ignore_missing_thread for trace
perf already support ignore_missing_thread for -u/-p, but not yet applied to `perf trace`. This patch enables ignore_missing_thread for `perf trace`. Signed-off-by: Gang Li <ligang.bdlg@bytedance.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.org Link: http://lkml.kernel.org/r/1513148513-6974-1-git-send-email-zhangmengting@huawei.com Link: http://lore.kernel.org/lkml/20211123074018.11406-1-ligang.bdlg@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Sandipan Das | 7a2e14962c |
perf docs: Update link to AMD documentation
This updates the link to documentation on AMD processors. The new link points to a page where users can find the Processor Programming Reference (PPR) documents for the family and model codes corresponding to processors they are using. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Robert Richter <rrichter@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20211123084613.243792-2-sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Sandipan Das | 4edb117e64 |
perf docs: Add info on AMD raw event encoding
AMD processors have events with event select codes and unit masks larger than a byte. The core PMU, for example, uses 12-bit event select codes split between bits 0-7 and 32-35 of the PERF_CTL MSRs as can be seen from /sys/bus/event_sources/devices/cpu/format/*. The Processor Programming Reference (PPR) lists the event codes as unified 12-bit hexadecimal values instead and the split between the bits is not apparent to someone who is not aware of the layout of the PERF_CTL MSRs. 8-bit event select codes continue to work as the layout matches that of the PERF_CTL MSRs i.e. bits 0-7 for event select and 8-15 for unit mask. This adds more details in the perf man pages about using /sys/bus/event_sources/devices/*/format/* for determining the correct raw event encoding scheme. E.g. the "op_cache_hit_miss.op_cache_hit" event with code 0x28f and umask 0x03 can be programmed using its symbolic name as: $ sudo perf --debug perf-event-open stat -e op_cache_hit_miss.op_cache_hit sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 size 128 config 0x20000038f sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ [...] One might use a simple eventsel+umask combination based on what the current man pages say and incorrectly program the event as: $ sudo perf --debug perf-event-open stat -e r0328f sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 size 128 config 0x328f sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ [...] When it should have been based on the format from sysfs: $ cat /sys/bus/event_source/devices/cpu/format/event config:0-7,32-35 $ sudo perf --debug perf-event-open stat -e r20000038f sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 size 128 config 0x20000038f sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ [...] Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Robert Richter <rrichter@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20211123084613.243792-1-sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Shunsuke Nakamura | 9a5b2d1afa |
libperf: Adopt perf_counts_values__scale() from tools/perf/util
Move perf_counts_values__scale() from tools/perf/util to tools/lib/perf so that it can be used with libperf. Committer notes: As noted by Jiri, use __s8 instead of s8 on the exported function. Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211109085831.3770594-2-nakamura.shun@fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
John Garry | c77a78c291 |
tools build: Enable warnings through HOSTCFLAGS
The tools build system uses KBUILD_HOSTCFLAGS symbol for obvious purposes. However this is not set for anything under tools/ As such, host tools apps built have no compiler warnings enabled. Declare HOSTCFLAGS for perf tools build, and also use that symbol in declaration of host_c_flags. HOSTCFLAGS comes from EXTRA_WARNINGS, which is independent of target platform/arch warning flags. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Laura Abbott <labbott@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1635525041-151876-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | e9c08f7229 |
perf test sigtrap: Print errno string when failing
Helps a bit the user figuring out why it is failing: Before: $ perf test sigtrap 73: Sigtrap : FAILED! $ perf test -v sigtrap 73: Sigtrap : --- start --- test child forked, pid 3816772 FAILED sys_perf_event_open() test child finished with -1 ---- end ---- Sigtrap: FAILED! $ After: $ perf test sigtrap 73: Sigtrap : FAILED! $ perf test -v sigtrap 73: Sigtrap : --- start --- test child forked, pid 3816772 FAILED sys_perf_event_open(): Permission denied test child finished with -1 ---- end ---- Sigtrap: FAILED! $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kasan-dev@googlegroups.com Link: http://lore.kernel.org/lkml/YZOpSVOCXe0zWeRs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Marco Elver | 5504f67944 |
perf test sigtrap: Add basic stress test for sigtrap handling
Add basic stress test for sigtrap handling as a perf tool built-in test. This allows sanity checking the basic sigtrap functionality from within the perf tool. Committer notes: Reported that !root was getting -EPERM, applied a fixup from Marco to set .exclude_{hv,kernel} that made it work. Signed-off-by: Marco Elver <elver@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kasan-dev@googlegroups.com Link: http://lore.kernel.org/lkml/20211115112822.4077224-1-elver@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Song Liu | 5a897531e0 |
perf bpf_skel: Do not use typedef to avoid error on old clang
When building bpf_skel with clang-10, typedef causes confusions like:
libbpf: map 'prev_readings': unexpected def kind var.
Fix this by removing the typedef.
Fixes:
|
|
Song Liu | f7c4e85bcc |
perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
Arnaldo reported that building all his containers with BUILD_BPF_SKEL=1 to then make this the default he found problems in some distros where the system linux/bpf.h file was being used and lacked this: util/bpf_skel/bperf_leader.bpf.c:13:20: error: use of undeclared identifier 'BPF_F_PRESERVE_ELEMS' __uint(map_flags, BPF_F_PRESERVE_ELEMS); So use instead the vmlinux.h file generated by bpftool from BTF info. This fixed these as well, getting the build back working on debian:11, debian:experimental and ubuntu:21.10: In file included from In file included from util/bpf_skel/bperf_leader.bpf.cutil/bpf_skel/bpf_prog_profiler.bpf.c::33: : In file included from In file included from /usr/include/linux/bpf.h/usr/include/linux/bpf.h::1111: : /usr/include/linux/types.h/usr/include/linux/types.h::55::1010:: In file included from util/bpf_skel/bperf_follower.bpf.c:3fatal errorfatal error: : : In file included from /usr/include/linux/bpf.h:'asm/types.h' file not found11'asm/types.h' file not found: /usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found #include <asm/types.h>#include <asm/types.h> ^~~~~~~~~~~~~ ^~~~~~~~~~~~~ #include <asm/types.h> ^~~~~~~~~~~~~ 1 error generated. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/CF175681-8101-43D1-ABDB-449E644BE986@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | 4747395082 |
perf header: Fix memory leaks when processing feature headers
These leaks were found with leak sanitizer running "perf pipe recording and injection test". In pipe mode feat_fd may hold onto an events struct that needs freeing. When string features are processed they may overwrite an already created string, so free this before the overwrite. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211118201730.2302927-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | 1aa79e5773 |
perf test: Reset shadow counts before loading
Otherwise load counting is an average. Without this change duration_time in test_memory_bandwidth will alter its value if an earlier test contains duration_time. This patch fixes an issue that's introduced in the proposed patch: https://lore.kernel.org/lkml/20211124015226.3317994-1-irogers@google.com/ in perf test "Parse and process metrics". Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211128085810.4027314-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Thomas Richter | 6c481031c9 |
perf test: Fix 'Simple expression parser' test on arch without CPU die topology info
Some platforms do not have CPU die support, for example s390. Commit Cc: Ian Rogers <irogers@google.com> Fixes: |
|
Arnaldo Carvalho de Melo | 3d1d57debe |
tools build: Remove needless libpython-version feature check that breaks test-all fast path
Since |