Граф коммитов

13493 Коммитов

Автор SHA1 Сообщение Дата
Ian Rogers 8252e7917e perf test: Parse events tidy evlist_test
Remove two unused variables. Make structs const. Also fix the array
index (aka id) for the event software/r0x1a/.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220518042027.836799-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:04:55 -03:00
Ian Rogers b58eca408c perf test: Parse events tidy terms_test
Remove an unused variables. Make structs const. Fix checkpatch issue wrt
unsigned not being with an int.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220518042027.836799-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:04:46 -03:00
Ian Rogers 7312c36ce6 perf test: Basic mmap use skip
If opening the event fails for basic mmap with EACCES it is more
likely permission related that a true error. Mark the test as skip
in this case and add a skip reason.

Committer testing:

Before:

  $ perf test "mmap interface"
    4: Read samples using the mmap interface           : FAILED!
  $

After:

  $ perf test "mmap interface"
    4: Read samples using the mmap interface           : Skip (permissions)
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220518042027.836799-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:03:39 -03:00
Ian Rogers f9b10c82fa perf test: Use skip in openat syscall
Failures to open the tracepoint cause this test to fail, however,
typically such failures are permission related. Lower the failure to
just skipping the test in those cases and add a skip reason.

Committer testing:

Before:

  $ perf test "openat syscall"
    2: Detect openat syscall event                        : FAILED!
    3: Detect openat syscall event on all cpus            : FAILED!
  $

After:

  $ perf test "openat syscall"
    2: Detect openat syscall event                        : Skip (permissions)
    3: Detect openat syscall event on all cpus            : Skip (permissions)
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220518042027.836799-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:01:57 -03:00
Ian Rogers 740f8a8241 perf test: Use skip in vmlinux kallsyms
Currently failures in reading vmlinux or kallsyms result in a test
failure. However, the failure is typically permission related. Prefer to
flag these failures as skip.

Committer testing:

Before:

  $ perf test vmlinux
    1: vmlinux symtab matches kallsyms                 : FAILED!
  $

After:

  $ perf test vmlinux
    1: vmlinux symtab matches kallsyms                 : Skip
  $

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220518042027.836799-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:01:15 -03:00
Ian Rogers cfa5013a41 perf test: Skip reason for suites with 1 test
When a suite has just 1 subtest, the subtest number is given as -1 to
avoid indented printing. When this subtest number is seen for the skip
reason, use the reason of the first test.

Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220518042027.836799-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 10:00:43 -03:00
Ian Rogers 0b9462d0ac perf stat: Make use of index clearer with perf_counts
Try to disambiguate further when perf_counts is being accessed it is
with a cpu map index rather than a CPU.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220519032005.1273691-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:54:02 -03:00
Ian Rogers 54668a4ea0 perf bpf_counter: Tidy use of CPU map index
BPF counters are typically running across all CPUs and so the CPU map
index and CPU number are the same. There may be cases with offline CPUs
where this isn't the case and so ensure the cpu map index for
perf_counts is going to be a valid index by explicitly iterating over
the CPU map. This also makes it clearer that users of perf_counts are
using an index. Collapse some multiple uses of perf_counts into single
uses.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220519032005.1273691-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:53:06 -03:00
Ian Rogers 0dd9769f0c perf stat: Add stat record+report test
This would have caught:

  "Subject: Re: perf stat report segfaults"
  https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220519032005.1273691-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:51:55 -03:00
Namhyung Kim 7c3bcbdf44 perf lock: Add -t/--thread option for report
The -t option is to show per-thread lock stat like below:

  $ perf lock report -t -F acquired,contended,avg_wait

                Name   acquired  contended   avg wait (ns)

                perf     240569          9            5784
             swapper     106610         19             543
              :15789      17370          2           14538
        ContainerMgr       8981          6             874
               sleep       5275          1           11281
     ContainerThread       4416          4             944
     RootPressureThr       3215          5            1215
         rcu_preempt       2954          0               0
        ContainerMgr       2560          0               0
             unnamed       1873          0               0
     EventManager_De       1845          1             636
     futex-default-S       1609          0               0
  ...

Committer notes:

Add that option to the 'perf lock report' man page.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220521010811.932703-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:49:35 -03:00
Namhyung Kim 79d9333b85 perf lock: Do not discard broken lock stats
Currently it discards a lock_stat for a lock instance when there's a
broken lock_seq_stat in a single task for the lock.  But it also means
that the existing (and later) valid lock stat info for that lock will
be discarded as well.

This is not ideal since we can lose many valuable info because of a
single failure.  Actually those failures are indepent to the existing
stat.  So we can only discard the broken lock_seq_stat but keep the
valid lock_stat.

The discarded lock_seq_stat will be reallocated in a subsequent event
with SEQ_STATE_UNINITIALIZED which will be ignored until it see the
start of the next sequence.  So it should be ok just free it.

Before:

  $ perf lock report -F acquired,contended,avg_wait

  Warning:
  Processed 1401603 events and lost 18 chunks!

  Check IO/CPU overload!

                  Name   acquired  contended   avg wait (ns)

         rcu_read_lock     251225          0               0
   &(ei->i_block_re...       8731          0               0
   &sb->s_type->i_l...       8731          0               0
    hrtimer_bases.lock       5261          0               0
    hrtimer_bases.lock       2626          0               0
    hrtimer_bases.lock       1953          0               0
    hrtimer_bases.lock       1382          0               0
      cpu_hotplug_lock       1350          0               0
    hrtimer_bases.lock       1273          0               0
    hrtimer_bases.lock       1269          0               0
    hrtimer_bases.lock       1198          0               0
   ...

New:
                  Name   acquired  contended   avg wait (ns)

         rcu_read_lock     251225          0               0
   tk_core.seq.seqc...      54074          0               0
          &xa->xa_lock      17470          0               0
        &ei->i_es_lock      17464          0               0
       &ei->i_raw_lock       9391          0               0
   &mapping->privat...       8734          0               0
       &ei->i_data_sem       8731          0               0
   &(ei->i_block_re...       8731          0               0
   &sb->s_type->i_l...       8731          0               0
   jiffies_seq.seqc...       6953          0               0
        &mm->mmap_lock       6889          0               0
             balancing       5768          0               0
    hrtimer_bases.lock       5261          0               0
   ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220521010811.932703-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:47:41 -03:00
Leo Yan 12aeaaba08 perf c2c: Update documentation for store metric 'N/A'
The 'N/A' metric is added for store operations, update documentation to
reflect changes in the report table.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adam Li <adamli@amperemail.onmicrosoft.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Alyssa Ross <hi@alyssa.is>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220518055729.1869566-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:36:47 -03:00
Leo Yan 550b4d6f9a perf c2c: Add dimensions for 'N/A' metrics of store operation
Since now we have the statistics 'st_na' for store operations, add
dimensions for the 'N/A' (no available memory level) metrics and the
associated percentage calculation for the single cache line view.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adam Li <adamli@amperemail.onmicrosoft.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Alyssa Ross <hi@alyssa.is>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220518055729.1869566-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:36:34 -03:00
Leo Yan 9845063710 perf mem: Add stats for store operation with no available memory level
Sometimes we don't know memory store operations happen on exactly which
memory (or cache) level, the memory level flag is set to PERF_MEM_LVL_NA
in this case; a practical example is Arm SPE AUX trace sets this flag
for all store operations due to absent info for cache level.

This patch is to add a new item "st_na" in structure c2c_stats to add
statistics for store operations with no available cache level.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adam Li <adamli@amperemail.onmicrosoft.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Alyssa Ross <hi@alyssa.is>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220518055729.1869566-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:36:12 -03:00
Ian Rogers 508c9fbce0 perf build: Error for BPF skeletons without LIBBPF
LIBBPF requires LIBELF so doing "make BUILD_BPF_SKEL=1 NO_LIBELF=1"
fails with compiler errors about missing declarations. Similar could
happen if libbpf feature detection fails.

Prefer to error when BUILD_BPF_SKEL is enabled but LIBBPF isn't.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220520211826.1828180-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:33:39 -03:00
Arnaldo Carvalho de Melo 0869331fba Merge remote-tracking branch 'torvalds/master' into perf/core
To get the rest of 5.18.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-23 09:32:49 -03:00
Chengdong Li 51d0bf99b8 perf session: Fix Intel LBR callstack entries and nr print message
When generating callstack information from branch_stack(Intel LBR), the
actual number of callstack entry should be bigger than the number of
branch_stack, for example:

	branch_stack records:
		B() -> C()
		A() -> B()
	converted callstack records should be:
		C()
		B()
		A()
though, the number of callstack equals
to the number of branch stack plus 1.

This patch fixes above issue in branch_stack__printf(). For example,

	# echo 'scale=2000; 4*a(1)' > cmd
	# perf record --call-graph lbr bc -l < cmd

Before applying this patch, `perf script -D` output:

	1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0
	... LBR call chain: nr:8
	.....  0: fffffffffffffe00
	.....  1: 000000000040a410
	.....  2: 000000000040573c
	.....  3: 0000000000408650
	.....  4: 00000000004022f2
	.....  5: 00000000004015f5
	.....  6: 00007f5ed6dcb553
	.....  7: 0000000000401698
	... FP chain: nr:2
	.....  0: fffffffffffffe00
	.....  1: 000000000040a6d8
	... branch callstack: nr:6    # which is not consistent with LBR records.
	.....  0: 000000000040a410
	.....  1: 0000000000408650    # ditto
	.....  2: 00000000004022f2
	.....  3: 00000000004015f5
	.....  4: 00007f5ed6dcb553
	.....  5: 0000000000401698
	 ... thread: bc:17990
	 ...... dso: /usr/bin/bc
	bc 17990 1220022.677386:     894172 cycles:
			  40a410 [unknown] (/usr/bin/bc)
			  40573c [unknown] (/usr/bin/bc)
			  408650 [unknown] (/usr/bin/bc)
			  4022f2 [unknown] (/usr/bin/bc)
			  4015f5 [unknown] (/usr/bin/bc)
		    7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so)
			  401698 [unknown] (/usr/bin/bc)

After applied:

	1220022677386876 0x2a40 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x4002): 17990/17990: 0x40a6d6 period: 894172 addr: 0
	... LBR call chain: nr:8
	.....  0: fffffffffffffe00
	.....  1: 000000000040a410
	.....  2: 000000000040573c
	.....  3: 0000000000408650
	.....  4: 00000000004022f2
	.....  5: 00000000004015f5
	.....  6: 00007f5ed6dcb553
	.....  7: 0000000000401698
	... FP chain: nr:2
	.....  0: fffffffffffffe00
	.....  1: 000000000040a6d8
	... branch callstack: nr:7
	.....  0: 000000000040a410
	.....  1: 000000000040573c
	.....  2: 0000000000408650
	.....  3: 00000000004022f2
	.....  4: 00000000004015f5
	.....  5: 00007f5ed6dcb553
	.....  6: 0000000000401698
	 ... thread: bc:17990
	 ...... dso: /usr/bin/bc
	bc 17990 1220022.677386:     894172 cycles:
			  40a410 [unknown] (/usr/bin/bc)
			  40573c [unknown] (/usr/bin/bc)
			  408650 [unknown] (/usr/bin/bc)
			  4022f2 [unknown] (/usr/bin/bc)
			  4015f5 [unknown] (/usr/bin/bc)
		    7f5ed6dcb553 __libc_start_main+0xf3 (/usr/lib64/libc-2.17.so)
			  401698 [unknown] (/usr/bin/bc)

Change from v1:
	- refined code style according to Jiri's review comments.

Signed-off-by: Chengdong Li <chengdongli@tencent.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: likexu@tencent.com
Link: https://lore.kernel.org/r/20220517015726.96131-1-chengdongli@tencent.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:56:24 -03:00
Athira Rajeev 8994e97be3 perf test bpf: Skip test if clang is not present
Perf BPF filter test fails in environment where "clang" is not
installed.

Test failure logs:

<<>>
 42: BPF filter                    :
 42.1: Basic BPF filtering         : Skip
 42.2: BPF pinning                 : FAILED!
 42.3: BPF prologue generation     : FAILED!
<<>>

Enabling verbose option provided debug logs which says clang/llvm needs
to be installed. Snippet of verbose logs:

<<>>
 42.2: BPF pinning                  :
 --- start ---
test child forked, pid 61423
ERROR:	unable to find clang.
Hint:	Try to install latest clang/llvm to support BPF.
        Check your $PATH

<<logs_here>>

Failed to compile test case: 'Basic BPF llvm compile'
Unable to get BPF object, fix kbuild first
test child finished with -1
 ---- end ----
BPF filter subtest 2: FAILED!
<<>>

Here subtests, "BPF pinning" and "BPF prologue generation" failed and
logs shows clang/llvm is needed. After installing clang, testcase
passes.

Reason on why subtest failure happens though logs has proper debug
information:

Main function __test__bpf calls test_llvm__fetch_bpf_obj by
passing 4th argument as true ( 4th arguments maps to parameter
"force" in test_llvm__fetch_bpf_obj ). But this will cause
test_llvm__fetch_bpf_obj to skip the check for clang/llvm.

Snippet of code part which checks for clang based on
parameter "force" in test_llvm__fetch_bpf_obj:

<<>>
if (!force && (!llvm_param.user_set_param &&
<<>>

Since force is set to "false", test won't get skipped and fails to
compile test case. The BPF code compilation needs clang, So pass the
fourth argument as "false" and also skip the test if reason for return
is "TEST_SKIP"

After the patch:

<<>>
 42: BPF filter                    :
 42.1: Basic BPF filtering         : Skip
 42.2: BPF pinning                 : Skip
 42.3: BPF prologue generation     : Skip
<<>>

Fixes: ba1fae431e ("perf test: Add 'perf test BPF'")
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lore.kernel.org/r/20220511115438.84032-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:54:21 -03:00
Athira Rajeev cfd7092c31 perf test session topology: Fix test to skip the test in guest environment
The session topology test fails in powerpc pSeries platform.

Test logs:

  <<>>
  Session topology : FAILED!
  <<>>

This testcases tests cpu topology by checking the core_id and socket_id
stored in perf_env from perf session. The data from perf session is
compared with the cpu topology information from
"/sys/devices/system/cpu/cpuX/topology" like core_id,
physical_package_id.

In case of virtual environment, detail like physical_package_id is
restricted to be exposed. Hence physical_package_id is set to -1. The
testcase fails on such platforms since socket_id can't be fetched from
topology info.

Skip the testcase in powerpc if physical_package_id returns -1.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>---
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220511114959.84002-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:52:33 -03:00
Thomas Richter f8ac1c4784 perf bench numa: Address compiler error on s390
The compilation on s390 results in this error:

  # make DEBUG=y bench/numa.o
  ...
  bench/numa.c: In function ‘__bench_numa’:
  bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated
              writing between 1 and 11 bytes into a region of size between
              10 and 20 [-Werror=format-truncation=]
  1749 |        snprintf(tname, sizeof(tname), "process%d:thread%d", p, t);
                                                               ^~
  ...
  bench/numa.c:1749:64: note: directive argument in the range
                 [-2147483647, 2147483646]
  ...
  #

The maximum length of the %d replacement is 11 characters because of the
negative sign.  Therefore extend the array by two more characters.

Output after:

  # make  DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o
  -rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o
  #

Fixes: 3aff8ba0a4 ("perf bench numa: Avoid possible truncation when using snprintf()")
Suggested-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:45:19 -03:00
Ian Rogers caaaa55477 perf test: Avoid shell test description infinite loop
for_each_shell_test() is already strict in expecting tests to be files
and executable. It is sometimes possible when it iterates over all files
that it finds one that is executable and lacks a newline character. When
this happens the loop never terminates as it doesn't check for EOF.

Add the EOF check to make this loop at least bounded by the file size.

If the description is returned as NULL then also skip the test.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220517204144.645913-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:45:19 -03:00
Kan Liang 01b28e4a58 perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
The X86 specific arch__intr_reg_mask() is to check whether the kernel
and hardware can collect XMM registers. But it doesn't work on some
hybrid platform.

Without the patch on ADL-N:

  $ perf record -I?
  available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
  R11 R12 R13 R14 R15

The config of the test event doesn't contain the PMU information. The
kernel may fail to initialize it on the correct hybrid PMU and return
the wrong non-supported information.

Add the PMU information into the config for the hybrid platform. The
same register set is supported among different hybrid PMUs. Checking
the first available one is good enough.

With the patch on ADL-N:

  $ perf record -I?
  available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
  R11 R12 R13 R14 R15 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 XMM9
  XMM10 XMM11 XMM12 XMM13 XMM14 XMM15

Fixes: 6466ec14aa ("perf regs x86: Add X86 specific arch__intr_reg_mask()")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518145125.1494156-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:45:19 -03:00
Athira Rajeev 451ed8058c perf test: Fix "all PMU test" to skip hv_24x7/hv_gpci tests on powerpc
"perf all PMU test" picks the input events from "perf list --raw-dump
pmu" list and runs "perf stat -e" for each of the event in the list. In
case of powerpc, the PowerVM environment supports events from hv_24x7
and hv_gpci PMU which is of example format like below:

- hv_24x7/CPM_ADJUNCT_INST,domain=?,core=?/
- hv_gpci/event,partition_id=?/

The value for "?" needs to be filled in depending on system and
respective event. CPM_ADJUNCT_INST needs have core value and domain
value. hv_gpci event needs partition_id.  Similarly, there are other
events for hv_24x7 and hv_gpci having "?" in event format. Hence skip
these events on powerpc platform since values like partition_id, domain
is specific to system and event.

Fixes: 3d5ac9effc ("perf test: Workload test of all PMUs")
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Link: https://lore.kernel.org/r/20220520101236.17249-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-21 14:45:06 -03:00
Kan Liang e0e14cdff3 perf parse-events: Move slots event for the hybrid platform too
The commit 94dbfd6781 ("perf parse-events: Architecture specific
leader override") introduced a feature to reorder the slots event to
fulfill the restriction of the perf metrics topdown group. But the
feature doesn't work on the hybrid machine.

  $ perf stat -e "{cpu_core/instructions/,cpu_core/slots/,cpu_core/topdown-retiring/}" -a sleep 1

   Performance counter stats for 'system wide':

       <not counted>      cpu_core/instructions/
       <not counted>      cpu_core/slots/
     <not supported>      cpu_core/topdown-retiring/

         1.002871801 seconds time elapsed

A hybrid platform has a different PMU name for the core PMUs, while
current perf hard code the PMU name "cpu".

Introduce a new function to check whether the system supports the perf
metrics feature. The result is cached for the future usage.

For X86, the core PMU name always has "cpu" prefix.

With the patch:

  $ perf stat -e "{cpu_core/instructions/,cpu_core/slots/,cpu_core/topdown-retiring/}" -a sleep 1

   Performance counter stats for 'system wide':

          76,337,010      cpu_core/slots/
          10,416,809      cpu_core/instructions/
          11,692,372      cpu_core/topdown-retiring/

         1.002805453 seconds time elapsed

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518143900.1493980-5-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 11:13:37 -03:00
Kan Liang e7d1374ed5 perf parse-events: Support different format of the topdown event name
The evsel->name may have a different format for a topdown event, a pure
topdown name (e.g., topdown-fe-bound), or a PMU name + a topdown name
(e.g., cpu/topdown-fe-bound/). The cpu/topdown-fe-bound/ kind format
isn't supported by the arch_evlist__leader(). This format is a very
common format for a hybrid platform, which requires specifying the PMU
name for each event.

Without the patch,

  $ perf stat -e '{instructions,slots,cpu/topdown-fe-bound/}' -a sleep 1

   Performance counter stats for 'system wide':

       <not counted>      instructions
       <not counted>      slots
     <not supported>      cpu/topdown-fe-bound/

         1.003482041 seconds time elapsed

  Some events weren't counted. Try disabling the NMI watchdog:
          echo 0 > /proc/sys/kernel/nmi_watchdog
          perf stat ...
          echo 1 > /proc/sys/kernel/nmi_watchdog
  The events in group usually have to be from the same PMU. Try reorganizing the group.

With the patch,

  $ perf stat -e '{instructions,slots,cpu/topdown-fe-bound/}' -a sleep 1

  Performance counter stats for 'system wide':

         157,383,996      slots
          25,011,711      instructions
          27,441,686      cpu/topdown-fe-bound/

         1.003530890 seconds time elapsed

Fixes: bc355822f0 ("perf parse-events: Move slots only with topdown")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518143900.1493980-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 11:12:57 -03:00
Kan Liang e8f4f794d7 perf stat: Always keep perf metrics topdown events in a group
If any member in a group has a different cpu mask than the other
members, the current perf stat disables group. when the perf metrics
topdown events are part of the group, the below <not supported> error
will be triggered.

  $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1
  WARNING: grouped events cpus do not match, disabling group:
    anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ }

   Performance counter stats for 'system wide':

         141,465,174      slots
     <not supported>      topdown-retiring
       1,605,330,334      uncore_imc_free_running_0/dclk/

The perf metrics topdown events must always be grouped with a slots
event as leader.

Factor out evsel__remove_from_group() to only remove the regular events
from the group.

Remove evsel__must_be_in_group(), since no one use it anymore.

With the patch, the topdown events aren't broken from the group for the
splitting.

  $ perf stat -e "{slots,topdown-retiring,uncore_imc_free_running_0/dclk/}" -a sleep 1
  WARNING: grouped events cpus do not match, disabling group:
    anon group { slots, topdown-retiring, uncore_imc_free_running_0/dclk/ }

   Performance counter stats for 'system wide':

         346,110,588      slots
         124,608,256      topdown-retiring
       1,606,869,976      uncore_imc_free_running_0/dclk/

         1.003877592 seconds time elapsed

Fixes: a9a1790247 ("perf stat: Ensure group is defined on top of the same cpu mask")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518143900.1493980-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 11:11:13 -03:00
Kan Liang 39d5f412da perf evsel: Fixes topdown events in a weak group for the hybrid platform
The patch ("perf evlist: Keep topdown counters in weak group") fixes the
perf metrics topdown event issue when the topdown events are in a weak
group on a non-hybrid platform. However, it doesn't work for the hybrid
platform.

  $./perf stat -e '{cpu_core/slots/,cpu_core/topdown-bad-spec/,
  cpu_core/topdown-be-bound/,cpu_core/topdown-fe-bound/,
  cpu_core/topdown-retiring/,cpu_core/branch-instructions/,
  cpu_core/branch-misses/,cpu_core/bus-cycles/,cpu_core/cache-misses/,
  cpu_core/cache-references/,cpu_core/cpu-cycles/,cpu_core/instructions/,
  cpu_core/mem-loads/,cpu_core/mem-stores/,cpu_core/ref-cycles/,
  cpu_core/cache-misses/,cpu_core/cache-references/}:W' -a sleep 1

  Performance counter stats for 'system wide':

       751,765,068      cpu_core/slots/                        (84.07%)
   <not supported>      cpu_core/topdown-bad-spec/
   <not supported>      cpu_core/topdown-be-bound/
   <not supported>      cpu_core/topdown-fe-bound/
   <not supported>      cpu_core/topdown-retiring/
        12,398,197      cpu_core/branch-instructions/          (84.07%)
         1,054,218      cpu_core/branch-misses/                (84.24%)
       539,764,637      cpu_core/bus-cycles/                   (84.64%)
            14,683      cpu_core/cache-misses/                 (84.87%)
         7,277,809      cpu_core/cache-references/             (77.30%)
       222,299,439      cpu_core/cpu-cycles/                   (77.28%)
        63,661,714      cpu_core/instructions/                 (84.85%)
                 0      cpu_core/mem-loads/                    (77.29%)
        12,271,725      cpu_core/mem-stores/                   (77.30%)
       542,241,102      cpu_core/ref-cycles/                   (84.85%)
             8,854      cpu_core/cache-misses/                 (76.71%)
         7,179,013      cpu_core/cache-references/             (76.31%)

         1.003245250 seconds time elapsed

A hybrid platform has a different PMU name for the core PMUs, while
the current perf hard code the PMU name "cpu".

The evsel->pmu_name can be used to replace the "cpu" to fix the issue.
For a hybrid platform, the pmu_name must be non-NULL. Because there are
at least two core PMUs. The PMU has to be specified.
For a non-hybrid platform, the pmu_name may be NULL. Because there is
only one core PMU, "cpu". For a NULL pmu_name, we can safely assume that
it is a "cpu" PMU.

In case other PMUs also define the "slots" event, checking the PMU type
as well.

With the patch,

  $ perf stat -e '{cpu_core/slots/,cpu_core/topdown-bad-spec/,
  cpu_core/topdown-be-bound/,cpu_core/topdown-fe-bound/,
  cpu_core/topdown-retiring/,cpu_core/branch-instructions/,
  cpu_core/branch-misses/,cpu_core/bus-cycles/,cpu_core/cache-misses/,
  cpu_core/cache-references/,cpu_core/cpu-cycles/,cpu_core/instructions/,
  cpu_core/mem-loads/,cpu_core/mem-stores/,cpu_core/ref-cycles/,
  cpu_core/cache-misses/,cpu_core/cache-references/}:W' -a sleep 1

  Performance counter stats for 'system wide':

     766,620,266   cpu_core/slots/                                        (84.06%)
      73,172,129   cpu_core/topdown-bad-spec/ #    9.5% bad speculation   (84.06%)
     193,443,341   cpu_core/topdown-be-bound/ #    25.0% backend bound    (84.06%)
     403,940,929   cpu_core/topdown-fe-bound/ #    52.3% frontend bound   (84.06%)
     102,070,237   cpu_core/topdown-retiring/ #    13.2% retiring         (84.06%)
      12,364,429   cpu_core/branch-instructions/                          (84.03%)
       1,080,124   cpu_core/branch-misses/                                (84.24%)
     564,120,383   cpu_core/bus-cycles/                                   (84.65%)
          36,979   cpu_core/cache-misses/                                 (84.86%)
       7,298,094   cpu_core/cache-references/                             (77.30%)
     227,174,372   cpu_core/cpu-cycles/                                   (77.31%)
      63,886,523   cpu_core/instructions/                                 (84.87%)
               0   cpu_core/mem-loads/                                    (77.31%)
      12,208,782   cpu_core/mem-stores/                                   (77.31%)
     566,409,738   cpu_core/ref-cycles/                                   (84.87%)
          23,118   cpu_core/cache-misses/                                 (76.71%)
       7,212,602   cpu_core/cache-references/                             (76.29%)

       1.003228667 seconds time elapsed

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220518143900.1493980-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 11:09:41 -03:00
Ian Rogers 92d579ea32 perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT events
Stat events can come from disk and so need a degree of validation. They
contain a CPU which needs looking up via CPU map to access a counter.

Add the CPU to index translation, alongside validity checking.

Discussion thread:

  https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/

Fixes: 7ac0089d13 ("perf evsel: Pass cpu not cpu map index to synthesize")
Reported-by: Michael Petlan <mpetlan@redhat.com>
Suggested-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220519032005.1273691-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 10:54:06 -03:00
Arnaldo Carvalho de Melo 0ae065a5d2 perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
Avi Kivity reported a problem where the __weak
btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being
used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in
turn called back to btf__load_from_kernel_by_id(), resulting in an
endless loop.

Fix this by adding a feature test to check if
btf__load_from_kernel_by_id() is available when building perf with
LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old
btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id()
since at that time it didn't exist at all.

Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC
where we don't have btf__load_from_kernel_by_id() and thus its feature
test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID:

  $ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output
  test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’:
  test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration]
      6 |         return btf__load_from_kernel_by_id(20151128, NULL);
        |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors
  $

  $ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id
  00000000005ba180 T btf__load_from_kernel_by_id
  $

  $ objdump --disassemble=btf__load_from_kernel_by_id -S /tmp/build/perf-urgent/perf

  /tmp/build/perf-urgent/perf:     file format elf64-x86-64
  <SNIP>
  00000000005ba180 <btf__load_from_kernel_by_id>:
  #include "record.h"
  #include "util/synthetic-events.h"

  #ifndef HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID
  struct btf *btf__load_from_kernel_by_id(__u32 id)
  {
    5ba180:	55                   	push   %rbp
    5ba181:	48 89 e5             	mov    %rsp,%rbp
    5ba184:	48 83 ec 10          	sub    $0x10,%rsp
    5ba188:	64 48 8b 04 25 28 00 	mov    %fs:0x28,%rax
    5ba18f:	00 00
    5ba191:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
    5ba195:	31 c0                	xor    %eax,%eax
         struct btf *btf;
  #pragma GCC diagnostic push
  #pragma GCC diagnostic ignored "-Wdeprecated-declarations"
         int err = btf__get_from_id(id, &btf);
    5ba197:	48 8d 75 f0          	lea    -0x10(%rbp),%rsi
    5ba19b:	e8 a0 57 e5 ff       	call   40f940 <btf__get_from_id@plt>
    5ba1a0:	89 c2                	mov    %eax,%edx
  #pragma GCC diagnostic pop

         return err ? ERR_PTR(err) : btf;
    5ba1a2:	48 98                	cltq
    5ba1a4:	85 d2                	test   %edx,%edx
    5ba1a6:	48 0f 44 45 f0       	cmove  -0x10(%rbp),%rax
  }
  <SNIP>

Fixes: 218e7b775d ("perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions")
Reported-by: Avi Kivity <avi@scylladb.com>
Link: https://lore.kernel.org/linux-perf-users/f0add43b-3de5-20c5-22c4-70aff4af959f@scylladb.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/linux-perf-users/YobjjFOblY4Xvwo7@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20 09:45:41 -03:00
Ian Rogers 6a973e2919 perf test: Add basic stat and topdown group test
Add a basic stat test.

Add two tests of grouping behavior for topdown events. Topdown events
are special as they must be grouped with the slots event first.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220517052724.283874-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 12:02:30 -03:00
Ian Rogers d98079c05b perf evlist: Keep topdown counters in weak group
On Intel Icelake, topdown events must always be grouped with a slots
event as leader. When a metric is parsed a weak group is formed and
retried if perf_event_open fails. The retried events aren't grouped
breaking the slots leader requirement. This change modifies the weak
group "reset" behavior so that topdown events aren't broken from the
group for the retry.

  $ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1

   Performance counter stats for 'system wide':

    47,867,188,483      slots                                                         (92.27%)
   <not supported>      topdown-bad-spec
   <not supported>      topdown-be-bound
   <not supported>      topdown-fe-bound
   <not supported>      topdown-retiring
     2,173,346,937      branch-instructions                                           (92.27%)
        10,540,253      branch-misses             #    0.48% of all branches          (92.29%)
        96,291,140      bus-cycles                                                    (92.29%)
         6,214,202      cache-misses              #   20.120 % of all cache refs      (92.29%)
        30,886,082      cache-references                                              (76.91%)
    11,773,726,641      cpu-cycles                                                    (84.62%)
    11,807,585,307      instructions              #    1.00  insn per cycle           (92.31%)
                 0      mem-loads                                                     (92.32%)
     2,212,928,573      mem-stores                                                    (84.69%)
    10,024,403,118      ref-cycles                                                    (92.35%)
        16,232,978      baclears.any                                                  (92.35%)
        23,832,633      ARITH.DIVIDER_ACTIVE                                          (84.59%)

       0.981070734 seconds time elapsed

After:

  $ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1

   Performance counter stats for 'system wide':

       31040189283      slots                                                         (92.27%)
        8997514811      topdown-bad-spec          #     28.2% bad speculation         (92.27%)
       10997536028      topdown-be-bound          #     34.5% backend bound           (92.27%)
        4778060526      topdown-fe-bound          #     15.0% frontend bound          (92.27%)
        7086628768      topdown-retiring          #     22.2% retiring                (92.27%)
        1417611942      branch-instructions                                           (92.26%)
           5285529      branch-misses             #    0.37% of all branches          (92.28%)
          62922469      bus-cycles                                                    (92.29%)
           1440708      cache-misses              #    8.292 % of all cache refs      (92.30%)
          17374098      cache-references                                              (76.94%)
        8040889520      cpu-cycles                                                    (84.63%)
        7709992319      instructions              #    0.96  insn per cycle           (92.32%)
                 0      mem-loads                                                     (92.32%)
        1515669558      mem-stores                                                    (84.68%)
        6542411177      ref-cycles                                                    (92.35%)
           4154149      baclears.any                                                  (92.35%)
          20556152      ARITH.DIVIDER_ACTIVE                                          (84.59%)

       1.010799593 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220517052724.283874-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 12:01:18 -03:00
Adrian Hunter 75659c6fb5 perf scripts python: intel-pt-events.py: Print ptwrite value as a string if it is ASCII
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.

To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.

Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:

 $ echo -n "Hello" | od -t x8
 0000000 0000006f6c6c6548
 0000005
 $ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.016 MB perf.data ]
 $ perf script --itrace=ew intel-pt-events.py
 Intel PT Branch Trace, Power Events, Event Trace and PTWRITE
      Switch In   38524/38524 [001]     24166.044995916     0/0
           eg_ptw 38524/38524 [001]     24166.045380004   ptwrite  jmp                   IP: 0 payload: 0x6f6c6c6548 Hello     56532c7ce196 perf_emulate_ptwrite+0x16 (/home/ahunter/git/work/eg_ptw)
 End

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 11:56:15 -03:00
Adrian Hunter a5014310f7 perf script: Print Intel ptwrite value as a string if it is ASCII
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.

To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.

Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:

 $ echo -n "Hello" | od -t x8
 0000000 0000006f6c6c6548
 0000005
 $ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.016 MB perf.data ]
 $ perf script --itrace=ew
           eg_ptw 35563 [005] 18256.087338:     ptwrite:  IP: 0 payload: 0x6f6c6c6548 Hello      55e764db5196 perf_emulate_ptwrite+0x16 (/home/user/eg_ptw)
 $

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 11:56:02 -03:00
Adrian Hunter d7015e50a9 perf intel-pt: Add support for emulated ptwrite
ptwrite is an Intel x86 instruction that writes arbitrary values into an
Intel PT trace. It is not supported on all hardware, so provide an
alternative that makes use of TNT packets to convey the payload data.
TNT packets encode Taken/Not-taken conditional branch information, so
taking branches based on the payload value will encode the value into
the TNT packet. Refer to the changes to the documentation file
perf-intel-pt.txt in this patch for an example.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-17 11:55:49 -03:00
Arnaldo Carvalho de Melo df36d2572e perf bench breakpoint: Fix build on 32-bit arches
Cast pointers to unsigned long instead of to uint64_t to avoid this
problem on 32-bit arches:

  31     6.89 debian:experimental-x-mips    : FAIL gcc version 11.2.0 (Debian 11.2.0-18)
    bench/breakpoint.c: In function 'breakpoint_setup':
    bench/breakpoint.c:56:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
       56 |         attr.bp_addr = (uint64_t)addr;
          |                        ^
    cc1: all warnings being treated as errors
    make[3]: *** [/git/perf-5.18.0-rc7/tools/build/Makefile.build:139: bench] Error 2

Fixes: 68a6772f11 ("perf bench: Add breakpoint benchmarks")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/YoLq1nHx1doi+VWl@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-16 21:23:09 -03:00
Arnaldo Carvalho de Melo c5468a28ef Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes from perf/urgent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-16 10:14:59 -03:00
Adrian Hunter 843e5ba75e perf tools: Remove unused machines__find_host()
machines__find_host() does not exist. Remove declaration.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220513084459.6581-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-13 11:11:27 -03:00
Dmitry Vyukov 68a6772f11 perf bench: Add breakpoint benchmarks
Add 2 benchmarks:

1. Performance of thread creation/exiting in presence of breakpoints.
2. Performance of breakpoint modification in presence of threads.

The benchmarks capture use cases that we are interested in:
using inheritable breakpoints in large highly-threaded applications.

The benchmarks show significant slowdown imposed by breakpoints
(even when they don't fire).

Testing on Intel 8173M with 112 HW threads show:

  perf bench --repeat=56 breakpoint thread --breakpoints=0 --parallelism=56 --threads=20
        78.675000 usecs/op
  perf bench --repeat=56 breakpoint thread --breakpoints=4 --parallelism=56 --threads=20
     12967.135714 usecs/op

That's 165x slowdown due to presence of the breakpoints.

  perf bench --repeat=20000 breakpoint enable --passive=0 --active=0
         1.433250 usecs/op
  perf bench --repeat=20000 breakpoint enable --passive=224 --active=0
       585.318400 usecs/op
  perf bench --repeat=20000 breakpoint enable --passive=0 --active=111
       635.953000 usecs/op

That's 408x and 444x slowdown due to presence of threads.

Profiles show some overhead in toggle_bp_slot,
but also very high contention:

    90.83%  breakpoint-thre  [kernel.kallsyms]  [k] osq_lock
     4.69%  breakpoint-thre  [kernel.kallsyms]  [k] mutex_spin_on_owner
     2.06%  breakpoint-thre  [kernel.kallsyms]  [k] __reserve_bp_slot
     2.04%  breakpoint-thre  [kernel.kallsyms]  [k] toggle_bp_slot

    79.01%  breakpoint-enab  [kernel.kallsyms]  [k] smp_call_function_single
     9.94%  breakpoint-enab  [kernel.kallsyms]  [k] llist_add_batch
     5.70%  breakpoint-enab  [kernel.kallsyms]  [k] _raw_spin_lock_irq
     1.84%  breakpoint-enab  [kernel.kallsyms]  [k] event_function_call
     1.12%  breakpoint-enab  [kernel.kallsyms]  [k] send_call_function_single_ipi
     0.37%  breakpoint-enab  [kernel.kallsyms]  [k] generic_exec_single
     0.24%  breakpoint-enab  [kernel.kallsyms]  [k] __perf_event_disable
     0.20%  breakpoint-enab  [kernel.kallsyms]  [k] _perf_event_enable
     0.18%  breakpoint-enab  [kernel.kallsyms]  [k] toggle_bp_slot

Committer notes:

Fixup struct init for older compilers:

   3    32.90 alpine:3.5                    : FAIL clang version 3.8.1 (tags/RELEASE_381/final)
    bench/breakpoint.c:49:34: error: missing field 'size' initializer [-Werror,-Wmissing-field-initializers]
            struct perf_event_attr attr = {0};
                                            ^
    1 error generated.
   7    37.31 alpine:3.9                    : FAIL gcc version 8.3.0 (Alpine 8.3.0)
    bench/breakpoint.c:49:34: error: missing field 'size' initializer [-Werror,-Wmissing-field-initializers]
            struct perf_event_attr attr = {0};
                                            ^
    1 error generated.

Signed-off-by: Dmitriy Vyukov <dvyukov@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220505155745.1690906-1-dvyukov@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-13 11:00:38 -03:00
Namhyung Kim cad10ce366 perf annotate: Add --percent-limit option
Like in 'perf report' and 'perf top', Add this option to limit the
number of functions it displays based on the overhead value in percent.

This affects only stdio and stdio2 output modes.  Without this, it
shows very long disassembly lines for every function in the data
file.  If users don't want this behavior, they can set a value in
percent to suppress that.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220502232015.697243-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:37:55 -03:00
Adrian Hunter 7df319e5b3 perf auxtrace: Record whether an auxtrace mmap is needed
Add a flag needs_auxtrace_mmap to record whether an auxtrace mmap is
needed, in preparation for correctly determining whether or not an
auxtrace mmap is needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:27:19 -03:00
Adrian Hunter 8f111be643 libperf evlist: Add evsel as a parameter to ->idx()
Add evsel as a parameter to ->idx() in preparation for correctly
determining whether an auxtrace mmap is needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:26:48 -03:00
Adrian Hunter 6a7b8a5a30 libperf evlist: Remove ->idx() per_cpu parameter
Remove ->idx() per_cpu parameter because it isn't needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:25:57 -03:00
Adrian Hunter d205a3a665 perf auxtrace: Do not mix up mmap idx
The idx is with respect to evlist not evsel. That hasn't mattered because
they are the same at present. Prepare for that not being the case, which it
won't be when sideband tracking events are allowed on all CPUs even when
auxtrace is limited to selected CPUs.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:24:32 -03:00
Adrian Hunter 024b3b42ad perf auxtrace: Move evlist__enable_event_idx() to auxtrace.c
evlist__enable_event_idx() is used only by auxtrace. Move it to auxtrace.c
in preparation for making it even more auxtrace specific.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:23:34 -03:00
Adrian Hunter a40bb7518e perf evlist: Use libperf functions in evlist__enable_event_idx()
evlist__enable_event_idx() is used only for auxtrace events which are never
system_wide. Simplify by using libperf enable event functions.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220506122601.367589-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-10 14:22:47 -03:00
Ian Rogers 8586d2744f perf metrics: Don't add all tool events for sharing
Tool events are added to the set of events for parsing so that having a
tool event in a metric doesn't inhibit event sharing of events between
metrics.

All tool events were added but this meant unused tool events would be
counted. Reduce this set of tool events to just those present in the
overall metric list.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:18:48 -03:00
Ian Rogers 9aa09230f0 perf metrics: Support all tool events
Previously duration_time was hard coded, which was ok until commit
b03b89b350 ("perf stat: Add user_time and system_time events")
added additional tool events. Do for all tool events what was previously
done just for duration_time.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:16:49 -03:00
Ian Rogers 79932d161f perf evsel: Add tool event helpers
Convert to and from a string. Fix evsel__tool_name() as array is
off-by-1.  Support more than just duration_time as a metric-id.

Fixes: 75eafc970b ("perf list: Print all available tool events")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:13:00 -03:00
Ian Rogers 545a96c90f perf evsel: Constify a few arrays
Remove public definition of evsel__tool_names(). Not used outside
util/evsel.c.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:12:14 -03:00
Ian Rogers 17b3867d97 Revert "perf stat: Support metrics with hybrid events"
This reverts commit 60344f1a9a.

Hybrid metrics place a PMU at the end of the parse string. This is also
where tool events are placed. The behavior of the parse string isn't
clear and so revert the change for now.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220507053410.3798748-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-09 10:09:44 -03:00
Jeremy Linton 45fa7c3869 perf tests: Fix coresight `perf test` failure.
Currently the `perf test` always fails the coresight test like:

  89: Check Arm CoreSight trace data recording and synthesized samples: FAILED!

That is because the test_arm_coresight.sh is attempting to SIGINT the
parent but is using $$ rather than $PPID and it sigint's itself when
run under the perf test framework.

Since this is done in a trap clause it ends up returning a non zero
return.

Since $PPID is a bash ism and not all distros are linking /bin/sh to
bash, the alternative parent pid lookups are uglier than just dropping
the kill, and its not strictly needed, lets pick the simple solution and
drop the sigint.

Fixes: 133fe2e617 ("perf tests: Improve temp file cleanup in test_arm_coresight.sh")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Jeremy Linton <jeremy.linton@arm.com>
Link: https://lore.kernel.org/r/20220428151947.290146-1-jeremy.linton@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-08 21:48:49 -03:00
Ian Rogers 183d4f2d23 perf bench: Fix two numa NDEBUG warnings
BUG_ON is a no-op if NDEBUG is defined, otherwise it is an assert.
Compiling with NDEBUG yields:

  bench/numa.c: In function ‘bind_to_cpu’:
  bench/numa.c:314:1: error: control reaches end of non-void function [-Werror=return-type]
    314 | }
        | ^
  bench/numa.c: In function ‘bind_to_node’:
  bench/numa.c:367:1: error: control reaches end of non-void function [-Werror=return-type]
    367 | }
        | ^

Add return statements to cover this case.

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428202912.1056444-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-08 21:48:49 -03:00
Ian Rogers 280c36d26e perf test: Add skip to --per-thread test
As reported in:
https://lore.kernel.org/linux-perf-users/20220428122821.3652015-1-tmricht@linux.ibm.com/
the 'instructions:u' event may not be supported. Add a skip using 'perf
record'.

Switch some code away from pipe to make the failures clearer.

Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220505182505.3313191-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-06 15:15:05 -03:00
Ian Rogers 0255571a16 perf cpumap: Switch to using perf_cpu_map API
Switch some raw accesses to the cpu map to using the library API. This
can help with reference count checking. Some BPF cases switch from index
to CPU for consistency, this shouldn't matter as the CPU map is full.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220503041757.2365696-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-05 14:07:27 -03:00
Ian Rogers 1ce7fc6fd4 perf vendor events intel: Update CLX events to v1.15
Events are generated for CascadeLake Server v1.15 with events from:

  https://download.01.org/perfmon/CLX/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change updates descriptions, adds INST_DECODED.DECODERS and
corrects a counter mask in UOPS_RETIRED.TOTAL_CYCLES.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-04 12:52:04 -03:00
Zhengjun Xing 4e411ee400 perf vendor events intel: Add uncore event list for Sapphirerapids
Add JSON uncore events for Sapphirerapids to perf.

Based on JSON list v1.01:

https://download.01.org/perfmon/SPR/

Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220425132211.801228-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-03 11:52:00 -03:00
Zhengjun Xing 9061dffd5e perf vendor events intel: Update core event list for Sapphirerapids
Update JSON core events for Sapphirerapids to perf.

Based on JSON list v1.01:

https://download.01.org/perfmon/SPR/

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220425132211.801228-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-03 11:52:00 -03:00
James Clark 630af16eee perf tools: Use Python devtools for version autodetection rather than runtime
This fixes the issue where the build will fail if only the Python2
runtime is installed but the Python3 devtools are installed. Currently
the workaround is 'make PYTHON=python3'.

Fix it by autodetecting Python based on whether python[x]-config exists
rather than just python[x] because both are needed for the build. Then
-config is stripped to find the Python runtime.

Testing
=======

 * Auto detect links with Python3 when the v3 devtools are installed
   and only Python 2 runtime is installed
 * Auto detect links with Python2 when both devtools are installed
 * Sensible warning is printed if no Python devtools are installed
 * 'make PYTHON=x' still automatically sets PYTHON_CONFIG=x-config
 * 'make PYTHON=x' fails if x-config doesn't exist
 * 'make PYTHON=python3' overrides Python2 devtools
 * 'make PYTHON=python2' overrides Python3 devtools
 * 'make PYTHON_CONFIG=x-config' works
 * 'make PYTHON=x PYTHON_CONFIG=x' works
 * 'make PYTHON=missing' reports an error
 * 'make PYTHON_CONFIG=missing' reports an error

Fixes: 79373082fa ("perf python: Autodetect python3 binary")
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220309194313.3350126-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-03 11:23:44 -03:00
Ian Rogers 570c44a01b perf stat: Avoid printing cpus with no counters
perf_evlist's user_requested_cpus can contain CPUs not present in any
evsel's cpus, for example uncore counters. Avoid printing the prefix and
trailing \n until the first valid counter is encountered.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20220503041757.2365696-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-03 11:19:17 -03:00
Yang Jihong 4d27cf1d9d perf tools: Add missing headers needed by util/data.h
'struct perf_data' in util/data.h uses the "u64" data type, which is
defined in "linux/types.h".

If we only include util/data.h, the following compilation error occurs:

  util/data.h:38:3: error: unknown type name ‘u64’
     u64    version;
     ^~~

Solution: include "linux/types.h." to add the needed type definitions.

Fixes: 258031c017 ("perf header: Add DIR_FORMAT feature to describe directory data")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220429090539.212448-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-30 12:30:16 -03:00
Arnaldo Carvalho de Melo 3297e5547b Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes from perf/urgent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-30 12:23:24 -03:00
Namhyung Kim a5d20d42a2 perf symbol: Remove arch__symbols__fixup_end()
Now the generic code can handle kallsyms fixup properly so no need to
keep the arch-functions anymore.

Fixes: 3cf6a32f3f ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:51:40 -03:00
Namhyung Kim 8799ebce84 perf symbol: Update symbols__fixup_end()
Now arch-specific functions all do the same thing.  When it fixes the
symbol address it needs to check the boundary between the kernel image
and modules.  For the last symbol in the previous region, it cannot
know the exact size as it's discarded already.  Thus it just uses a
small page size (4096) and rounds it up like the last symbol.

Fixes: 3cf6a32f3f ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:51:33 -03:00
Namhyung Kim 838425f2de perf symbol: Pass is_kallsyms to symbols__fixup_end()
The symbol fixup is necessary for symbols in kallsyms since they don't
have size info.  So we use the next symbol's address to calculate the
size.  Now it's also used for user binaries because sometimes they miss
size for hand-written asm functions.

There's a arch-specific function to handle kallsyms differently but
currently it cannot distinguish kallsyms from others.  Pass this
information explicitly to handle it properly.  Note that those arch
functions will be moved to the generic function so I didn't added it to
the arch-functions.

Fixes: 3cf6a32f3f ("perf symbols: Fix symbol size calculation condition")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220416004048.1514900-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:51:20 -03:00
Timothy Hayes 3b9a8c8b9a perf test: Add perf_event_attr test for Arm SPE
Adds a perf_event_attr test for Arm SPE in which the presence of
physical addresses are checked when SPE unit is run with pa_enable=1.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220421165205.117662-4-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:40:49 -03:00
Timothy Hayes 7599b70a3c perf arm-spe: Fix SPE events with phys addresses
This patch corrects a bug whereby SPE collection is invoked with
pa_enable=1 but synthesized events fail to show physical addresses.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220421165205.117662-3-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:39:28 -03:00
Timothy Hayes 4e13f6706d perf arm-spe: Fix addresses of synthesized SPE events
This patch corrects a bug whereby synthesized events from SPE
samples are missing virtual addresses.

Fixes: 54f7815efe ("perf arm-spe: Fill address info for samples")
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Song Liu <songliubraving@fb.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220421165205.117662-2-timothy.hayes@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:39:14 -03:00
Ian Rogers 36c84190dc perf vendor events intel: Update WSM-EX events to v3
Events are generated for Westmere EX v3 with events from:

  https://download.01.org/perfmon/WSM-EX/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change updates descriptions.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:30:36 -03:00
Ian Rogers a0cb448978 perf vendor events intel: Update WSM-EP-SP events to v3
Events are generated for Westmere EP-SP v3 with events from:

  https://download.01.org/perfmon/WSM-EP-SP/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change updates descriptions.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:30:16 -03:00
Ian Rogers e14fd2ee6d perf vendor events intel: Update SKX events to v1.27
Events are generated for Skylake Server v1.27 with
events from:

  https://download.01.org/perfmon/SKX/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change updates descriptions, adds INST_DECODED.DECODERS and
corrects a counter mask in UOPS_RETIRED.TOTAL_CYCLES.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:29:56 -03:00
Ian Rogers 02c758d2aa perf vendor events intel: Update SKL events to v53
Events are generated for Skylake v53 with
events from:

  https://download.01.org/perfmon/SKL/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change updates descriptions, adds INST_DECODED.DECODERS and
corrects a counter mask in UOPS_RETIRED.TOTAL_CYCLES.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:29:38 -03:00
Ian Rogers 8ce185d496 perf vendor events intel: Update IVT events to v21
Events are generated for Ivytown v21 with events from:

  https://download.01.org/perfmon/IVT/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change fixes a spelling mistake in a description.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:29:12 -03:00
Ian Rogers a5043ed963 perf vendor events intel: Update ICL events to v1.13
Events are generated for Icelake v1.13 with events from:

  https://download.01.org/perfmon/ICL/

Using the scripts at:

  https://github.com/intel/event-converter-for-linux-perf/

This change updates descriptions and adds INST_DECODED.DECODERS.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220428075730.797727-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:28:40 -03:00
Thomas Richter 44900ce975 perf test: Fix test case 81 ("perf record tests") on s390x
perf test -F 81 ("perf record tests") -v fails on s390x on the
linux-next branch.

The test case is x86 specific can not be executed on s390x.  The test
case depends on x86 register names such as:

  ... | egrep -q 'available registers: AX BX CX DX ....'

Skip this test case on s390x.

Output before:

  # perf test -F 81
  81: perf record tests                       : FAILED!
  #

Output after:

  # perf test -F 81
  81: perf record tests                       : Skip
  #

Fixes: 24f378e660 ("perf test: Add basic perf record tests")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220428122821.3652015-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28 10:24:24 -03:00
Adrian Hunter de8fd13843 perf intel-pt: Fix timeless decoding with perf.data directory
Intel PT does not capture data in separate directories, so do not
use separate directory processing because it doesn't work for
timeless decoding. It also looks like it doesn't support one_mmap
handling.

Example:

  Before:

    # perf record --kcore -a -e intel_pt/tsc=0/k sleep 0.1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 1.799 MB perf.data ]
    # perf script --itrace=bep | head
    #

  After:

    # perf script --itrace=bep | head
    perf 21073 [000]              psb:  psb offs: 0                       ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms])
    perf 21073 [000]              cbr:  cbr: 45 freq: 4505 MHz (161%)     ffffffffaa68faf4 native_write_msr+0x4 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:                 0 [unknown] ([unknown]) => ffffffffaa68faf6 native_write_msr+0x6 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa68faf8 native_write_msr+0x8 ([kernel.kallsyms]) => ffffffffaa61aab0 pt_config_start+0x60 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa61aabd pt_config_start+0x6d ([kernel.kallsyms]) => ffffffffaa61b8ad pt_event_start+0x27d ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa61b8bb pt_event_start+0x28b ([kernel.kallsyms]) => ffffffffaa61ba60 pt_event_add+0x40 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa61ba76 pt_event_add+0x56 ([kernel.kallsyms]) => ffffffffaa880e86 event_sched_in+0xc6 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa880e9b event_sched_in+0xdb ([kernel.kallsyms]) => ffffffffaa880ea5 event_sched_in+0xe5 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa880eba event_sched_in+0xfa ([kernel.kallsyms]) => ffffffffaa880f96 event_sched_in+0x1d6 ([kernel.kallsyms])
    perf 21073 [000]          1       branches:k:  ffffffffaa880fc8 event_sched_in+0x208 ([kernel.kallsyms]) => ffffffffaa880ec0 event_sched_in+0x100 ([kernel.kallsyms])

Fixes: bb6be405c4 ("perf session: Load data directory files for analysis")
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220428093109.274641-1-adrian.hunter@intel.com
Cc: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linux-kernel@vger.kernel.org
2022-04-28 10:20:52 -03:00
Adrian Hunter 52cc784244 perf tools: Delete perf-with-kcore.sh script
It has been obsolete since the introduction of the 'perf record --kcore'
option.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20220427141946.269523-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-27 20:11:26 -03:00
Adrian Hunter 9e5e641045 perf intel-pt: Add link to the perf wiki's Intel PT page
Add an EXAMPLE section and link to the perf wiki's Intel PT page.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20220426133213.248475-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-26 14:32:29 -03:00
Arnaldo Carvalho de Melo e0c1b8f9eb Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes, such as the llvm one for ubuntu:22.04.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-24 07:50:49 -03:00
Zhengjun Xing d7e3c39708 perf stat: Support hybrid --topdown option
Since for cpu_core or cpu_atom, they have different topdown events
groups.

For cpu_core, --topdown equals to:

"{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,
  cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,
  cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,
  cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"

For cpu_atom, --topdown equals to:

"{cpu_atom/topdown-retiring/,cpu_atom/topdown-bad-spec/,
 cpu_atom/topdown-fe-bound/,cpu_atom/topdown-be-bound/}"

To simplify the implementation, on hybrid, --topdown is used
together with --cputype. If without --cputype, it uses cpu_core
topdown events by default.

  # ./perf stat --topdown -a  sleep 1
  WARNING: default to use cpu_core topdown events

   Performance counter stats for 'system wide':

              retiring      bad speculation       frontend bound        backend bound     heavy operations     light operations    branch mispredict       machine clears        fetch latency      fetch bandwidth         memory bound           Core bound
                  4.1%                 0.0%                 5.1%                90.8%                 2.3%                 1.8%                 0.0%                 0.0%                 4.2%                 0.9%                 9.9%                81.0%

         1.002624229 seconds time elapsed

  # ./perf stat --topdown -a --cputype atom  sleep 1

   Performance counter stats for 'system wide':

              retiring      bad speculation       frontend bound        backend bound
                 13.5%                 0.1%                31.2%                55.2%

         1.002366987 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-3-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-24 07:50:18 -03:00
Thomas Richter 5bb017d4b9 perf test: Fix error message for test case 71 on s390, where it is not supported
Test case 71 'Convert perf time to TSC' is not supported on s390.

Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.

The root cause is function evlist__open() called from
test__perf_time_to_tsc().  evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.

The PMU driver returns -ENOENT in this case. This error is leads to the
failure.

Fix this by returning TEST_SKIP on -ENOENT.

Output before:
 71: Convert perf time to TSC:
 71.1: TSC support:             Skip (This architecture does not support)
 71.2: Perf time to TSC:        FAILED!

Output after:
 71: Convert perf time to TSC:
 71.1: TSC support:             Skip (This architecture does not support)
 71.2: Perf time to TSC:        Skip (perf_read_tsc_conversion is not supported)

This also happens on an x86_64 virtual machine:
   # uname -m
   x86_64
   $ ./perf test -F 71
    71: Convert perf time to TSC  :
    71.1: TSC support             : Ok
    71.2: Perf time to TSC        : FAILED!
   $

Committer testing:

Continues to work on x86_64:

  $ perf test 71
   71: Convert perf time to TSC    :
   71.1: TSC support               : Ok
   71.2: Perf time to TSC          : Ok
  $

Fixes: 290fa68bdc ("perf test tsc: Fix error message when not supported")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: chengdongli@tencent.com
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220420062921.1211825-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 18:39:34 -03:00
Leo Yan ccb17caecf perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
Since commit bb30acae4c ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.

The commit ffab487052 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue.  It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.

On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data.  This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.

Fixes: bb30acae4c ("perf report: Bail out --mem-mode if mem info is not available")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/20220414123201.842754-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 18:39:34 -03:00
Leo Yan c6d8df0106 perf script: Always allow field 'data_src' for auxtrace
If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:

  # perf script -F,+data_src
  Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.

This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.

Fixes: e55ed3423c ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220417114837.839896-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 18:39:34 -03:00
Guilherme Amadio d22588d73b perf clang: Fix header include for LLVM >= 14
The header TargetRegistry.h has moved in LLVM/clang 14.

Committer notes:

The problem as noticed when building in ubuntu:22.04:

    90    98.61 ubuntu:22.04                  : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
      util/c++/clang.cpp:23:10: fatal error: llvm/Support/TargetRegistry.h: No such file or directory
         23 | #include "llvm/Support/TargetRegistry.h"
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.

Fixed after applying this patch.

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Guilherme Amadio <amadio@gentoo.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://twitter.com/GuilhermeAmadio/status/1514970524232921088
Link: http://lore.kernel.org/lkml/Ylp0M/VYgHOxtcnF@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 18:39:34 -03:00
Zhengjun Xing 2c8e64514a perf stat: Merge event counts from all hybrid PMUs
For hybrid events, by default stat aggregates and reports the event counts
per pmu.

  # ./perf stat -e cycles -a  sleep 1

   Performance counter stats for 'system wide':

      14,066,877,268      cpu_core/cycles/
       6,814,443,147      cpu_atom/cycles/

         1.002760625 seconds time elapsed

Sometimes, it's also useful to aggregate event counts from all PMUs.
Create a new option '--hybrid-merge' to enable that behavior and report
the counts without PMUs.

  # ./perf stat -e cycles -a --hybrid-merge  sleep 1

   Performance counter stats for 'system wide':

      20,732,982,512      cycles

         1.002776793 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 14:23:35 -03:00
Zhengjun Xing 60344f1a9a perf stat: Support metrics with hybrid events
One metric such as 'Kernel_Utilization' may be from different PMUs and
consists of different events.

For core,
Kernel_Utilization = cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread

For atom,
Kernel_Utilization = cpu_clk_unhalted.core:k / cpu_clk_unhalted.core

The metric group string for core is:
'{cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.thread_p/metric-id=cpu_clk_unhalted.thread_p:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W#cpu_core'

The metric group string for atom is:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W#cpu_atom'

That means the group "{cpu_clk_unhalted.thread:k,cpu_clk_unhalted.thread}:W"
is from cpu_core PMU and the group "{cpu_clk_unhalted.core:k,cpu_clk_unhalted.core}"
is from cpu_atom PMU. And then next, check if the events in the group are
valid on that PMU. If one event is not valid on that PMU, the associated
group would be removed internally.

In this example, cpu_clk_unhalted.thread is valid on cpu_core and
cpu_clk_unhalted.core is valid on cpu_atom. So the checks for these two
groups are passed.

Before:

  # ./perf stat -M Kernel_Utilization -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
  anon group { CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD, CPU_CLK_UNHALTED.THREAD }

 Performance counter stats for 'system wide':

        17,639,501      cpu_atom/CPU_CLK_UNHALTED.CORE/ #     1.00 Kernel_Utilization
        17,578,757      cpu_atom/CPU_CLK_UNHALTED.CORE:k/
     1,005,350,226 ns   duration_time
        43,012,352      cpu_core/CPU_CLK_UNHALTED.THREAD_P:k/ #     0.99 Kernel_Utilization
        17,608,010      cpu_atom/CPU_CLK_UNHALTED.THREAD_P:k/
        43,608,755      cpu_core/CPU_CLK_UNHALTED.THREAD/
        17,630,838      cpu_atom/CPU_CLK_UNHALTED.THREAD/
     1,005,350,226 ns   duration_time

       1.005350226 seconds time elapsed

After:

  # ./perf stat -M Kernel_Utilization -a sleep 1

 Performance counter stats for 'system wide':

        17,981,895      CPU_CLK_UNHALTED.CORE [cpu_atom] #     1.00 Kernel_Utilization
        17,925,405      CPU_CLK_UNHALTED.CORE:k [cpu_atom]
     1,004,811,366 ns   duration_time
        41,246,425      CPU_CLK_UNHALTED.THREAD_P:k [cpu_core] #     0.99 Kernel_Utilization
        41,819,129      CPU_CLK_UNHALTED.THREAD [cpu_core]
     1,004,811,366 ns   duration_time

       1.004811366 seconds time elapsed

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 14:23:17 -03:00
Zhengjun Xing 17408e5904 perf vendor events intel: Add metrics for Alderlake
Add JSON metrics for Alderlake to perf.

It included both P-core and E-core metrics.

P-core metrics based on TMA 4.3-full (TMA_Metrics-full.csv)
E-core metrics based on E-core TMA 2.0 (E-core_TMA_Metrics.xlsx)

They are all downloaded from:
  https://download.01.org/perfmon/

Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220422065336.767582-1-zhengjun.xing@linux.intel.com
Cc: irogers@google.com
Cc: peterz@infradead.org
Cc: adrian.hunter@intel.com
Cc: alexander.shishkin@intel.com
Cc: acme@kernel.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Cc: mingo@redhat.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
2022-04-22 14:22:24 -03:00
Jiri Olsa 3a7ab60597 perf tools: Move libbpf init in libbpf_init function
Move the libbpf init code into a single function, so that we have a single
place doing that.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220422100025.1469207-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-22 14:02:15 -03:00
Florian Fischer 75eafc970b perf list: Print all available tool events
Introduce names for the new tool events 'user_time' and 'system_time'.

  $ perf list
  ...
  duration_time                                      [Tool event]
  user_time                                          [Tool event]
  system_time                                        [Tool event]
  ...

Committer testing:

Before:

  $ perf list | grep Tool
  duration_time                                      [Tool event]
  $

After:

  $ perf list | grep Tool
    duration_time                                    [Tool event]
    user_time                                        [Tool event]
    system_time                                      [Tool event]
  $

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/20220420174244.1741958-2-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 15:05:00 -03:00
Florian Fischer b03b89b350 perf stat: Add user_time and system_time events
It bothered me that during benchmarking using 'perf stat' (to collect
for example CPU cache events) I could not simultaneously retrieve the
times spend in user or kernel mode in a machine readable format.

When running 'perf stat' the output for humans contains the times
reported by rusage and wait4.

  $ perf stat -e cache-misses:u -- true

   Performance counter stats for 'true':

             4,206      cache-misses:u

       0.001113619 seconds time elapsed

       0.001175000 seconds user
       0.000000000 seconds sys

But 'perf stat's machine-readable format does not provide this information.

  $ perf stat -x, -e cache-misses:u -- true
  4282,,cache-misses:u,492859,100.00,,

I found no way to retrieve this information using the available events
while using machine-readable output.

This patch adds two new tool internal events 'user_time' and
'system_time', similarly to the already present 'duration_time' event.

Both events use the already collected rusage information obtained by
wait4 and tracked in the global ru_stats.

Examples presenting cache-misses and rusage information in both human
and machine-readable form:

  $ perf stat -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time .

   Performance counter stats for 'grep -q -r duration_time .':

        67,422,542 ns   duration_time:u
        50,517,000 ns   user_time:u
        16,839,000 ns   system_time:u
            30,937      cache-misses:u

       0.067422542 seconds time elapsed

       0.050517000 seconds user
       0.016839000 seconds sys

  $ perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time .
  72134524,ns,duration_time:u,72134524,100.00,,
  65225000,ns,user_time:u,65225000,100.00,,
  6865000,ns,system_time:u,6865000,100.00,,
  38705,,cache-misses:u,71189328,100.00,,

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-3-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:44:56 -03:00
Florian Fischer c735b0a521 perf stat: Introduce stats for the user and system rusage times
This is preparation for exporting rusage values as tool events.

Add new global stats tracking the values obtained via rusage.

For now only ru_utime and ru_stime are part of the tracked stats.

Both are stored as nanoseconds to be consistent with 'duration_time',
although the finest resolution the struct timeval data in rusage
provides are microseconds.

Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-2-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:38:41 -03:00
Martin Liška c60664dea7 perf tools: Print warning when HAVE_DEBUGINFOD_SUPPORT is not set and user tries to use debuginfod support
When one requests debuginfod, either via --debuginfod option, or with a
perf-config value, complain when perf is not built with it.

Signed-off-by: Martin Liška <mliska@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/35bae747-3951-dc3d-a66b-abf4cebcd9cb@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:36:36 -03:00
Martin Liška b8836c2a4d perf version: Add HAVE_DEBUGINFOD_SUPPORT to built-in features
The change adds debuginfod to ./perf -vv:

  ...
  debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
  ...

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/0d1c5ace-88e8-7102-1565-7c143f01a966@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-20 13:32:09 -03:00
Ian Rogers 87e0a30e9a perf vendor events intel: Update goldmont event topics
Apply topic updates from:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:38:28 -03:00
Ian Rogers f51c401f11 perf vendor events intel: Update goldmontplus event topics
Apply topic updates from:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:38:18 -03:00
Ian Rogers 8f1a69825f perf vendor events intel: Update elkhartlake event topics
Apply topic updates from:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-12-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:38:08 -03:00
Ian Rogers 44a4b9ad8e perf vendor events intel: Update westmereex event topics
Apply topic updates from:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:37:56 -03:00
Ian Rogers 7f2c72fa69 perf vendor events intel: Update westmereep-sp event topics
Apply topic updates from:
p
https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:37:43 -03:00
Ian Rogers a01174fc9e perf vendor events intel: Update westmereep-dp event topics
Apply topic updates from:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:37:32 -03:00
Ian Rogers 55ae1b759e perf vendor events intel: Update tremontx uncore and topics
Update the topic of BTCLEAR.ANY and add additional uncore event names
as per:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>1
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:37:22 -03:00
Ian Rogers 45d97cdd2f perf vendor events intel: Update tigerlake topic
Update the topic of ASSISTS.ANY as per:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:37:10 -03:00
Ian Rogers da578feb70 perf vendor events intel: Update nehalemep event topics
Apply topic updates from:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:36:59 -03:00
Ian Rogers 339ec95167 perf vendor events intel: Update SKX uncore
JSON uncore events are generated for Skylake Server for v1.26
with events from:

https://download.01.org/perfmon/SKX/

New event names are added, that match the original JSON names,
due to an update to:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:36:35 -03:00
Ian Rogers dd498d0804 perf vendor events intel: Update CLX uncore to v1.14
JSON uncore events are generated for CascadeLake Server for v1.14 with
events from:

https://download.01.org/perfmon/CLX/

New event names are added, that match the original JSON names,
due to an update to:

https://github.com/intel/event-converter-for-linux-perf/

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:36:15 -03:00
Ian Rogers 12c6385eeb perf vendor events intel: Add sapphirerapids events
Events were generated from 01.org using:

https://github.com/intel/event-converter-for-linux-perf

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:36:07 -03:00
Ian Rogers cbeee6caa4 perf vendor events intel: Fix icelakex cstate metrics
Apply cstate fix from:

https://github.com/intel/event-converter-for-linux-perf/

so that metrics for cstates that exist on the particular architecture
are generated. This corrects issues with metric testing.

Also correct topic of ASSISTS.ANY event.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:35:54 -03:00
Ian Rogers 2c77f36a9a perf vendor events intel: Fix icelake cstate metrics
Apply cstate fix from:

https://github.com/intel/event-converter-for-linux-perf/

so that metrics for cstates that exist on the particular architecture
are generated. This corrects issues with metric testing.

Also correct topic of ASSISTS.ANY event.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 12:35:03 -03:00
Leo Yan fdefc3750e perf mem: Print memory operation type
The memory operation types are not only for load and store, for easier
reviewing the memory operation type, this patch prints out it.

Before:
  ls 14753 [011]  3678.072400:  1    l1d-miss:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  l1d-access:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  tlb-access:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1      memory:  88000182 L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])

After:

  ls 14753 [011]  3678.072400:  1    l1d-miss:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  l1d-access:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1  tlb-access:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])
  ls 14753 [011]  3678.072400:  1      memory:  88000182 |OP LOAD|LVL L1 miss|SNP N/A|TLB Walker hit|LCK No|BLK  N/A ffffa7c22b4b2a00 [unknown] ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220417124524.901148-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-18 11:44:06 -03:00
Athira Rajeev f58faed7fb perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1K
The 'perf bench numa' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1

Snippet of code:

  <<>>
  perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
  Aborted (core dumped)
  <<>>

bind_to_node() uses "sched_getaffinity" to save the original cpumask and
this call is returning EINVAL ((invalid argument).

This happens because the default mask size in glibc is 1024.  To
overcome this 1024 CPUs mask size limitation of cpu_set_t, change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate
cpumask, CPU_ALLOC_SIZE for size.

Apart from fixing this for "orig_mask", apply same logic to "mask" as
well which is used to setaffinity so that mask size is large enough to
represent number of possible CPU's in the system.

sched_getaffinity is used in one more place in perf numa bench. It is in
"bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.

Also fix "sched_setaffinity" to use mask size which is large enough to
represent number of possible CPU's in the system.

Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14 09:15:10 -03:00
Athira Rajeev 8cb7a188ac perf bench: Fix numa testcase to check if CPU used to bind task is online
Perf numa bench test fails with error:

Testcase:

  ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp  1 --no-data_rand_walk

Failure snippet:

<<>>
  Running 'numa/mem' benchmark:

  # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"

  perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>

The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu's possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus || bind_cpu_1 >=
g->p.nr_cpus) {".

But it could happen that system has say 48 CPU's, but only number of
online CPU's is 0-7. Other CPU's are offlined. Since "g->p.nr_cpus" is
48, so function will go ahead and set bit for CPU 8 also in cpumask (
td->bind_cpumask).

bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present, set
affinity will fail here with EINVAL.

Fix this issue by adding a check to make sure that, CPU's provided in
the input argument values are online before proceeding further and skip
the test. For this, include new helper function "is_cpu_online" in
"tools/perf/util/header.c".

Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14 09:13:41 -03:00
Ian Rogers 24f378e660 perf test: Add basic perf record tests
Test the --per-thread flag.

Test Intel machine state capturing.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.bayduraev@gmail.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220414014642.3308206-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14 09:10:12 -03:00
Alexey Bayduraev 23380e4d53 perf record: Fix per-thread option
Per-thread mode doesn't have specific CPUs for events, add checks for
this case.

Minor fix to a pr_debug by Ian Rogers <irogers@google.com> to avoid an
out of bound array access.

Fixes: 7954f71689 ("perf record: Introduce thread affinity and mmap masks")
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Alexey Bayduraev <alexey.bayduraev@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220414014642.3308206-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14 09:05:11 -03:00
James Clark 2adacd7f0a perf docs: Add man page entry for Arm SPE
The SPE integration in Perf has quite a few usability quirks that
can't be found by just reading the reference manual. So document this
and at the same time add a summary of the feature that is also hard to
find elsewhere.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Co-authored-by: Al Grant <al.grant@arm.com>
Co-authored-by: Luke Dare <Luke.Dare@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220413084021.2556142-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-14 08:54:03 -03:00
Lv Ruyi d73f5d14e0 perf stat: Fix error check return value of hashmap__new(), must use IS_ERR()
hashmap__new() returns ERR_PTR(-ENOMEM) when it fails, so we should use
IS_ERR() to check it in error handling path.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220413093302.2538128-1-lv.ruyi@zte.com.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-13 22:20:15 -03:00
Adrian Hunter f034fc50d3 perf tools: Fix misleading add event PMU debug message
Fix incorrect debug message:

   Attempting to add event pmu 'intel_pt' with '' that may result in
   non-fatal errors

which always appears with perf record -vv and intel_pt e.g.

    perf record -vv -e intel_pt//u uname

The message is incorrect because there will never be non-fatal errors.

Suppress the message if the PMU is 'selectable' i.e. meant to be
selected directly as an event.

Fixes: 4ac22b484d ("perf parse-events: Make add PMU verbose output clearer")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20220411061758.2458417-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-13 07:00:31 -03:00
Carsten Haitzler 41204da4c1 perf test: Shell - Limit to only run executable scripts in tests
'perf test''s shell runner will just run everything in the tests
directory (as long as it's not another directory or does not begin
with a dot), but sometimes you find files in there that are not shell
scripts - perf.data output for example if you do some testing and then
the next time you run perf test it tries to run these.

Check the files are executable so they are actually intended to be test
scripts and not just some "random junk" files there.

Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Link: http://lore.kernel.org/lkml/20220309122859.31487-1-carsten.haitzler@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-11 16:39:49 -03:00
Eelco Chaudron ae24e9b53d perf scripting python: Expose symbol offset and source information
This change adds the symbol offset to the data exported for each
call-chain entry. This can not be calculated from the script and
only the ip value, and no related mmap information.

In addition, also export the source file and line information, if
available, to avoid an external lookup if this information is needed.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/164554263724.752731.14651017093796049736.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-11 16:39:49 -03:00
Eric Lin 335f70faa2 perf jitdump: Add riscv64 support
This patch enables perf jitdump for riscv64 and was tested with V8 on
qemu rv64.

Qemu rv64:

  $ perf record -e cpu-clock -c 1000 -g -k mono ./d8_rv64 --perf-prof --no-write-protect-code-memory test.js
  $ perf inject -j -i perf.data -o perf.data.jitted
  $ perf report -i perf.data.jitted

Output:

  To display the perf.data header info, please use --header/--header-only options.

  Total Lost Samples: 0

  Samples: 87K of event 'cpu-clock'
  Event count (approx.): 87974000

  Children  Self   Command   Shared Object      Symbol

  ....
   0.28%    0.06%  d8_rv64   d8_rv64            [.] _ZN2v88internal6WasmJs7InstallEPNS0_7IsolateEb
   0.28%    0.00%  d8_rv64   d8_rv64            [.] _ZN2v88internal10ParserBaseINS0_6ParserEE22ParseLogicalExpressionEv
   0.28%    0.03%  d8_rv64   jitted-112-76.so   [.] Builtin:InterpreterEntryTrampoline
   0.12%    0.00%  d8_rv64   d8_rv64            [.] _ZN2v88internal19ContextDeserializer11DeserializeEPNS0_7IsolateENS0_6HandleINS0_13JSGlobalProxyEEENS_33DeserializeInternalFieldsCallbackE
   0.12%    0.01%  d8_rv64   jitted-112-651.so  [.] Builtin:CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit
  ....

Signed-off-by: Eric Lin <eric.lin@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: greentime.hu@sifive.com
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220406142606.18464-2-eric.lin@sifive.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-11 16:37:26 -03:00
Ian Rogers 940a445a90 perf annotate: Drop objdump stderr to avoid getting stuck waiting for stdout output
If objdump writes to stderr it can block waiting for it to be read. As
perf doesn't read stderr then progress stops with perf waiting for
stdout output.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lexi Shao <shaolexi@huawei.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220407230503.1265036-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:21:00 -03:00
Michael Petlan 3e6b43beb7 perf tools: Add external commands to list-cmds
The `perf --list-cmds` output prints only internal commands, although
there is no reason for that from users' perspective.

Adding the external commands to commands array with NULL function
pointer allows printing all perf commands while not changing the logic
of command handler selection.

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220404221541.30312-2-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:21:00 -03:00
Michael Petlan 0ff26efe92 perf docs: Add perf-iostat link to manpages
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220404221541.30312-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:20:59 -03:00
Denis Nikitin bc21e74d47 perf session: Remap buf if there is no space for event
If a perf event doesn't fit into remaining buffer space return NULL to
remap buf and fetch the event again.

Keep the logic to error out on inadequate input from fuzzing.

This fixes perf failing on ChromeOS (with 32b userspace):

  $ perf report -v -i perf.data
  ...
  prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data?
  Error:
  failed to process sample

Fixes: 57fc032ad6 ("perf session: Avoid infinite loop when seeing invalid header.size")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Denis Nikitin <denik@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:20:59 -03:00
Athira Rajeev 299687e18a perf bench: Fix epoll bench to correct usage of affinity for machines with #CPUs > 1K
The 'perf bench epoll' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench epoll all

Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.

perf: pthread_create: No such file or directory
<<>>

In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_*  function. Though the logs shows direct
failure from pthread_create, the actual failure is from
"sched_setaffinity" returning EINVAL (invalid argument).

This happens because the default mask size in glibc is 1024. To overcome
this 1024 CPUs mask size limitation of cpu_set_t, change the mask size
using the CPU_*_S macros.

Patch addresses this by fixing all the epoll benchmarks to use CPU_ALLOC
to allocate cpumask, CPU_ALLOC_SIZE for size, and CPU_SET_S to set the
mask.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220406175113.87881-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Athira Rajeev c9c2a427dd perf bench: Fix futex bench to correct usage of affinity for machines with #CPUs > 1K
The 'perf bench futex' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench futex all

Failure snippet:
<<>>Running futex/hash benchmark...

perf: pthread_create: No such file or directory
<<>>

All the futex benchmarks (ie hash, lock-api, requeue, wake,
wake-parallel), pthread_create is invoked in respective bench_futex_*
function. Though the logs shows direct failure from pthread_create,
strace logs showed that actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument).

This happens because the default mask size in glibc is 1024. To overcome
this 1024 CPUs mask size limitation of cpu_set_t, change the mask size
using the CPU_*_S macros.

Patch addresses this by fixing all the futex benchmarks to use CPU_ALLOC
to allocate cpumask, CPU_ALLOC_SIZE for size, and CPU_SET_S to set the
mask.

Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220406175113.87881-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Adrian Hunter aeee9dc53c perf tools: Fix perf's libperf_print callback
eprintf() does not expect va_list as the type of the 4th parameter.

Use veprintf() because it does.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 428dab813a ("libperf: Merge libperf_set_print() into libperf_init()")
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220408132625.2451452-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
James Clark ffab487052 perf: arm-spe: Fix perf report --mem-mode
Since commit bb30acae4c ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't allow opening the file unless one of the events has
PERF_SAMPLE_DATA_SRC set.

SPE doesn't have this set even though synthetic memory data is generated
after it is decoded. Fix this issue by setting DATA_SRC on SPE events.
This has no effect on the data collected because the SPE driver doesn't
do anything with that flag and doesn't generate samples.

Fixes: bb30acae4c ("perf report: Bail out --mem-mode if mem info is not available")
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220408144056.1955535-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
James Clark fa7095c5c3 perf unwind: Don't show unwind error messages when augmenting frame pointer stack
Commit Fixes: b9f6fbb3b2 ("perf arm64: Inject missing frames when
using 'perf record --call-graph=fp'") intended to add a 'best effort'
DWARF unwind that improved the frame pointer stack in most scenarios.

It's expected that the unwind will fail sometimes, but this shouldn't be
reported as an error. It only works when the return address can be
determined from the contents of the link register alone.

Fix the error shown when the unwinder requires extra registers by adding
a new flag that suppresses error messages. This flag is not set in the
normal --call-graph=dwarf unwind mode so that behavior is not changed.

Fixes: b9f6fbb3b2 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220406145651.1392529-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Chengdong Li 290fa68bdc perf test tsc: Fix error message when not supported
By default `perf test tsc` does not return the error message when the
child process detected kernel does not support it. Instead, the child
process prints an error message to stderr, unfortunately stderr is
redirected to /dev/null when verbose <= 0.

This patch does:

- return TEST_SKIP to the parent process instead of TEST_OK when
  perf_read_tsc_conversion() is not supported.

- Add a new subtest of testing if TSC is supported on current
  architecture by moving exist code to a separate function.
  It avoids two places in test__perf_time_to_tsc() that return
  TEST_SKIP by doing this.

- Extend the test suite definition to contain above two subtests.
  Current test_suite and test_case structs do not support printing skip
  reason when the number of subtest less than 1. To print skip reason, it
  is necessary to extend current test suite definition.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chengdong Li <chengdongli@tencent.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: likexu@tencent.com
Link: https://lore.kernel.org/r/20220408084748.43707-1-chengdongli@tencent.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Arnaldo Carvalho de Melo 3a8a047586 perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
Using -ffat-lto-objects in the python feature test when building with
clang-13 results in:

  clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
  error: command '/usr/sbin/clang' failed with exit code 1
  cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
  make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1

Noticed when building on a docker.io/library/archlinux:base container.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Arnaldo Carvalho de Melo dd6e1fe91c perf python: Fix probing for some clang command line options
The clang compiler complains about some options even without a source
file being available, while others require one, so use the simple
tools/build/feature/test-hello.c file.

Then check for the "is not supported" string in its output, in addition
to the "unknown argument" already being looked for.

This was noticed when building with clang-13 where -ffat-lto-objects
isn't supported and since we were looking just for "unknown argument"
and not providing a source code to clang, was mistakenly assumed as
being available and not being filtered to set of command line options
provided to clang, leading to a build failure.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:29 -03:00
Arnaldo Carvalho de Melo 41caff459a tools build: Filter out options and warnings not supported by clang
These make the feature check fail when using clang, so remove them just
like is done in tools/perf/Makefile.config to build perf itself.

Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config
when building with clang is also necessary to avoid these warnings
turned into errors (-Werror):

    CC      /tmp/build/perf/util/scripting-engines/trace-event-perl.o
  In file included from util/scripting-engines/trace-event-perl.c:35:
  In file included from /usr/lib64/perl5/CORE/perl.h:4085:
  In file included from /usr/lib64/perl5/CORE/hv.h:659:
  In file included from /usr/lib64/perl5/CORE/hv_func.h:34:
  In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4:
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
  #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START {  \
                                       ^~~~~~~~~~
  /usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START'
  #   define STMT_START   (void)( /* gcc supports "({ STATEMENTS; })" */
                                ^
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
  #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START {  \
                                                  ^
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
      v ^= (v>>23);                       \
                                          ^
  /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here
      ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  /usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
  } STMT_END
    ^~~~~~~~
  /usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END'
  #   define STMT_END     )
                          ^

Please refer to the discussion on the Link: tag below, where Nathan
clarifies the situation:

<quote>
acme> And then get to the problems at the end of this message, which seem
acme> similar to the problem described here:
acme>
acme> From  Nathan Chancellor <>
acme> Subject	[PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO
acme>
acme> https://lkml.org/lkml/2020/9/1/135
acme>
acme> So perhaps in this case its better to disable that
acme> -Werror,-Wcompound-token-split-by-macro when building with clang?

Yes, I think that is probably the best solution. As far as I can tell,
at least in this file and context, the warning appears harmless, as the
"create a GNU C statement expression from two different macros" is very
much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS.
The warning is fixed in upstream Perl by just avoiding creating GNU C
statement expressions using STMT_START and STMT_END:

  https://github.com/Perl/perl5/issues/18780
  https://github.com/Perl/perl5/pull/18984

If I am reading the source code correctly, an alternative to disabling
the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it
seems like that might end up impacting more than just this site,
according to the issue discussion above.
</quote>

Based-on-a-patch-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YkxWcYzph5pC1EK8@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 12:34:16 -03:00
Tanu M 7e2022af79 perf python: Convert tracepoint.py example to python3
Convert the tracepoint.py file to python3 as many of the files in
tools/perf are already written in python3.

Committer testing:

  # export PYTHONPATH=/tmp/build/perf/python/
  # python3 ~acme/git/perf/tools/perf/python/tracepoint.py | head
  time 67394457376909 prev_comm=swapper/12 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=gnome-terminal- next_pid=3313 next_prio=120
  time 67394457807669 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457811859 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457824929 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457831899 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457842299 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457844179 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457853879 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457856339 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457865659 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  Traceback (most recent call last):
    File "/var/home/acme/git/perf/tools/perf/python/tracepoint.py", line 48, in <module>
      main()
    File "/var/home/acme/git/perf/tools/perf/python/tracepoint.py", line 37, in main
      print("time %u prev_comm=%s prev_pid=%d prev_prio=%d prev_state=0x%x ==> next_comm=%s next_pid=%d next_prio=%d" % (
  BrokenPipeError: [Errno 32] Broken pipe
  #

Signed-off-by: Tanu M <tanu235m@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/linux-perf-users/CAPS78prawYzRZnyhWjgOnGw4EwoswNwztvfZFdCOPOydFzVwzQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
Haowen Bai f717d89a2b perf evlist: Directly return instead of using local ret variable
Addresses this coccinelle warning:

  ./tools/perf/util/evlist.c:1333:5-8: Unneeded variable: "err". Return
  "- ENOMEM" on line 1358

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/1648432532-23151-1-git-send-email-baihaowen@meizu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
Ian Rogers 0df6ade711 perf evlist: Rename cpus to user_requested_cpus
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels.

For non-task targets, cpus is set to be cpus requested from the command
line, defaulting to all online cpus if no cpus are specified.

For an uncore event, all_cpus may be just CPU 0 or every online CPU.

This causes all_cpus to have fewer values than the cpus variable which
is confusing given the 'all' in the name.

To try to make the behavior clearer, rename cpus to user_requested_cpus
and add comments on the two struct variables.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
John Garry d4ff926592 perf tools: Stop depending on .git files for building PERF-VERSION-FILE
This essentially reverts commit c72e3f04b4 ("tools/perf/build:
Speed up git-version test on re-make") and commit 4e666cdb06
("perf tools: Fix dependency for version file creation")

In commit c72e3f04b4 ("tools/perf/build: Speed up git-version test
on re-make"), a makefile dependency on .git/HEAD was added. The
background is that running PERF-VERSION-FILE is relatively slow, and
commands like "git describe" are particularly slow.

In commit 4e666cdb06 ("perf tools: Fix dependency for version file
creation"), an additional dependency on .git/ORIG_HEAD was added, as
.git/HEAD may not change for "git reset --hard HEAD^" command. However,
depending on whether we're on a branch or not, a "git cherry-pick" may
not lead to the version being updated.

As discussed with the git community in [0], using git internal files for
dependencies is not reliable. Commit 4e666cdb06 also breaks some build
scenarios [1].

As mentioned, c72e3f04b4 ("tools/perf/build: Speed up git-version
test on re-make") was added to speed up the build. However in commit
7572733b84 ("perf tools: Fix version kernel tag") we removed the
call to "git describe", so just revert Makefile.perf back to same as pre
c72e3f04b4 ("tools/perf/build: Speed up git-version test on
re-make") and the build should not be so slow, as below:

Pre 7572733b8499:

  $> time util/PERF-VERSION-GEN
    PERF_VERSION = 5.17.rc8.g4e666cdb06ee

  real    0m0.110s
  user    0m0.091s
  sys     0m0.019s

Post 7572733b8499:

  $> time util/PERF-VERSION-GEN
    PERF_VERSION = 5.17.rc8.g7572733b8499

  real    0m0.039s
  user    0m0.036s
  sys     0m0.007s

[0] https://lore.kernel.org/git/87wngkpddp.fsf@igel.home/T/#m4a4dd6de52fdbe21179306cd57b3761eb07f45f8
[1] https://lore.kernel.org/linux-perf-users/20220329093120.4173283-1-matthieu.baerts@tessares.net/T/#u

Committer testing:

After a fresh rebuild using 'make -C tools/perf O=/tmp/build/perf install-bin':

  $ perf -v
  perf version 5.17.g162f9db407b6
  $ git log --oneline -1
  162f9db407b6a6e5 (HEAD -> perf/core) perf tools: Stop depending on .git files for building PERF-VERSION-FILE
  $

Now using a detached tarball, i.e. outside the kernel source tree:

  $ ls -la perf*tar
  ls: cannot access 'perf*tar': No such file or directory
  $ make perf-tar-src-pkg
    TAR
    PERF_VERSION = 5.17.g31d10b3ef133
  $ ls -la perf*tar
  -rw-r--r--. 1 acme acme 22241280 Mar 30 13:26 perf-5.17.0.tar
  $ mv perf-5.17.0.tar /tmp
  $ cd /tmp
  $ tar xf perf-5.17.0.tar
  $ cd perf-5.17.0/
  $ make -C tools/perf |& tail
    CC      util/pmu.o
    CC      util/pmu-flex.o
    CC      util/expr-flex.o
    CC      util/expr.o
    LD      util/scripting-engines/perf-in.o
    LD      util/intel-pt-decoder/perf-in.o
    LD      util/perf-in.o
    LD      perf-in.o
    LINK    perf
  make: Leaving directory '/tmp/perf-5.17.0/tools/perf'
  $ tools/perf/perf -v
  perf version 5.17.g31d10b3ef133
  $ pwd
  /tmp/perf-5.17.0
  $ cat PERF-VERSION-FILE
  #define PERF_VERSION "5.17.g31d10b3ef133"
  $

Fixes: 4e666cdb06 ("perf tools: Fix dependency for version file creation")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1648635774-14581-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
Arnaldo Carvalho de Melo 9a195da42f perf beauty: Update copy of linux/socket.h with the kernel sources
To pick the changes in:

  a6a6fe27ba ("net/smc: Dynamic control handshake limitation by socket options")

This automagically adds support for the SOL_MNC socket level:

  $ diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
  --- tools/perf/trace/beauty/include/linux/socket.h	2022-03-14 17:55:22.277148656 -0300
  +++ include/linux/socket.h	2022-03-27 19:12:48.908250063 -0300
  @@ -366,6 +366,7 @@
   #define SOL_XDP		283
   #define SOL_MPTCP	284
   #define SOL_MCTP	285
  +#define SOL_SMC		286

   /* IPX options */
   #define IPX_TYPE	1
  $ tools/perf/trace/beauty/socket.sh > before
  $ cp include/linux/socket.h tools/perf/trace/beauty/include/linux/socket.h
  $ tools/perf/trace/beauty/socket.sh > after
  $ diff -u before after
  --- before	2022-03-29 11:47:56.390258780 -0300
  +++ after	2022-03-29 11:48:03.158436189 -0300
  @@ -67,6 +67,7 @@
   	[283] = "XDP",
   	[284] = "MPTCP",
   	[285] = "MCTP",
  +	[286] = "SMC",
   };

   DEFINE_STRARRAY(socket_level, "SOL_");
  $

This will allow 'perf trace' to translate 286 into "SMC" as is done with
the other socket levels:

  # perf trace -e setsockopt --max-events 4
   344.916 ( 0.003 ms): Socket Thread/3816 setsockopt(fd: 168, level: TCP, optname: 5, optval: 0x7f5797b9c4f8, optlen: 4) = 0
   344.920 ( 0.002 ms): Socket Thread/3816 setsockopt(fd: 168, level: TCP, optname: 6, optval: 0x7f5797b9c4f4, optlen: 4) = 0
  1246.974 ( 0.010 ms): systemd-resolv/1128 setsockopt(fd: 22, level: IP, optname: 11, optval: 0x7ffc96cd7244, optlen: 4) = 0
  1246.986 ( 0.002 ms): systemd-resolv/1128 setsockopt(fd: 22, level: IP, optname: 8, optval: 0x7ffc96cd7264, optlen: 4) = 0

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
  diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: D. Wythe <alibuda@linux.alibaba.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkMdpzzjPu5VZtW3@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:34 -03:00
Arnaldo Carvalho de Melo 4d4d00dd32 perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:

  fba60b171a ("libbpf: Use IS_ERR_OR_NULL() in hashmap__free()")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
  diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mauricio Vásquez <mauricio@kinvolk.io>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkMb2SAIai2VeuUD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:34 -03:00
Ian Rogers 8a96f454f5 perf stat: Avoid SEGV if core.cpus isn't set
Passing NULL to perf_cpu_map__max doesn't make sense as there is no
valid max. Avoid this problem by null checking in
perf_stat_init_aggr_mode.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328062414.1893550-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:34 -03:00
Linus Torvalds b8321ed4a4 Kbuild updates for v5.18
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
    additional flags to be passed to user-space programs.
 
  - Fix missing fflush() bugs in Kconfig and fixdep
 
  - Fix a minor bug in the comment format of the .config file
 
  - Make kallsyms ignore llvm's local labels, .L*
 
  - Fix UAPI compile-test for cross-compiling with Clang
 
  - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
    particular version of LLVm, and LLVM=<prefix> form for using custom
    LLVM in a particular directory path.
 
  - Clean up Makefiles
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs
 IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/
 p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj
 ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa
 rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z
 ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF
 nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW
 /YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX
 oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2
 p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD
 ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1
 +LJMEpdZO0dFvpF+
 =84rW
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
   additional flags to be passed to user-space programs.

 - Fix missing fflush() bugs in Kconfig and fixdep

 - Fix a minor bug in the comment format of the .config file

 - Make kallsyms ignore llvm's local labels, .L*

 - Fix UAPI compile-test for cross-compiling with Clang

 - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
   particular version of LLVm, and LLVM=<prefix> form for using custom
   LLVM in a particular directory path.

 - Clean up Makefiles

* tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: Make $(LLVM) more flexible
  kbuild: add --target to correctly cross-compile UAPI headers with Clang
  fixdep: use fflush() and ferror() to ensure successful write to files
  arch: syscalls: simplify uapi/kapi directory creation
  usr/include: replace extra-y with always-y
  certs: simplify empty certs creation in certs/Makefile
  certs: include certs/signing_key.x509 unconditionally
  kallsyms: ignore all local labels prefixed by '.L'
  kconfig: fix missing '# end of' for empty menu
  kconfig: add fflush() before ferror() check
  kbuild: replace $(if A,A,B) with $(or A,B)
  kbuild: Add environment variables for userprogs flags
  kbuild: unify cmd_copy and cmd_shipped
2022-03-31 11:59:03 -07:00
Linus Torvalds 7b58b82b86 perf tools changes for v5.18: 1st batch
New features:
 
 perf ftrace:
 
 - Add -n/--use-nsec option to the 'latency' subcommand.
 
   Default: usecs:
 
   $ sudo perf ftrace latency -T dput -a sleep 1
   #   DURATION     |      COUNT | GRAPH                          |
        0 - 1    us |    2098375 | #############################  |
        1 - 2    us |         61 |                                |
        2 - 4    us |         33 |                                |
        4 - 8    us |         13 |                                |
        8 - 16   us |        124 |                                |
       16 - 32   us |        123 |                                |
       32 - 64   us |          1 |                                |
       64 - 128  us |          0 |                                |
      128 - 256  us |          1 |                                |
      256 - 512  us |          0 |                                |
 
   Better granularity with nsec:
 
   $ sudo perf ftrace latency -T dput -a -n sleep 1
   #   DURATION     |      COUNT | GRAPH                          |
        0 - 1    us |          0 |                                |
        1 - 2    ns |          0 |                                |
        2 - 4    ns |          0 |                                |
        4 - 8    ns |          0 |                                |
        8 - 16   ns |          0 |                                |
       16 - 32   ns |          0 |                                |
       32 - 64   ns |          0 |                                |
       64 - 128  ns |    1163434 | ##############                 |
      128 - 256  ns |     914102 | #############                  |
      256 - 512  ns |        884 |                                |
      512 - 1024 ns |        613 |                                |
        1 - 2    us |         31 |                                |
        2 - 4    us |         17 |                                |
        4 - 8    us |          7 |                                |
        8 - 16   us |        123 |                                |
       16 - 32   us |         83 |                                |
 
 perf lock:
 
 - Add -c/--combine-locks option to merge lock instances in the same class into
   a single entry.
 
   # perf lock report -c
                  Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)
 
         rcu_read_lock   251225         0            0              0            0            0
    hrtimer_bases.lock    39450         0            0              0            0            0
   &sb->s_type->i_l...    10301         1          662            662          662          662
      ptlock_ptr(page)    10173         2          701           1402          760          642
   &(ei->i_block_re...     8732         0            0              0            0            0
          &xa->xa_lock     8088         0            0              0            0            0
           &base->lock     6705         0            0              0            0            0
           &p->pi_lock     5549         0            0              0            0            0
   &dentry->d_lockr...     5010         4         1274           5097         1844          789
             &ep->lock     3958         0            0              0            0            0
 
 - Add -F/--field option to customize the list of fields to output:
 
   $ perf lock report -F contended,wait_max -k avg_wait
                   Name contended max wait(ns) avg wait(ns)
 
         slock-AF_INET6         1        23543        23543
      &lruvec->lru_lock         5        18317        11254
         slock-AF_INET6         1        10379        10379
             rcu_node_1         1         2104         2104
    &dentry->d_lockr...         1         1844         1844
    &dentry->d_lockr...         1         1672         1672
       &newf->file_lock        15         2279         1025
    &dentry->d_lockr...         1          792          792
 
 - Add --synth=no option for record, as there is no need to symbolize,
   lock names comes from the tracepoints.
 
 perf record:
 
 - Threaded recording, opt-in, via the new --threads command line option.
 
 - Improve AMD IBS (Instruction-Based Sampling) error handling messages.
 
 perf script:
 
 - Add 'brstackinsnlen' field (use it with -F) for branch stacks.
 
 - Output branch sample type in 'perf script'.
 
 perf report:
 
 - Add "addr_from" and "addr_to" sort dimensions.
 
 - Print branch stack entry type in 'perf report --dump-raw-trace'
 
 - Fix symbolization for chrooted workloads.
 
 Hardware tracing:
 
 Intel PT:
 
 - Add CFE (Control Flow Event) and EVD (Event Data) packets support.
 
 - Add MODE.Exec IFLAG bit support.
 
 Explanation about these features from the "Intel® 64 and IA-32 architectures
 software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C,
 3D, and 4" PDF at:
 
   https://cdrdv2.intel.com/v1/dl/getContent/671200
 
 At page 3951:
 
 <quote>
 32.2.4
 
 Event Trace is a capability that exposes details about the asynchronous
 events, when they are generated, and when their corresponding software
 event handler completes execution. These include:
 
 o Interrupts, including NMI and SMI, including the interrupt vector when
 defined.
 
 o Faults, exceptions including the fault vector.
 
 — Page faults additionally include the page fault address, when in context.
 
 o Event handler returns, including IRET and RSM.
 
 o VM exits and VM entries.¹
 
 — VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields.
 INIT and SIPI events.
 
 o TSX aborts, including the abort status returned for the RTM instructions.
 
 o Shutdown.
 
 Additionally, it provides indication of the status of the Interrupt Flag
 (IF), to indicate when interrupts are masked.
 </quote>
 
 ARM CoreSight:
 
 - Use advertised caps/min_interval as default sample_period on ARM spe.
 
 - Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM.
 
 Vendor Events (JSON):
 
 Intel:
 
 - Update events and metrics for:
 
     Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake,
     Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX,
     Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing,
     Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
     Tigerlake, TremontX, Westmere EP-SP, Westmere EX.
 
 ARM:
 
 - Add support for HiSilicon CPA PMU aliasing.
 
 perf stat:
 
 - Fix forked applications enablement of counters.
 
 - The 'slots' should only be printed on a different order than the one specified
   on the command line when 'topdown' events are present, fix it.
 
 Miscellaneous:
 
 - Sync msr-index, cpufeatures header files with the kernel sources.
 
 - Stop using some deprecated libbpf APIs in 'perf trace'.
 
 - Fix some spelling mistakes.
 
 - Refactor the maps pointers usage to pave the way for using refcount debugging.
 
 - Only offer the --tui option on perf top, report and annotate when perf was
   built with libslang.
 
 - Don't mention --to-ctf in 'perf data --help' when not linking with the required
   library, libbabeltrace.
 
 - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci.
 
 - Enhance the matching of sub-commands abbreviations:
 	'perf c2c rec' -> 'perf c2c record'
 	'perf c2c recport -> error
 
 - Set build-id using build-id header on new mmap records.
 
 - Fix generation of 'perf --version' string.
 
 perf test:
 
 - Add test for the arm_spe event.
 
 - Add test to check unwinding using fame-pointer (fp) mode on arm64.
 
 - Make metric testing more robust in 'perf test'.
 
 - Add error message for unsupported branch stack cases.
 
 libperf:
 
 - Add API for allocating new thread map array.
 
 - Fix typo in perf_evlist__open() failure error messages in libperf tests.
 
 perf c2c:
 
 - Replace bitmap_weight() with bitmap_empty() where appropriate.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYj8viwAKCRCyPKLppCJ+
 J8K3AQDpN45P4/TWJxVWhZlvYzJtWDSboXHZJfmBiEd4Xu2zbwD7BFW02f1ATHPr
 dGBFXxRQQufBIqfE+OQXG59Awp1m8wE=
 =1l8S
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "New features:

  perf ftrace:

   - Add -n/--use-nsec option to the 'latency' subcommand.

     Default: usecs:

     $ sudo perf ftrace latency -T dput -a sleep 1
     #   DURATION     |      COUNT | GRAPH                          |
          0 - 1    us |    2098375 | #############################  |
          1 - 2    us |         61 |                                |
          2 - 4    us |         33 |                                |
          4 - 8    us |         13 |                                |
          8 - 16   us |        124 |                                |
         16 - 32   us |        123 |                                |
         32 - 64   us |          1 |                                |
         64 - 128  us |          0 |                                |
        128 - 256  us |          1 |                                |
        256 - 512  us |          0 |                                |

     Better granularity with nsec:

     $ sudo perf ftrace latency -T dput -a -n sleep 1
     #   DURATION     |      COUNT | GRAPH                          |
          0 - 1    us |          0 |                                |
          1 - 2    ns |          0 |                                |
          2 - 4    ns |          0 |                                |
          4 - 8    ns |          0 |                                |
          8 - 16   ns |          0 |                                |
         16 - 32   ns |          0 |                                |
         32 - 64   ns |          0 |                                |
         64 - 128  ns |    1163434 | ##############                 |
        128 - 256  ns |     914102 | #############                  |
        256 - 512  ns |        884 |                                |
        512 - 1024 ns |        613 |                                |
          1 - 2    us |         31 |                                |
          2 - 4    us |         17 |                                |
          4 - 8    us |          7 |                                |
          8 - 16   us |        123 |                                |
         16 - 32   us |         83 |                                |

  perf lock:

   - Add -c/--combine-locks option to merge lock instances in the same
     class into a single entry.

     # perf lock report -c
                    Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)

           rcu_read_lock   251225         0            0              0            0            0
      hrtimer_bases.lock    39450         0            0              0            0            0
     &sb->s_type->i_l...    10301         1          662            662          662          662
        ptlock_ptr(page)    10173         2          701           1402          760          642
     &(ei->i_block_re...     8732         0            0              0            0            0
            &xa->xa_lock     8088         0            0              0            0            0
             &base->lock     6705         0            0              0            0            0
             &p->pi_lock     5549         0            0              0            0            0
     &dentry->d_lockr...     5010         4         1274           5097         1844          789
               &ep->lock     3958         0            0              0            0            0

      - Add -F/--field option to customize the list of fields to output:

     $ perf lock report -F contended,wait_max -k avg_wait
                     Name contended max wait(ns) avg wait(ns)

           slock-AF_INET6         1        23543        23543
        &lruvec->lru_lock         5        18317        11254
           slock-AF_INET6         1        10379        10379
               rcu_node_1         1         2104         2104
      &dentry->d_lockr...         1         1844         1844
      &dentry->d_lockr...         1         1672         1672
         &newf->file_lock        15         2279         1025
      &dentry->d_lockr...         1          792          792

   - Add --synth=no option for record, as there is no need to symbolize,
     lock names comes from the tracepoints.

  perf record:

   - Threaded recording, opt-in, via the new --threads command line
     option.

   - Improve AMD IBS (Instruction-Based Sampling) error handling
     messages.

  perf script:

   - Add 'brstackinsnlen' field (use it with -F) for branch stacks.

   - Output branch sample type in 'perf script'.

  perf report:

   - Add "addr_from" and "addr_to" sort dimensions.

   - Print branch stack entry type in 'perf report --dump-raw-trace'

   - Fix symbolization for chrooted workloads.

  Hardware tracing:

  Intel PT:

   - Add CFE (Control Flow Event) and EVD (Event Data) packets support.

   - Add MODE.Exec IFLAG bit support.

     Explanation about these features from the "Intel® 64 and IA-32
     architectures software developer’s manual combined volumes: 1, 2A,
     2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:

        https://cdrdv2.intel.com/v1/dl/getContent/671200

     At page 3951:
      "32.2.4

       Event Trace is a capability that exposes details about the
       asynchronous events, when they are generated, and when their
       corresponding software event handler completes execution. These
       include:

        o Interrupts, including NMI and SMI, including the interrupt
          vector when defined.

        o Faults, exceptions including the fault vector.

           - Page faults additionally include the page fault address,
             when in context.

        o Event handler returns, including IRET and RSM.

        o VM exits and VM entries.¹

           - VM exits include the values written to the “exit reason”
             and “exit qualification” VMCS fields. INIT and SIPI events.

        o TSX aborts, including the abort status returned for the RTM
          instructions.

        o Shutdown.

       Additionally, it provides indication of the status of the
       Interrupt Flag (IF), to indicate when interrupts are masked"

  ARM CoreSight:

   - Use advertised caps/min_interval as default sample_period on ARM
     spe.

   - Update deduction of TRCCONFIGR register for branch broadcast on
     ARM's CoreSight ETM.

  Vendor Events (JSON):

  Intel:

   - Update events and metrics for: Alderlake, Broadwell, Broadwell DE,
     BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont,
     GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX,
     Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP,
     Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
     Tigerlake, TremontX, Westmere EP-SP, and Westmere EX.

  ARM:

   - Add support for HiSilicon CPA PMU aliasing.

  perf stat:

   - Fix forked applications enablement of counters.

   - The 'slots' should only be printed on a different order than the
     one specified on the command line when 'topdown' events are
     present, fix it.

  Miscellaneous:

   - Sync msr-index, cpufeatures header files with the kernel sources.

   - Stop using some deprecated libbpf APIs in 'perf trace'.

   - Fix some spelling mistakes.

   - Refactor the maps pointers usage to pave the way for using refcount
     debugging.

   - Only offer the --tui option on perf top, report and annotate when
     perf was built with libslang.

   - Don't mention --to-ctf in 'perf data --help' when not linking with
     the required library, libbabeltrace.

   - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by
     array_size.cocci.

   - Enhance the matching of sub-commands abbreviations:
	'perf c2c rec' -> 'perf c2c record'
	'perf c2c recport -> error

   - Set build-id using build-id header on new mmap records.

   - Fix generation of 'perf --version' string.

  perf test:

   - Add test for the arm_spe event.

   - Add test to check unwinding using fame-pointer (fp) mode on arm64.

   - Make metric testing more robust in 'perf test'.

   - Add error message for unsupported branch stack cases.

  libperf:

   - Add API for allocating new thread map array.

   - Fix typo in perf_evlist__open() failure error messages in libperf
     tests.

  perf c2c:

   - Replace bitmap_weight() with bitmap_empty() where appropriate"

* tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits)
  perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
  perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
  perf tools: Enhance the matching of sub-commands abbreviations
  libperf tests: Fix typo in perf_evlist__open() failure error messages
  tools arm64: Import cputype.h
  perf lock: Add -F/--field option to control output
  perf lock: Extend struct lock_key to have print function
  perf lock: Add --synth=no option for record
  tools headers cpufeatures: Sync with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  perf stat: Fix forked applications enablement of counters
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  perf evsel: Make evsel__env() always return a valid env
  perf build-id: Fix spelling mistake "Cant" -> "Can't"
  perf header: Fix spelling mistake "could't" -> "couldn't"
  perf script: Add 'brstackinsnlen' for branch stacks
  perf parse-events: Move slots only with topdown
  perf ftrace latency: Update documentation
  perf ftrace latency: Add -n/--use-nsec option
  perf tools: Fix version kernel tag
  ...
2022-03-27 13:42:32 -07:00
Kim Phillips ab0809af0b perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
Improve the error message returned on failed perf_event_open() on AMD
systems when using IBS (Instruction-Based Sampling).

Output of executing 'perf record -e ibs_op// true' as a non root user
BEFORE this patch (perf will add the 'u' modifier at the end to exclude
kernel/hypervisor sampling):

  The sys_perf_event_open() syscall returned with 22 (Invalid argument)for event (ibs_op//u).
  /bin/dmesg | grep -i perf may provide additional information.

Output after:

  AMD IBS can't exclude kernel events.  Try running at a higher privilege level.

Output of executing 'sudo perf record -e ibs_op// true' BEFORE this patch:

  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (ibs_op//).
  /bin/dmesg | grep -i perf may provide additional information.

Output after:

  Error:
  Invalid event (ibs_op//) in per-thread mode, enable system wide with '-a'.

Folowing the suggestion:

  $ sudo perf record -a -e ibs_op// true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.664 MB perf.data (194 samples) ]
  $

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: João Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20220322221517.2510440-12-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 10:55:58 -03:00
Arnaldo Carvalho de Melo b58230de3c perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
The AMD IBS error message enhancements will use these, but we're not
using evsel__open_strerror() in the python binding so far.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 10:55:57 -03:00
Wei Li ae0f4eb34f perf tools: Enhance the matching of sub-commands abbreviations
We support short command 'rec*' for 'record' and 'rep*' for 'report' in
lots of sub-commands, but the matching is not quite strict currnetly.

It may be puzzling sometime, like we mis-type a 'recport' to report but
it will perform 'record' in fact without any message.

To fix this, add a check to ensure that the short cmd is valid prefix
of the real command.

Committer testing:

  [root@quaco ~]# perf c2c re sleep 1

   Usage: perf c2c {record|report}

      -v, --verbose         be more verbose (show counter open errors, etc)

  # perf c2c rec sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ]
  # perf c2c recport sleep 1

   Usage: perf c2c {record|report}

      -v, --verbose         be more verbose (show counter open errors, etc)

  # perf c2c record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ]
  # perf c2c records sleep 1

   Usage: perf c2c {record|report}

      -v, --verbose         be more verbose (show counter open errors, etc)

  #

Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rui Xiang <rui.xiang@huawei.com>
Link: http://lore.kernel.org/lkml/20220325092032.2956161-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 10:55:57 -03:00
Ali Saidi 1314376d49 tools arm64: Import cputype.h
Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/
for arm64 to make use of all the core-type definitions in perf.

Replace sysreg.h with the version already imported into tools/.

Committer notes:

Added an entry to tools/perf/check-headers.sh, so that we get notified
when the original file in the kernel sources gets modified.

Tester notes:

LGTM. I did the testing on both my x86 and Arm64 platforms, thanks for
the fixing up.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick.Forrington@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220324183323.31414-2-alisaidi@amazon.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-26 10:53:45 -03:00
Namhyung Kim 4bd9cab59f perf lock: Add -F/--field option to control output
The -F/--field option is to customize the list of fields to output:

  $ perf lock report -F contended,wait_max -k avg_wait
                  Name  contended   max wait (ns)   avg wait (ns)

        slock-AF_INET6          1           23543           23543
     &lruvec->lru_lock          5           18317           11254
        slock-AF_INET6          1           10379           10379
            rcu_node_1          1            2104            2104
   &dentry->d_lockr...          1            1844            1844
   &dentry->d_lockr...          1            1672            1672
      &newf->file_lock         15            2279            1025
   &dentry->d_lockr...          1             792             792

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220323230259.288494-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-25 15:24:50 -03:00
Namhyung Kim 64999e4402 perf lock: Extend struct lock_key to have print function
And use it to print output for each key field.  No functional change
intended and the output should be identical.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220323230259.288494-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-25 15:24:16 -03:00
Namhyung Kim 67b61f59a6 perf lock: Add --synth=no option for record
The perf lock command has nothing to symbolize and lock names come
from the tracepoint.  Moreover, kernel symbols are available even the
--synth=no option is given.

This will reduce the startup time by avoiding unnecessary synthesis.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220323230259.288494-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-25 15:24:03 -03:00
Linus Torvalds d710d370c4 s390 updates for the 5.18 merge window
- Raise minimum supported machine generation to z10, which comes with
   various cleanups and code simplifications (usercopy/spectre
   mitigation/etc).
 
 - Rework extables and get rid of anonymous out-of-line fixups.
 
 - Page table helpers cleanup. Add set_pXd()/set_pte() helper
   functions. Covert pte_val()/pXd_val() macros to functions.
 
 - Optimize kretprobe handling by avoiding extra kprobe on
   __kretprobe_trampoline.
 
 - Add support for CEX8 crypto cards.
 
 - Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
 
 - Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
   group sections which simplifies kpatch support.
 
 - Always use the packed stack layout and extend kernel unwinder tests.
 
 - Add sanity checks for ftrace code patching.
 
 - Add s390dbf debug log for the vfio_ap device driver.
 
 - Various virtual vs physical address confusion fixes.
 
 - Various small fixes and improvements all over the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmI94dsACgkQjYWKoQLX
 FBiaCggAm9xYJ06Qt9c+T9B7aA4Lt50w7Bnxqx1/Q7UHQQgDpkNhKzI1kt/xeKY4
 JgZQ9lJC4YRLlyfIVzffLI2DWGbl8BcTpuRWVLhPI5D2yHZBXr2ARe7IGFJueddy
 MVqU/r+U3H0r3obQeUc4TSrHtSRX7eQZWIoVuDU75b9fCniee/bmGZqs6yXPXXh4
 pTZQ/gsIhF/o6eBJLEXLjUAcIasxCk15GXWXmkaSwKHAhfYiintwGmtKqQ8etCvw
 17vdlTjA4ce+3ooD/hXGPa8TqeiGKsIB2Xr89x/48f1eJyp2zPJZ1ZvAUBHJBCNt
 b4sF4ql8303Lj7Be+LeqdlbXfa5PZg==
 =meZf
 -----END PGP SIGNATURE-----

Merge tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Raise minimum supported machine generation to z10, which comes with
   various cleanups and code simplifications (usercopy/spectre
   mitigation/etc).

 - Rework extables and get rid of anonymous out-of-line fixups.

 - Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
   Covert pte_val()/pXd_val() macros to functions.

 - Optimize kretprobe handling by avoiding extra kprobe on
   __kretprobe_trampoline.

 - Add support for CEX8 crypto cards.

 - Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.

 - Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
   group sections which simplifies kpatch support.

 - Always use the packed stack layout and extend kernel unwinder tests.

 - Add sanity checks for ftrace code patching.

 - Add s390dbf debug log for the vfio_ap device driver.

 - Various virtual vs physical address confusion fixes.

 - Various small fixes and improvements all over the code.

* tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
  s390/test_unwind: add kretprobe tests
  s390/kprobes: Avoid additional kprobe in kretprobe handling
  s390: convert ".insn" encoding to instruction names
  s390: assume stckf is always present
  s390/nospec: move to single register thunks
  s390: raise minimum supported machine generation to z10
  s390/uaccess: Add copy_from/to_user_key functions
  s390/nospec: align and size extern thunks
  s390/nospec: add an option to use thunk-extern
  s390/nospec: generate single register thunks if possible
  s390/pci: make zpci_set_irq()/zpci_clear_irq() static
  s390: remove unused expoline to BC instructions
  s390/irq: use assignment instead of cast
  s390/traps: get rid of magic cast for per code
  s390/traps: get rid of magic cast for program interruption code
  s390/signal: fix typo in comments
  s390/asm-offsets: remove unused defines
  s390/test_unwind: avoid build warning with W=1
  s390: remove .fixup section
  s390/bpf: encode register within extable entry
  ...
2022-03-25 10:01:34 -07:00
Thomas Richter d0a0a51149 perf stat: Fix forked applications enablement of counters
I have run into the following issue:

 # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7

 Performance counter stats for 'system wide':

                 0      new_pmu/INSTRUCTION_7/

       0.000366428 seconds time elapsed
 #

The new PMU for s390 counts the execution of certain CPU instructions.
The root cause is the extremely small run time of the mytest program. It
just executes some assembly instructions and then exits.

In above invocation the instruction is executed exactly one time (-c1
option). The PMU is expected to report this one time execution by a
counter value of one, but fails to do so in some cases, not all.

Debugging reveals the invocation of the child process is done
*before* the counter events are installed and enabled.

Tracing reveals that sometimes the child process starts and exits before
the event is installed on all CPUs. The more CPUs the machine has, the
more often this miscount happens.

Fix this by reversing the start of the work load after the events have
been installed on the specified CPUs. Now the comment also matches the
code.

Output after:

 # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7

 Performance counter stats for 'system wide':

                 1      new_pmu/INSTRUCTION_7/

       0.000366428 seconds time elapsed
 #

Now the correct result is reported rock solid all the time regardless
how many CPUs are online.

Reviewers notes:

Jiri:

Right, without -a the event has enable_on_exec so the race does not
matter, but it's a problem for system wide with fork.

Namhyung:

Agreed. Also we may move the enable_counters() and the clock code out of
the if block to be shared with the else block.

Fixes: acf2892270 ("perf stat: Use perf_evlist__prepare/start_workload()")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220317155346.577384-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-24 17:36:54 -03:00
Linus Torvalds 169e77764a Networking changes for 5.18.
Core
 ----
 
  - Introduce XDP multi-buffer support, allowing the use of XDP with
    jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
 
  - Speed up netns dismantling (5x) and lower the memory cost a little.
    Remove unnecessary per-netns sockets. Scope some lists to a netns.
    Cut down RCU syncing. Use batch methods. Allow netdev registration
    to complete out of order.
 
  - Support distinguishing timestamp types (ingress vs egress) and
    maintaining them across packet scrubbing points (e.g. redirect).
 
  - Continue the work of annotating packet drop reasons throughout
    the stack.
 
  - Switch netdev error counters from an atomic to dynamically
    allocated per-CPU counters.
 
  - Rework a few preempt_disable(), local_irq_save() and busy waiting
    sections problematic on PREEMPT_RT.
 
  - Extend the ref_tracker to allow catching use-after-free bugs.
 
 BPF
 ---
 
  - Introduce "packing allocator" for BPF JIT images. JITed code is
    marked read only, and used to be allocated at page granularity.
    Custom allocator allows for more efficient memory use, lower
    iTLB pressure and prevents identity mapping huge pages from
    getting split.
 
  - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
    the correct probe read access method, add appropriate helpers.
 
  - Convert the BPF preload to use light skeleton and drop
    the user-mode-driver dependency.
 
  - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
    its use as a packet generator.
 
  - Allow local storage memory to be allocated with GFP_KERNEL if called
    from a hook allowed to sleep.
 
  - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
    bits to come later).
 
  - Add unstable conntrack lookup helpers for BPF by using the BPF
    kfunc infra.
 
  - Allow cgroup BPF progs to return custom errors to user space.
 
  - Add support for AF_UNIX iterator batching.
 
  - Allow iterator programs to use sleepable helpers.
 
  - Support JIT of add, and, or, xor and xchg atomic ops on arm64.
 
  - Add BTFGen support to bpftool which allows to use CO-RE in kernels
    without BTF info.
 
  - Large number of libbpf API improvements, cleanups and deprecations.
 
 Protocols
 ---------
 
  - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
 
  - Adjust TSO packet sizes based on min_rtt, allowing very low latency
    links (data centers) to always send full-sized TSO super-frames.
 
  - Make IPv6 flow label changes (AKA hash rethink) more configurable,
    via sysctl and setsockopt. Distinguish between server and client
    behavior.
 
  - VxLAN support to "collect metadata" devices to terminate only
    configured VNIs. This is similar to VLAN filtering in the bridge.
 
  - Support inserting IPv6 IOAM information to a fraction of frames.
 
  - Add protocol attribute to IP addresses to allow identifying where
    given address comes from (kernel-generated, DHCP etc.)
 
  - Support setting socket and IPv6 options via cmsg on ping6 sockets.
 
  - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
    Define dscp_t and stop taking ECN bits into account in fib-rules.
 
  - Add support for locked bridge ports (for 802.1X).
 
  - tun: support NAPI for packets received from batched XDP buffs,
    doubling the performance in some scenarios.
 
  - IPv6 extension header handling in Open vSwitch.
 
  - Support IPv6 control message load balancing in bonding, prevent
    neighbor solicitation and advertisement from using the wrong port.
    Support NS/NA monitor selection similar to existing ARP monitor.
 
  - SMC
    - improve performance with TCP_CORK and sendfile()
    - support auto-corking
    - support TCP_NODELAY
 
  - MCTP (Management Component Transport Protocol)
    - add user space tag control interface
    - I2C binding driver (as specified by DMTF DSP0237)
 
  - Multi-BSSID beacon handling in AP mode for WiFi.
 
  - Bluetooth:
    - handle MSFT Monitor Device Event
    - add MGMT Adv Monitor Device Found/Lost events
 
  - Multi-Path TCP:
    - add support for the SO_SNDTIMEO socket option
    - lots of selftest cleanups and improvements
 
  - Increase the max PDU size in CAN ISOTP to 64 kB.
 
 Driver API
 ----------
 
  - Add HW counters for SW netdevs, a mechanism for devices which
    offload packet forwarding to report packet statistics back to
    software interfaces such as tunnels.
 
  - Select the default NIC queue count as a fraction of number of
    physical CPU cores, instead of hard-coding to 8.
 
  - Expose devlink instance locks to drivers. Allow device layer of
    drivers to use that lock directly instead of creating their own
    which always runs into ordering issues in devlink callbacks.
 
  - Add header/data split indication to guide user space enabling
    of TCP zero-copy Rx.
 
  - Allow configuring completion queue event size.
 
  - Refactor page_pool to enable fragmenting after allocation.
 
  - Add allocation and page reuse statistics to page_pool.
 
  - Improve Multiple Spanning Trees support in the bridge to allow
    reuse of topologies across VLANs, saving HW resources in switches.
 
  - DSA (Distributed Switch Architecture):
    - replay and offload of host VLAN entries
    - offload of static and local FDB entries on LAG interfaces
    - FDB isolation and unicast filtering
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - LAN937x T1 PHYs
    - Davicom DM9051 SPI NIC driver
    - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
    - Microchip ksz8563 switches
    - Netronome NFP3800 SmartNICs
    - Fungible SmartNICs
    - MediaTek MT8195 switches
 
  - WiFi:
    - mt76: MediaTek mt7916
    - mt76: MediaTek mt7921u USB adapters
    - brcmfmac: Broadcom BCM43454/6
 
  - Mobile:
    - iosm: Intel M.2 7360 WWAN card
 
 Drivers
 -------
 
  - Convert many drivers to the new phylink API built for split PCS
    designs but also simplifying other cases.
 
  - Intel Ethernet NICs:
    - add TTY for GNSS module for E810T device
    - improve AF_XDP performance
    - GTP-C and GTP-U filter offload
    - QinQ VLAN support
 
  - Mellanox Ethernet NICs (mlx5):
    - support xdp->data_meta
    - multi-buffer XDP
    - offload tc push_eth and pop_eth actions
 
  - Netronome Ethernet NICs (nfp):
    - flow-independent tc action hardware offload (police / meter)
    - AF_XDP
 
  - Other Ethernet NICs:
    - at803x: fiber and SFP support
    - xgmac: mdio: preamble suppression and custom MDC frequencies
    - r8169: enable ASPM L1.2 if system vendor flags it as safe
    - macb/gem: ZynqMP SGMII
    - hns3: add TX push mode
    - dpaa2-eth: software TSO
    - lan743x: multi-queue, mdio, SGMII, PTP
    - axienet: NAPI and GRO support
 
  - Mellanox Ethernet switches (mlxsw):
    - source and dest IP address rewrites
    - RJ45 ports
 
  - Marvell Ethernet switches (prestera):
    - basic routing offload
    - multi-chain TC ACL offload
 
  - NXP embedded Ethernet switches (ocelot & felix):
    - PTP over UDP with the ocelot-8021q DSA tagging protocol
    - basic QoS classification on Felix DSA switch using dcbnl
    - port mirroring for ocelot switches
 
  - Microchip high-speed industrial Ethernet (sparx5):
    - offloading of bridge port flooding flags
    - PTP Hardware Clock
 
  - Other embedded switches:
    - lan966x: PTP Hardward Clock
    - qca8k: mdio read/write operations via crafted Ethernet packets
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
    - enable RX PPDU stats in monitor co-exist mode
 
  - Intel WiFi (iwlwifi):
    - UHB TAS enablement via BIOS
    - band disablement via BIOS
    - channel switch offload
    - 32 Rx AMPDU sessions in newer devices
 
  - MediaTek WiFi (mt76):
    - background radar detection
    - thermal management improvements on mt7915
    - SAR support for more mt76 platforms
    - MBSSID and 6 GHz band on mt7915
 
  - RealTek WiFi:
    - rtw89: AP mode
    - rtw89: 160 MHz channels and 6 GHz band
    - rtw89: hardware scan
 
  - Bluetooth:
    - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
 
  - Microchip CAN (mcp251xfd):
    - multiple RX-FIFOs and runtime configurable RX/TX rings
    - internal PLL, runtime PM handling simplification
    - improve chip detection and error handling after wakeup
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
 IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
 FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
 AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
 KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
 K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
 cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
 krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
 dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
 =ELpe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "The sprinkling of SPI drivers is because we added a new one and Mark
  sent us a SPI driver interface conversion pull request.

  Core
  ----

   - Introduce XDP multi-buffer support, allowing the use of XDP with
     jumbo frame MTUs and combination with Rx coalescing offloads (LRO).

   - Speed up netns dismantling (5x) and lower the memory cost a little.
     Remove unnecessary per-netns sockets. Scope some lists to a netns.
     Cut down RCU syncing. Use batch methods. Allow netdev registration
     to complete out of order.

   - Support distinguishing timestamp types (ingress vs egress) and
     maintaining them across packet scrubbing points (e.g. redirect).

   - Continue the work of annotating packet drop reasons throughout the
     stack.

   - Switch netdev error counters from an atomic to dynamically
     allocated per-CPU counters.

   - Rework a few preempt_disable(), local_irq_save() and busy waiting
     sections problematic on PREEMPT_RT.

   - Extend the ref_tracker to allow catching use-after-free bugs.

  BPF
  ---

   - Introduce "packing allocator" for BPF JIT images. JITed code is
     marked read only, and used to be allocated at page granularity.
     Custom allocator allows for more efficient memory use, lower iTLB
     pressure and prevents identity mapping huge pages from getting
     split.

   - Make use of BTF type annotations (e.g. __user, __percpu) to enforce
     the correct probe read access method, add appropriate helpers.

   - Convert the BPF preload to use light skeleton and drop the
     user-mode-driver dependency.

   - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
     its use as a packet generator.

   - Allow local storage memory to be allocated with GFP_KERNEL if
     called from a hook allowed to sleep.

   - Introduce fprobe (multi kprobe) to speed up mass attachment (arch
     bits to come later).

   - Add unstable conntrack lookup helpers for BPF by using the BPF
     kfunc infra.

   - Allow cgroup BPF progs to return custom errors to user space.

   - Add support for AF_UNIX iterator batching.

   - Allow iterator programs to use sleepable helpers.

   - Support JIT of add, and, or, xor and xchg atomic ops on arm64.

   - Add BTFGen support to bpftool which allows to use CO-RE in kernels
     without BTF info.

   - Large number of libbpf API improvements, cleanups and deprecations.

  Protocols
  ---------

   - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.

   - Adjust TSO packet sizes based on min_rtt, allowing very low latency
     links (data centers) to always send full-sized TSO super-frames.

   - Make IPv6 flow label changes (AKA hash rethink) more configurable,
     via sysctl and setsockopt. Distinguish between server and client
     behavior.

   - VxLAN support to "collect metadata" devices to terminate only
     configured VNIs. This is similar to VLAN filtering in the bridge.

   - Support inserting IPv6 IOAM information to a fraction of frames.

   - Add protocol attribute to IP addresses to allow identifying where
     given address comes from (kernel-generated, DHCP etc.)

   - Support setting socket and IPv6 options via cmsg on ping6 sockets.

   - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
     Define dscp_t and stop taking ECN bits into account in fib-rules.

   - Add support for locked bridge ports (for 802.1X).

   - tun: support NAPI for packets received from batched XDP buffs,
     doubling the performance in some scenarios.

   - IPv6 extension header handling in Open vSwitch.

   - Support IPv6 control message load balancing in bonding, prevent
     neighbor solicitation and advertisement from using the wrong port.
     Support NS/NA monitor selection similar to existing ARP monitor.

   - SMC
      - improve performance with TCP_CORK and sendfile()
      - support auto-corking
      - support TCP_NODELAY

   - MCTP (Management Component Transport Protocol)
      - add user space tag control interface
      - I2C binding driver (as specified by DMTF DSP0237)

   - Multi-BSSID beacon handling in AP mode for WiFi.

   - Bluetooth:
      - handle MSFT Monitor Device Event
      - add MGMT Adv Monitor Device Found/Lost events

   - Multi-Path TCP:
      - add support for the SO_SNDTIMEO socket option
      - lots of selftest cleanups and improvements

   - Increase the max PDU size in CAN ISOTP to 64 kB.

  Driver API
  ----------

   - Add HW counters for SW netdevs, a mechanism for devices which
     offload packet forwarding to report packet statistics back to
     software interfaces such as tunnels.

   - Select the default NIC queue count as a fraction of number of
     physical CPU cores, instead of hard-coding to 8.

   - Expose devlink instance locks to drivers. Allow device layer of
     drivers to use that lock directly instead of creating their own
     which always runs into ordering issues in devlink callbacks.

   - Add header/data split indication to guide user space enabling of
     TCP zero-copy Rx.

   - Allow configuring completion queue event size.

   - Refactor page_pool to enable fragmenting after allocation.

   - Add allocation and page reuse statistics to page_pool.

   - Improve Multiple Spanning Trees support in the bridge to allow
     reuse of topologies across VLANs, saving HW resources in switches.

   - DSA (Distributed Switch Architecture):
      - replay and offload of host VLAN entries
      - offload of static and local FDB entries on LAG interfaces
      - FDB isolation and unicast filtering

  New hardware / drivers
  ----------------------

   - Ethernet:
      - LAN937x T1 PHYs
      - Davicom DM9051 SPI NIC driver
      - Realtek RTL8367S, RTL8367RB-VB switch and MDIO
      - Microchip ksz8563 switches
      - Netronome NFP3800 SmartNICs
      - Fungible SmartNICs
      - MediaTek MT8195 switches

   - WiFi:
      - mt76: MediaTek mt7916
      - mt76: MediaTek mt7921u USB adapters
      - brcmfmac: Broadcom BCM43454/6

   - Mobile:
      - iosm: Intel M.2 7360 WWAN card

  Drivers
  -------

   - Convert many drivers to the new phylink API built for split PCS
     designs but also simplifying other cases.

   - Intel Ethernet NICs:
      - add TTY for GNSS module for E810T device
      - improve AF_XDP performance
      - GTP-C and GTP-U filter offload
      - QinQ VLAN support

   - Mellanox Ethernet NICs (mlx5):
      - support xdp->data_meta
      - multi-buffer XDP
      - offload tc push_eth and pop_eth actions

   - Netronome Ethernet NICs (nfp):
      - flow-independent tc action hardware offload (police / meter)
      - AF_XDP

   - Other Ethernet NICs:
      - at803x: fiber and SFP support
      - xgmac: mdio: preamble suppression and custom MDC frequencies
      - r8169: enable ASPM L1.2 if system vendor flags it as safe
      - macb/gem: ZynqMP SGMII
      - hns3: add TX push mode
      - dpaa2-eth: software TSO
      - lan743x: multi-queue, mdio, SGMII, PTP
      - axienet: NAPI and GRO support

   - Mellanox Ethernet switches (mlxsw):
      - source and dest IP address rewrites
      - RJ45 ports

   - Marvell Ethernet switches (prestera):
      - basic routing offload
      - multi-chain TC ACL offload

   - NXP embedded Ethernet switches (ocelot & felix):
      - PTP over UDP with the ocelot-8021q DSA tagging protocol
      - basic QoS classification on Felix DSA switch using dcbnl
      - port mirroring for ocelot switches

   - Microchip high-speed industrial Ethernet (sparx5):
      - offloading of bridge port flooding flags
      - PTP Hardware Clock

   - Other embedded switches:
      - lan966x: PTP Hardward Clock
      - qca8k: mdio read/write operations via crafted Ethernet packets

   - Qualcomm 802.11ax WiFi (ath11k):
      - add LDPC FEC type and 802.11ax High Efficiency data in radiotap
      - enable RX PPDU stats in monitor co-exist mode

   - Intel WiFi (iwlwifi):
      - UHB TAS enablement via BIOS
      - band disablement via BIOS
      - channel switch offload
      - 32 Rx AMPDU sessions in newer devices

   - MediaTek WiFi (mt76):
      - background radar detection
      - thermal management improvements on mt7915
      - SAR support for more mt76 platforms
      - MBSSID and 6 GHz band on mt7915

   - RealTek WiFi:
      - rtw89: AP mode
      - rtw89: 160 MHz channels and 6 GHz band
      - rtw89: hardware scan

   - Bluetooth:
      - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)

   - Microchip CAN (mcp251xfd):
      - multiple RX-FIFOs and runtime configurable RX/TX rings
      - internal PLL, runtime PM handling simplification
      - improve chip detection and error handling after wakeup"

* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
  llc: fix netdevice reference leaks in llc_ui_bind()
  drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
  ice: don't allow to run ice_send_event_to_aux() in atomic ctx
  ice: fix 'scheduling while atomic' on aux critical err interrupt
  net/sched: fix incorrect vlan_push_eth dest field
  net: bridge: mst: Restrict info size queries to bridge ports
  net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
  drivers: net: xgene: Fix regression in CRC stripping
  net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
  net: dsa: fix missing host-filtered multicast addresses
  net/mlx5e: Fix build warning, detected write beyond size of field
  iwlwifi: mvm: Don't fail if PPAG isn't supported
  selftests/bpf: Fix kprobe_multi test.
  Revert "rethook: x86: Add rethook x86 implementation"
  Revert "arm64: rethook: Add arm64 rethook implementation"
  Revert "powerpc: Add rethook support"
  Revert "ARM: rethook: Add rethook arm implementation"
  netdevice: add missing dm_private kdoc
  net: bridge: mst: prevent NULL deref in br_mst_info_size()
  selftests: forwarding: Use same VRF for port and VLAN upper
  ...
2022-03-24 13:13:26 -07:00
Linus Torvalds 194dfe88d6 asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree:
 
  - The set_fs()/get_fs() infrastructure gets removed for good. This
    was already gone from all major architectures, but now we can
    finally remove it everywhere, which loses some particularly
    tricky and error-prone code.
    There is a small merge conflict against a parisc cleanup, the
    solution is to use their new version.
 
  - The nds32 architecture ends its tenure in the Linux kernel. The
    hardware is still used and the code is in reasonable shape, but
    the mainline port is not actively maintained any more, as all
    remaining users are thought to run vendor kernels that would never
    be updated to a future release.
    There are some obvious conflicts against changes to the removed
    files.
 
  - A series from Masahiro Yamada cleans up some of the uapi header
    files to pass the compile-time checks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
 GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
 IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
 cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
 DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
 G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
 a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
 ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
 CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
 h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
 =vtCN
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "There are three sets of updates for 5.18 in the asm-generic tree:

   - The set_fs()/get_fs() infrastructure gets removed for good.

     This was already gone from all major architectures, but now we can
     finally remove it everywhere, which loses some particularly tricky
     and error-prone code. There is a small merge conflict against a
     parisc cleanup, the solution is to use their new version.

   - The nds32 architecture ends its tenure in the Linux kernel.

     The hardware is still used and the code is in reasonable shape, but
     the mainline port is not actively maintained any more, as all
     remaining users are thought to run vendor kernels that would never
     be updated to a future release.

   - A series from Masahiro Yamada cleans up some of the uapi header
     files to pass the compile-time checks"

* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
  nds32: Remove the architecture
  uaccess: remove CONFIG_SET_FS
  ia64: remove CONFIG_SET_FS support
  sh: remove CONFIG_SET_FS support
  sparc64: remove CONFIG_SET_FS support
  lib/test_lockup: fix kernel pointer check for separate address spaces
  uaccess: generalize access_ok()
  uaccess: fix type mismatch warnings from access_ok()
  arm64: simplify access_ok()
  m68k: fix access_ok for coldfire
  MIPS: use simpler access_ok()
  MIPS: Handle address errors for accesses above CPU max virtual user address
  uaccess: add generic __{get,put}_kernel_nofault
  nios2: drop access_ok() check from __put_user()
  x86: use more conventional access_ok() definition
  x86: remove __range_not_ok()
  sparc64: add __{get,put}_kernel_nofault()
  nds32: fix access_ok() checks in get/put_user
  uaccess: fix nios2 and microblaze get_user_8()
  sparc64: fix building assembly files
  ...
2022-03-23 18:03:08 -07:00