Commit 6320e693d9 ("powerpc/perf: Add support for caps under sysfs in
powerpc") added support for caps under sysfs in powerpc. This added caps
directory to: /sys/bus/event_source/devices/cpu/ for power8, power9,
power10 and generic compat PMU in respective PMU driver code.
For power10, it is added under "power10_pmu_attr_groups". But
for DD1 version, attr_groups are defined under dd1 array:
"power10_pmu_attr_groups_dd1". Since caps is not added for DD1,
it fails to include "cpu/caps" in DD1 model.
The issue was observed while booting power10 pseries with qemu version
6, but not observed with qemu version 7. This is because qemu version 7
uses a DD 2.0 CPU model.
Below is the trace log:
Can't update unknown attr grp name: cpu/caps^M
------------[ cut here ]------------^M
Failed to register pmu: cpu, reason -22^M
WARNING: CPU: 1 PID: 1 at kernel/events/core.c:13427 perf_event_sysfs_init+0xbc/0x108^M
Modules linked in:^M
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc2-00111-g6320e693d98c #148^M
NIP: c0000000020391f4 LR: c0000000020391f0 CTR: c0000000008c9c30^M
REGS: c0000000044c38c0 TRAP: 0700 Not tainted (5.19.0-rc2-00111-g6320e693d98c)^M
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 48000281 XER: 20040000^M
CFAR: c00000000013feac IRQMASK: 0 ^M
GPR00: c0000000020391f0 c0000000044c3b60 c00000000283db00 0000000000000027 ^M
GPR04: 80000000ffffe0a8 0000000000000000 0000000000000004 00000000fdcd0000 ^M
GPR08: 0000000000000027 c0000000ffe07e08 0000000000000001 0000000000000000 ^M
GPR12: c00000000035dd90 c0000000fffff300 c000000000012478 0000000000000000 ^M
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ^M
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ^M
GPR24: c000000002003480 0000000000000007 c0000000012a78d0 c000000001170a80 ^M
GPR28: c0000000026c4df8 c0000000026c4e68 0000000000000000 c0000000025a8628 ^M
NIP [c0000000020391f4] perf_event_sysfs_init+0xbc/0x108^M
LR [c0000000020391f0] perf_event_sysfs_init+0xb8/0x108^M
Call Trace:^M
[c0000000044c3b60] [c0000000020391f0] perf_event_sysfs_init+0xb8/0x108 (unreliable)^M
[c0000000044c3bf0] [c000000000011ec4] do_one_initcall+0x64/0x2d0^M
[c0000000044c3cd0] [c0000000020049fc] kernel_init_freeable+0x338/0x3e0^M
[c0000000044c3db0] [c0000000000124a0] kernel_init+0x30/0x1a0^M
[c0000000044c3e10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64^M
Instruction dump:^M
813f0038 2c090000 4180002c 7fe3fb78 4a3280c5 2c030000 7c651b78 41820018 ^M
e89f0030 7f63db78 4a106c59 60000000 <0fe00000> ebff0000 4bffffb4 39200001 ^M
---[ end trace 0000000000000000 ]---^M
Fix it by adding caps for dd1 attr_groups in power10 PMU driver.
Fixes: 6320e693d9 ("powerpc/perf: Add support for caps under sysfs in powerpc")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Update change log to mention qemu 7 DD2.0 CPU model]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220728163746.85062-1-atrajeev@linux.vnet.ibm.com
Currently the perf CPU backend drivers detect what CPU they're on using
cur_cpu_spec->oprofile_cpu_type.
Although that works, it's a bit crufty to be using oprofile related fields,
especially seeing as oprofile is more or less unused these days.
It also means perf is reliant on the fragile logic in setup_cpu_spec()
which detects when we're using a logical PVR and copies back the PMU
related fields from the raw CPU entry. So lets check the PVR directly.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[chleroy: Added power10 and fixed checkpatch issues]
Reviewed-and-tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-and-tested-By: Kajol Jain <kjain@linux.ibm.com> [For 24x7 side changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20c0ee7f99dbf0dbf8658df6b39f84753e6db1ef.1657204631.git.christophe.leroy@csgroup.eu
Add caps support under "/sys/bus/event_source/devices/<pmu>/"
for powerpc. This directory can be used to expose some of the
specific features that powerpc PMU supports to the user.
Example: pmu_name. The name of PMU registered will depend on
platform, say power9 or power10 or it could be Generic Compat
PMU.
Currently the only way to know which is the registered
PMU is from the dmesg logs. But clearing the dmesg will make it
difficult to know exact PMU backend used. And even extracting
from dmesg will be complicated, as we need to parse the dmesg
logs and add filters for pmu name. Whereas by exposing it via
caps will make it easy as we just need to directly read it from
the sysfs.
Add a caps directory to /sys/bus/event_source/devices/cpu/
for power8, power9, power10 and generic compat PMU in respective
PMU driver code. Update the pmu_name file under caps folder
in core-book3s using "attr_update".
The information exposed currently:
- pmu_name : Underlying PMU name from the driver
Example result with power9 pmu:
# ls /sys/bus/event_source/devices/cpu/caps
pmu_name
# cat /sys/bus/event_source/devices/cpu/caps/pmu_name
POWER9
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220520084630.15181-1-atrajeev@linux.vnet.ibm.com
When booting on a machine that uses the compat pmu driver we see this:
[ 0.071192] GENERIC_COMPAT performance monitor hardware support registered
Which is a bit shouty. Give it a nicer name.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220610044006.2095806-1-joel@jms.id.au
struct perf_event_attr supports exclude counting of idle task.
This is sent to kernel via perf_event_attr.exclude_idle and
in perf tool, user can use ":I" event modifier to enable this
for specific event.
Monitor Mode Control Register 2 (MMCR2) SPR has control bits
for each PMCs to freeze counting based on the Control Register
CTRL[RUN] state. CTRL[RUN] is not set when idle task is
running. Patch adds a check for event attr.exclude_idle to
set MMCR2[FCnWAIT] bit.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210429050208.266619-1-maddy@linux.ibm.com
commit 2c9ac51b85 ("powerpc/perf: Fix PMU callbacks to clear
pending PMI before resetting an overflown PMC") added a new
function "pmi_irq_pending" in hw_irq.h. This function is to check
if there is a PMI marked as pending in Paca (PACA_IRQ_PMI).This is
used in power_pmu_disable in a WARN_ON. The intention here is to
provide a warning if there is PMI pending, but no counter is found
overflown.
During some of the perf runs, below warning is hit:
WARNING: CPU: 36 PID: 0 at arch/powerpc/perf/core-book3s.c:1332 power_pmu_disable+0x25c/0x2c0
Modules linked in:
-----
NIP [c000000000141c3c] power_pmu_disable+0x25c/0x2c0
LR [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0
Call Trace:
[c000000baffcfb90] [c000000000141c8c] power_pmu_disable+0x2ac/0x2c0 (unreliable)
[c000000baffcfc10] [c0000000003e2f8c] perf_pmu_disable+0x4c/0x60
[c000000baffcfc30] [c0000000003e3344] group_sched_out.part.124+0x44/0x100
[c000000baffcfc80] [c0000000003e353c] __perf_event_disable+0x13c/0x240
[c000000baffcfcd0] [c0000000003dd334] event_function+0xc4/0x140
[c000000baffcfd20] [c0000000003d855c] remote_function+0x7c/0xa0
[c000000baffcfd50] [c00000000026c394] flush_smp_call_function_queue+0xd4/0x300
[c000000baffcfde0] [c000000000065b24] smp_ipi_demux_relaxed+0xa4/0x100
[c000000baffcfe20] [c0000000000cb2b0] xive_muxed_ipi_action+0x20/0x40
[c000000baffcfe40] [c000000000207c3c] __handle_irq_event_percpu+0x8c/0x250
[c000000baffcfee0] [c000000000207e2c] handle_irq_event_percpu+0x2c/0xa0
[c000000baffcff10] [c000000000210a04] handle_percpu_irq+0x84/0xc0
[c000000baffcff40] [c000000000205f14] generic_handle_irq+0x54/0x80
[c000000baffcff60] [c000000000015740] __do_irq+0x90/0x1d0
[c000000baffcff90] [c000000000016990] __do_IRQ+0xc0/0x140
[c0000009732f3940] [c000000bafceaca8] 0xc000000bafceaca8
[c0000009732f39d0] [c000000000016b78] do_IRQ+0x168/0x1c0
[c0000009732f3a00] [c0000000000090c8] hardware_interrupt_common_virt+0x218/0x220
This means that there is no PMC overflown among the active events
in the PMU, but there is a PMU pending in Paca. The function
"any_pmc_overflown" checks the PMCs on active events in
cpuhw->n_events. Code snippet:
<<>>
if (any_pmc_overflown(cpuhw))
clear_pmi_irq_pending();
else
WARN_ON(pmi_irq_pending());
<<>>
Here the PMC overflown is not from active event. Example: When we do
perf record, default cycles and instructions will be running on PMC6
and PMC5 respectively. It could happen that overflowed event is currently
not active and pending PMI is for the inactive event. Debug logs from
trace_printk:
<<>>
any_pmc_overflown: idx is 5: pmc value is 0xd9a
power_pmu_disable: PMC1: 0x0, PMC2: 0x0, PMC3: 0x0, PMC4: 0x0, PMC5: 0xd9a, PMC6: 0x80002011
<<>>
Here active PMC (from idx) is PMC5 , but overflown PMC is PMC6(0x80002011).
When we handle PMI interrupt for such cases, if the PMC overflown is
from inactive event, it will be ignored. Reference commit:
commit bc09c219b2 ("powerpc/perf: Fix finding overflowed PMC in interrupt")
Patch addresses two changes:
1) Fix 1 : Removal of warning ( WARN_ON(pmi_irq_pending()); )
We were printing warning if no PMC is found overflown among active PMU
events, but PMI pending in PACA. But this could happen in cases where
PMC overflown is not in active PMC. An inactive event could have caused
the overflow. Hence the warning is not needed. To know pending PMI is
from an inactive event, we need to loop through all PMC's which will
cause more SPR reads via mfspr and increase in context switch. Also in
existing function: perf_event_interrupt, already we ignore PMI's
overflown when it is from an inactive PMC.
2) Fix 2: optimization in clearing pending PMI.
Currently we check for any active PMC overflown before clearing PMI
pending in Paca. This is causing additional SPR read also. From point 1,
we know that if PMI pending in Paca from inactive cases, that is going
to be ignored during replay. Hence if there is pending PMI in Paca, just
clear it irrespective of PMC overflown or not.
In summary, remove the any_pmc_overflown check entirely in
power_pmu_disable. ie If there is a pending PMI in Paca, clear it, since
we are in pmu_disable. There could be cases where PMI is pending because
of inactive PMC ( which later when replayed also will get ignored ), so
WARN_ON could give false warning. Hence removing it.
Fixes: 2c9ac51b85 ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220522142256.24699-1-atrajeev@linux.vnet.ibm.com
Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9).
When scheduling events as a group, all events in that group should
match value in threshold bits (like thresh compare, thresh control,
thresh select). Otherwise event open for the sibling events should fail.
But in the current code, incase thresh compare bits are not valid,
we are not failing in group_constraint function which can result
in invalid group schduling.
Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid.
Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and
PM_THRESH_MET (8734340101ec) is expected to fail as both event
request different thresh control bits and invalid thresh compare value.
Result before the patch changes:
[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
Performance counter stats for 'sleep 1':
11,048 r8735340401e0
1,967 r8734340101ec
1.001354036 seconds time elapsed
0.001421000 seconds user
0.000000000 seconds sys
Result after the patch changes:
[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (r8735340401e0).
/bin/dmesg | grep -i perf may provide additional information.
Fixes: 78a16d9fc1 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506061015.43916-2-kjain@linux.ibm.com
Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 8-18 bits for power10).
When scheduling events as a group, all events in that group should
match value in threshold bits. Otherwise event open for the sibling
events should fail. But in the current code, incase thresh compare bits are
not valid, we are not failing in group_constraint function which can result
in invalid group schduling.
Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid in group_constraint function.
Patch also fixes the p10_thresh_cmp_val function to return -1,
incase threshold bits are not valid and changes corresponding check in
is_thresh_cmp_valid function to return false only when the thresh_cmp
value is less then 0.
Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (3534401e0) and
PM_THRESH_MET (34340101ec) is expected to fail as both event
request different thresh control bits.
Result before the patch changes:
[command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1
Performance counter stats for 'sleep 1':
8,482 r35340401e0
0 r34340101ec
1.001474838 seconds time elapsed
0.001145000 seconds user
0.000000000 seconds sys
Result after the patch changes:
[command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1
Performance counter stats for 'sleep 1':
<not counted> r35340401e0
<not supported> r34340101ec
1.001499607 seconds time elapsed
0.000204000 seconds user
0.000760000 seconds sys
Fixes: 82d2c16b35 ("powerpc/perf: Adds support for programming of Thresholding in P10")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506061015.43916-1-kjain@linux.ibm.com
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmJlxloeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGBoIH/3b1GGuTBq8XndVl
1EaCJe/3auE8cHklNpLyTWsQY7He9CcIe4b0fGmtkUlwqAE6E5fUPfqwzjjb3eux
MDPYYKbnjm/jA73dbTbr3lxMlc/caZpuRrwwuek0+vS0DLYhP917NmDvGX8q3l5U
84RCEHztrTmzOivS0BwNJV1XFpcqnODTDN4zNR43o9ZY9tVY4/OqL0+lcQIHM2Nh
6urEzWMMi+BRGaOqdgtt+NxmgKQNTRAkPan6FpJloRSxrOzl4LiYBKMfE2iyisND
91i1CGvOQjaQNIw9JYNvtSWGawMb+obTyCFHyj1Qm7LwD0VAZ+FQrvrdGz4rJrAY
Sq01XYs=
=Rl6l
-----END PGP SIGNATURE-----
Merge tag 'v5.18-rc4' into next
Merge master into next, to bring in commit 5f24d5a579 ("mm, hugetlb:
allow for "high" userspace addresses"), which is needed as a
prerequisite for the series converting powerpc to the generic mmap
logic.
Replace the 'unsigned' with 'unsigned int' which is more accurate.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
[chleroy: Fixed parenthesis alignment]
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210729115252.40895-1-wangborong@cdjrlc.com
Build kernel with `C=2`:
arch/powerpc/perf/isa207-common.c:24:18: warning: symbol
'isa207_pmu_format_attr' was not declared. Should it be static?
arch/powerpc/perf/power9-pmu.c:101:5: warning: symbol 'p9_dd21_bl_ev'
was not declared. Should it be static?
arch/powerpc/perf/power9-pmu.c:115:5: warning: symbol 'p9_dd22_bl_ev'
was not declared. Should it be static?
Those symbols are used only in the files that define them so we declare
them as static to fix the warnings.
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200923071453.2540-1-wangwensheng4@huawei.com
The "read_bhrb" global symbol is only called under CONFIG_PPC64 of
arch/powerpc/perf/core-book3s.c but it is compiled for both 32 and 64 bit
anyway (and LLVM fails to link this on 32bit).
This fixes it by moving bhrb.o to obj64 targets.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220421025756.571995-1-aik@ozlabs.ru
When scheduling a group of events, there are constraint checks done to
make sure all events can go in a group. Example, one of the criteria is
that events in a group cannot use the same PMC. But platform specific
PMU supports alternative event for some of the event codes. During
perf_event_open(), if any event group doesn't match constraint check
criteria, further lookup is done to find alternative event.
By current design, the array of alternatives events in PMU code is
expected to be sorted by column 0. This is because in
find_alternative() the return criteria is based on event code
comparison. ie. "event < ev_alt[i][0])". This optimisation is there
since find_alternative() can be called multiple times. In power10 PMU
code, the alternative event array is not sorted properly and hence there
is breakage in finding alternative event.
To work with existing logic, fix the alternative event array to be
sorted by column 0 for power10-pmu.c
Results:
In case where an alternative event is not chosen when we could, events
will be multiplexed. ie, time sliced where it could actually run
concurrently.
Example, in power10 PM_INST_CMPL_ALT(0x00002) has alternative event,
PM_INST_CMPL(0x500fa). Without the fix, if a group of events with PMC1
to PMC4 is used along with PM_INST_CMPL_ALT, it will be time sliced
since all programmable PMC's are consumed already. But with the fix,
when it picks alternative event on PMC5, all events will run
concurrently.
Before:
# perf stat -e r00002,r100fc,r200fa,r300fc,r400fc
Performance counter stats for 'system wide':
328668935 r00002 (79.94%)
56501024 r100fc (79.95%)
49564238 r200fa (79.95%)
376 r300fc (80.19%)
660 r400fc (79.97%)
4.039150522 seconds time elapsed
With the fix, since alternative event is chosen to run on PMC6, events
will be run concurrently.
After:
# perf stat -e r00002,r100fc,r200fa,r300fc,r400fc
Performance counter stats for 'system wide':
23596607 r00002
4907738 r100fc
2283608 r200fa
135 r300fc
248 r400fc
1.664671390 seconds time elapsed
Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220419114828.89843-2-atrajeev@linux.vnet.ibm.com
When scheduling a group of events, there are constraint checks done to
make sure all events can go in a group. Example, one of the criteria is
that events in a group cannot use the same PMC. But platform specific
PMU supports alternative event for some of the event codes. During
perf_event_open(), if any event group doesn't match constraint check
criteria, further lookup is done to find alternative event.
By current design, the array of alternatives events in PMU code is
expected to be sorted by column 0. This is because in
find_alternative() the return criteria is based on event code
comparison. ie. "event < ev_alt[i][0])". This optimisation is there
since find_alternative() can be called multiple times. In power9 PMU
code, the alternative event array is not sorted properly and hence there
is breakage in finding alternative events.
To work with existing logic, fix the alternative event array to be
sorted by column 0 for power9-pmu.c
Results:
With alternative events, multiplexing can be avoided. That is, for
example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event,
PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be
programmed in a different PMC.
Before:
# perf stat -e r3e054,r300fc
Performance counter stats for 'system wide':
1057860 r3e054 (50.21%)
379 r300fc (49.79%)
0.944329741 seconds time elapsed
Since both the events are using PMC3 in this case, they are
multiplexed here.
After:
# perf stat -e r3e054,r300fc
Performance counter stats for 'system wide':
1006948 r3e054
182 r300fc
Fixes: 91e0bd1e62 ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220419114828.89843-1-atrajeev@linux.vnet.ibm.com
Hash faults are not resoved in NMI context, instead causing the access
to fail. This is done because perf interrupts can get backtraces
including walking the user stack, and taking a hash fault on those could
deadlock on the HPTE lock if the perf interrupt hits while the same HPTE
lock is being held by the hash fault code. The user-access for the stack
walking will notice the access failed and deal with that in the perf
code.
The reason to allow perf interrupts in is to better profile hash faults.
The problem with this is any hash fault on a kernel access that happens
in NMI context will crash, because kernel accesses must not fail.
Hard lockups, system reset, machine checks that access vmalloc space
including modules and including stack backtracing and symbol lookup in
modules, per-cpu data, etc could all run into this problem.
Fix this by disallowing perf interrupts in the hash fault code (the
direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs
to extend to the preload case). This simplifies the tricky logic in hash
faults and perf, at the cost of reduced profiling of hash faults.
perf can still latch addresses when interrupts are disabled, it just
won't get the stack trace at that point, so it would still find hot
spots, just sometimes with confusing stack chains.
An alternative could be to allow perf interrupts here but always do the
slowpath stack walk if we are in nmi context, but that slows down all
perf interrupt stack walking on hash though and it does not remove as
much tricky code.
Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com
Trace IMC (In-Memory collection counters) in powerpc is useful for
application level profiling.
For trace_imc, presently task context (task_ctx_nr) is set to
perf_hw_context. But perf_hw_context should only be used for CPU PMU.
See commit 2665784850 ("perf/core: Verify we have a single
perf_hw_context PMU").
So for trace_imc, even though it is per thread PMU, it is preferred to
use sw_context in order to be able to do application level monitoring.
Hence change the task_ctx_nr to use perf_sw_context.
Fixes: 012ae24484 ("powerpc/perf: Trace imc PMU functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Update subject & incorporate notes into change log, reflow comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com
power_pmu_wants_prompt_pmi() is used to decide if PMIs should be taken
promptly. This is valid only for ppc64 and is used only if
CONFIG_PPC_BOOK3S_64=y. Hence include the function under config check
for PPC64.
Fixes warning for 32-bit compilation:
arch/powerpc/perf/core-book3s.c:2455:6: warning: no previous prototype for 'power_pmu_wants_prompt_pmi'
2455 | bool power_pmu_wants_prompt_pmi(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 5a7745b96f ("powerpc/64s/perf: add power_pmu_wants_prompt_pmi to say whether perf wants PMIs to be soft-NMI")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move inside existing CONFIG_PPC64 ifdef block]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220114031355.87480-1-atrajeev@linux.vnet.ibm.com
Some functions defined in 'arch/powerpc/perf' are deserving of an
`__init` macro attribute. These functions are only called by other
initialization functions and therefore should inherit the attribute.
Also, change function declarations in header files to include `__init`.
Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-5-nick.child@ibm.com
Interrupt code enables MSR[EE] in some irq handlers while keeping local
irqs disabled via soft-mask, allowing PMI interrupts to be taken as
soft-NMI to improve profiling of irq handlers.
When perf is not enabled, there is no point to doing this, it's
additional overhead. So provide a function that can say if PMIs should
be taken promptly if possible.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-4-npiggin@gmail.com
The code represent memory/cache level data based on PERF_MEM_LVL_*
namespace, which is in the process of deprication in the favour of
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent cache/memory data based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields.
Add data source encodings to represent data coming from local
memory/Remote memory/distant memory and remote/distant cache hits.
In order to represent data coming from OpenCAPI cache/memory, we use
LVLNUM "PMEM" field which is used to present persistent memory accesses.
Result in power10 system with patch changes:
localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
# Overhead Samples Memory access Symbol Shared Object
# ........ ............ ........................ .......................... ................
#
29.46% 2331 L1 or L1 hit [.] __random libc-2.28.so
23.11% 2121 L1 or L1 hit [.] producer_populate_cache producer_consumer
18.56% 1758 L1 or L1 hit [.] __random_r libc-2.28.so
15.64% 1559 L2 or L2 hit [.] __random libc-2.28.so
.....
0.09% 5 Remote socket, same board Any cache hit [.] __random libc-2.28.so
0.07% 4 Remote socket, same board Any cache hit [.] __random libc-2.28.so
.....
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-5-kjain@linux.ibm.com
The code represent data coming from L1/L2/L3 cache hits based on
PERF_MEM_LVL_* namespace, which is in the process of deprecation in
the favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_}
fields.
Add data source encodings to represent L1/L2/L3 cache hits based on
newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_,HOPS_} fields for
power10 and older platforms
Result in power9 system without patch changes:
localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
# Overhead Samples Memory access Symbol Shared Object
# ........ ............ ........................ ................................. ................
#
29.51% 1 L2 hit [k] perf_event_exec [kernel.vmlinux]
27.05% 1 L1 hit [k] perf_ctx_unlock [kernel.vmlinux]
13.93% 1 L1 hit [k] vtime_delta [kernel.vmlinux]
13.11% 1 L1 hit [k] prepend_path.isra.11 [kernel.vmlinux]
8.20% 1 L1 hit [.] 00000038.plt_call.__GI_strlen libc-2.28.so
8.20% 1 L1 hit [k] perf_event_interrupt [kernel.vmlinux]
Result in power9 system with patch changes:
localhost:# ./perf mem report --sort="mem,sym,dso" --stdio
# Overhead Samples Memory access Symbol Shared Object
# ........ ............ ........................ .......................... ................
#
36.63% 1 L2 or L2 hit [k] perf_event_exec [kernel.vmlinux]
25.50% 1 L1 or L1 hit [k] vtime_delta [kernel.vmlinux]
13.12% 1 L1 or L1 hit [k] unmap_region [kernel.vmlinux]
12.62% 1 L1 or L1 hit [k] perf_sample_event_took [kernel.vmlinux]
6.93% 1 L1 or L1 hit [k] perf_ctx_unlock [kernel.vmlinux]
5.20% 1 L1 or L1 hit [.] __memcpy_power7 libc-2.28.so
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211206091749.87585-4-kjain@linux.ibm.com
Running perf fuzzer showed below in dmesg logs:
"Can't find PMC that caused IRQ"
This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.
Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:
During replay of interrupts like timer, __do_irq() and doorbell
exception, we conditionally enable interrupts via may_hard_irq_enable().
This could potentially create a window to generate a PMI. Since irq soft
mask is set to ALL_DISABLED, the PMI will get masked here. We could get
IPIs run before perf interrupt is replayed and the PMU events could
be deleted or stopped. This will change the PMU SPR values and resets
the counters. Snippet of ftrace log showing PMU callbacks invoked in
__do_irq():
<idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
<idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
<idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
<idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
<<>>
<idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
<idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
<idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
<idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
<idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
<idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
<idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
<idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
<idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
<idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
<idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
<idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
<<>>
<idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
<idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts
Case 2: PMI's masked during local_* operations, example local_add(). If
the local_add() operation happens within a local_irq_save(), replay of
PMI will be during local_irq_restore(). Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.
Fix it by updating the PMU callback function power_pmu_disable() to
check for pending perf interrupt. If there is an overflown PMC and
pending perf interrupt indicated in paca, clear the PMI bit in paca to
drop that sample. Clearing of PMI bit is done in power_pmu_disable()
since disable is invoked before any event gets deleted/stopped. With
this fix, if there are more than one event running in the PMU, there is
a chance that we clear the PMI bit for the event which is not getting
deleted/stopped. The other events may still remain active. Hence to make
sure we don't drop valid sample in such cases, another check is added in
power_pmu_enable. This checks if there is an overflown PMC found among
the active events and if so enable back the PMI bit. Two new helper
functions are introduced to clear/set the PMI, ie
clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
pmi_irq_pending() is introduced to give a warning if there is pending
PMI bit in paca, but no PMC is overflown.
Also there are corner cases which result in performance monitor
interrupts being triggered during power_pmu_disable(). This happens
since PMXE bit is not cleared along with disabling of other MMCR0 bits
in the pmu_disable. Such PMI's could leave the PMU running and could
trigger PMI again which will set MMCR0 PMAO bit. This could lead to
spurious interrupts in some corner cases. Example, a timer after
power_pmu_del() which will re-enable interrupts and triggers a PMI again
since PMAO bit is still set. But fails to find valid overflow since PMC
was cleared in power_pmu_del(). Fix that by disabling PMXE along with
disabling of other MMCR0 bits in power_pmu_disable().
We can't just replay PMI any time. Hence this approach is preferred
rather than replaying PMI before resetting overflown PMC. Patch also
documents core-book3s on a race condition which can trigger these PMC
messages during idle path in PowerNV.
Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
It can be useful in simulators (with very constrained environments)
to allow some PMCs to run from boot so they can be sampled directly
by a test harness, rather than having to run perf.
A previous change freezes counters at boot by default, so provide
a boot time option to un-freeze (plus a bit more flexibility).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-13-npiggin@gmail.com
- Enable STRICT_KERNEL_RWX for Freescale 85xx platforms.
- Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it to be disabled.
- Add support for out-of-line static calls on 32-bit.
- Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled.
- Fix boot hangs on e5500 due to stale value in ESR passed to do_page_fault().
- Fix several bugs on pseries in handling of device tree cache information for hotplugged
CPUs, and/or during partition migration.
- Various other small features and fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin, Andrew Donnellan,
Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le Goater, Christophe Leroy, Daniel
Axtens, Daniel Henrique Barboza, Denis Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo
A. R. Silva, Hari Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent
Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Naveen N.
Rao, Nicholas Piggin, Nick Desaulniers, Niklas Schnelle, Oliver O'Halloran, Rob Herring,
Russell Currey, Srikar Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant
Hegde, Wan Jiabing, Xiaoming Ni,
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGFDPoTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgNbsEACVczMVwMBEny5a7W1tqq1bnY9RFVw3
K+/rE7/FpSLX+RrwgoMkmqfPvfyc9WISVLlIQDKz4XhkBjaXv0+Y4OMsymuDCbxL
Qk7F1ff22UfLmPjjJk39gHJ8QZQqZk3wmFu2QzTO4yBZbz2SqqXFLxwyoLpZ0LJ8
pdGl51+bIsTkDJzrdkhX9X4AKS/fYyjbQxq5u7FS89ZCCs+KvzjLcDRo0GZYaOK/
hgDBa60DCCszL/9yjbh0ANZxmM2Z3+6AXkvAAXrtXzIGk4JzenZfiV+VEzmq8Tt0
UpWSsUEe7VgykMR3MTrL9G8op70PpKX6OMUPegJq4iRQ24h4mpFCK4oV9OMKJqpF
ifN9NO2ZZKOz1ke4l7Xe8SEHLX7rq5U/P7INh3AsKYNYwo6HkJhSPxiCBWUTlnZ3
OYoZ7czyO4gMPHWP92z4CoSiTYVBFuyhYexRcnQskg60TIwbr+lMXziRyPRGI8b6
taf2rD8eAiyUJnvkFUsyAHtYHpkSkuMeiVqY2CDQdh2SdtIFgwKzB2RjFL0gzaBZ
XP9RWD+HernGQAJSlIk7cVthont3JHklcKk+ohhDbsWzPeUJcz6t4ChtgRq0x4q4
Hpes1lsVfXpyxj5ouBK/E/t+diwPvBLM0dCcarQJE6ExgMzBC/C7Br7jCSgyVJA2
VhtcZaCYh+vRlQ==
=f7HE
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable STRICT_KERNEL_RWX for Freescale 85xx platforms.
- Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it
to be disabled.
- Add support for out-of-line static calls on 32-bit.
- Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled.
- Fix boot hangs on e5500 due to stale value in ESR passed to
do_page_fault().
- Fix several bugs on pseries in handling of device tree cache
information for hotplugged CPUs, and/or during partition migration.
- Various other small features and fixes.
Thanks to Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin,
Andrew Donnellan, Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le
Goater, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Denis
Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo A. R. Silva, Hari
Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent
Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Niklas
Schnelle, Oliver O'Halloran, Rob Herring, Russell Currey, Srikar
Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant
Hegde, Wan Jiabing, and Xiaoming Ni,
* tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (73 commits)
powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
powerpc/32e: Ignore ESR in instruction storage interrupt handler
powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC
MAINTAINERS: Update powerpc KVM entry
powerpc/xmon: fix task state output
powerpc/44x/fsp2: add missing of_node_put
powerpc/dcr: Use cmplwi instead of 3-argument cmpli
KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
powerpc/security: Use a mutex for interrupt exit code patching
powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void
powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs
powerpc/book3e: Fix set_memory_x() and set_memory_nx()
powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
powerpc/bpf: Fix write protecting JIT code
selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh
powerpc/64s/interrupt: Fix check_return_regs_valid() false positive
powerpc/boot: Set LC_ALL=C in wrapper script
powerpc/64s: Default to 64K pages for 64 bit book3s
Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"
...
core:
- Allow ftrace to instrument parts of the perf core code
- Add a new mem_hops field to perf_mem_data_src which allows to represent
intra-node/package or inter-node/off-package details to prepare for
next generation systems which have more hieararchy within the
node/pacakge level.
tools:
- Update for the new mem_hops field in perf_mem_data_src
arch:
- A set of constraints fixes for the Intel uncore PMU
- The usual set of small fixes and improvements for x86 and PPC
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/GkQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaD8D/wLhXR8RxtF4W9HJmHA+5XFsPtg+isp
ZNU2kOs4gZskFx75NQaRv5ikA8y68TKdIx+NuQvRLYItaMveTToLSsJ55bfGMxIQ
JHqDvANUNxBmAACnbYQlqf9WgB0i/3fCUHY5lpmN0waKjaswz7WNpycv4ccShVZr
PKbgEjkeFBhplCqqOF0X5H3V+4q85+nZONm1iSNd4S7/3B6OCxOf1u78usL1bbtW
yJAMSuTeOVUZCJm7oVywKW/ZlCscT135aKr6xe5QTrjlPuRWzuLaXNezdMnMyoVN
HVv8a0ClACb8U5KiGfhvaipaIlIAliWJp2qoiNjrspDruhH6Yc+eNh1gUhLbtNpR
4YZR5jxv4/mS13kzMMQg00cCWQl7N4whPT+ZE9pkpshGt+EwT+Iy3U+v13wDfnnp
MnDggpWYGEkAck13t/T6DwC3qBIsVujtpiG+tt/ERbTxiuxi1ccQTGY3PDjtHV3k
tIMH5n7l4jEpfl8VmoSUgz/2h1MLZnQUWp41GXkjkaOt7uunQZen+nAwqpTm28KV
7U6U0h1q6r7HxOZRxkPPe4HSV+aBNH3H1LeNBfEd3hDCFGf6MY6vLow+2BE9ybk7
Y6LPbRqq0SN3sd5MND0ZvQEt5Zgol8CMlX+UKoLEEv7RognGbIxkgpK7exv5pC9w
nWj7TaMfpRzPgw==
=Oj0G
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"Core:
- Allow ftrace to instrument parts of the perf core code
- Add a new mem_hops field to perf_mem_data_src which allows to
represent intra-node/package or inter-node/off-package details to
prepare for next generation systems which have more hieararchy
within the node/pacakge level.
Tools:
- Update for the new mem_hops field in perf_mem_data_src
Arch:
- A set of constraints fixes for the Intel uncore PMU
- The usual set of small fixes and improvements for x86 and PPC"
* tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings
powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses
tools/perf: Add mem_hops field in perf_mem_data_src structure
perf: Add mem_hops field in perf_mem_data_src structure
perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line
perf/core: Allow ftrace for functions in kernel/event/core.c
perf/x86: Add new event for AUX output counter index
perf/x86: Add compiler barrier after updating BTS
perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints
perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints
perf/x86/intel/uncore: Fix Intel SPR IIO event constraints
perf/x86/intel/uncore: Fix Intel SPR CHA event constraints
perf/x86/intel/uncore: Fix Intel ICX IIO event constraints
perf/x86/intel/uncore: Fix invalid unit check
perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
Fix the data source encodings to represent L2.1/L3.1(another core's
L2/L3 on the same node) accesses properly for power10 and older
plaforms.
Add new macros(LEVEL/REM) which can be used to add mem_lvl_num and remote
field data inside perf_mem_data_src structure.
Result in power9 system with patch changes:
localhost:~/linux/tools/perf # ./perf mem report | grep Remote
0.01% 1 252 Remote core, same node L3 or L3 hit [.] 0x0000000000002dd0 producer_consumer [.] 0x00007fff7f25eb90
anon HitM N/A No N/A 0 0
0.01% 1 220 Remote core, same node L3 or L3 hit [.] 0x0000000000002dd0 producer_consumer [.] 0x00007fff77776d90
anon HitM N/A No N/A 0 0
0.01% 1 220 Remote core, same node L3 or L3 hit [.] 0x0000000000002dd0 producer_consumer [.] 0x00007fff817d9410
anon HitM N/A No N/A 0 0
Fixes: 79e96f8f93 ("powerpc/perf: Export memory hierarchy info to user space")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211006140654.298352-5-kjain@linux.ibm.com
On power9 and earlier platforms, the default event used for cyles and
instructions is PM_CYC (0x0001e) and PM_INST_CMPL (0x00002)
respectively. These events use two programmable PMCs and by default will
count irrespective of the run latch state (idle state). But since they
use programmable PMCs, these events can lead to multiplexing with other
events, because there are only 4 programmable PMCs. Hence in power10,
performance monitoring unit (PMU) driver uses performance monitor
counter 5 (PMC5) and performance monitor counter6 (PMC6) for counting
instructions and cycles.
Currently on power10, the event used for cycles is PM_RUN_CYC (0x600F4)
and instructions uses PM_RUN_INST_CMPL (0x500fa). But counting of these
events in idle state is controlled by the CC56RUN bit setting in Monitor
Mode Control Register0 (MMCR0). If the CC56RUN bit is zero, PMC5/6 will
not count when CTRL[RUN] (run latch) is zero. This could lead to missing
some counts if a thread is in idle state during system wide profiling.
To fix it, set the CC56RUN bit in MMCR0 for power10, which makes PMC5
and PMC6 count instructions and cycles regardless of the run latch
state. Since this change make PMC5/6 count as PM_INST_CMPL/PM_CYC,
rename the event code 0x600f4 as PM_CYC instead of PM_RUN_CYC and event
code 0x500fa as PM_INST_CMPL instead of PM_RUN_INST_CMPL. The changes
are only for PMC5/6 event codes and will not affect the behaviour of
PM_RUN_CYC/PM_RUN_INST_CMPL if progammed in other PMC's.
Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.cm>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Tweak change log wording for style and consistency]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007075121.28497-1-atrajeev@linux.vnet.ibm.com
Patch adds support to include Sampled Instruction Address Register
(SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended
registers. Update the definition of PERF_REG_PMU_MASK_300/31 and
PERF_REG_EXTENDED_MAX to include these SPR's.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007065505.27809-4-atrajeev@linux.vnet.ibm.com
Incase of random sampling, there can be scenarios where
Sample Instruction Address Register(SIAR) may not latch
to the sampled instruction and could result in
the value of 0. In these scenarios it is preferred to
return regs->nip. These corner cases are seen in the
previous generation (p9) also.
Patch adds the check for SIAR value along with regs_use_siar
and siar_valid checks so that the function will return
regs->nip incase SIAR is zero.
Patch drops the code under PPMU_P10_DD1 flag check
which handles SIAR 0 case only for Power10 DD1.
Fixes: 2ca13a4cc5 ("powerpc/perf: Use regs->nip when SIAR is zero")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818171556.36912-3-kjain@linux.ibm.com
Drop the case of returning 0 as instruction pointer since kernel
never executes at 0 and userspace almost never does either.
Fixes: e6878835ac ("powerpc/perf: Sample only if SIAR-Valid bit is set in P7+")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818171556.36912-2-kjain@linux.ibm.com
Minor optimization in the 'perf_instruction_pointer' function code by
making use of stack siar instead of mfspr.
Fixes: 75382aa72f ("powerpc/perf: Move code to select SIAR or pt_regs into perf_read_regs")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210818171556.36912-1-kjain@linux.ibm.com
H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in
the result buffer. Result buffer has specific format defined in the PAPR
specification. One of the fields is counter offset and width of the
counter data returned.
Counter data are returned in a unsigned char array in big endian byte
order. To get the final counter data, the values must be left shifted
byte at a time. But commit 220a0c609a ("powerpc/perf: Add support for
the hv gpci (get performance counter info) interface") made the shifting
bitwise and also assumed little endian order. Because of that, hcall
counters values are reported incorrectly.
In particular this can lead to counters go backwards which messes up the
counter prev vs now calculation and leads to huge counter value
reporting:
#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
-C 0 -I 1000
time counts unit events
1.000078854 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
2.000213293 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
3.000320107 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
4.000428392 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
5.000537864 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
6.000649087 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
7.000760312 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
8.000865218 16,448 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
9.000978985 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
10.001088891 16,384 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
11.001201435 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
12.001307937 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
Fix the shifting logic to correct match the format, ie. read bytes in
big endian order.
Fixes: e4f226b158 ("powerpc/perf/hv-gpci: Increase request buffer size")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com
If an interrupt is taken in kernel mode, always use SIAR for it rather than
looking at regs_sipr. This prevents samples piling up around interrupt
enable (hard enable or interrupt replay via soft enable) in PMUs / modes
where the PR sample indication is not in synch with SIAR.
This results in better sampling of interrupt entry and exit in particular.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
- A big series refactoring parts of our KVM code, and converting some to C.
- Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs.
- Support for the Microwatt soft-core.
- Optimisations to our interrupt return path on 64-bit.
- Support for userspace access to the NX GZIP accelerator on PowerVM on Power10.
- Enable KUAP and KUEP by default on 32-bit Book3S CPUs.
- Other smaller features, fixes & cleanups.
Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li,
Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique
Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley,
Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun
Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain,
YueHaibing, Zhang Jianhua, Zhen Lei.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDfFS4THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFxHEAC88NJ+Gz87LiTQFt6QjhziBaJUd+sY
uqADPRROr4P50O8PjYZbMi2qbXzOlLkZO4wJWX7jpZ1F9KmbPNqY2shD8h4ahyge
F/uqzBW1FXBJfnDEKdU2MzalkeTP+dwxLZyouUamjDCGNLFjOV4x/Fft5otOdXjO
k9uO6yoGyOkWYzjC+Y/irNPlIDDByB/+bD92Cb52Y2mXMDDEnx4JzbtkeJW+8udT
Sjn3bWzeL+dz5GehjMKwK4+SptNiyQGOgM8FwtnKUMvgzxv04DqCGjr9YC12L2Z7
VoFZc4GzVgtf8DZg4fJ3KG5aG2nH3Tui7jc9lUckdrxixDAZw5wSG7CQ39gFb/5+
7A4fEJk4Z3h5llibwxAZrC7wV8ZDDXn8oRFzRcOJjfxYaD+ohOOyWHIebwkdiXYx
nfYI7sBcScDLXeBvHtDra2GJpbFSpVL3S/QNhhi1vKVNrFSyAgbAybcVL2xPLZ6+
8Mh7A8xt+hf2bo9AXuYJDo9mwXWfg1093d0kT+AslcRhZioBk18c2AiZLIz0FzuL
Ua/e5FPb99x9LSdcZHvaAXBoHT2iTgDyCyDa3gkIesyuRX6ggHoFcVQuvdDcbJ9d
H8LK+Tahy1Y+E5b6KdtU8mDEGE+QG+CWLnwQ6YSCaL/MYgaFzNa32Jdj1fmztSBC
cttP43kHZ7ljTw==
=zo4d
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- A big series refactoring parts of our KVM code, and converting some
to C.
- Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on
some CPUs.
- Support for the Microwatt soft-core.
- Optimisations to our interrupt return path on 64-bit.
- Support for userspace access to the NX GZIP accelerator on PowerVM on
Power10.
- Enable KUAP and KUEP by default on 32-bit Book3S CPUs.
- Other smaller features, fixes & cleanups.
Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe
Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand,
Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe,
Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika
Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar
Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei.
* tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits)
powerpc: Only build restart_table.c for 64s
powerpc/64s: move ret_from_fork etc above __end_soft_masked
powerpc/64s/interrupt: clean up interrupt return labels
powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols
powerpc/64: enable MSR[EE] in irq replay pt_regs
powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
powerpc/64s: add a table of implicit soft-masked addresses
powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
powerpc/64e: fix CONFIG_RELOCATABLE build warnings
powerpc/64s: fix hash page fault interrupt handler
powerpc/4xx: Fix setup_kuep() on SMP
powerpc/32s: Fix setup_{kuap/kuep}() on SMP
powerpc/interrupt: Use names in check_return_regs_valid()
powerpc/interrupt: Also use exit_must_hard_disable() on PPC32
powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE
powerpc/ptrace: Refactor regs_set_return_{msr/ip}
powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip}
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
powerpc/pseries/vas: Include irqdomain.h
powerpc: mark local variables around longjmp as volatile
...
This changes generic-compat-pmu.c so that it only uses architected
events defined in Power ISA v3.0B, rather than event encodings which,
while common to all the IBM Power Systems implementations, are
nevertheless implementation-specific rather than architected. The
intention is that any CPU implementation designed to conform to Power
ISA v3.0B or later can use generic-compat-pmu.c.
In addition to the existing events for cycles and instructions, this
adds several other architected events, including alternative encodings
for some events. In order to make it possible to measure cycles and
instructions at the same time as each other, we set the CC5-6RUN bit
in MMCR0, which makes PMC5 and PMC6 count instructions and cycles
regardless of the run bit, so their events are now PM_CYC and
PM_INST_CMPL rather than PM_RUN_CYC and PM_RUN_INST_CMPL (the latter
are still available via other event codes).
Note that POWER9 has an erratum where one architected event
(PM_FLOP_CMPL, floating-point operations completed, code 0x100f4) does
not work correctly. Given that there is a specific PMU driver for P9
which will be used in preference to generic-compat-pmu.c, that is not
a real problem.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YJD7L9yeoxvxqeYi@thinks.paulus.ozlabs.org
On systems without any specific PMU driver support registered, running
perf record causes Oops.
The relevant portion from call trace:
BUG: Kernel NULL pointer dereference on read at 0x00000040
Faulting instruction address: 0xc0021f0c
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PREEMPT CMPCPRO
SAF3000 DIE NOTIFICATION
CPU: 0 PID: 442 Comm: null_syscall Not tainted 5.13.0-rc6-s3k-dev-01645-g7649ee3d2957 #5164
NIP: c0021f0c LR: c00e8ad8 CTR: c00d8a5c
NIP perf_instruction_pointer+0x10/0x60
LR perf_prepare_sample+0x344/0x674
Call Trace:
perf_prepare_sample+0x7c/0x674 (unreliable)
perf_event_output_forward+0x3c/0x94
__perf_event_overflow+0x74/0x14c
perf_swevent_hrtimer+0xf8/0x170
__hrtimer_run_queues.constprop.0+0x160/0x318
hrtimer_interrupt+0x148/0x3b0
timer_interrupt+0xc4/0x22c
Decrementer_virt+0xb8/0xbc
During perf record session, perf_instruction_pointer() is called to
capture the sample IP. This function in core-book3s accesses
ppmu->flags. If a platform specific PMU driver is not registered, ppmu
is set to NULL and accessing its members results in a crash. Fix this
crash by checking if ppmu is set.
Fixes: 2ca13a4cc5 ("powerpc/perf: Use regs->nip when SIAR is zero")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1623952506-1431-1-git-send-email-atrajeev@linux.vnet.ibm.com
Make our stack-walking code KASAN-safe by using __no_sanitize_address.
Generic code, arm64, s390 and x86 all make accesses unchecked for similar
sorts of reasons: when unwinding a stack, we might touch memory that KASAN
has marked as being out-of-bounds. In ppc64 KASAN development, I hit this
sometimes when checking for an exception frame - because we're checking
an arbitrary offset into the stack frame.
See commit 2095574632 ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4b ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c8 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c83664
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210614120907.1952321-1-dja@axtens.net
'struct ppc_inst' is an internal representation of an instruction, but
in-memory instructions are and will remain a table of 'u32' forever.
Replace all 'struct ppc_inst *' used for locating an instruction in
memory by 'u32 *'. This removes a lot of undue casts to 'struct
ppc_inst *'.
It also helps locating ab-use of 'struct ppc_inst' dereference.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix ppc_inst_next(), use u32 instead of unsigned int]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
arch/powerpc/Kbuild decend into arch/powerpc/perf/ only when
CONFIG_PERF_EVENTS is selected, so there is not need to take
CONFIG_PERF_EVENTS into account in arch/powerpc/perf/Makefile.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d37f61afca55b5b33787b643890e061ae1c18f5f.1620396045.git.christophe.leroy@csgroup.eu
Memory events (mem-loads and mem-stores) currently use the threshold
event selection as issue to finish. Power10 supports issue to complete
as part of thresholding which is more appropriate for mem-loads and
mem-stores. Hence fix the event code for memory events to use issue
to complete.
Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1614840015-1535-1-git-send-email-atrajeev@linux.vnet.ibm.com
Sampled Instruction Event Register (SIER) field [46:48] identifies the
sampled instruction type. ISA v3.1 says value of 0b111 for this field as
reserved, but in POWER10 it denotes LARX/STCX type which will hopefully
be fixed in ISA v3.1 update.
Patch fixes the functions to handle type value 7 for CPU_FTR_ARCH_31.
Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Avoid reading mmcra until necessary, use early return to deindent if block]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1614858937-1485-1-git-send-email-atrajeev@linux.vnet.ibm.com
Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.
The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.
If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
Add platform specific attr.config value checks. Patch
includes checks for both power9 and power10.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408074504.248211-2-maddy@linux.ibm.com