- KCSAN enabled for arm64.
 
 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
 
 - Some more SVE clean-ups and refactoring in preparation for SME support
   (scalable matrix extensions).
 
 - BTI clean-ups (SYM_FUNC macros etc.)
 
 - arm64 atomics clean-up and codegen improvements.
 
 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal
   square root estimate).
 
 - Use SHA3 instructions to speed-up XOR.
 
 - arm64 unwind code refactoring/unification.
 
 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1
   (potentially set by a hypervisor; user-space already does this).
 
 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
 
 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHXNtYACgkQa9axLQDI
 XvHBGw/+OVGdbORxwrU+uRb7N6qIJkrW/mmM4x1KLo1i+REZLb8/VlXm0xC60FG+
 39x6FSVkRr+lLDfTqpQsOez5FpdsvOe9Fc4L3bwniDg+EPo7x65VmP2dw/Ae2q0i
 87xyWCczx5hFEPF/1sb1R1pm3bTXjeklBkdv+OXhwflLOwpCp1J8z8WJK8qJVFX6
 CmuE6Q4fDQr0ghl9Nf8DiAr20mHDh8wMKNUJOg4waaQOOCta6q1oJ3qfz6E9z1eW
 zEE3dfZgBCx7HCRc3KGgzT7H4Ces3BYvhBYP6bJRliVI88XdPiM4MfdGL4UIb27Q
 NLAdr+FVzk/YLzMHtxSfkT10nBqoOPWUTckLu9jIIl5cpBX73Wiz7jfzBvqFmC/y
 opSFMZ3lwQPM5WAPtAlZptA3GPPySeInVmvUgB7IQ+1Q1T1n8ri1y5hzTYC4Sc/g
 amJI1rXf1Al8+2zFBggr6Up+EOnfV9nAwrzLXkRlASsfmvY4dnVWg3NWfBqtEHAq
 VuZCecSgawxuSlpmJ4VGbLrBFaz18bn9EzujR5fFvi5Qcg1CMFOROi2+6IynopNV
 IS0R8j6fwgQPA5lcnNIPeJRRkQoqO4l8bPDzeXEny0BSw313EgBSo9aQtnjyIJbp
 BTuDHARKs+/NvDPvd8GQkxNPgwJnVOL9pdgNAolEu1/k7JtnIS0=
 =ecyi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - KCSAN enabled for arm64.

 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.

 - Some more SVE clean-ups and refactoring in preparation for SME
   support (scalable matrix extensions).

 - BTI clean-ups (SYM_FUNC macros etc.)

 - arm64 atomics clean-up and codegen improvements.

 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and
   reciprocal square root estimate).

 - Use SHA3 instructions to speed-up XOR.

 - arm64 unwind code refactoring/unification.

 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
   1 (potentially set by a hypervisor; user-space already does this).

 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.

 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64: Enable KCSAN
  kselftest/arm64: Add pidbench for floating point syscall cases
  arm64/fp: Add comments documenting the usage of state restore functions
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME
  ...
This commit is contained in:
Linus Torvalds 2022-01-10 08:49:37 -08:00
Родитель a7ac314061 945409a6ef
Коммит 9b9e211360
86 изменённых файлов: 4186 добавлений и 1072 удалений

Просмотреть файл

@ -0,0 +1,106 @@
================================================
HiSilicon PCIe Performance Monitoring Unit (PMU)
================================================
On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor
bandwidth, latency, bus utilization and buffer occupancy data of PCIe.
Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and
all Endpoints downstream these Root Ports.
HiSilicon PCIe PMU driver
=========================
The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe
Core id.::
/sys/bus/event_source/hisi_pcie<sicl>_<core>
PMU driver provides description of available events and filter options in sysfs,
see /sys/bus/event_source/devices/hisi_pcie<sicl>_<core>.
The "format" directory describes all formats of the config (events) and config1
(filter options) fields of the perf_event_attr structure. The "events" directory
describes all documented events shown in perf list.
The "identifier" sysfs file allows users to identify the version of the
PMU hardware device.
The "bus" sysfs file allows users to get the bus number of Root Ports
monitored by PMU.
Example usage of perf::
$# perf list
hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event]
hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event]
------------------------------------------
$# perf stat -e hisi_pcie0_0/rx_mwr_latency/
$# perf stat -e hisi_pcie0_0/rx_mwr_cnt/
$# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/
The current driver does not support sampling. So "perf record" is unsupported.
Also attach to a task is unsupported for PCIe PMU.
Filter options
--------------
1. Target filter
PMU could only monitor the performance of traffic downstream target Root Ports
or downstream target Endpoint. PCIe PMU driver support "port" and "bdf"
interfaces for users, and these two interfaces aren't supported at the same
time.
-port
"port" filter can be used in all PCIe PMU events, target Root Port can be
selected by configuring the 16-bits-bitmap "port". Multi ports can be selected
for AP-layer-events, and only one port can be selected for TL/DL-layer-events.
For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap
should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes),
bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101.
Example usage of perf::
$# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5
-bdf
"bdf" filter can only be used in bandwidth events, target Endpoint is selected
by configuring BDF to "bdf". Counter only counts the bandwidth of message
requested by target Endpoint.
For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
Example usage of perf::
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5
2. Trigger filter
Event statistics start when the first time TLP length is greater/smaller
than trigger condition. You can set the trigger condition by writing "trig_len",
and set the trigger mode by writing "trig_mode". This filter can only be used
in bandwidth events.
For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
means statistics start when TLP length > trigger condition, "trig_mode=1"
means start when TLP length < condition.
Example usage of perf::
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
3. Threshold filter
Counter counts when TLP length within the specified range. You can set the
threshold by writing "thr_len", and set the threshold mode by writing
"thr_mode". This filter can only be used in bandwidth events.
For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
counter counts when TLP length >= threshold, and "thr_mode=1" means counts
when TLP length < threshold.
Example usage of perf::
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5

Просмотреть файл

@ -905,6 +905,17 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
The default value is 8.
perf_user_access (arm64 only)
=================================
Controls user space access for reading perf event counters. When set to 1,
user space can read performance monitor counter registers directly.
The default value is 0 (access disabled).
See Documentation/arm64/perf.rst for more information.
pid_max
=======

Просмотреть файл

@ -275,6 +275,23 @@ infrastructure:
| SVEVer | [3-0] | y |
+------------------------------+---------+---------+
8) ID_AA64MMFR1_EL1 - Memory model feature register 1
+------------------------------+---------+---------+
| Name | bits | visible |
+------------------------------+---------+---------+
| AFP | [47-44] | y |
+------------------------------+---------+---------+
9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+------------------------------+---------+---------+
| Name | bits | visible |
+------------------------------+---------+---------+
| RPRES | [7-4] | y |
+------------------------------+---------+---------+
Appendix I: Example
-------------------

Просмотреть файл

@ -251,6 +251,14 @@ HWCAP2_ECV
Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001.
HWCAP2_AFP
Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001.
HWCAP2_RPRES
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
4. Unused AT_HWCAP bits
-----------------------

Просмотреть файл

@ -2,7 +2,10 @@
.. _perf_index:
=====================
====
Perf
====
Perf Event Attributes
=====================
@ -88,3 +91,76 @@ exclude_host. However when using !exclude_hv there is a small blackout
window at the guest entry/exit where host events are not captured.
On VHE systems there are no blackout windows.
Perf Userspace PMU Hardware Counter Access
==========================================
Overview
--------
The perf userspace tool relies on the PMU to monitor events. It offers an
abstraction layer over the hardware counters since the underlying
implementation is cpu-dependent.
Arm64 allows userspace tools to have access to the registers storing the
hardware counters' values directly.
This targets specifically self-monitoring tasks in order to reduce the overhead
by directly accessing the registers without having to go through the kernel.
How-to
------
The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
registers is enabled and that the userspace has access to the relevant
information in order to use them.
In order to have access to the hardware counters, the global sysctl
kernel/perf_user_access must first be enabled:
.. code-block:: sh
echo 1 > /proc/sys/kernel/perf_user_access
It is necessary to open the event using the perf tool interface with config1:1
attr bit set: the sys_perf_event_open syscall returns a fd which can
subsequently be used with the mmap syscall in order to retrieve a page of memory
containing information about the event. The PMU driver uses this page to expose
to the user the hardware counter's index and other necessary data. Using this
index enables the user to access the PMU registers using the `mrs` instruction.
Access to the PMU registers is only valid while the sequence lock is unchanged.
In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is
changed.
The userspace access is supported in libperf using the perf_evsel__mmap()
and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
an example.
About heterogeneous systems
---------------------------
On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
only be enabled when the tasks are pinned to a homogeneous subset of cores and
the corresponding PMU instance is opened by specifying the 'type' attribute.
The use of generic event types is not supported in this case.
Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
can be run using the perf tool to check that the access to the registers works
correctly from userspace:
.. code-block:: sh
perf test -v user
About chained events and counter sizes
--------------------------------------
The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)
counter along with userspace access. The sys_perf_event_open syscall will fail
if a 64-bit counter is requested and the hardware doesn't support 64-bit
counters. Chained events are not supported in conjunction with userspace counter
access. If a 32-bit counter is requested on hardware with 64-bit counters, then
userspace must treat the upper 32-bits read from the counter as UNKNOWN. The
'pmc_width' field in the user page will indicate the valid width of the counter
and should be used to mask the upper bits as needed.
.. Links
.. _tools/perf/arch/arm64/tests/user-events.c:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
.. _tools/lib/perf/tests/test-evsel.c:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c

Просмотреть файл

@ -255,7 +255,7 @@ prctl(PR_SVE_GET_VL)
vector length change (which would only normally be the case between a
fork() or vfork() and the corresponding execve() in typical use).
To extract the vector length from the result, and it with
To extract the vector length from the result, bitwise and it with
PR_SVE_VL_LEN_MASK.
Return value: a nonnegative value on success, or a negative value on error:

Просмотреть файл

@ -49,7 +49,7 @@ how the user addresses are used by the kernel:
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
``mremap()`` as these have the potential to alias with existing
user addresses.
user addresses.
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
incorrectly accept valid tagged pointers for the ``brk()``,

Просмотреть файл

@ -12,12 +12,14 @@ maintainers:
properties:
compatible:
const: arm,cmn-600
enum:
- arm,cmn-600
- arm,ci-700
reg:
items:
- description: Physical address of the base (PERIPHBASE) and
size (up to 64MB) of the configuration address space.
size of the configuration address space.
interrupts:
minItems: 1
@ -31,14 +33,23 @@ properties:
arm,root-node:
$ref: /schemas/types.yaml#/definitions/uint32
description: Offset from PERIPHBASE of the configuration
discovery node (see TRM definition of ROOTNODEBASE).
description: Offset from PERIPHBASE of CMN-600's configuration
discovery node (see TRM definition of ROOTNODEBASE). Not
relevant for newer CMN/CI products.
required:
- compatible
- reg
- interrupts
- arm,root-node
if:
properties:
compatible:
contains:
const: arm,cmn-600
then:
required:
- arm,root-node
additionalProperties: false

Просмотреть файл

@ -0,0 +1,70 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/perf/arm,smmu-v3-pmcg.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Arm SMMUv3 Performance Monitor Counter Group
maintainers:
- Will Deacon <will@kernel.org>
- Robin Murphy <robin.murphy@arm.com>
description: |
An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
They are standalone performance monitoring units that support both
architected and IMPLEMENTATION DEFINED event counters.
properties:
$nodename:
pattern: "^pmu@[0-9a-f]*"
compatible:
oneOf:
- items:
- const: arm,mmu-600-pmcg
- const: arm,smmu-v3-pmcg
- const: arm,smmu-v3-pmcg
reg:
items:
- description: Register page 0
- description: Register page 1, if SMMU_PMCG_CFGR.RELOC_CTRS = 1
minItems: 1
interrupts:
maxItems: 1
msi-parent: true
required:
- compatible
- reg
anyOf:
- required:
- interrupts
- required:
- msi-parent
additionalProperties: false
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>
#include <dt-bindings/interrupt-controller/irq.h>
pmu@2b420000 {
compatible = "arm,smmu-v3-pmcg";
reg = <0x2b420000 0x1000>,
<0x2b430000 0x1000>;
interrupts = <GIC_SPI 80 IRQ_TYPE_EDGE_RISING>;
msi-parent = <&its 0xff0000>;
};
pmu@2b440000 {
compatible = "arm,smmu-v3-pmcg";
reg = <0x2b440000 0x1000>,
<0x2b450000 0x1000>;
interrupts = <GIC_SPI 81 IRQ_TYPE_EDGE_RISING>;
msi-parent = <&its 0xff0000>;
};

Просмотреть файл

@ -0,0 +1,63 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/perf/marvell-cn10k-tad.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Marvell CN10K LLC-TAD performance monitor
maintainers:
- Bhaskara Budiredla <bbudiredla@marvell.com>
description: |
The Tag-and-Data units (TADs) maintain coherence and contain CN10K
shared on-chip last level cache (LLC). The tad pmu measures the
performance of last-level cache. Each tad pmu supports up to eight
counters.
The DT setup comprises of number of tad blocks, the sizes of pmu
regions, tad blocks and overall base address of the HW.
properties:
compatible:
const: marvell,cn10k-tad-pmu
reg:
maxItems: 1
marvell,tad-cnt:
description: specifies the number of tads on the soc
$ref: /schemas/types.yaml#/definitions/uint32
marvell,tad-page-size:
description: specifies the size of each tad page
$ref: /schemas/types.yaml#/definitions/uint32
marvell,tad-pmu-page-size:
description: specifies the size of page that the pmu uses
$ref: /schemas/types.yaml#/definitions/uint32
required:
- compatible
- reg
- marvell,tad-cnt
- marvell,tad-page-size
- marvell,tad-pmu-page-size
additionalProperties: false
examples:
- |
tad {
#address-cells = <2>;
#size-cells = <2>;
tad_pmu@80000000 {
compatible = "marvell,cn10k-tad-pmu";
reg = <0x87e2 0x80000000 0x0 0x1000>;
marvell,tad-cnt = <1>;
marvell,tad-page-size = <0x1000>;
marvell,tad-pmu-page-size = <0x1000>;
};
};

Просмотреть файл

@ -1950,6 +1950,14 @@ There are some more advanced barrier functions:
For load from persistent memory, existing read memory barriers are sufficient
to ensure read ordering.
(*) io_stop_wc();
For memory accesses with write-combining attributes (e.g. those returned
by ioremap_wc(), the CPU may wait for prior accesses to be merged with
subsequent ones. io_stop_wc() can be used to prevent the merging of
write-combining memory accesses before this macro with those after it when
such wait has performance implications.
===============================
IMPLICIT KERNEL MEMORY BARRIERS
===============================

Просмотреть файл

@ -8615,8 +8615,10 @@ F: drivers/misc/hisi_hikey_usb.c
HISILICON PMU DRIVER
M: Shaokun Zhang <zhangshaokun@hisilicon.com>
M: Qi Liu <liuqi115@huawei.com>
S: Supported
W: http://www.hisilicon.com
F: Documentation/admin-guide/perf/hisi-pcie-pmu.rst
F: Documentation/admin-guide/perf/hisi-pmu.rst
F: drivers/perf/hisilicon

Просмотреть файл

@ -150,6 +150,8 @@ config ARM64
select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
# Some instrumentation may be unsound, hence EXPERT
select HAVE_ARCH_KCSAN if EXPERT
select HAVE_ARCH_KFENCE
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
@ -1545,6 +1547,12 @@ endmenu
menu "ARMv8.2 architectural features"
config AS_HAS_ARMV8_2
def_bool $(cc-option,-Wa$(comma)-march=armv8.2-a)
config AS_HAS_SHA3
def_bool $(as-instr,.arch armv8.2-a+sha3)
config ARM64_PMEM
bool "Enable support for persistent memory"
select ARCH_HAS_PMEM_API

Просмотреть файл

@ -58,6 +58,11 @@ stack_protector_prepare: prepare0
include/generated/asm-offsets.h))
endif
ifeq ($(CONFIG_AS_HAS_ARMV8_2), y)
# make sure to pass the newest target architecture to -march.
asm-arch := armv8.2-a
endif
# Ensure that if the compiler supports branch protection we default it
# off, this will be overridden if we are using branch protection.
branch-prot-flags-y += $(call cc-option,-mbranch-protection=none)

Просмотреть файл

@ -363,15 +363,15 @@ ST5( mov v4.16b, vctr.16b )
adr x16, 1f
sub x16, x16, x12, lsl #3
br x16
hint 34 // bti c
bti c
mov v0.d[0], vctr.d[0]
hint 34 // bti c
bti c
mov v1.d[0], vctr.d[0]
hint 34 // bti c
bti c
mov v2.d[0], vctr.d[0]
hint 34 // bti c
bti c
mov v3.d[0], vctr.d[0]
ST5( hint 34 )
ST5( bti c )
ST5( mov v4.d[0], vctr.d[0] )
1: b 2f
.previous

Просмотреть файл

@ -790,6 +790,16 @@ alternative_endif
.Lnoyield_\@:
.endm
/*
* Branch Target Identifier (BTI)
*/
.macro bti, targets
.equ .L__bti_targets_c, 34
.equ .L__bti_targets_j, 36
.equ .L__bti_targets_jc,38
hint #.L__bti_targets_\targets
.endm
/*
* This macro emits a program property note section identifying
* architecture features which require special handling, mainly for

Просмотреть файл

@ -44,11 +44,11 @@ __ll_sc_atomic_##op(int i, atomic_t *v) \
\
asm volatile("// atomic_" #op "\n" \
__LL_SC_FALLBACK( \
" prfm pstl1strm, %2\n" \
"1: ldxr %w0, %2\n" \
" " #asm_op " %w0, %w0, %w3\n" \
" stxr %w1, %w0, %2\n" \
" cbnz %w1, 1b\n") \
" prfm pstl1strm, %2\n" \
"1: ldxr %w0, %2\n" \
" " #asm_op " %w0, %w0, %w3\n" \
" stxr %w1, %w0, %2\n" \
" cbnz %w1, 1b\n") \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i)); \
}
@ -62,12 +62,12 @@ __ll_sc_atomic_##op##_return##name(int i, atomic_t *v) \
\
asm volatile("// atomic_" #op "_return" #name "\n" \
__LL_SC_FALLBACK( \
" prfm pstl1strm, %2\n" \
"1: ld" #acq "xr %w0, %2\n" \
" " #asm_op " %w0, %w0, %w3\n" \
" st" #rel "xr %w1, %w0, %2\n" \
" cbnz %w1, 1b\n" \
" " #mb ) \
" prfm pstl1strm, %2\n" \
"1: ld" #acq "xr %w0, %2\n" \
" " #asm_op " %w0, %w0, %w3\n" \
" st" #rel "xr %w1, %w0, %2\n" \
" cbnz %w1, 1b\n" \
" " #mb ) \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@ -84,12 +84,12 @@ __ll_sc_atomic_fetch_##op##name(int i, atomic_t *v) \
\
asm volatile("// atomic_fetch_" #op #name "\n" \
__LL_SC_FALLBACK( \
" prfm pstl1strm, %3\n" \
"1: ld" #acq "xr %w0, %3\n" \
" " #asm_op " %w1, %w0, %w4\n" \
" st" #rel "xr %w2, %w1, %3\n" \
" cbnz %w2, 1b\n" \
" " #mb ) \
" prfm pstl1strm, %3\n" \
"1: ld" #acq "xr %w0, %3\n" \
" " #asm_op " %w1, %w0, %w4\n" \
" st" #rel "xr %w2, %w1, %3\n" \
" cbnz %w2, 1b\n" \
" " #mb ) \
: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@ -143,11 +143,11 @@ __ll_sc_atomic64_##op(s64 i, atomic64_t *v) \
\
asm volatile("// atomic64_" #op "\n" \
__LL_SC_FALLBACK( \
" prfm pstl1strm, %2\n" \
"1: ldxr %0, %2\n" \
" " #asm_op " %0, %0, %3\n" \
" stxr %w1, %0, %2\n" \
" cbnz %w1, 1b") \
" prfm pstl1strm, %2\n" \
"1: ldxr %0, %2\n" \
" " #asm_op " %0, %0, %3\n" \
" stxr %w1, %0, %2\n" \
" cbnz %w1, 1b") \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i)); \
}
@ -161,12 +161,12 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v) \
\
asm volatile("// atomic64_" #op "_return" #name "\n" \
__LL_SC_FALLBACK( \
" prfm pstl1strm, %2\n" \
"1: ld" #acq "xr %0, %2\n" \
" " #asm_op " %0, %0, %3\n" \
" st" #rel "xr %w1, %0, %2\n" \
" cbnz %w1, 1b\n" \
" " #mb ) \
" prfm pstl1strm, %2\n" \
"1: ld" #acq "xr %0, %2\n" \
" " #asm_op " %0, %0, %3\n" \
" st" #rel "xr %w1, %0, %2\n" \
" cbnz %w1, 1b\n" \
" " #mb ) \
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@ -176,19 +176,19 @@ __ll_sc_atomic64_##op##_return##name(s64 i, atomic64_t *v) \
#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\
static inline long \
__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \
__ll_sc_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \
{ \
s64 result, val; \
unsigned long tmp; \
\
asm volatile("// atomic64_fetch_" #op #name "\n" \
__LL_SC_FALLBACK( \
" prfm pstl1strm, %3\n" \
"1: ld" #acq "xr %0, %3\n" \
" " #asm_op " %1, %0, %4\n" \
" st" #rel "xr %w2, %1, %3\n" \
" cbnz %w2, 1b\n" \
" " #mb ) \
" prfm pstl1strm, %3\n" \
"1: ld" #acq "xr %0, %3\n" \
" " #asm_op " %1, %0, %4\n" \
" st" #rel "xr %w2, %1, %3\n" \
" cbnz %w2, 1b\n" \
" " #mb ) \
: "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \
: __stringify(constraint) "r" (i) \
: cl); \
@ -241,14 +241,14 @@ __ll_sc_atomic64_dec_if_positive(atomic64_t *v)
asm volatile("// atomic64_dec_if_positive\n"
__LL_SC_FALLBACK(
" prfm pstl1strm, %2\n"
"1: ldxr %0, %2\n"
" subs %0, %0, #1\n"
" b.lt 2f\n"
" stlxr %w1, %0, %2\n"
" cbnz %w1, 1b\n"
" dmb ish\n"
"2:")
" prfm pstl1strm, %2\n"
"1: ldxr %0, %2\n"
" subs %0, %0, #1\n"
" b.lt 2f\n"
" stlxr %w1, %0, %2\n"
" cbnz %w1, 1b\n"
" dmb ish\n"
"2:")
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
:
: "cc", "memory");

Просмотреть файл

@ -11,13 +11,13 @@
#define __ASM_ATOMIC_LSE_H
#define ATOMIC_OP(op, asm_op) \
static inline void __lse_atomic_##op(int i, atomic_t *v) \
static inline void __lse_atomic_##op(int i, atomic_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
" " #asm_op " %w[i], %[v]\n" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v)); \
" " #asm_op " %w[i], %[v]\n" \
: [v] "+Q" (v->counter) \
: [i] "r" (i)); \
}
ATOMIC_OP(andnot, stclr)
@ -25,19 +25,27 @@ ATOMIC_OP(or, stset)
ATOMIC_OP(xor, steor)
ATOMIC_OP(add, stadd)
static inline void __lse_atomic_sub(int i, atomic_t *v)
{
__lse_atomic_add(-i, v);
}
#undef ATOMIC_OP
#define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...) \
static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v) \
{ \
int old; \
\
asm volatile( \
__LSE_PREAMBLE \
" " #asm_op #mb " %w[i], %w[i], %[v]" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
" " #asm_op #mb " %w[i], %w[old], %[v]" \
: [v] "+Q" (v->counter), \
[old] "=r" (old) \
: [i] "r" (i) \
: cl); \
\
return i; \
return old; \
}
#define ATOMIC_FETCH_OPS(op, asm_op) \
@ -54,51 +62,46 @@ ATOMIC_FETCH_OPS(add, ldadd)
#undef ATOMIC_FETCH_OP
#undef ATOMIC_FETCH_OPS
#define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \
static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
#define ATOMIC_FETCH_OP_SUB(name) \
static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v) \
{ \
u32 tmp; \
\
asm volatile( \
__LSE_PREAMBLE \
" ldadd" #mb " %w[i], %w[tmp], %[v]\n" \
" add %w[i], %w[i], %w[tmp]" \
: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
: "r" (v) \
: cl); \
\
return i; \
return __lse_atomic_fetch_add##name(-i, v); \
}
ATOMIC_OP_ADD_RETURN(_relaxed, )
ATOMIC_OP_ADD_RETURN(_acquire, a, "memory")
ATOMIC_OP_ADD_RETURN(_release, l, "memory")
ATOMIC_OP_ADD_RETURN( , al, "memory")
ATOMIC_FETCH_OP_SUB(_relaxed)
ATOMIC_FETCH_OP_SUB(_acquire)
ATOMIC_FETCH_OP_SUB(_release)
ATOMIC_FETCH_OP_SUB( )
#undef ATOMIC_OP_ADD_RETURN
#undef ATOMIC_FETCH_OP_SUB
#define ATOMIC_OP_ADD_SUB_RETURN(name) \
static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \
{ \
return __lse_atomic_fetch_add##name(i, v) + i; \
} \
\
static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \
{ \
return __lse_atomic_fetch_sub(i, v) - i; \
}
ATOMIC_OP_ADD_SUB_RETURN(_relaxed)
ATOMIC_OP_ADD_SUB_RETURN(_acquire)
ATOMIC_OP_ADD_SUB_RETURN(_release)
ATOMIC_OP_ADD_SUB_RETURN( )
#undef ATOMIC_OP_ADD_SUB_RETURN
static inline void __lse_atomic_and(int i, atomic_t *v)
{
asm volatile(
__LSE_PREAMBLE
" mvn %w[i], %w[i]\n"
" stclr %w[i], %[v]"
: [i] "+&r" (i), [v] "+Q" (v->counter)
: "r" (v));
return __lse_atomic_andnot(~i, v);
}
#define ATOMIC_FETCH_OP_AND(name, mb, cl...) \
static inline int __lse_atomic_fetch_and##name(int i, atomic_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
" mvn %w[i], %w[i]\n" \
" ldclr" #mb " %w[i], %w[i], %[v]" \
: [i] "+&r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
: cl); \
\
return i; \
return __lse_atomic_fetch_andnot##name(~i, v); \
}
ATOMIC_FETCH_OP_AND(_relaxed, )
@ -108,69 +111,14 @@ ATOMIC_FETCH_OP_AND( , al, "memory")
#undef ATOMIC_FETCH_OP_AND
static inline void __lse_atomic_sub(int i, atomic_t *v)
{
asm volatile(
__LSE_PREAMBLE
" neg %w[i], %w[i]\n"
" stadd %w[i], %[v]"
: [i] "+&r" (i), [v] "+Q" (v->counter)
: "r" (v));
}
#define ATOMIC_OP_SUB_RETURN(name, mb, cl...) \
static inline int __lse_atomic_sub_return##name(int i, atomic_t *v) \
{ \
u32 tmp; \
\
asm volatile( \
__LSE_PREAMBLE \
" neg %w[i], %w[i]\n" \
" ldadd" #mb " %w[i], %w[tmp], %[v]\n" \
" add %w[i], %w[i], %w[tmp]" \
: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
: "r" (v) \
: cl); \
\
return i; \
}
ATOMIC_OP_SUB_RETURN(_relaxed, )
ATOMIC_OP_SUB_RETURN(_acquire, a, "memory")
ATOMIC_OP_SUB_RETURN(_release, l, "memory")
ATOMIC_OP_SUB_RETURN( , al, "memory")
#undef ATOMIC_OP_SUB_RETURN
#define ATOMIC_FETCH_OP_SUB(name, mb, cl...) \
static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
" neg %w[i], %w[i]\n" \
" ldadd" #mb " %w[i], %w[i], %[v]" \
: [i] "+&r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
: cl); \
\
return i; \
}
ATOMIC_FETCH_OP_SUB(_relaxed, )
ATOMIC_FETCH_OP_SUB(_acquire, a, "memory")
ATOMIC_FETCH_OP_SUB(_release, l, "memory")
ATOMIC_FETCH_OP_SUB( , al, "memory")
#undef ATOMIC_FETCH_OP_SUB
#define ATOMIC64_OP(op, asm_op) \
static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
" " #asm_op " %[i], %[v]\n" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v)); \
" " #asm_op " %[i], %[v]\n" \
: [v] "+Q" (v->counter) \
: [i] "r" (i)); \
}
ATOMIC64_OP(andnot, stclr)
@ -178,19 +126,27 @@ ATOMIC64_OP(or, stset)
ATOMIC64_OP(xor, steor)
ATOMIC64_OP(add, stadd)
static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
{
__lse_atomic64_add(-i, v);
}
#undef ATOMIC64_OP
#define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \
static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\
{ \
s64 old; \
\
asm volatile( \
__LSE_PREAMBLE \
" " #asm_op #mb " %[i], %[i], %[v]" \
: [i] "+r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
" " #asm_op #mb " %[i], %[old], %[v]" \
: [v] "+Q" (v->counter), \
[old] "=r" (old) \
: [i] "r" (i) \
: cl); \
\
return i; \
return old; \
}
#define ATOMIC64_FETCH_OPS(op, asm_op) \
@ -207,51 +163,46 @@ ATOMIC64_FETCH_OPS(add, ldadd)
#undef ATOMIC64_FETCH_OP
#undef ATOMIC64_FETCH_OPS
#define ATOMIC64_OP_ADD_RETURN(name, mb, cl...) \
static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
#define ATOMIC64_FETCH_OP_SUB(name) \
static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \
{ \
unsigned long tmp; \
\
asm volatile( \
__LSE_PREAMBLE \
" ldadd" #mb " %[i], %x[tmp], %[v]\n" \
" add %[i], %[i], %x[tmp]" \
: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
: "r" (v) \
: cl); \
\
return i; \
return __lse_atomic64_fetch_add##name(-i, v); \
}
ATOMIC64_OP_ADD_RETURN(_relaxed, )
ATOMIC64_OP_ADD_RETURN(_acquire, a, "memory")
ATOMIC64_OP_ADD_RETURN(_release, l, "memory")
ATOMIC64_OP_ADD_RETURN( , al, "memory")
ATOMIC64_FETCH_OP_SUB(_relaxed)
ATOMIC64_FETCH_OP_SUB(_acquire)
ATOMIC64_FETCH_OP_SUB(_release)
ATOMIC64_FETCH_OP_SUB( )
#undef ATOMIC64_OP_ADD_RETURN
#undef ATOMIC64_FETCH_OP_SUB
#define ATOMIC64_OP_ADD_SUB_RETURN(name) \
static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
{ \
return __lse_atomic64_fetch_add##name(i, v) + i; \
} \
\
static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
{ \
return __lse_atomic64_fetch_sub##name(i, v) - i; \
}
ATOMIC64_OP_ADD_SUB_RETURN(_relaxed)
ATOMIC64_OP_ADD_SUB_RETURN(_acquire)
ATOMIC64_OP_ADD_SUB_RETURN(_release)
ATOMIC64_OP_ADD_SUB_RETURN( )
#undef ATOMIC64_OP_ADD_SUB_RETURN
static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
{
asm volatile(
__LSE_PREAMBLE
" mvn %[i], %[i]\n"
" stclr %[i], %[v]"
: [i] "+&r" (i), [v] "+Q" (v->counter)
: "r" (v));
return __lse_atomic64_andnot(~i, v);
}
#define ATOMIC64_FETCH_OP_AND(name, mb, cl...) \
static inline long __lse_atomic64_fetch_and##name(s64 i, atomic64_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
" mvn %[i], %[i]\n" \
" ldclr" #mb " %[i], %[i], %[v]" \
: [i] "+&r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
: cl); \
\
return i; \
return __lse_atomic64_fetch_andnot##name(~i, v); \
}
ATOMIC64_FETCH_OP_AND(_relaxed, )
@ -261,61 +212,6 @@ ATOMIC64_FETCH_OP_AND( , al, "memory")
#undef ATOMIC64_FETCH_OP_AND
static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
{
asm volatile(
__LSE_PREAMBLE
" neg %[i], %[i]\n"
" stadd %[i], %[v]"
: [i] "+&r" (i), [v] "+Q" (v->counter)
: "r" (v));
}
#define ATOMIC64_OP_SUB_RETURN(name, mb, cl...) \
static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v) \
{ \
unsigned long tmp; \
\
asm volatile( \
__LSE_PREAMBLE \
" neg %[i], %[i]\n" \
" ldadd" #mb " %[i], %x[tmp], %[v]\n" \
" add %[i], %[i], %x[tmp]" \
: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \
: "r" (v) \
: cl); \
\
return i; \
}
ATOMIC64_OP_SUB_RETURN(_relaxed, )
ATOMIC64_OP_SUB_RETURN(_acquire, a, "memory")
ATOMIC64_OP_SUB_RETURN(_release, l, "memory")
ATOMIC64_OP_SUB_RETURN( , al, "memory")
#undef ATOMIC64_OP_SUB_RETURN
#define ATOMIC64_FETCH_OP_SUB(name, mb, cl...) \
static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \
{ \
asm volatile( \
__LSE_PREAMBLE \
" neg %[i], %[i]\n" \
" ldadd" #mb " %[i], %[i], %[v]" \
: [i] "+&r" (i), [v] "+Q" (v->counter) \
: "r" (v) \
: cl); \
\
return i; \
}
ATOMIC64_FETCH_OP_SUB(_relaxed, )
ATOMIC64_FETCH_OP_SUB(_acquire, a, "memory")
ATOMIC64_FETCH_OP_SUB(_release, l, "memory")
ATOMIC64_FETCH_OP_SUB( , al, "memory")
#undef ATOMIC64_FETCH_OP_SUB
static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
{
unsigned long tmp;

Просмотреть файл

@ -26,6 +26,14 @@
#define __tsb_csync() asm volatile("hint #18" : : : "memory")
#define csdb() asm volatile("hint #20" : : : "memory")
/*
* Data Gathering Hint:
* This instruction prevents merging memory accesses with Normal-NC or
* Device-GRE attributes before the hint instruction with any memory accesses
* appearing after the hint instruction.
*/
#define dgh() asm volatile("hint #6" : : : "memory")
#ifdef CONFIG_ARM64_PSEUDO_NMI
#define pmr_sync() \
do { \
@ -46,6 +54,7 @@
#define dma_rmb() dmb(oshld)
#define dma_wmb() dmb(oshst)
#define io_stop_wc() dgh()
#define tsb_csync() \
do { \

Просмотреть файл

@ -51,6 +51,7 @@ struct cpuinfo_arm64 {
u64 reg_id_aa64dfr1;
u64 reg_id_aa64isar0;
u64 reg_id_aa64isar1;
u64 reg_id_aa64isar2;
u64 reg_id_aa64mmfr0;
u64 reg_id_aa64mmfr1;
u64 reg_id_aa64mmfr2;

Просмотреть файл

@ -51,8 +51,8 @@ extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
extern void fpsimd_flush_task_state(struct task_struct *target);
extern void fpsimd_save_and_flush_cpu_state(void);
/* Maximum VL that SVE VL-agnostic software can transparently support */
#define SVE_VL_ARCH_MAX 0x100
/* Maximum VL that SVE/SME VL-agnostic software can transparently support */
#define VL_ARCH_MAX 0x100
/* Offset of FFR in the SVE register dump */
static inline size_t sve_ffr_offset(int vl)
@ -122,7 +122,7 @@ extern void fpsimd_sync_to_sve(struct task_struct *task);
extern void sve_sync_to_fpsimd(struct task_struct *task);
extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
extern int sve_set_vector_length(struct task_struct *task,
extern int vec_set_vector_length(struct task_struct *task, enum vec_type type,
unsigned long vl, unsigned long flags);
extern int sve_set_current_vl(unsigned long arg);

Просмотреть файл

@ -106,6 +106,8 @@
#define KERNEL_HWCAP_BTI __khwcap2_feature(BTI)
#define KERNEL_HWCAP_MTE __khwcap2_feature(MTE)
#define KERNEL_HWCAP_ECV __khwcap2_feature(ECV)
#define KERNEL_HWCAP_AFP __khwcap2_feature(AFP)
#define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES)
/*
* This yields a mask that user programs can use to figure out what

Просмотреть файл

@ -1,48 +1,43 @@
#ifndef __ASM_LINKAGE_H
#define __ASM_LINKAGE_H
#ifdef __ASSEMBLY__
#include <asm/assembler.h>
#endif
#define __ALIGN .align 2
#define __ALIGN_STR ".align 2"
#if defined(CONFIG_ARM64_BTI_KERNEL) && defined(__aarch64__)
/*
* Since current versions of gas reject the BTI instruction unless we
* set the architecture version to v8.5 we use the hint instruction
* instead.
*/
#define BTI_C hint 34 ;
/*
* When using in-kernel BTI we need to ensure that PCS-conformant assembly
* functions have suitable annotations. Override SYM_FUNC_START to insert
* a BTI landing pad at the start of everything.
* When using in-kernel BTI we need to ensure that PCS-conformant
* assembly functions have suitable annotations. Override
* SYM_FUNC_START to insert a BTI landing pad at the start of
* everything, the override is done unconditionally so we're more
* likely to notice any drift from the overridden definitions.
*/
#define SYM_FUNC_START(name) \
SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) \
BTI_C
bti c ;
#define SYM_FUNC_START_NOALIGN(name) \
SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE) \
BTI_C
bti c ;
#define SYM_FUNC_START_LOCAL(name) \
SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN) \
BTI_C
bti c ;
#define SYM_FUNC_START_LOCAL_NOALIGN(name) \
SYM_START(name, SYM_L_LOCAL, SYM_A_NONE) \
BTI_C
bti c ;
#define SYM_FUNC_START_WEAK(name) \
SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN) \
BTI_C
bti c ;
#define SYM_FUNC_START_WEAK_NOALIGN(name) \
SYM_START(name, SYM_L_WEAK, SYM_A_NONE) \
BTI_C
#endif
bti c ;
/*
* Annotate a function as position independent, i.e., safe to be called before

Просмотреть файл

@ -84,10 +84,12 @@ static inline void __dc_gzva(u64 p)
static inline void mte_set_mem_tag_range(void *addr, size_t size, u8 tag,
bool init)
{
u64 curr, mask, dczid_bs, end1, end2, end3;
u64 curr, mask, dczid, dczid_bs, dczid_dzp, end1, end2, end3;
/* Read DC G(Z)VA block size from the system register. */
dczid_bs = 4ul << (read_cpuid(DCZID_EL0) & 0xf);
dczid = read_cpuid(DCZID_EL0);
dczid_bs = 4ul << (dczid & 0xf);
dczid_dzp = (dczid >> 4) & 1;
curr = (u64)__tag_set(addr, tag);
mask = dczid_bs - 1;
@ -106,7 +108,7 @@ static inline void mte_set_mem_tag_range(void *addr, size_t size, u8 tag,
*/
#define SET_MEMTAG_RANGE(stg_post, dc_gva) \
do { \
if (size >= 2 * dczid_bs) { \
if (!dczid_dzp && size >= 2 * dczid_bs) {\
do { \
curr = stg_post(curr); \
} while (curr < end1); \

Просмотреть файл

@ -47,6 +47,10 @@ struct stack_info {
* @prev_type: The type of stack this frame record was on, or a synthetic
* value of STACK_TYPE_UNKNOWN. This is used to detect a
* transition from one stack to another.
*
* @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
* associated with the most recently encountered replacement lr
* value.
*/
struct stackframe {
unsigned long fp;
@ -59,9 +63,6 @@ struct stackframe {
#endif
};
extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
bool (*fn)(void *, unsigned long), void *data);
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl);
@ -146,7 +147,4 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
return false;
}
void start_backtrace(struct stackframe *frame, unsigned long fp,
unsigned long pc);
#endif /* __ASM_STACKTRACE_H */

Просмотреть файл

@ -182,6 +182,7 @@
#define SYS_ID_AA64ISAR0_EL1 sys_reg(3, 0, 0, 6, 0)
#define SYS_ID_AA64ISAR1_EL1 sys_reg(3, 0, 0, 6, 1)
#define SYS_ID_AA64ISAR2_EL1 sys_reg(3, 0, 0, 6, 2)
#define SYS_ID_AA64MMFR0_EL1 sys_reg(3, 0, 0, 7, 0)
#define SYS_ID_AA64MMFR1_EL1 sys_reg(3, 0, 0, 7, 1)
@ -771,6 +772,20 @@
#define ID_AA64ISAR1_GPI_NI 0x0
#define ID_AA64ISAR1_GPI_IMP_DEF 0x1
/* id_aa64isar2 */
#define ID_AA64ISAR2_RPRES_SHIFT 4
#define ID_AA64ISAR2_WFXT_SHIFT 0
#define ID_AA64ISAR2_RPRES_8BIT 0x0
#define ID_AA64ISAR2_RPRES_12BIT 0x1
/*
* Value 0x1 has been removed from the architecture, and is
* reserved, but has not yet been removed from the ARM ARM
* as of ARM DDI 0487G.b.
*/
#define ID_AA64ISAR2_WFXT_NI 0x0
#define ID_AA64ISAR2_WFXT_SUPPORTED 0x2
/* id_aa64pfr0 */
#define ID_AA64PFR0_CSV3_SHIFT 60
#define ID_AA64PFR0_CSV2_SHIFT 56
@ -889,6 +904,7 @@
#endif
/* id_aa64mmfr1 */
#define ID_AA64MMFR1_AFP_SHIFT 44
#define ID_AA64MMFR1_ETS_SHIFT 36
#define ID_AA64MMFR1_TWED_SHIFT 32
#define ID_AA64MMFR1_XNX_SHIFT 28

Просмотреть файл

@ -76,5 +76,7 @@
#define HWCAP2_BTI (1 << 17)
#define HWCAP2_MTE (1 << 18)
#define HWCAP2_ECV (1 << 19)
#define HWCAP2_AFP (1 << 20)
#define HWCAP2_RPRES (1 << 21)
#endif /* _UAPI__ASM_HWCAP_H */

Просмотреть файл

@ -22,6 +22,7 @@
#include <linux/irq_work.h>
#include <linux/memblock.h>
#include <linux/of_fdt.h>
#include <linux/libfdt.h>
#include <linux/smp.h>
#include <linux/serial_core.h>
#include <linux/pgtable.h>
@ -62,29 +63,22 @@ static int __init parse_acpi(char *arg)
}
early_param("acpi", parse_acpi);
static int __init dt_scan_depth1_nodes(unsigned long node,
const char *uname, int depth,
void *data)
static bool __init dt_is_stub(void)
{
/*
* Ignore anything not directly under the root node; we'll
* catch its parent instead.
*/
if (depth != 1)
return 0;
int node;
if (strcmp(uname, "chosen") == 0)
return 0;
fdt_for_each_subnode(node, initial_boot_params, 0) {
const char *name = fdt_get_name(initial_boot_params, node, NULL);
if (strcmp(name, "chosen") == 0)
continue;
if (strcmp(name, "hypervisor") == 0 &&
of_flat_dt_is_compatible(node, "xen,xen"))
continue;
if (strcmp(uname, "hypervisor") == 0 &&
of_flat_dt_is_compatible(node, "xen,xen"))
return 0;
return false;
}
/*
* This node at depth 1 is neither a chosen node nor a xen node,
* which we do not expect.
*/
return 1;
return true;
}
/*
@ -205,8 +199,7 @@ void __init acpi_boot_table_init(void)
* and ACPI has not been [force] enabled (acpi=on|force)
*/
if (param_acpi_off ||
(!param_acpi_on && !param_acpi_force &&
of_scan_flat_dt(dt_scan_depth1_nodes, NULL)))
(!param_acpi_on && !param_acpi_force && !dt_is_stub()))
goto done;
/*

Просмотреть файл

@ -225,6 +225,11 @@ static const struct arm64_ftr_bits ftr_id_aa64isar1[] = {
ARM64_FTR_END,
};
static const struct arm64_ftr_bits ftr_id_aa64isar2[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64ISAR2_RPRES_SHIFT, 4, 0),
ARM64_FTR_END,
};
static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV3_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_CSV2_SHIFT, 4, 0),
@ -325,6 +330,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
};
static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_AFP_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_ETS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_TWED_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR1_XNX_SHIFT, 4, 0),
@ -637,6 +643,7 @@ static const struct __ftr_reg_entry {
ARM64_FTR_REG(SYS_ID_AA64ISAR0_EL1, ftr_id_aa64isar0),
ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1,
&id_aa64isar1_override),
ARM64_FTR_REG(SYS_ID_AA64ISAR2_EL1, ftr_id_aa64isar2),
/* Op1 = 0, CRn = 0, CRm = 7 */
ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
@ -933,6 +940,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
init_cpu_ftr_reg(SYS_ID_AA64DFR1_EL1, info->reg_id_aa64dfr1);
init_cpu_ftr_reg(SYS_ID_AA64ISAR0_EL1, info->reg_id_aa64isar0);
init_cpu_ftr_reg(SYS_ID_AA64ISAR1_EL1, info->reg_id_aa64isar1);
init_cpu_ftr_reg(SYS_ID_AA64ISAR2_EL1, info->reg_id_aa64isar2);
init_cpu_ftr_reg(SYS_ID_AA64MMFR0_EL1, info->reg_id_aa64mmfr0);
init_cpu_ftr_reg(SYS_ID_AA64MMFR1_EL1, info->reg_id_aa64mmfr1);
init_cpu_ftr_reg(SYS_ID_AA64MMFR2_EL1, info->reg_id_aa64mmfr2);
@ -1151,6 +1159,8 @@ void update_cpu_features(int cpu,
info->reg_id_aa64isar0, boot->reg_id_aa64isar0);
taint |= check_update_ftr_reg(SYS_ID_AA64ISAR1_EL1, cpu,
info->reg_id_aa64isar1, boot->reg_id_aa64isar1);
taint |= check_update_ftr_reg(SYS_ID_AA64ISAR2_EL1, cpu,
info->reg_id_aa64isar2, boot->reg_id_aa64isar2);
/*
* Differing PARange support is fine as long as all peripherals and
@ -1272,6 +1282,7 @@ u64 __read_sysreg_by_encoding(u32 sys_id)
read_sysreg_case(SYS_ID_AA64MMFR2_EL1);
read_sysreg_case(SYS_ID_AA64ISAR0_EL1);
read_sysreg_case(SYS_ID_AA64ISAR1_EL1);
read_sysreg_case(SYS_ID_AA64ISAR2_EL1);
read_sysreg_case(SYS_CNTFRQ_EL0);
read_sysreg_case(SYS_CTR_EL0);
@ -2476,6 +2487,8 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE),
#endif /* CONFIG_ARM64_MTE */
HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV),
HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP),
HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES),
{},
};

Просмотреть файл

@ -95,6 +95,8 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_BTI] = "bti",
[KERNEL_HWCAP_MTE] = "mte",
[KERNEL_HWCAP_ECV] = "ecv",
[KERNEL_HWCAP_AFP] = "afp",
[KERNEL_HWCAP_RPRES] = "rpres",
};
#ifdef CONFIG_COMPAT
@ -391,6 +393,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
info->reg_id_aa64dfr1 = read_cpuid(ID_AA64DFR1_EL1);
info->reg_id_aa64isar0 = read_cpuid(ID_AA64ISAR0_EL1);
info->reg_id_aa64isar1 = read_cpuid(ID_AA64ISAR1_EL1);
info->reg_id_aa64isar2 = read_cpuid(ID_AA64ISAR2_EL1);
info->reg_id_aa64mmfr0 = read_cpuid(ID_AA64MMFR0_EL1);
info->reg_id_aa64mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
info->reg_id_aa64mmfr2 = read_cpuid(ID_AA64MMFR2_EL1);

Просмотреть файл

@ -77,17 +77,13 @@
.endm
SYM_CODE_START(ftrace_regs_caller)
#ifdef BTI_C
BTI_C
#endif
bti c
ftrace_regs_entry 1
b ftrace_common
SYM_CODE_END(ftrace_regs_caller)
SYM_CODE_START(ftrace_caller)
#ifdef BTI_C
BTI_C
#endif
bti c
ftrace_regs_entry 0
b ftrace_common
SYM_CODE_END(ftrace_caller)

Просмотреть файл

@ -966,8 +966,10 @@ SYM_CODE_START(__sdei_asm_handler)
mov sp, x1
mov x1, x0 // address to complete_and_resume
/* x0 = (x0 <= 1) ? EVENT_COMPLETE:EVENT_COMPLETE_AND_RESUME */
cmp x0, #1
/* x0 = (x0 <= SDEI_EV_FAILED) ?
* EVENT_COMPLETE:EVENT_COMPLETE_AND_RESUME
*/
cmp x0, #SDEI_EV_FAILED
mov_q x2, SDEI_1_0_FN_SDEI_EVENT_COMPLETE
mov_q x3, SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME
csel x0, x2, x3, ls

Просмотреть файл

@ -15,6 +15,7 @@
#include <linux/compiler.h>
#include <linux/cpu.h>
#include <linux/cpu_pm.h>
#include <linux/ctype.h>
#include <linux/kernel.h>
#include <linux/linkage.h>
#include <linux/irqflags.h>
@ -406,12 +407,13 @@ static unsigned int find_supported_vector_length(enum vec_type type,
#if defined(CONFIG_ARM64_SVE) && defined(CONFIG_SYSCTL)
static int sve_proc_do_default_vl(struct ctl_table *table, int write,
static int vec_proc_do_default_vl(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
struct vl_info *info = &vl_info[ARM64_VEC_SVE];
struct vl_info *info = table->extra1;
enum vec_type type = info->type;
int ret;
int vl = get_sve_default_vl();
int vl = get_default_vl(type);
struct ctl_table tmp_table = {
.data = &vl,
.maxlen = sizeof(vl),
@ -428,7 +430,7 @@ static int sve_proc_do_default_vl(struct ctl_table *table, int write,
if (!sve_vl_valid(vl))
return -EINVAL;
set_sve_default_vl(find_supported_vector_length(ARM64_VEC_SVE, vl));
set_default_vl(type, find_supported_vector_length(type, vl));
return 0;
}
@ -436,7 +438,8 @@ static struct ctl_table sve_default_vl_table[] = {
{
.procname = "sve_default_vector_length",
.mode = 0644,
.proc_handler = sve_proc_do_default_vl,
.proc_handler = vec_proc_do_default_vl,
.extra1 = &vl_info[ARM64_VEC_SVE],
},
{ }
};
@ -629,7 +632,7 @@ void sve_sync_from_fpsimd_zeropad(struct task_struct *task)
__fpsimd_to_sve(sst, fst, vq);
}
int sve_set_vector_length(struct task_struct *task,
int vec_set_vector_length(struct task_struct *task, enum vec_type type,
unsigned long vl, unsigned long flags)
{
if (flags & ~(unsigned long)(PR_SVE_VL_INHERIT |
@ -640,33 +643,35 @@ int sve_set_vector_length(struct task_struct *task,
return -EINVAL;
/*
* Clamp to the maximum vector length that VL-agnostic SVE code can
* work with. A flag may be assigned in the future to allow setting
* of larger vector lengths without confusing older software.
* Clamp to the maximum vector length that VL-agnostic code
* can work with. A flag may be assigned in the future to
* allow setting of larger vector lengths without confusing
* older software.
*/
if (vl > SVE_VL_ARCH_MAX)
vl = SVE_VL_ARCH_MAX;
if (vl > VL_ARCH_MAX)
vl = VL_ARCH_MAX;
vl = find_supported_vector_length(ARM64_VEC_SVE, vl);
vl = find_supported_vector_length(type, vl);
if (flags & (PR_SVE_VL_INHERIT |
PR_SVE_SET_VL_ONEXEC))
task_set_sve_vl_onexec(task, vl);
task_set_vl_onexec(task, type, vl);
else
/* Reset VL to system default on next exec: */
task_set_sve_vl_onexec(task, 0);
task_set_vl_onexec(task, type, 0);
/* Only actually set the VL if not deferred: */
if (flags & PR_SVE_SET_VL_ONEXEC)
goto out;
if (vl == task_get_sve_vl(task))
if (vl == task_get_vl(task, type))
goto out;
/*
* To ensure the FPSIMD bits of the SVE vector registers are preserved,
* write any live register state back to task_struct, and convert to a
* non-SVE thread.
* regular FPSIMD thread. Since the vector length can only be changed
* with a syscall we can't be in streaming mode while reconfiguring.
*/
if (task == current) {
get_cpu_fpsimd_context();
@ -687,10 +692,10 @@ int sve_set_vector_length(struct task_struct *task,
*/
sve_free(task);
task_set_sve_vl(task, vl);
task_set_vl(task, type, vl);
out:
update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
update_tsk_thread_flag(task, vec_vl_inherit_flag(type),
flags & PR_SVE_VL_INHERIT);
return 0;
@ -698,20 +703,21 @@ out:
/*
* Encode the current vector length and flags for return.
* This is only required for prctl(): ptrace has separate fields
* This is only required for prctl(): ptrace has separate fields.
* SVE and SME use the same bits for _ONEXEC and _INHERIT.
*
* flags are as for sve_set_vector_length().
* flags are as for vec_set_vector_length().
*/
static int sve_prctl_status(unsigned long flags)
static int vec_prctl_status(enum vec_type type, unsigned long flags)
{
int ret;
if (flags & PR_SVE_SET_VL_ONEXEC)
ret = task_get_sve_vl_onexec(current);
ret = task_get_vl_onexec(current, type);
else
ret = task_get_sve_vl(current);
ret = task_get_vl(current, type);
if (test_thread_flag(TIF_SVE_VL_INHERIT))
if (test_thread_flag(vec_vl_inherit_flag(type)))
ret |= PR_SVE_VL_INHERIT;
return ret;
@ -729,11 +735,11 @@ int sve_set_current_vl(unsigned long arg)
if (!system_supports_sve() || is_compat_task())
return -EINVAL;
ret = sve_set_vector_length(current, vl, flags);
ret = vec_set_vector_length(current, ARM64_VEC_SVE, vl, flags);
if (ret)
return ret;
return sve_prctl_status(flags);
return vec_prctl_status(ARM64_VEC_SVE, flags);
}
/* PR_SVE_GET_VL */
@ -742,7 +748,7 @@ int sve_get_current_vl(void)
if (!system_supports_sve() || is_compat_task())
return -EINVAL;
return sve_prctl_status(0);
return vec_prctl_status(ARM64_VEC_SVE, 0);
}
static void vec_probe_vqs(struct vl_info *info,
@ -1107,7 +1113,7 @@ static void fpsimd_flush_thread_vl(enum vec_type type)
vl = get_default_vl(type);
if (WARN_ON(!sve_vl_valid(vl)))
vl = SVE_VL_MIN;
vl = vl_info[type].min_vl;
supported_vl = find_supported_vector_length(type, vl);
if (WARN_ON(supported_vl != vl))
@ -1213,7 +1219,8 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st, void *sve_state,
/*
* Load the userland FPSIMD state of 'current' from memory, but only if the
* FPSIMD state already held in the registers is /not/ the most recent FPSIMD
* state of 'current'
* state of 'current'. This is called when we are preparing to return to
* userspace to ensure that userspace sees a good register state.
*/
void fpsimd_restore_current_state(void)
{
@ -1244,7 +1251,9 @@ void fpsimd_restore_current_state(void)
/*
* Load an updated userland FPSIMD state for 'current' from memory and set the
* flag that indicates that the FPSIMD register contents are the most recent
* FPSIMD state of 'current'
* FPSIMD state of 'current'. This is used by the signal code to restore the
* register state when returning from a signal handler in FPSIMD only cases,
* any SVE context will be discarded.
*/
void fpsimd_update_current_state(struct user_fpsimd_state const *state)
{

Просмотреть файл

@ -7,10 +7,6 @@
* Ubuntu project, hibernation support for mach-dove
* Copyright (C) 2010 Nokia Corporation (Hiroshi Doyu)
* Copyright (C) 2010 Texas Instruments, Inc. (Teerth Reddy et al.)
* https://lkml.org/lkml/2010/6/18/4
* https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/027422.html
* https://patchwork.kernel.org/patch/96442/
*
* Copyright (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
*/
#define pr_fmt(x) "hibernate: " x

Просмотреть файл

@ -104,13 +104,15 @@ static void *kexec_page_alloc(void *arg)
{
struct kimage *kimage = (struct kimage *)arg;
struct page *page = kimage_alloc_control_pages(kimage, 0);
void *vaddr = NULL;
if (!page)
return NULL;
memset(page_address(page), 0, PAGE_SIZE);
vaddr = page_address(page);
memset(vaddr, 0, PAGE_SIZE);
return page_address(page);
return vaddr;
}
int machine_kexec_post_load(struct kimage *kimage)

Просмотреть файл

@ -5,10 +5,10 @@
* Copyright (C) 2015 ARM Limited
*/
#include <linux/perf_event.h>
#include <linux/stacktrace.h>
#include <linux/uaccess.h>
#include <asm/pointer_auth.h>
#include <asm/stacktrace.h>
struct frame_tail {
struct frame_tail __user *fp;
@ -132,30 +132,21 @@ void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
}
}
/*
* Gets called by walk_stackframe() for every stackframe. This will be called
* whist unwinding the stackframe and is like a subroutine return so we use
* the PC.
*/
static bool callchain_trace(void *data, unsigned long pc)
{
struct perf_callchain_entry_ctx *entry = data;
perf_callchain_store(entry, pc);
return true;
return perf_callchain_store(entry, pc) == 0;
}
void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
struct stackframe frame;
if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
/* We don't support guest os callchain now */
return;
}
start_backtrace(&frame, regs->regs[29], regs->pc);
walk_stackframe(current, &frame, callchain_trace, entry);
arch_stack_walk(callchain_trace, entry, current, regs);
}
unsigned long perf_instruction_pointer(struct pt_regs *regs)

Просмотреть файл

@ -285,15 +285,24 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
PMU_FORMAT_ATTR(event, "config:0-15");
PMU_FORMAT_ATTR(long, "config1:0");
PMU_FORMAT_ATTR(rdpmc, "config1:1");
static int sysctl_perf_user_access __read_mostly;
static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
{
return event->attr.config1 & 0x1;
}
static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
{
return event->attr.config1 & 0x2;
}
static struct attribute *armv8_pmuv3_format_attrs[] = {
&format_attr_event.attr,
&format_attr_long.attr,
&format_attr_rdpmc.attr,
NULL,
};
@ -362,7 +371,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
*/
#define ARMV8_IDX_CYCLE_COUNTER 0
#define ARMV8_IDX_COUNTER0 1
#define ARMV8_IDX_CYCLE_COUNTER_USER 32
/*
* We unconditionally enable ARMv8.5-PMU long event counter support
@ -374,18 +383,22 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
return (cpu_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_5);
}
static inline bool armv8pmu_event_has_user_read(struct perf_event *event)
{
return event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT;
}
/*
* We must chain two programmable counters for 64 bit events,
* except when we have allocated the 64bit cycle counter (for CPU
* cycles event). This must be called only when the event has
* a counter allocated.
* cycles event) or when user space counter access is enabled.
*/
static inline bool armv8pmu_event_is_chained(struct perf_event *event)
{
int idx = event->hw.idx;
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
return !WARN_ON(idx < 0) &&
return !armv8pmu_event_has_user_read(event) &&
armv8pmu_event_is_64bit(event) &&
!armv8pmu_has_long_event(cpu_pmu) &&
(idx != ARMV8_IDX_CYCLE_COUNTER);
@ -718,6 +731,28 @@ static inline u32 armv8pmu_getreset_flags(void)
return value;
}
static void armv8pmu_disable_user_access(void)
{
write_sysreg(0, pmuserenr_el0);
}
static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
{
int i;
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
/* Clear any unused counters to avoid leaking their contents */
for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
if (i == ARMV8_IDX_CYCLE_COUNTER)
write_sysreg(0, pmccntr_el0);
else
armv8pmu_write_evcntr(i, 0);
}
write_sysreg(0, pmuserenr_el0);
write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0);
}
static void armv8pmu_enable_event(struct perf_event *event)
{
/*
@ -761,6 +796,14 @@ static void armv8pmu_disable_event(struct perf_event *event)
static void armv8pmu_start(struct arm_pmu *cpu_pmu)
{
struct perf_event_context *task_ctx =
this_cpu_ptr(cpu_pmu->pmu.pmu_cpu_context)->task_ctx;
if (sysctl_perf_user_access && task_ctx && task_ctx->nr_user)
armv8pmu_enable_user_access(cpu_pmu);
else
armv8pmu_disable_user_access();
/* Enable all counters */
armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
}
@ -878,13 +921,16 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
return ARMV8_IDX_CYCLE_COUNTER;
else if (armv8pmu_event_is_64bit(event) &&
armv8pmu_event_want_user_access(event) &&
!armv8pmu_has_long_event(cpu_pmu))
return -EAGAIN;
}
/*
* Otherwise use events counters
*/
if (armv8pmu_event_is_64bit(event) &&
!armv8pmu_has_long_event(cpu_pmu))
if (armv8pmu_event_is_chained(event))
return armv8pmu_get_chain_idx(cpuc, cpu_pmu);
else
return armv8pmu_get_single_idx(cpuc, cpu_pmu);
@ -900,6 +946,22 @@ static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
clear_bit(idx - 1, cpuc->used_mask);
}
static int armv8pmu_user_event_idx(struct perf_event *event)
{
if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event))
return 0;
/*
* We remap the cycle counter index to 32 to
* match the offset applied to the rest of
* the counter indices.
*/
if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
return ARMV8_IDX_CYCLE_COUNTER_USER;
return event->hw.idx;
}
/*
* Add an event filter to a given event.
*/
@ -996,6 +1058,25 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
if (armv8pmu_event_is_64bit(event))
event->hw.flags |= ARMPMU_EVT_64BIT;
/*
* User events must be allocated into a single counter, and so
* must not be chained.
*
* Most 64-bit events require long counter support, but 64-bit
* CPU_CYCLES events can be placed into the dedicated cycle
* counter when this is free.
*/
if (armv8pmu_event_want_user_access(event)) {
if (!(event->attach_state & PERF_ATTACH_TASK))
return -EINVAL;
if (armv8pmu_event_is_64bit(event) &&
(hw_event_id != ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
!armv8pmu_has_long_event(armpmu))
return -EOPNOTSUPP;
event->hw.flags |= PERF_EVENT_FLAG_USER_READ_CNT;
}
/* Only expose micro/arch events supported by this PMU */
if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
&& test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
@ -1104,6 +1185,43 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
return probe.present ? 0 : -ENODEV;
}
static void armv8pmu_disable_user_access_ipi(void *unused)
{
armv8pmu_disable_user_access();
}
static int armv8pmu_proc_user_access_handler(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (ret || !write || sysctl_perf_user_access)
return ret;
on_each_cpu(armv8pmu_disable_user_access_ipi, NULL, 1);
return 0;
}
static struct ctl_table armv8_pmu_sysctl_table[] = {
{
.procname = "perf_user_access",
.data = &sysctl_perf_user_access,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = armv8pmu_proc_user_access_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
{ }
};
static void armv8_pmu_register_sysctl_table(void)
{
static u32 tbl_registered = 0;
if (!cmpxchg_relaxed(&tbl_registered, 0, 1))
register_sysctl("kernel", armv8_pmu_sysctl_table);
}
static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
int (*map_event)(struct perf_event *event),
const struct attribute_group *events,
@ -1127,6 +1245,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
cpu_pmu->set_event_filter = armv8pmu_set_event_filter;
cpu_pmu->filter_match = armv8pmu_filter_match;
cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
cpu_pmu->name = name;
cpu_pmu->map_event = map_event;
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
@ -1136,6 +1256,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_CAPS] = caps ?
caps : &armv8_pmuv3_caps_attr_group;
armv8_pmu_register_sysctl_table();
return 0;
}
@ -1145,17 +1266,32 @@ static int armv8_pmu_init_nogroups(struct arm_pmu *cpu_pmu, char *name,
return armv8_pmu_init(cpu_pmu, name, map_event, NULL, NULL, NULL);
}
static int armv8_pmuv3_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_pmuv3",
armv8_pmuv3_map_event);
#define PMUV3_INIT_SIMPLE(name) \
static int name##_pmu_init(struct arm_pmu *cpu_pmu) \
{ \
return armv8_pmu_init_nogroups(cpu_pmu, #name, armv8_pmuv3_map_event);\
}
static int armv8_a34_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a34",
armv8_pmuv3_map_event);
}
PMUV3_INIT_SIMPLE(armv8_pmuv3)
PMUV3_INIT_SIMPLE(armv8_cortex_a34)
PMUV3_INIT_SIMPLE(armv8_cortex_a55)
PMUV3_INIT_SIMPLE(armv8_cortex_a65)
PMUV3_INIT_SIMPLE(armv8_cortex_a75)
PMUV3_INIT_SIMPLE(armv8_cortex_a76)
PMUV3_INIT_SIMPLE(armv8_cortex_a77)
PMUV3_INIT_SIMPLE(armv8_cortex_a78)
PMUV3_INIT_SIMPLE(armv9_cortex_a510)
PMUV3_INIT_SIMPLE(armv9_cortex_a710)
PMUV3_INIT_SIMPLE(armv8_cortex_x1)
PMUV3_INIT_SIMPLE(armv9_cortex_x2)
PMUV3_INIT_SIMPLE(armv8_neoverse_e1)
PMUV3_INIT_SIMPLE(armv8_neoverse_n1)
PMUV3_INIT_SIMPLE(armv9_neoverse_n2)
PMUV3_INIT_SIMPLE(armv8_neoverse_v1)
PMUV3_INIT_SIMPLE(armv8_nvidia_carmel)
PMUV3_INIT_SIMPLE(armv8_nvidia_denver)
static int armv8_a35_pmu_init(struct arm_pmu *cpu_pmu)
{
@ -1169,24 +1305,12 @@ static int armv8_a53_pmu_init(struct arm_pmu *cpu_pmu)
armv8_a53_map_event);
}
static int armv8_a55_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a55",
armv8_pmuv3_map_event);
}
static int armv8_a57_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a57",
armv8_a57_map_event);
}
static int armv8_a65_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a65",
armv8_pmuv3_map_event);
}
static int armv8_a72_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a72",
@ -1199,42 +1323,6 @@ static int armv8_a73_pmu_init(struct arm_pmu *cpu_pmu)
armv8_a73_map_event);
}
static int armv8_a75_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a75",
armv8_pmuv3_map_event);
}
static int armv8_a76_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a76",
armv8_pmuv3_map_event);
}
static int armv8_a77_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a77",
armv8_pmuv3_map_event);
}
static int armv8_a78_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cortex_a78",
armv8_pmuv3_map_event);
}
static int armv8_e1_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_neoverse_e1",
armv8_pmuv3_map_event);
}
static int armv8_n1_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_neoverse_n1",
armv8_pmuv3_map_event);
}
static int armv8_thunder_pmu_init(struct arm_pmu *cpu_pmu)
{
return armv8_pmu_init_nogroups(cpu_pmu, "armv8_cavium_thunder",
@ -1248,23 +1336,31 @@ static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
}
static const struct of_device_id armv8_pmu_of_device_ids[] = {
{.compatible = "arm,armv8-pmuv3", .data = armv8_pmuv3_init},
{.compatible = "arm,cortex-a34-pmu", .data = armv8_a34_pmu_init},
{.compatible = "arm,armv8-pmuv3", .data = armv8_pmuv3_pmu_init},
{.compatible = "arm,cortex-a34-pmu", .data = armv8_cortex_a34_pmu_init},
{.compatible = "arm,cortex-a35-pmu", .data = armv8_a35_pmu_init},
{.compatible = "arm,cortex-a53-pmu", .data = armv8_a53_pmu_init},
{.compatible = "arm,cortex-a55-pmu", .data = armv8_a55_pmu_init},
{.compatible = "arm,cortex-a55-pmu", .data = armv8_cortex_a55_pmu_init},
{.compatible = "arm,cortex-a57-pmu", .data = armv8_a57_pmu_init},
{.compatible = "arm,cortex-a65-pmu", .data = armv8_a65_pmu_init},
{.compatible = "arm,cortex-a65-pmu", .data = armv8_cortex_a65_pmu_init},
{.compatible = "arm,cortex-a72-pmu", .data = armv8_a72_pmu_init},
{.compatible = "arm,cortex-a73-pmu", .data = armv8_a73_pmu_init},
{.compatible = "arm,cortex-a75-pmu", .data = armv8_a75_pmu_init},
{.compatible = "arm,cortex-a76-pmu", .data = armv8_a76_pmu_init},
{.compatible = "arm,cortex-a77-pmu", .data = armv8_a77_pmu_init},
{.compatible = "arm,cortex-a78-pmu", .data = armv8_a78_pmu_init},
{.compatible = "arm,neoverse-e1-pmu", .data = armv8_e1_pmu_init},
{.compatible = "arm,neoverse-n1-pmu", .data = armv8_n1_pmu_init},
{.compatible = "arm,cortex-a75-pmu", .data = armv8_cortex_a75_pmu_init},
{.compatible = "arm,cortex-a76-pmu", .data = armv8_cortex_a76_pmu_init},
{.compatible = "arm,cortex-a77-pmu", .data = armv8_cortex_a77_pmu_init},
{.compatible = "arm,cortex-a78-pmu", .data = armv8_cortex_a78_pmu_init},
{.compatible = "arm,cortex-a510-pmu", .data = armv9_cortex_a510_pmu_init},
{.compatible = "arm,cortex-a710-pmu", .data = armv9_cortex_a710_pmu_init},
{.compatible = "arm,cortex-x1-pmu", .data = armv8_cortex_x1_pmu_init},
{.compatible = "arm,cortex-x2-pmu", .data = armv9_cortex_x2_pmu_init},
{.compatible = "arm,neoverse-e1-pmu", .data = armv8_neoverse_e1_pmu_init},
{.compatible = "arm,neoverse-n1-pmu", .data = armv8_neoverse_n1_pmu_init},
{.compatible = "arm,neoverse-n2-pmu", .data = armv9_neoverse_n2_pmu_init},
{.compatible = "arm,neoverse-v1-pmu", .data = armv8_neoverse_v1_pmu_init},
{.compatible = "cavium,thunder-pmu", .data = armv8_thunder_pmu_init},
{.compatible = "brcm,vulcan-pmu", .data = armv8_vulcan_pmu_init},
{.compatible = "nvidia,carmel-pmu", .data = armv8_nvidia_carmel_pmu_init},
{.compatible = "nvidia,denver-pmu", .data = armv8_nvidia_denver_pmu_init},
{},
};
@ -1287,7 +1383,7 @@ static int __init armv8_pmu_driver_init(void)
if (acpi_disabled)
return platform_driver_register(&armv8_pmu_driver);
else
return arm_pmu_acpi_probe(armv8_pmuv3_init);
return arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
}
device_initcall(armv8_pmu_driver_init)
@ -1301,6 +1397,14 @@ void arch_perf_update_userpage(struct perf_event *event,
userpg->cap_user_time = 0;
userpg->cap_user_time_zero = 0;
userpg->cap_user_time_short = 0;
userpg->cap_user_rdpmc = armv8pmu_event_has_user_read(event);
if (userpg->cap_user_rdpmc) {
if (event->hw.flags & ARMPMU_EVT_64BIT)
userpg->pmc_width = 64;
else
userpg->pmc_width = 32;
}
do {
rd = sched_clock_read_begin(&seq);

Просмотреть файл

@ -40,6 +40,7 @@
#include <linux/percpu.h>
#include <linux/thread_info.h>
#include <linux/prctl.h>
#include <linux/stacktrace.h>
#include <asm/alternative.h>
#include <asm/compat.h>
@ -439,34 +440,26 @@ static void entry_task_switch(struct task_struct *next)
/*
* ARM erratum 1418040 handling, affecting the 32bit view of CNTVCT.
* Assuming the virtual counter is enabled at the beginning of times:
*
* - disable access when switching from a 64bit task to a 32bit task
* - enable access when switching from a 32bit task to a 64bit task
* Ensure access is disabled when switching to a 32bit task, ensure
* access is enabled when switching to a 64bit task.
*/
static void erratum_1418040_thread_switch(struct task_struct *prev,
struct task_struct *next)
static void erratum_1418040_thread_switch(struct task_struct *next)
{
bool prev32, next32;
u64 val;
if (!IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040))
if (!IS_ENABLED(CONFIG_ARM64_ERRATUM_1418040) ||
!this_cpu_has_cap(ARM64_WORKAROUND_1418040))
return;
prev32 = is_compat_thread(task_thread_info(prev));
next32 = is_compat_thread(task_thread_info(next));
if (prev32 == next32 || !this_cpu_has_cap(ARM64_WORKAROUND_1418040))
return;
val = read_sysreg(cntkctl_el1);
if (!next32)
val |= ARCH_TIMER_USR_VCT_ACCESS_EN;
if (is_compat_thread(task_thread_info(next)))
sysreg_clear_set(cntkctl_el1, ARCH_TIMER_USR_VCT_ACCESS_EN, 0);
else
val &= ~ARCH_TIMER_USR_VCT_ACCESS_EN;
sysreg_clear_set(cntkctl_el1, 0, ARCH_TIMER_USR_VCT_ACCESS_EN);
}
write_sysreg(val, cntkctl_el1);
static void erratum_1418040_new_exec(void)
{
preempt_disable();
erratum_1418040_thread_switch(current);
preempt_enable();
}
/*
@ -490,7 +483,8 @@ void update_sctlr_el1(u64 sctlr)
/*
* Thread switching.
*/
__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
__notrace_funcgraph __sched
struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *next)
{
struct task_struct *last;
@ -501,7 +495,7 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
contextidr_thread_switch(next);
entry_task_switch(next);
ssbs_thread_switch(next);
erratum_1418040_thread_switch(prev, next);
erratum_1418040_thread_switch(next);
ptrauth_thread_switch_user(next);
/*
@ -528,30 +522,37 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
return last;
}
struct wchan_info {
unsigned long pc;
int count;
};
static bool get_wchan_cb(void *arg, unsigned long pc)
{
struct wchan_info *wchan_info = arg;
if (!in_sched_functions(pc)) {
wchan_info->pc = pc;
return false;
}
return wchan_info->count++ < 16;
}
unsigned long __get_wchan(struct task_struct *p)
{
struct stackframe frame;
unsigned long stack_page, ret = 0;
int count = 0;
struct wchan_info wchan_info = {
.pc = 0,
.count = 0,
};
stack_page = (unsigned long)try_get_task_stack(p);
if (!stack_page)
if (!try_get_task_stack(p))
return 0;
start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
do {
if (unwind_frame(p, &frame))
goto out;
if (!in_sched_functions(frame.pc)) {
ret = frame.pc;
goto out;
}
} while (count++ < 16);
out:
put_task_stack(p);
return ret;
return wchan_info.pc;
}
unsigned long arch_align_stack(unsigned long sp)
@ -611,6 +612,7 @@ void arch_setup_new_exec(void)
current->mm->context.flags = mmflags;
ptrauth_thread_init_user();
mte_thread_init_user();
erratum_1418040_new_exec();
if (task_spec_ssb_noexec(current)) {
arch_prctl_spec_ctrl_set(current, PR_SPEC_STORE_BYPASS,

Просмотреть файл

@ -812,9 +812,9 @@ static int sve_set(struct task_struct *target,
/*
* Apart from SVE_PT_REGS_MASK, all SVE_PT_* flags are consumed by
* sve_set_vector_length(), which will also validate them for us:
* vec_set_vector_length(), which will also validate them for us:
*/
ret = sve_set_vector_length(target, header.vl,
ret = vec_set_vector_length(target, ARM64_VEC_SVE, header.vl,
((unsigned long)header.flags & ~SVE_PT_REGS_MASK) << 16);
if (ret)
goto out;

Просмотреть файл

@ -9,9 +9,9 @@
#include <linux/export.h>
#include <linux/ftrace.h>
#include <linux/kprobes.h>
#include <linux/stacktrace.h>
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
struct return_address_data {
unsigned int level;
@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
void *return_address(unsigned int level)
{
struct return_address_data data;
struct stackframe frame;
data.level = level + 2;
data.addr = NULL;
start_backtrace(&frame,
(unsigned long)__builtin_frame_address(0),
(unsigned long)return_address);
walk_stackframe(current, &frame, save_return_addr, &data);
arch_stack_walk(save_return_addr, &data, current, NULL);
if (!data.level)
return data.addr;

Просмотреть файл

@ -189,11 +189,16 @@ static void __init setup_machine_fdt(phys_addr_t dt_phys)
if (!dt_virt || !early_init_dt_scan(dt_virt)) {
pr_crit("\n"
"Error: invalid device tree blob at physical address %pa (virtual address 0x%p)\n"
"Error: invalid device tree blob at physical address %pa (virtual address 0x%px)\n"
"The dtb must be 8-byte aligned and must not exceed 2 MB in size\n"
"\nPlease check your bootloader.",
&dt_phys, dt_virt);
/*
* Note that in this _really_ early stage we cannot even BUG()
* or oops, so the least terrible thing to do is cpu_relax(),
* or else we could end-up printing non-initialized data, etc.
*/
while (true)
cpu_relax();
}
@ -232,12 +237,14 @@ static void __init request_standard_resources(void)
if (memblock_is_nomap(region)) {
res->name = "reserved";
res->flags = IORESOURCE_MEM;
res->start = __pfn_to_phys(memblock_region_reserved_base_pfn(region));
res->end = __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
} else {
res->name = "System RAM";
res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
}
res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
request_resource(&iomem_resource, res);

Просмотреть файл

@ -33,8 +33,8 @@
*/
void start_backtrace(struct stackframe *frame, unsigned long fp,
unsigned long pc)
static void start_backtrace(struct stackframe *frame, unsigned long fp,
unsigned long pc)
{
frame->fp = fp;
frame->pc = pc;
@ -63,7 +63,8 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
* records (e.g. a cycle), determined based on the location and fp value of A
* and the location (but not the fp value) of B.
*/
int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
static int notrace unwind_frame(struct task_struct *tsk,
struct stackframe *frame)
{
unsigned long fp = frame->fp;
struct stack_info info;
@ -141,8 +142,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
}
NOKPROBE_SYMBOL(unwind_frame);
void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
bool (*fn)(void *, unsigned long), void *data)
static void notrace walk_stackframe(struct task_struct *tsk,
struct stackframe *frame,
bool (*fn)(void *, unsigned long), void *data)
{
while (1) {
int ret;
@ -156,24 +158,20 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
}
NOKPROBE_SYMBOL(walk_stackframe);
static void dump_backtrace_entry(unsigned long where, const char *loglvl)
static bool dump_backtrace_entry(void *arg, unsigned long where)
{
char *loglvl = arg;
printk("%s %pSb\n", loglvl, (void *)where);
return true;
}
void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl)
{
struct stackframe frame;
int skip = 0;
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
if (regs) {
if (user_mode(regs))
return;
skip = 1;
}
if (regs && user_mode(regs))
return;
if (!tsk)
tsk = current;
@ -181,36 +179,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
if (!try_get_task_stack(tsk))
return;
if (tsk == current) {
start_backtrace(&frame,
(unsigned long)__builtin_frame_address(0),
(unsigned long)dump_backtrace);
} else {
/*
* task blocked in __switch_to
*/
start_backtrace(&frame,
thread_saved_fp(tsk),
thread_saved_pc(tsk));
}
printk("%sCall trace:\n", loglvl);
do {
/* skip until specified stack frame */
if (!skip) {
dump_backtrace_entry(frame.pc, loglvl);
} else if (frame.fp == regs->regs[29]) {
skip = 0;
/*
* Mostly, this is the case where this function is
* called in panic/abort. As exception handler's
* stack frame does not contain the corresponding pc
* at which an exception has taken place, use regs->pc
* instead.
*/
dump_backtrace_entry(regs->pc, loglvl);
}
} while (!unwind_frame(tsk, &frame));
arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
put_task_stack(tsk);
}
@ -221,8 +191,6 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
barrier();
}
#ifdef CONFIG_STACKTRACE
noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
void *cookie, struct task_struct *task,
struct pt_regs *regs)
@ -241,5 +209,3 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
walk_stackframe(task, &frame, consume_entry, cookie);
}
#endif

Просмотреть файл

@ -18,6 +18,7 @@
#include <linux/timex.h>
#include <linux/errno.h>
#include <linux/profile.h>
#include <linux/stacktrace.h>
#include <linux/syscore_ops.h>
#include <linux/timer.h>
#include <linux/irq.h>
@ -29,25 +30,25 @@
#include <clocksource/arm_arch_timer.h>
#include <asm/thread_info.h>
#include <asm/stacktrace.h>
#include <asm/paravirt.h>
static bool profile_pc_cb(void *arg, unsigned long pc)
{
unsigned long *prof_pc = arg;
if (in_lock_functions(pc))
return true;
*prof_pc = pc;
return false;
}
unsigned long profile_pc(struct pt_regs *regs)
{
struct stackframe frame;
unsigned long prof_pc = 0;
if (!in_lock_functions(regs->pc))
return regs->pc;
arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
start_backtrace(&frame, regs->regs[29], regs->pc);
do {
int ret = unwind_frame(NULL, &frame);
if (ret < 0)
return 0;
} while (in_lock_functions(frame.pc));
return frame.pc;
return prof_pc;
}
EXPORT_SYMBOL(profile_pc);

Просмотреть файл

@ -32,6 +32,7 @@ ccflags-y += -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) \
$(CC_FLAGS_LTO)
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
KCOV_INSTRUMENT := n

Просмотреть файл

@ -140,9 +140,12 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu)
return 1;
}
/*
* Guest access to SVE registers should be routed to this handler only
* when the system doesn't support SVE.
*/
static int handle_sve(struct kvm_vcpu *vcpu)
{
/* Until SVE is supported for guests: */
kvm_inject_undefined(vcpu);
return 1;
}

Просмотреть файл

@ -89,6 +89,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) $(CC_FLAGS_CFI)
# cause crashes. Just disable it.
GCOV_PROFILE := n
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n

Просмотреть файл

@ -52,10 +52,10 @@ int kvm_arm_init_sve(void)
* The get_sve_reg()/set_sve_reg() ioctl interface will need
* to be extended with multiple register slice support in
* order to support vector lengths greater than
* SVE_VL_ARCH_MAX:
* VL_ARCH_MAX:
*/
if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
kvm_sve_max_vl = SVE_VL_ARCH_MAX;
if (WARN_ON(kvm_sve_max_vl > VL_ARCH_MAX))
kvm_sve_max_vl = VL_ARCH_MAX;
/*
* Don't even try to make use of vector lengths that
@ -103,7 +103,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
* set_sve_vls(). Double-check here just to be sure:
*/
if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl() ||
vl > SVE_VL_ARCH_MAX))
vl > VL_ARCH_MAX))
return -EIO;
buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL_ACCOUNT);

Просмотреть файл

@ -1525,7 +1525,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
/* CRm=6 */
ID_SANITISED(ID_AA64ISAR0_EL1),
ID_SANITISED(ID_AA64ISAR1_EL1),
ID_UNALLOCATED(6,2),
ID_SANITISED(ID_AA64ISAR2_EL1),
ID_UNALLOCATED(6,3),
ID_UNALLOCATED(6,4),
ID_UNALLOCATED(6,5),

Просмотреть файл

@ -16,6 +16,7 @@
*/
SYM_FUNC_START_PI(clear_page)
mrs x1, dczid_el0
tbnz x1, #4, 2f /* Branch if DC ZVA is prohibited */
and w1, w1, #0xf
mov x2, #4
lsl x1, x2, x1
@ -25,5 +26,14 @@ SYM_FUNC_START_PI(clear_page)
tst x0, #(PAGE_SIZE - 1)
b.ne 1b
ret
2: stnp xzr, xzr, [x0]
stnp xzr, xzr, [x0, #16]
stnp xzr, xzr, [x0, #32]
stnp xzr, xzr, [x0, #48]
add x0, x0, #64
tst x0, #(PAGE_SIZE - 1)
b.ne 2b
ret
SYM_FUNC_END_PI(clear_page)
EXPORT_SYMBOL(clear_page)

Просмотреть файл

@ -38,9 +38,7 @@
* incremented by 256 prior to return).
*/
SYM_CODE_START(__hwasan_tag_mismatch)
#ifdef BTI_C
BTI_C
#endif
bti c
add x29, sp, #232
stp x2, x3, [sp, #8 * 2]
stp x4, x5, [sp, #8 * 4]

Просмотреть файл

@ -43,17 +43,23 @@ SYM_FUNC_END(mte_clear_page_tags)
* x0 - address to the beginning of the page
*/
SYM_FUNC_START(mte_zero_clear_page_tags)
and x0, x0, #(1 << MTE_TAG_SHIFT) - 1 // clear the tag
mrs x1, dczid_el0
tbnz x1, #4, 2f // Branch if DC GZVA is prohibited
and w1, w1, #0xf
mov x2, #4
lsl x1, x2, x1
and x0, x0, #(1 << MTE_TAG_SHIFT) - 1 // clear the tag
1: dc gzva, x0
add x0, x0, x1
tst x0, #(PAGE_SIZE - 1)
b.ne 1b
ret
2: stz2g x0, [x0], #(MTE_GRANULE_SIZE * 2)
tst x0, #(PAGE_SIZE - 1)
b.ne 2b
ret
SYM_FUNC_END(mte_zero_clear_page_tags)
/*

Просмотреть файл

@ -167,7 +167,7 @@ void xor_arm64_neon_5(unsigned long bytes, unsigned long *p1,
} while (--lines > 0);
}
struct xor_block_template const xor_block_inner_neon = {
struct xor_block_template xor_block_inner_neon __ro_after_init = {
.name = "__inner_neon__",
.do_2 = xor_arm64_neon_2,
.do_3 = xor_arm64_neon_3,
@ -176,6 +176,151 @@ struct xor_block_template const xor_block_inner_neon = {
};
EXPORT_SYMBOL(xor_block_inner_neon);
static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r)
{
uint64x2_t res;
asm(ARM64_ASM_PREAMBLE ".arch_extension sha3\n"
"eor3 %0.16b, %1.16b, %2.16b, %3.16b"
: "=w"(res) : "w"(p), "w"(q), "w"(r));
return res;
}
static void xor_arm64_eor3_3(unsigned long bytes, unsigned long *p1,
unsigned long *p2, unsigned long *p3)
{
uint64_t *dp1 = (uint64_t *)p1;
uint64_t *dp2 = (uint64_t *)p2;
uint64_t *dp3 = (uint64_t *)p3;
register uint64x2_t v0, v1, v2, v3;
long lines = bytes / (sizeof(uint64x2_t) * 4);
do {
/* p1 ^= p2 ^ p3 */
v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0),
vld1q_u64(dp3 + 0));
v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2),
vld1q_u64(dp3 + 2));
v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4),
vld1q_u64(dp3 + 4));
v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6),
vld1q_u64(dp3 + 6));
/* store */
vst1q_u64(dp1 + 0, v0);
vst1q_u64(dp1 + 2, v1);
vst1q_u64(dp1 + 4, v2);
vst1q_u64(dp1 + 6, v3);
dp1 += 8;
dp2 += 8;
dp3 += 8;
} while (--lines > 0);
}
static void xor_arm64_eor3_4(unsigned long bytes, unsigned long *p1,
unsigned long *p2, unsigned long *p3,
unsigned long *p4)
{
uint64_t *dp1 = (uint64_t *)p1;
uint64_t *dp2 = (uint64_t *)p2;
uint64_t *dp3 = (uint64_t *)p3;
uint64_t *dp4 = (uint64_t *)p4;
register uint64x2_t v0, v1, v2, v3;
long lines = bytes / (sizeof(uint64x2_t) * 4);
do {
/* p1 ^= p2 ^ p3 */
v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0),
vld1q_u64(dp3 + 0));
v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2),
vld1q_u64(dp3 + 2));
v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4),
vld1q_u64(dp3 + 4));
v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6),
vld1q_u64(dp3 + 6));
/* p1 ^= p4 */
v0 = veorq_u64(v0, vld1q_u64(dp4 + 0));
v1 = veorq_u64(v1, vld1q_u64(dp4 + 2));
v2 = veorq_u64(v2, vld1q_u64(dp4 + 4));
v3 = veorq_u64(v3, vld1q_u64(dp4 + 6));
/* store */
vst1q_u64(dp1 + 0, v0);
vst1q_u64(dp1 + 2, v1);
vst1q_u64(dp1 + 4, v2);
vst1q_u64(dp1 + 6, v3);
dp1 += 8;
dp2 += 8;
dp3 += 8;
dp4 += 8;
} while (--lines > 0);
}
static void xor_arm64_eor3_5(unsigned long bytes, unsigned long *p1,
unsigned long *p2, unsigned long *p3,
unsigned long *p4, unsigned long *p5)
{
uint64_t *dp1 = (uint64_t *)p1;
uint64_t *dp2 = (uint64_t *)p2;
uint64_t *dp3 = (uint64_t *)p3;
uint64_t *dp4 = (uint64_t *)p4;
uint64_t *dp5 = (uint64_t *)p5;
register uint64x2_t v0, v1, v2, v3;
long lines = bytes / (sizeof(uint64x2_t) * 4);
do {
/* p1 ^= p2 ^ p3 */
v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0),
vld1q_u64(dp3 + 0));
v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2),
vld1q_u64(dp3 + 2));
v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4),
vld1q_u64(dp3 + 4));
v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6),
vld1q_u64(dp3 + 6));
/* p1 ^= p4 ^ p5 */
v0 = eor3(v0, vld1q_u64(dp4 + 0), vld1q_u64(dp5 + 0));
v1 = eor3(v1, vld1q_u64(dp4 + 2), vld1q_u64(dp5 + 2));
v2 = eor3(v2, vld1q_u64(dp4 + 4), vld1q_u64(dp5 + 4));
v3 = eor3(v3, vld1q_u64(dp4 + 6), vld1q_u64(dp5 + 6));
/* store */
vst1q_u64(dp1 + 0, v0);
vst1q_u64(dp1 + 2, v1);
vst1q_u64(dp1 + 4, v2);
vst1q_u64(dp1 + 6, v3);
dp1 += 8;
dp2 += 8;
dp3 += 8;
dp4 += 8;
dp5 += 8;
} while (--lines > 0);
}
static int __init xor_neon_init(void)
{
if (IS_ENABLED(CONFIG_AS_HAS_SHA3) && cpu_have_named_feature(SHA3)) {
xor_block_inner_neon.do_3 = xor_arm64_eor3_3;
xor_block_inner_neon.do_4 = xor_arm64_eor3_4;
xor_block_inner_neon.do_5 = xor_arm64_eor3_5;
}
return 0;
}
module_init(xor_neon_init);
static void __exit xor_neon_exit(void)
{
}
module_exit(xor_neon_exit);
MODULE_AUTHOR("Jackie Liu <liuyun01@kylinos.cn>");
MODULE_DESCRIPTION("ARMv8 XOR Extensions");
MODULE_LICENSE("GPL");

Просмотреть файл

@ -140,15 +140,7 @@ SYM_FUNC_END(dcache_clean_pou)
* - start - kernel start address of region
* - end - kernel end address of region
*/
SYM_FUNC_START_LOCAL(__dma_inv_area)
SYM_FUNC_START_PI(dcache_inval_poc)
/* FALLTHROUGH */
/*
* __dma_inv_area(start, end)
* - start - virtual start address of region
* - end - virtual end address of region
*/
dcache_line_size x2, x3
sub x3, x2, #1
tst x1, x3 // end cache line aligned?
@ -167,7 +159,6 @@ SYM_FUNC_START_PI(dcache_inval_poc)
dsb sy
ret
SYM_FUNC_END_PI(dcache_inval_poc)
SYM_FUNC_END(__dma_inv_area)
/*
* dcache_clean_poc(start, end)
@ -178,19 +169,10 @@ SYM_FUNC_END(__dma_inv_area)
* - start - virtual start address of region
* - end - virtual end address of region
*/
SYM_FUNC_START_LOCAL(__dma_clean_area)
SYM_FUNC_START_PI(dcache_clean_poc)
/* FALLTHROUGH */
/*
* __dma_clean_area(start, end)
* - start - virtual start address of region
* - end - virtual end address of region
*/
dcache_by_line_op cvac, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(dcache_clean_poc)
SYM_FUNC_END(__dma_clean_area)
/*
* dcache_clean_pop(start, end)
@ -232,8 +214,8 @@ SYM_FUNC_END_PI(__dma_flush_area)
SYM_FUNC_START_PI(__dma_map_area)
add x1, x0, x1
cmp w2, #DMA_FROM_DEVICE
b.eq __dma_inv_area
b __dma_clean_area
b.eq __pi_dcache_inval_poc
b __pi_dcache_clean_poc
SYM_FUNC_END_PI(__dma_map_area)
/*
@ -245,6 +227,6 @@ SYM_FUNC_END_PI(__dma_map_area)
SYM_FUNC_START_PI(__dma_unmap_area)
add x1, x0, x1
cmp w2, #DMA_TO_DEVICE
b.ne __dma_inv_area
b.ne __pi_dcache_inval_poc
ret
SYM_FUNC_END_PI(__dma_unmap_area)

Просмотреть файл

@ -35,8 +35,8 @@ static unsigned long *pinned_asid_map;
#define ASID_FIRST_VERSION (1UL << asid_bits)
#define NUM_USER_ASIDS ASID_FIRST_VERSION
#define asid2idx(asid) ((asid) & ~ASID_MASK)
#define idx2asid(idx) asid2idx(idx)
#define ctxid2asid(asid) ((asid) & ~ASID_MASK)
#define asid2ctxid(asid, genid) ((asid) | (genid))
/* Get the ASIDBits supported by the current CPU */
static u32 get_cpu_asid_bits(void)
@ -50,10 +50,10 @@ static u32 get_cpu_asid_bits(void)
pr_warn("CPU%d: Unknown ASID size (%d); assuming 8-bit\n",
smp_processor_id(), fld);
fallthrough;
case 0:
case ID_AA64MMFR0_ASID_8:
asid = 8;
break;
case 2:
case ID_AA64MMFR0_ASID_16:
asid = 16;
}
@ -120,7 +120,7 @@ static void flush_context(void)
*/
if (asid == 0)
asid = per_cpu(reserved_asids, i);
__set_bit(asid2idx(asid), asid_map);
__set_bit(ctxid2asid(asid), asid_map);
per_cpu(reserved_asids, i) = asid;
}
@ -162,7 +162,7 @@ static u64 new_context(struct mm_struct *mm)
u64 generation = atomic64_read(&asid_generation);
if (asid != 0) {
u64 newasid = generation | (asid & ~ASID_MASK);
u64 newasid = asid2ctxid(ctxid2asid(asid), generation);
/*
* If our current ASID was active during a rollover, we
@ -183,7 +183,7 @@ static u64 new_context(struct mm_struct *mm)
* We had a valid ASID in a previous life, so try to re-use
* it if possible.
*/
if (!__test_and_set_bit(asid2idx(asid), asid_map))
if (!__test_and_set_bit(ctxid2asid(asid), asid_map))
return newasid;
}
@ -209,7 +209,7 @@ static u64 new_context(struct mm_struct *mm)
set_asid:
__set_bit(asid, asid_map);
cur_idx = asid;
return idx2asid(asid) | generation;
return asid2ctxid(asid, generation);
}
void check_and_switch_context(struct mm_struct *mm)
@ -300,13 +300,13 @@ unsigned long arm64_mm_context_get(struct mm_struct *mm)
}
nr_pinned_asids++;
__set_bit(asid2idx(asid), pinned_asid_map);
__set_bit(ctxid2asid(asid), pinned_asid_map);
refcount_set(&mm->context.pinned, 1);
out_unlock:
raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
asid &= ~ASID_MASK;
asid = ctxid2asid(asid);
/* Set the equivalent of USER_ASID_BIT */
if (asid && arm64_kernel_unmapped_at_el0())
@ -327,7 +327,7 @@ void arm64_mm_context_put(struct mm_struct *mm)
raw_spin_lock_irqsave(&cpu_asid_lock, flags);
if (refcount_dec_and_test(&mm->context.pinned)) {
__clear_bit(asid2idx(asid), pinned_asid_map);
__clear_bit(ctxid2asid(asid), pinned_asid_map);
nr_pinned_asids--;
}

Просмотреть файл

@ -10,9 +10,6 @@
#include <asm/asm-extable.h>
#include <asm/ptrace.h>
typedef bool (*ex_handler_t)(const struct exception_table_entry *,
struct pt_regs *);
static inline unsigned long
get_ex_fixup(const struct exception_table_entry *ex)
{

Просмотреть файл

@ -297,6 +297,8 @@ static void die_kernel_fault(const char *msg, unsigned long addr,
pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
addr);
kasan_non_canonical_hook(addr);
mem_abort_decode(esr);
show_pte(addr);
@ -813,11 +815,8 @@ void do_mem_abort(unsigned long far, unsigned int esr, struct pt_regs *regs)
if (!inf->fn(far, esr, regs))
return;
if (!user_mode(regs)) {
pr_alert("Unhandled fault at 0x%016lx\n", addr);
mem_abort_decode(esr);
show_pte(addr);
}
if (!user_mode(regs))
die_kernel_fault(inf->name, addr, esr, regs);
/*
* At this point we have an unrecognized fault type whose tag bits may

Просмотреть файл

@ -47,7 +47,7 @@ obj-y := cputable.o syscalls.o \
udbg.o misc.o io.o misc_$(BITS).o \
of_platform.o prom_parse.o firmware.o \
hw_breakpoint_constraints.o interrupt.o \
kdebugfs.o
kdebugfs.o stacktrace.o
obj-y += ptrace/
obj-$(CONFIG_PPC64) += setup_64.o \
paca.o nvram_64.o note.o
@ -116,7 +116,6 @@ obj-$(CONFIG_OPTPROBES) += optprobes.o optprobes_head.o
obj-$(CONFIG_KPROBES_ON_FTRACE) += kprobes-ftrace.o
obj-$(CONFIG_UPROBES) += uprobes.o
obj-$(CONFIG_PPC_UDBG_16550) += legacy_serial.o udbg_16550.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_SWIOTLB) += dma-swiotlb.o
obj-$(CONFIG_ARCH_HAS_DMA_SET_MASK) += dma-mask.o

Просмотреть файл

@ -139,12 +139,8 @@ unsigned long __get_wchan(struct task_struct *task)
return pc;
}
#ifdef CONFIG_STACKTRACE
noinline void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
struct task_struct *task, struct pt_regs *regs)
{
walk_stackframe(task, regs, consume_entry, cookie);
}
#endif /* CONFIG_STACKTRACE */

Просмотреть файл

@ -40,7 +40,7 @@ obj-y += sysinfo.o lgr.o os_info.o machine_kexec.o
obj-y += runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
obj-y += entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
obj-y += nospec-branch.o ipl_vmparm.o machine_kexec_reloc.o unwind_bc.o
obj-y += smp.o text_amode31.o
obj-y += smp.o text_amode31.o stacktrace.o
extra-y += head64.o vmlinux.lds
@ -55,7 +55,6 @@ compat-obj-$(CONFIG_AUDIT) += compat_audit.o
obj-$(CONFIG_COMPAT) += compat_linux.o compat_signal.o
obj-$(CONFIG_COMPAT) += $(compat-obj-y)
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_KPROBES) += kprobes_insn_page.o
obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o

Просмотреть файл

@ -2476,7 +2476,7 @@ static int x86_pmu_event_init(struct perf_event *event)
if (READ_ONCE(x86_pmu.attr_rdpmc) &&
!(event->hw.flags & PERF_X86_EVENT_LARGE_PEBS))
event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED;
event->hw.flags |= PERF_EVENT_FLAG_USER_READ_CNT;
return err;
}
@ -2510,7 +2510,7 @@ void perf_clear_dirty_counters(void)
static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
{
if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT))
return;
/*
@ -2531,7 +2531,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
{
if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT))
return;
if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
@ -2542,7 +2542,7 @@ static int x86_pmu_event_idx(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
if (!(hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
return 0;
if (is_metric_idx(hwc->idx))
@ -2725,7 +2725,7 @@ void arch_perf_update_userpage(struct perf_event *event,
userpg->cap_user_time = 0;
userpg->cap_user_time_zero = 0;
userpg->cap_user_rdpmc =
!!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED);
!!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT);
userpg->pmc_width = x86_pmu.cntval_bits;
if (!using_native_sched_clock() || !sched_clock_stable())

Просмотреть файл

@ -74,7 +74,7 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode)
#define PERF_X86_EVENT_PEBS_NA_HSW 0x0010 /* haswell style datala, unknown */
#define PERF_X86_EVENT_EXCL 0x0020 /* HT exclusivity on counter */
#define PERF_X86_EVENT_DYNAMIC 0x0040 /* dynamic alloc'd constraint */
#define PERF_X86_EVENT_RDPMC_ALLOWED 0x0080 /* grant rdpmc permission */
#define PERF_X86_EVENT_EXCL_ACCT 0x0100 /* accounted EXCL event */
#define PERF_X86_EVENT_AUTO_RELOAD 0x0200 /* use PEBS auto-reload */
#define PERF_X86_EVENT_LARGE_PEBS 0x0400 /* use large PEBS */

Просмотреть файл

@ -84,7 +84,7 @@ obj-$(CONFIG_IA32_EMULATION) += tls.o
obj-y += step.o
obj-$(CONFIG_INTEL_TXT) += tboot.o
obj-$(CONFIG_ISA_DMA_API) += i8237.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += stacktrace.o
obj-y += cpu/
obj-y += acpi/
obj-y += reboot.o

Просмотреть файл

@ -43,7 +43,7 @@ config ARM_CCN
config ARM_CMN
tristate "Arm CMN-600 PMU support"
depends on ARM64 || (COMPILE_TEST && 64BIT)
depends on ARM64 || COMPILE_TEST
help
Support for PMU events monitoring on the Arm CMN-600 Coherent Mesh
Network interconnect.
@ -139,6 +139,13 @@ config ARM_DMC620_PMU
Support for PMU events monitoring on the ARM DMC-620 memory
controller.
config MARVELL_CN10K_TAD_PMU
tristate "Marvell CN10K LLC-TAD PMU"
depends on ARM64 || (COMPILE_TEST && 64BIT)
help
Provides support for Last-Level cache Tag-and-data Units (LLC-TAD)
performance monitors on CN10K family silicons.
source "drivers/perf/hisilicon/Kconfig"
endmenu

Просмотреть файл

@ -14,3 +14,4 @@ obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -47,6 +47,7 @@
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/msi.h>
#include <linux/of.h>
#include <linux/perf_event.h>
#include <linux/platform_device.h>
#include <linux/smp.h>
@ -75,6 +76,10 @@
#define SMMU_PMCG_CR 0xE04
#define SMMU_PMCG_CR_ENABLE BIT(0)
#define SMMU_PMCG_IIDR 0xE08
#define SMMU_PMCG_IIDR_PRODUCTID GENMASK(31, 20)
#define SMMU_PMCG_IIDR_VARIANT GENMASK(19, 16)
#define SMMU_PMCG_IIDR_REVISION GENMASK(15, 12)
#define SMMU_PMCG_IIDR_IMPLEMENTER GENMASK(11, 0)
#define SMMU_PMCG_CEID0 0xE20
#define SMMU_PMCG_CEID1 0xE28
#define SMMU_PMCG_IRQ_CTRL 0xE50
@ -83,6 +88,20 @@
#define SMMU_PMCG_IRQ_CFG1 0xE60
#define SMMU_PMCG_IRQ_CFG2 0xE64
/* IMP-DEF ID registers */
#define SMMU_PMCG_PIDR0 0xFE0
#define SMMU_PMCG_PIDR0_PART_0 GENMASK(7, 0)
#define SMMU_PMCG_PIDR1 0xFE4
#define SMMU_PMCG_PIDR1_DES_0 GENMASK(7, 4)
#define SMMU_PMCG_PIDR1_PART_1 GENMASK(3, 0)
#define SMMU_PMCG_PIDR2 0xFE8
#define SMMU_PMCG_PIDR2_REVISION GENMASK(7, 4)
#define SMMU_PMCG_PIDR2_DES_1 GENMASK(2, 0)
#define SMMU_PMCG_PIDR3 0xFEC
#define SMMU_PMCG_PIDR3_REVAND GENMASK(7, 4)
#define SMMU_PMCG_PIDR4 0xFD0
#define SMMU_PMCG_PIDR4_DES_2 GENMASK(3, 0)
/* MSI config fields */
#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
#define MSI_CFG2_MEMATTR_DEVICE_nGnRE 0x1
@ -754,6 +773,41 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu *smmu_pmu)
dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
}
static bool smmu_pmu_coresight_id_regs(struct smmu_pmu *smmu_pmu)
{
return of_device_is_compatible(smmu_pmu->dev->of_node,
"arm,mmu-600-pmcg");
}
static void smmu_pmu_get_iidr(struct smmu_pmu *smmu_pmu)
{
u32 iidr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_IIDR);
if (!iidr && smmu_pmu_coresight_id_regs(smmu_pmu)) {
u32 pidr0 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR0);
u32 pidr1 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR1);
u32 pidr2 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR2);
u32 pidr3 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR3);
u32 pidr4 = readl(smmu_pmu->reg_base + SMMU_PMCG_PIDR4);
u32 productid = FIELD_GET(SMMU_PMCG_PIDR0_PART_0, pidr0) |
(FIELD_GET(SMMU_PMCG_PIDR1_PART_1, pidr1) << 8);
u32 variant = FIELD_GET(SMMU_PMCG_PIDR2_REVISION, pidr2);
u32 revision = FIELD_GET(SMMU_PMCG_PIDR3_REVAND, pidr3);
u32 implementer =
FIELD_GET(SMMU_PMCG_PIDR1_DES_0, pidr1) |
(FIELD_GET(SMMU_PMCG_PIDR2_DES_1, pidr2) << 4) |
(FIELD_GET(SMMU_PMCG_PIDR4_DES_2, pidr4) << 8);
iidr = FIELD_PREP(SMMU_PMCG_IIDR_PRODUCTID, productid) |
FIELD_PREP(SMMU_PMCG_IIDR_VARIANT, variant) |
FIELD_PREP(SMMU_PMCG_IIDR_REVISION, revision) |
FIELD_PREP(SMMU_PMCG_IIDR_IMPLEMENTER, implementer);
}
smmu_pmu->iidr = iidr;
}
static int smmu_pmu_probe(struct platform_device *pdev)
{
struct smmu_pmu *smmu_pmu;
@ -825,7 +879,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return err;
}
smmu_pmu->iidr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_IIDR);
smmu_pmu_get_iidr(smmu_pmu);
name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "smmuv3_pmcg_%llx",
(res_0->start) >> SMMU_PMCG_PA_SHIFT);
@ -834,7 +888,8 @@ static int smmu_pmu_probe(struct platform_device *pdev)
return -EINVAL;
}
smmu_pmu_get_acpi_options(smmu_pmu);
if (!dev->of_node)
smmu_pmu_get_acpi_options(smmu_pmu);
/* Pick one CPU to be the preferred one to use */
smmu_pmu->on_cpu = raw_smp_processor_id();
@ -884,9 +939,18 @@ static void smmu_pmu_shutdown(struct platform_device *pdev)
smmu_pmu_disable(&smmu_pmu->pmu);
}
#ifdef CONFIG_OF
static const struct of_device_id smmu_pmu_of_match[] = {
{ .compatible = "arm,smmu-v3-pmcg" },
{}
};
MODULE_DEVICE_TABLE(of, smmu_pmu_of_match);
#endif
static struct platform_driver smmu_pmu_driver = {
.driver = {
.name = "arm-smmu-v3-pmcg",
.of_match_table = of_match_ptr(smmu_pmu_of_match),
.suppress_bind_attrs = true,
},
.probe = smmu_pmu_probe,

Просмотреть файл

@ -5,3 +5,12 @@ config HISI_PMU
help
Support for HiSilicon SoC L3 Cache performance monitor, Hydra Home
Agent performance monitor and DDR Controller performance monitor.
config HISI_PCIE_PMU
tristate "HiSilicon PCIE PERF PMU"
depends on PCI && ARM64
help
Provide support for HiSilicon PCIe performance monitoring unit (PMU)
RCiEP devices.
Adds the PCIe PMU into perf events system for monitoring latency,
bandwidth etc.

Просмотреть файл

@ -2,3 +2,5 @@
obj-$(CONFIG_HISI_PMU) += hisi_uncore_pmu.o hisi_uncore_l3c_pmu.o \
hisi_uncore_hha_pmu.o hisi_uncore_ddrc_pmu.o hisi_uncore_sllc_pmu.o \
hisi_uncore_pa_pmu.o
obj-$(CONFIG_HISI_PCIE_PMU) += hisi_pcie_pmu.o

Просмотреть файл

@ -0,0 +1,948 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* This driver adds support for PCIe PMU RCiEP device. Related
* perf events are bandwidth, latency etc.
*
* Copyright (C) 2021 HiSilicon Limited
* Author: Qi Liu <liuqi115@huawei.com>
*/
#include <linux/bitfield.h>
#include <linux/bitmap.h>
#include <linux/bug.h>
#include <linux/device.h>
#include <linux/err.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/perf_event.h>
#define DRV_NAME "hisi_pcie_pmu"
/* Define registers */
#define HISI_PCIE_GLOBAL_CTRL 0x00
#define HISI_PCIE_EVENT_CTRL 0x010
#define HISI_PCIE_CNT 0x090
#define HISI_PCIE_EXT_CNT 0x110
#define HISI_PCIE_INT_STAT 0x150
#define HISI_PCIE_INT_MASK 0x154
#define HISI_PCIE_REG_BDF 0xfe0
#define HISI_PCIE_REG_VERSION 0xfe4
#define HISI_PCIE_REG_INFO 0xfe8
/* Define command in HISI_PCIE_GLOBAL_CTRL */
#define HISI_PCIE_GLOBAL_EN 0x01
#define HISI_PCIE_GLOBAL_NONE 0
/* Define command in HISI_PCIE_EVENT_CTRL */
#define HISI_PCIE_EVENT_EN BIT_ULL(20)
#define HISI_PCIE_RESET_CNT BIT_ULL(22)
#define HISI_PCIE_INIT_SET BIT_ULL(34)
#define HISI_PCIE_THR_EN BIT_ULL(26)
#define HISI_PCIE_TARGET_EN BIT_ULL(32)
#define HISI_PCIE_TRIG_EN BIT_ULL(52)
/* Define offsets in HISI_PCIE_EVENT_CTRL */
#define HISI_PCIE_EVENT_M GENMASK_ULL(15, 0)
#define HISI_PCIE_THR_MODE_M GENMASK_ULL(27, 27)
#define HISI_PCIE_THR_M GENMASK_ULL(31, 28)
#define HISI_PCIE_TARGET_M GENMASK_ULL(52, 36)
#define HISI_PCIE_TRIG_MODE_M GENMASK_ULL(53, 53)
#define HISI_PCIE_TRIG_M GENMASK_ULL(59, 56)
#define HISI_PCIE_MAX_COUNTERS 8
#define HISI_PCIE_REG_STEP 8
#define HISI_PCIE_THR_MAX_VAL 10
#define HISI_PCIE_TRIG_MAX_VAL 10
#define HISI_PCIE_MAX_PERIOD (GENMASK_ULL(63, 0))
#define HISI_PCIE_INIT_VAL BIT_ULL(63)
struct hisi_pcie_pmu {
struct perf_event *hw_events[HISI_PCIE_MAX_COUNTERS];
struct hlist_node node;
struct pci_dev *pdev;
struct pmu pmu;
void __iomem *base;
int irq;
u32 identifier;
/* Minimum and maximum BDF of root ports monitored by PMU */
u16 bdf_min;
u16 bdf_max;
int on_cpu;
};
struct hisi_pcie_reg_pair {
u16 lo;
u16 hi;
};
#define to_pcie_pmu(p) (container_of((p), struct hisi_pcie_pmu, pmu))
#define GET_PCI_DEVFN(bdf) ((bdf) & 0xff)
#define HISI_PCIE_PMU_FILTER_ATTR(_name, _config, _hi, _lo) \
static u64 hisi_pcie_get_##_name(struct perf_event *event) \
{ \
return FIELD_GET(GENMASK(_hi, _lo), event->attr._config); \
} \
HISI_PCIE_PMU_FILTER_ATTR(event, config, 16, 0);
HISI_PCIE_PMU_FILTER_ATTR(thr_len, config1, 3, 0);
HISI_PCIE_PMU_FILTER_ATTR(thr_mode, config1, 4, 4);
HISI_PCIE_PMU_FILTER_ATTR(trig_len, config1, 8, 5);
HISI_PCIE_PMU_FILTER_ATTR(trig_mode, config1, 9, 9);
HISI_PCIE_PMU_FILTER_ATTR(port, config2, 15, 0);
HISI_PCIE_PMU_FILTER_ATTR(bdf, config2, 31, 16);
static ssize_t hisi_pcie_format_sysfs_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct dev_ext_attribute *eattr;
eattr = container_of(attr, struct dev_ext_attribute, attr);
return sysfs_emit(buf, "%s\n", (char *)eattr->var);
}
static ssize_t hisi_pcie_event_sysfs_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct perf_pmu_events_attr *pmu_attr =
container_of(attr, struct perf_pmu_events_attr, attr);
return sysfs_emit(buf, "config=0x%llx\n", pmu_attr->id);
}
#define HISI_PCIE_PMU_FORMAT_ATTR(_name, _format) \
(&((struct dev_ext_attribute[]){ \
{ .attr = __ATTR(_name, 0444, hisi_pcie_format_sysfs_show, \
NULL), \
.var = (void *)_format } \
})[0].attr.attr)
#define HISI_PCIE_PMU_EVENT_ATTR(_name, _id) \
PMU_EVENT_ATTR_ID(_name, hisi_pcie_event_sysfs_show, _id)
static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr, char *buf)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev));
return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->on_cpu));
}
static DEVICE_ATTR_RO(cpumask);
static ssize_t identifier_show(struct device *dev, struct device_attribute *attr, char *buf)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev));
return sysfs_emit(buf, "%#x\n", pcie_pmu->identifier);
}
static DEVICE_ATTR_RO(identifier);
static ssize_t bus_show(struct device *dev, struct device_attribute *attr, char *buf)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev));
return sysfs_emit(buf, "%#04x\n", PCI_BUS_NUM(pcie_pmu->bdf_min));
}
static DEVICE_ATTR_RO(bus);
static struct hisi_pcie_reg_pair
hisi_pcie_parse_reg_value(struct hisi_pcie_pmu *pcie_pmu, u32 reg_off)
{
u32 val = readl_relaxed(pcie_pmu->base + reg_off);
struct hisi_pcie_reg_pair regs = {
.lo = val,
.hi = val >> 16,
};
return regs;
}
/*
* Hardware counter and ext_counter work together for bandwidth, latency, bus
* utilization and buffer occupancy events. For example, RX memory write latency
* events(index = 0x0010), counter counts total delay cycles and ext_counter
* counts RX memory write PCIe packets number.
*
* As we don't want PMU driver to process these two data, "delay cycles" can
* be treated as an independent event(index = 0x0010), "RX memory write packets
* number" as another(index = 0x10010). BIT 16 is used to distinguish and 0-15
* bits are "real" event index, which can be used to set HISI_PCIE_EVENT_CTRL.
*/
#define EXT_COUNTER_IS_USED(idx) ((idx) & BIT(16))
static u32 hisi_pcie_get_real_event(struct perf_event *event)
{
return hisi_pcie_get_event(event) & GENMASK(15, 0);
}
static u32 hisi_pcie_pmu_get_offset(u32 offset, u32 idx)
{
return offset + HISI_PCIE_REG_STEP * idx;
}
static u32 hisi_pcie_pmu_readl(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset,
u32 idx)
{
u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx);
return readl_relaxed(pcie_pmu->base + offset);
}
static void hisi_pcie_pmu_writel(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, u32 idx, u32 val)
{
u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx);
writel_relaxed(val, pcie_pmu->base + offset);
}
static u64 hisi_pcie_pmu_readq(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, u32 idx)
{
u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx);
return readq_relaxed(pcie_pmu->base + offset);
}
static void hisi_pcie_pmu_writeq(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, u32 idx, u64 val)
{
u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx);
writeq_relaxed(val, pcie_pmu->base + offset);
}
static void hisi_pcie_pmu_config_filter(struct perf_event *event)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
u64 reg = HISI_PCIE_INIT_SET;
u64 port, trig_len, thr_len;
/* Config HISI_PCIE_EVENT_CTRL according to event. */
reg |= FIELD_PREP(HISI_PCIE_EVENT_M, hisi_pcie_get_real_event(event));
/* Config HISI_PCIE_EVENT_CTRL according to root port or EP device. */
port = hisi_pcie_get_port(event);
if (port)
reg |= FIELD_PREP(HISI_PCIE_TARGET_M, port);
else
reg |= HISI_PCIE_TARGET_EN |
FIELD_PREP(HISI_PCIE_TARGET_M, hisi_pcie_get_bdf(event));
/* Config HISI_PCIE_EVENT_CTRL according to trigger condition. */
trig_len = hisi_pcie_get_trig_len(event);
if (trig_len) {
reg |= FIELD_PREP(HISI_PCIE_TRIG_M, trig_len);
reg |= FIELD_PREP(HISI_PCIE_TRIG_MODE_M, hisi_pcie_get_trig_mode(event));
reg |= HISI_PCIE_TRIG_EN;
}
/* Config HISI_PCIE_EVENT_CTRL according to threshold condition. */
thr_len = hisi_pcie_get_thr_len(event);
if (thr_len) {
reg |= FIELD_PREP(HISI_PCIE_THR_M, thr_len);
reg |= FIELD_PREP(HISI_PCIE_THR_MODE_M, hisi_pcie_get_thr_mode(event));
reg |= HISI_PCIE_THR_EN;
}
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, hwc->idx, reg);
}
static void hisi_pcie_pmu_clear_filter(struct perf_event *event)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, hwc->idx, HISI_PCIE_INIT_SET);
}
static bool hisi_pcie_pmu_valid_requester_id(struct hisi_pcie_pmu *pcie_pmu, u32 bdf)
{
struct pci_dev *root_port, *pdev;
u16 rp_bdf;
pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pcie_pmu->pdev->bus), PCI_BUS_NUM(bdf),
GET_PCI_DEVFN(bdf));
if (!pdev)
return false;
root_port = pcie_find_root_port(pdev);
if (!root_port) {
pci_dev_put(pdev);
return false;
}
pci_dev_put(pdev);
rp_bdf = pci_dev_id(root_port);
return rp_bdf >= pcie_pmu->bdf_min && rp_bdf <= pcie_pmu->bdf_max;
}
static bool hisi_pcie_pmu_valid_filter(struct perf_event *event,
struct hisi_pcie_pmu *pcie_pmu)
{
u32 requester_id = hisi_pcie_get_bdf(event);
if (hisi_pcie_get_thr_len(event) > HISI_PCIE_THR_MAX_VAL)
return false;
if (hisi_pcie_get_trig_len(event) > HISI_PCIE_TRIG_MAX_VAL)
return false;
if (requester_id) {
if (!hisi_pcie_pmu_valid_requester_id(pcie_pmu, requester_id))
return false;
}
return true;
}
static bool hisi_pcie_pmu_cmp_event(struct perf_event *target,
struct perf_event *event)
{
return hisi_pcie_get_real_event(target) == hisi_pcie_get_real_event(event);
}
static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event)
{
struct perf_event *sibling, *leader = event->group_leader;
struct perf_event *event_group[HISI_PCIE_MAX_COUNTERS];
int counters = 1;
int num;
event_group[0] = leader;
if (!is_software_event(leader)) {
if (leader->pmu != event->pmu)
return false;
if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event))
event_group[counters++] = event;
}
for_each_sibling_event(sibling, event->group_leader) {
if (is_software_event(sibling))
continue;
if (sibling->pmu != event->pmu)
return false;
for (num = 0; num < counters; num++) {
if (hisi_pcie_pmu_cmp_event(event_group[num], sibling))
break;
}
if (num == counters)
event_group[counters++] = sibling;
}
return counters <= HISI_PCIE_MAX_COUNTERS;
}
static int hisi_pcie_pmu_event_init(struct perf_event *event)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
event->cpu = pcie_pmu->on_cpu;
if (EXT_COUNTER_IS_USED(hisi_pcie_get_event(event)))
hwc->event_base = HISI_PCIE_EXT_CNT;
else
hwc->event_base = HISI_PCIE_CNT;
if (event->attr.type != event->pmu->type)
return -ENOENT;
/* Sampling is not supported. */
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
if (!hisi_pcie_pmu_valid_filter(event, pcie_pmu))
return -EINVAL;
if (!hisi_pcie_pmu_validate_event_group(event))
return -EINVAL;
return 0;
}
static u64 hisi_pcie_pmu_read_counter(struct perf_event *event)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
u32 idx = event->hw.idx;
return hisi_pcie_pmu_readq(pcie_pmu, event->hw.event_base, idx);
}
static int hisi_pcie_pmu_find_related_event(struct hisi_pcie_pmu *pcie_pmu,
struct perf_event *event)
{
struct perf_event *sibling;
int idx;
for (idx = 0; idx < HISI_PCIE_MAX_COUNTERS; idx++) {
sibling = pcie_pmu->hw_events[idx];
if (!sibling)
continue;
if (!hisi_pcie_pmu_cmp_event(sibling, event))
continue;
/* Related events must be used in group */
if (sibling->group_leader == event->group_leader)
return idx;
else
return -EINVAL;
}
return idx;
}
static int hisi_pcie_pmu_get_event_idx(struct hisi_pcie_pmu *pcie_pmu)
{
int idx;
for (idx = 0; idx < HISI_PCIE_MAX_COUNTERS; idx++) {
if (!pcie_pmu->hw_events[idx])
return idx;
}
return -EINVAL;
}
static void hisi_pcie_pmu_event_update(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
u64 new_cnt, prev_cnt, delta;
do {
prev_cnt = local64_read(&hwc->prev_count);
new_cnt = hisi_pcie_pmu_read_counter(event);
} while (local64_cmpxchg(&hwc->prev_count, prev_cnt,
new_cnt) != prev_cnt);
delta = (new_cnt - prev_cnt) & HISI_PCIE_MAX_PERIOD;
local64_add(delta, &event->count);
}
static void hisi_pcie_pmu_read(struct perf_event *event)
{
hisi_pcie_pmu_event_update(event);
}
static void hisi_pcie_pmu_set_period(struct perf_event *event)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
local64_set(&hwc->prev_count, HISI_PCIE_INIT_VAL);
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_CNT, idx, HISI_PCIE_INIT_VAL);
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EXT_CNT, idx, HISI_PCIE_INIT_VAL);
}
static void hisi_pcie_pmu_enable_counter(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc)
{
u32 idx = hwc->idx;
u64 val;
val = hisi_pcie_pmu_readq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx);
val |= HISI_PCIE_EVENT_EN;
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, val);
}
static void hisi_pcie_pmu_disable_counter(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc)
{
u32 idx = hwc->idx;
u64 val;
val = hisi_pcie_pmu_readq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx);
val &= ~HISI_PCIE_EVENT_EN;
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, val);
}
static void hisi_pcie_pmu_enable_int(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc)
{
u32 idx = hwc->idx;
hisi_pcie_pmu_writel(pcie_pmu, HISI_PCIE_INT_MASK, idx, 0);
}
static void hisi_pcie_pmu_disable_int(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc)
{
u32 idx = hwc->idx;
hisi_pcie_pmu_writel(pcie_pmu, HISI_PCIE_INT_MASK, idx, 1);
}
static void hisi_pcie_pmu_reset_counter(struct hisi_pcie_pmu *pcie_pmu, int idx)
{
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, HISI_PCIE_RESET_CNT);
hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, HISI_PCIE_INIT_SET);
}
static void hisi_pcie_pmu_start(struct perf_event *event, int flags)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
u64 prev_cnt;
if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
return;
WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
hwc->state = 0;
hisi_pcie_pmu_config_filter(event);
hisi_pcie_pmu_enable_counter(pcie_pmu, hwc);
hisi_pcie_pmu_enable_int(pcie_pmu, hwc);
hisi_pcie_pmu_set_period(event);
if (flags & PERF_EF_RELOAD) {
prev_cnt = local64_read(&hwc->prev_count);
hisi_pcie_pmu_writeq(pcie_pmu, hwc->event_base, idx, prev_cnt);
}
perf_event_update_userpage(event);
}
static void hisi_pcie_pmu_stop(struct perf_event *event, int flags)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
hisi_pcie_pmu_event_update(event);
hisi_pcie_pmu_disable_int(pcie_pmu, hwc);
hisi_pcie_pmu_disable_counter(pcie_pmu, hwc);
hisi_pcie_pmu_clear_filter(event);
WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
hwc->state |= PERF_HES_STOPPED;
if (hwc->state & PERF_HES_UPTODATE)
return;
hwc->state |= PERF_HES_UPTODATE;
}
static int hisi_pcie_pmu_add(struct perf_event *event, int flags)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
int idx;
hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
/* Check all working events to find a related event. */
idx = hisi_pcie_pmu_find_related_event(pcie_pmu, event);
if (idx < 0)
return idx;
/* Current event shares an enabled counter with the related event */
if (idx < HISI_PCIE_MAX_COUNTERS) {
hwc->idx = idx;
goto start_count;
}
idx = hisi_pcie_pmu_get_event_idx(pcie_pmu);
if (idx < 0)
return idx;
hwc->idx = idx;
pcie_pmu->hw_events[idx] = event;
/* Reset Counter to avoid previous statistic interference. */
hisi_pcie_pmu_reset_counter(pcie_pmu, idx);
start_count:
if (flags & PERF_EF_START)
hisi_pcie_pmu_start(event, PERF_EF_RELOAD);
return 0;
}
static void hisi_pcie_pmu_del(struct perf_event *event, int flags)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
hisi_pcie_pmu_stop(event, PERF_EF_UPDATE);
pcie_pmu->hw_events[hwc->idx] = NULL;
perf_event_update_userpage(event);
}
static void hisi_pcie_pmu_enable(struct pmu *pmu)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(pmu);
int num;
for (num = 0; num < HISI_PCIE_MAX_COUNTERS; num++) {
if (pcie_pmu->hw_events[num])
break;
}
if (num == HISI_PCIE_MAX_COUNTERS)
return;
writel(HISI_PCIE_GLOBAL_EN, pcie_pmu->base + HISI_PCIE_GLOBAL_CTRL);
}
static void hisi_pcie_pmu_disable(struct pmu *pmu)
{
struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(pmu);
writel(HISI_PCIE_GLOBAL_NONE, pcie_pmu->base + HISI_PCIE_GLOBAL_CTRL);
}
static irqreturn_t hisi_pcie_pmu_irq(int irq, void *data)
{
struct hisi_pcie_pmu *pcie_pmu = data;
irqreturn_t ret = IRQ_NONE;
struct perf_event *event;
u32 overflown;
int idx;
for (idx = 0; idx < HISI_PCIE_MAX_COUNTERS; idx++) {
overflown = hisi_pcie_pmu_readl(pcie_pmu, HISI_PCIE_INT_STAT, idx);
if (!overflown)
continue;
/* Clear status of interrupt. */
hisi_pcie_pmu_writel(pcie_pmu, HISI_PCIE_INT_STAT, idx, 1);
event = pcie_pmu->hw_events[idx];
if (!event)
continue;
hisi_pcie_pmu_event_update(event);
hisi_pcie_pmu_set_period(event);
ret = IRQ_HANDLED;
}
return ret;
}
static int hisi_pcie_pmu_irq_register(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu)
{
int irq, ret;
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
if (ret < 0) {
pci_err(pdev, "Failed to enable MSI vectors: %d\n", ret);
return ret;
}
irq = pci_irq_vector(pdev, 0);
ret = request_irq(irq, hisi_pcie_pmu_irq, IRQF_NOBALANCING | IRQF_NO_THREAD, DRV_NAME,
pcie_pmu);
if (ret) {
pci_err(pdev, "Failed to register IRQ: %d\n", ret);
pci_free_irq_vectors(pdev);
return ret;
}
pcie_pmu->irq = irq;
return 0;
}
static void hisi_pcie_pmu_irq_unregister(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu)
{
free_irq(pcie_pmu->irq, pcie_pmu);
pci_free_irq_vectors(pdev);
}
static int hisi_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
{
struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node, struct hisi_pcie_pmu, node);
if (pcie_pmu->on_cpu == -1) {
pcie_pmu->on_cpu = cpu;
WARN_ON(irq_set_affinity(pcie_pmu->irq, cpumask_of(cpu)));
}
return 0;
}
static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
{
struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node, struct hisi_pcie_pmu, node);
unsigned int target;
/* Nothing to do if this CPU doesn't own the PMU */
if (pcie_pmu->on_cpu != cpu)
return 0;
pcie_pmu->on_cpu = -1;
/* Choose a new CPU from all online cpus. */
target = cpumask_first(cpu_online_mask);
if (target >= nr_cpu_ids) {
pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
return 0;
}
perf_pmu_migrate_context(&pcie_pmu->pmu, cpu, target);
/* Use this CPU for event counting */
pcie_pmu->on_cpu = target;
WARN_ON(irq_set_affinity(pcie_pmu->irq, cpumask_of(target)));
return 0;
}
static struct attribute *hisi_pcie_pmu_events_attr[] = {
HISI_PCIE_PMU_EVENT_ATTR(rx_mwr_latency, 0x0010),
HISI_PCIE_PMU_EVENT_ATTR(rx_mwr_cnt, 0x10010),
HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_latency, 0x0210),
HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_cnt, 0x10210),
HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_latency, 0x0011),
HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_cnt, 0x10011),
HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_flux, 0x1005),
HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_time, 0x11005),
HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_flux, 0x2004),
HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_time, 0x12004),
NULL
};
static struct attribute_group hisi_pcie_pmu_events_group = {
.name = "events",
.attrs = hisi_pcie_pmu_events_attr,
};
static struct attribute *hisi_pcie_pmu_format_attr[] = {
HISI_PCIE_PMU_FORMAT_ATTR(event, "config:0-16"),
HISI_PCIE_PMU_FORMAT_ATTR(thr_len, "config1:0-3"),
HISI_PCIE_PMU_FORMAT_ATTR(thr_mode, "config1:4"),
HISI_PCIE_PMU_FORMAT_ATTR(trig_len, "config1:5-8"),
HISI_PCIE_PMU_FORMAT_ATTR(trig_mode, "config1:9"),
HISI_PCIE_PMU_FORMAT_ATTR(port, "config2:0-15"),
HISI_PCIE_PMU_FORMAT_ATTR(bdf, "config2:16-31"),
NULL
};
static const struct attribute_group hisi_pcie_pmu_format_group = {
.name = "format",
.attrs = hisi_pcie_pmu_format_attr,
};
static struct attribute *hisi_pcie_pmu_bus_attrs[] = {
&dev_attr_bus.attr,
NULL
};
static const struct attribute_group hisi_pcie_pmu_bus_attr_group = {
.attrs = hisi_pcie_pmu_bus_attrs,
};
static struct attribute *hisi_pcie_pmu_cpumask_attrs[] = {
&dev_attr_cpumask.attr,
NULL
};
static const struct attribute_group hisi_pcie_pmu_cpumask_attr_group = {
.attrs = hisi_pcie_pmu_cpumask_attrs,
};
static struct attribute *hisi_pcie_pmu_identifier_attrs[] = {
&dev_attr_identifier.attr,
NULL
};
static const struct attribute_group hisi_pcie_pmu_identifier_attr_group = {
.attrs = hisi_pcie_pmu_identifier_attrs,
};
static const struct attribute_group *hisi_pcie_pmu_attr_groups[] = {
&hisi_pcie_pmu_events_group,
&hisi_pcie_pmu_format_group,
&hisi_pcie_pmu_bus_attr_group,
&hisi_pcie_pmu_cpumask_attr_group,
&hisi_pcie_pmu_identifier_attr_group,
NULL
};
static int hisi_pcie_alloc_pmu(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu)
{
struct hisi_pcie_reg_pair regs;
u16 sicl_id, core_id;
char *name;
regs = hisi_pcie_parse_reg_value(pcie_pmu, HISI_PCIE_REG_BDF);
pcie_pmu->bdf_min = regs.lo;
pcie_pmu->bdf_max = regs.hi;
regs = hisi_pcie_parse_reg_value(pcie_pmu, HISI_PCIE_REG_INFO);
sicl_id = regs.hi;
core_id = regs.lo;
name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "hisi_pcie%u_core%u", sicl_id, core_id);
if (!name)
return -ENOMEM;
pcie_pmu->pdev = pdev;
pcie_pmu->on_cpu = -1;
pcie_pmu->identifier = readl(pcie_pmu->base + HISI_PCIE_REG_VERSION);
pcie_pmu->pmu = (struct pmu) {
.name = name,
.module = THIS_MODULE,
.event_init = hisi_pcie_pmu_event_init,
.pmu_enable = hisi_pcie_pmu_enable,
.pmu_disable = hisi_pcie_pmu_disable,
.add = hisi_pcie_pmu_add,
.del = hisi_pcie_pmu_del,
.start = hisi_pcie_pmu_start,
.stop = hisi_pcie_pmu_stop,
.read = hisi_pcie_pmu_read,
.task_ctx_nr = perf_invalid_context,
.attr_groups = hisi_pcie_pmu_attr_groups,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
return 0;
}
static int hisi_pcie_init_pmu(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu)
{
int ret;
pcie_pmu->base = pci_ioremap_bar(pdev, 2);
if (!pcie_pmu->base) {
pci_err(pdev, "Ioremap failed for pcie_pmu resource\n");
return -ENOMEM;
}
ret = hisi_pcie_alloc_pmu(pdev, pcie_pmu);
if (ret)
goto err_iounmap;
ret = hisi_pcie_pmu_irq_register(pdev, pcie_pmu);
if (ret)
goto err_iounmap;
ret = cpuhp_state_add_instance(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node);
if (ret) {
pci_err(pdev, "Failed to register hotplug: %d\n", ret);
goto err_irq_unregister;
}
ret = perf_pmu_register(&pcie_pmu->pmu, pcie_pmu->pmu.name, -1);
if (ret) {
pci_err(pdev, "Failed to register PCIe PMU: %d\n", ret);
goto err_hotplug_unregister;
}
return ret;
err_hotplug_unregister:
cpuhp_state_remove_instance_nocalls(
CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node);
err_irq_unregister:
hisi_pcie_pmu_irq_unregister(pdev, pcie_pmu);
err_iounmap:
iounmap(pcie_pmu->base);
return ret;
}
static void hisi_pcie_uninit_pmu(struct pci_dev *pdev)
{
struct hisi_pcie_pmu *pcie_pmu = pci_get_drvdata(pdev);
perf_pmu_unregister(&pcie_pmu->pmu);
cpuhp_state_remove_instance_nocalls(
CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node);
hisi_pcie_pmu_irq_unregister(pdev, pcie_pmu);
iounmap(pcie_pmu->base);
}
static int hisi_pcie_init_dev(struct pci_dev *pdev)
{
int ret;
ret = pcim_enable_device(pdev);
if (ret) {
pci_err(pdev, "Failed to enable PCI device: %d\n", ret);
return ret;
}
ret = pcim_iomap_regions(pdev, BIT(2), DRV_NAME);
if (ret < 0) {
pci_err(pdev, "Failed to request PCI mem regions: %d\n", ret);
return ret;
}
pci_set_master(pdev);
return 0;
}
static int hisi_pcie_pmu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct hisi_pcie_pmu *pcie_pmu;
int ret;
pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
if (!pcie_pmu)
return -ENOMEM;
ret = hisi_pcie_init_dev(pdev);
if (ret)
return ret;
ret = hisi_pcie_init_pmu(pdev, pcie_pmu);
if (ret)
return ret;
pci_set_drvdata(pdev, pcie_pmu);
return ret;
}
static void hisi_pcie_pmu_remove(struct pci_dev *pdev)
{
hisi_pcie_uninit_pmu(pdev);
pci_set_drvdata(pdev, NULL);
}
static const struct pci_device_id hisi_pcie_pmu_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, 0xa12d) },
{ 0, }
};
MODULE_DEVICE_TABLE(pci, hisi_pcie_pmu_ids);
static struct pci_driver hisi_pcie_pmu_driver = {
.name = DRV_NAME,
.id_table = hisi_pcie_pmu_ids,
.probe = hisi_pcie_pmu_probe,
.remove = hisi_pcie_pmu_remove,
};
static int __init hisi_pcie_module_init(void)
{
int ret;
ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE,
"AP_PERF_ARM_HISI_PCIE_PMU_ONLINE",
hisi_pcie_pmu_online_cpu,
hisi_pcie_pmu_offline_cpu);
if (ret) {
pr_err("Failed to setup PCIe PMU hotplug: %d\n", ret);
return ret;
}
ret = pci_register_driver(&hisi_pcie_pmu_driver);
if (ret)
cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE);
return ret;
}
module_init(hisi_pcie_module_init);
static void __exit hisi_pcie_module_exit(void)
{
pci_unregister_driver(&hisi_pcie_pmu_driver);
cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE);
}
module_exit(hisi_pcie_module_exit);
MODULE_DESCRIPTION("HiSilicon PCIe PMU driver");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Qi Liu <liuqi115@huawei.com>");

Просмотреть файл

@ -0,0 +1,429 @@
// SPDX-License-Identifier: GPL-2.0
/* Marvell CN10K LLC-TAD perf driver
*
* Copyright (C) 2021 Marvell
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#define pr_fmt(fmt) "tad_pmu: " fmt
#include <linux/module.h>
#include <linux/of.h>
#include <linux/of_address.h>
#include <linux/of_device.h>
#include <linux/cpuhotplug.h>
#include <linux/perf_event.h>
#include <linux/platform_device.h>
#define TAD_PFC_OFFSET 0x0
#define TAD_PFC(counter) (TAD_PFC_OFFSET | (counter << 3))
#define TAD_PRF_OFFSET 0x100
#define TAD_PRF(counter) (TAD_PRF_OFFSET | (counter << 3))
#define TAD_PRF_CNTSEL_MASK 0xFF
#define TAD_MAX_COUNTERS 8
#define to_tad_pmu(p) (container_of(p, struct tad_pmu, pmu))
struct tad_region {
void __iomem *base;
};
struct tad_pmu {
struct pmu pmu;
struct tad_region *regions;
u32 region_cnt;
unsigned int cpu;
struct hlist_node node;
struct perf_event *events[TAD_MAX_COUNTERS];
DECLARE_BITMAP(counters_map, TAD_MAX_COUNTERS);
};
static int tad_pmu_cpuhp_state;
static void tad_pmu_event_counter_read(struct perf_event *event)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
u32 counter_idx = hwc->idx;
u64 prev, new;
int i;
do {
prev = local64_read(&hwc->prev_count);
for (i = 0, new = 0; i < tad_pmu->region_cnt; i++)
new += readq(tad_pmu->regions[i].base +
TAD_PFC(counter_idx));
} while (local64_cmpxchg(&hwc->prev_count, prev, new) != prev);
local64_add(new - prev, &event->count);
}
static void tad_pmu_event_counter_stop(struct perf_event *event, int flags)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
u32 counter_idx = hwc->idx;
int i;
/* TAD()_PFC() stop counting on the write
* which sets TAD()_PRF()[CNTSEL] == 0
*/
for (i = 0; i < tad_pmu->region_cnt; i++) {
writeq_relaxed(0, tad_pmu->regions[i].base +
TAD_PRF(counter_idx));
}
tad_pmu_event_counter_read(event);
hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
}
static void tad_pmu_event_counter_start(struct perf_event *event, int flags)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
u32 event_idx = event->attr.config;
u32 counter_idx = hwc->idx;
u64 reg_val;
int i;
hwc->state = 0;
/* Typically TAD_PFC() are zeroed to start counting */
for (i = 0; i < tad_pmu->region_cnt; i++)
writeq_relaxed(0, tad_pmu->regions[i].base +
TAD_PFC(counter_idx));
/* TAD()_PFC() start counting on the write
* which sets TAD()_PRF()[CNTSEL] != 0
*/
for (i = 0; i < tad_pmu->region_cnt; i++) {
reg_val = readq_relaxed(tad_pmu->regions[i].base +
TAD_PRF(counter_idx));
reg_val |= (event_idx & 0xFF);
writeq_relaxed(reg_val, tad_pmu->regions[i].base +
TAD_PRF(counter_idx));
}
}
static void tad_pmu_event_counter_del(struct perf_event *event, int flags)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
tad_pmu_event_counter_stop(event, flags | PERF_EF_UPDATE);
tad_pmu->events[idx] = NULL;
clear_bit(idx, tad_pmu->counters_map);
}
static int tad_pmu_event_counter_add(struct perf_event *event, int flags)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
int idx;
/* Get a free counter for this event */
idx = find_first_zero_bit(tad_pmu->counters_map, TAD_MAX_COUNTERS);
if (idx == TAD_MAX_COUNTERS)
return -EAGAIN;
set_bit(idx, tad_pmu->counters_map);
hwc->idx = idx;
hwc->state = PERF_HES_STOPPED;
tad_pmu->events[idx] = event;
if (flags & PERF_EF_START)
tad_pmu_event_counter_start(event, flags);
return 0;
}
static int tad_pmu_event_init(struct perf_event *event)
{
struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
if (!event->attr.disabled)
return -EINVAL;
if (event->attr.type != event->pmu->type)
return -ENOENT;
if (event->state != PERF_EVENT_STATE_OFF)
return -EINVAL;
event->cpu = tad_pmu->cpu;
event->hw.idx = -1;
event->hw.config_base = event->attr.config;
return 0;
}
static ssize_t tad_pmu_event_show(struct device *dev,
struct device_attribute *attr, char *page)
{
struct perf_pmu_events_attr *pmu_attr;
pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
return sysfs_emit(page, "event=0x%02llx\n", pmu_attr->id);
}
#define TAD_PMU_EVENT_ATTR(name, config) \
PMU_EVENT_ATTR_ID(name, tad_pmu_event_show, config)
static struct attribute *tad_pmu_event_attrs[] = {
TAD_PMU_EVENT_ATTR(tad_none, 0x0),
TAD_PMU_EVENT_ATTR(tad_req_msh_in_any, 0x1),
TAD_PMU_EVENT_ATTR(tad_req_msh_in_mn, 0x2),
TAD_PMU_EVENT_ATTR(tad_req_msh_in_exlmn, 0x3),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_in_any, 0x4),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_in_mn, 0x5),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_in_exlmn, 0x6),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_in_dss, 0x7),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_in_retry_dss, 0x8),
TAD_PMU_EVENT_ATTR(tad_dat_msh_in_any, 0x9),
TAD_PMU_EVENT_ATTR(tad_dat_msh_in_dss, 0xa),
TAD_PMU_EVENT_ATTR(tad_req_msh_out_any, 0xb),
TAD_PMU_EVENT_ATTR(tad_req_msh_out_dss_rd, 0xc),
TAD_PMU_EVENT_ATTR(tad_req_msh_out_dss_wr, 0xd),
TAD_PMU_EVENT_ATTR(tad_req_msh_out_evict, 0xe),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_any, 0xf),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_retry_exlmn, 0x10),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_retry_mn, 0x11),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_exlmn, 0x12),
TAD_PMU_EVENT_ATTR(tad_rsp_msh_out_mn, 0x13),
TAD_PMU_EVENT_ATTR(tad_snp_msh_out_any, 0x14),
TAD_PMU_EVENT_ATTR(tad_snp_msh_out_mn, 0x15),
TAD_PMU_EVENT_ATTR(tad_snp_msh_out_exlmn, 0x16),
TAD_PMU_EVENT_ATTR(tad_dat_msh_out_any, 0x17),
TAD_PMU_EVENT_ATTR(tad_dat_msh_out_fill, 0x18),
TAD_PMU_EVENT_ATTR(tad_dat_msh_out_dss, 0x19),
TAD_PMU_EVENT_ATTR(tad_alloc_dtg, 0x1a),
TAD_PMU_EVENT_ATTR(tad_alloc_ltg, 0x1b),
TAD_PMU_EVENT_ATTR(tad_alloc_any, 0x1c),
TAD_PMU_EVENT_ATTR(tad_hit_dtg, 0x1d),
TAD_PMU_EVENT_ATTR(tad_hit_ltg, 0x1e),
TAD_PMU_EVENT_ATTR(tad_hit_any, 0x1f),
TAD_PMU_EVENT_ATTR(tad_tag_rd, 0x20),
TAD_PMU_EVENT_ATTR(tad_dat_rd, 0x21),
TAD_PMU_EVENT_ATTR(tad_dat_rd_byp, 0x22),
TAD_PMU_EVENT_ATTR(tad_ifb_occ, 0x23),
TAD_PMU_EVENT_ATTR(tad_req_occ, 0x24),
NULL
};
static const struct attribute_group tad_pmu_events_attr_group = {
.name = "events",
.attrs = tad_pmu_event_attrs,
};
PMU_FORMAT_ATTR(event, "config:0-7");
static struct attribute *tad_pmu_format_attrs[] = {
&format_attr_event.attr,
NULL
};
static struct attribute_group tad_pmu_format_attr_group = {
.name = "format",
.attrs = tad_pmu_format_attrs,
};
static ssize_t tad_pmu_cpumask_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct tad_pmu *tad_pmu = to_tad_pmu(dev_get_drvdata(dev));
return cpumap_print_to_pagebuf(true, buf, cpumask_of(tad_pmu->cpu));
}
static DEVICE_ATTR(cpumask, 0444, tad_pmu_cpumask_show, NULL);
static struct attribute *tad_pmu_cpumask_attrs[] = {
&dev_attr_cpumask.attr,
NULL
};
static struct attribute_group tad_pmu_cpumask_attr_group = {
.attrs = tad_pmu_cpumask_attrs,
};
static const struct attribute_group *tad_pmu_attr_groups[] = {
&tad_pmu_events_attr_group,
&tad_pmu_format_attr_group,
&tad_pmu_cpumask_attr_group,
NULL
};
static int tad_pmu_probe(struct platform_device *pdev)
{
struct device_node *node = pdev->dev.of_node;
struct tad_region *regions;
struct tad_pmu *tad_pmu;
struct resource *res;
u32 tad_pmu_page_size;
u32 tad_page_size;
u32 tad_cnt;
int i, ret;
char *name;
tad_pmu = devm_kzalloc(&pdev->dev, sizeof(*tad_pmu), GFP_KERNEL);
if (!tad_pmu)
return -ENOMEM;
platform_set_drvdata(pdev, tad_pmu);
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
dev_err(&pdev->dev, "Mem resource not found\n");
return -ENODEV;
}
ret = of_property_read_u32(node, "marvell,tad-page-size",
&tad_page_size);
if (ret) {
dev_err(&pdev->dev, "Can't find tad-page-size property\n");
return ret;
}
ret = of_property_read_u32(node, "marvell,tad-pmu-page-size",
&tad_pmu_page_size);
if (ret) {
dev_err(&pdev->dev, "Can't find tad-pmu-page-size property\n");
return ret;
}
ret = of_property_read_u32(node, "marvell,tad-cnt", &tad_cnt);
if (ret) {
dev_err(&pdev->dev, "Can't find tad-cnt property\n");
return ret;
}
regions = devm_kcalloc(&pdev->dev, tad_cnt,
sizeof(*regions), GFP_KERNEL);
if (!regions)
return -ENOMEM;
/* ioremap the distributed TAD pmu regions */
for (i = 0; i < tad_cnt && res->start < res->end; i++) {
regions[i].base = devm_ioremap(&pdev->dev,
res->start,
tad_pmu_page_size);
if (!regions[i].base) {
dev_err(&pdev->dev, "TAD%d ioremap fail\n", i);
return -ENOMEM;
}
res->start += tad_page_size;
}
tad_pmu->regions = regions;
tad_pmu->region_cnt = tad_cnt;
tad_pmu->pmu = (struct pmu) {
.module = THIS_MODULE,
.attr_groups = tad_pmu_attr_groups,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE |
PERF_PMU_CAP_NO_INTERRUPT,
.task_ctx_nr = perf_invalid_context,
.event_init = tad_pmu_event_init,
.add = tad_pmu_event_counter_add,
.del = tad_pmu_event_counter_del,
.start = tad_pmu_event_counter_start,
.stop = tad_pmu_event_counter_stop,
.read = tad_pmu_event_counter_read,
};
tad_pmu->cpu = raw_smp_processor_id();
/* Register pmu instance for cpu hotplug */
ret = cpuhp_state_add_instance_nocalls(tad_pmu_cpuhp_state,
&tad_pmu->node);
if (ret) {
dev_err(&pdev->dev, "Error %d registering hotplug\n", ret);
return ret;
}
name = "tad";
ret = perf_pmu_register(&tad_pmu->pmu, name, -1);
if (ret)
cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
&tad_pmu->node);
return ret;
}
static int tad_pmu_remove(struct platform_device *pdev)
{
struct tad_pmu *pmu = platform_get_drvdata(pdev);
cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
&pmu->node);
perf_pmu_unregister(&pmu->pmu);
return 0;
}
static const struct of_device_id tad_pmu_of_match[] = {
{ .compatible = "marvell,cn10k-tad-pmu", },
{},
};
static struct platform_driver tad_pmu_driver = {
.driver = {
.name = "cn10k_tad_pmu",
.of_match_table = of_match_ptr(tad_pmu_of_match),
.suppress_bind_attrs = true,
},
.probe = tad_pmu_probe,
.remove = tad_pmu_remove,
};
static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
{
struct tad_pmu *pmu = hlist_entry_safe(node, struct tad_pmu, node);
unsigned int target;
if (cpu != pmu->cpu)
return 0;
target = cpumask_any_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;
perf_pmu_migrate_context(&pmu->pmu, cpu, target);
pmu->cpu = target;
return 0;
}
static int __init tad_pmu_init(void)
{
int ret;
ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
"perf/cn10k/tadpmu:online",
NULL,
tad_pmu_offline_cpu);
if (ret < 0)
return ret;
tad_pmu_cpuhp_state = ret;
return platform_driver_register(&tad_pmu_driver);
}
static void __exit tad_pmu_exit(void)
{
platform_driver_unregister(&tad_pmu_driver);
cpuhp_remove_multi_state(tad_pmu_cpuhp_state);
}
module_init(tad_pmu_init);
module_exit(tad_pmu_exit);
MODULE_DESCRIPTION("Marvell CN10K LLC-TAD Perf driver");
MODULE_AUTHOR("Bhaskara Budiredla <bbudiredla@marvell.com>");
MODULE_LICENSE("GPL v2");

Просмотреть файл

@ -251,5 +251,16 @@ do { \
#define pmem_wmb() wmb()
#endif
/*
* ioremap_wc() maps I/O memory as memory with write-combining attributes. For
* this kind of memory accesses, the CPU may wait for prior accesses to be
* merged with subsequent ones. In some situation, such wait is bad for the
* performance. io_stop_wc() can be used to prevent the merging of
* write-combining memory accesses before this macro with those after it.
*/
#ifndef io_stop_wc
#define io_stop_wc do { } while (0)
#endif
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */

Просмотреть файл

@ -225,6 +225,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_ARM_HISI_L3_ONLINE,
CPUHP_AP_PERF_ARM_HISI_PA_ONLINE,
CPUHP_AP_PERF_ARM_HISI_SLLC_ONLINE,
CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE,
CPUHP_AP_PERF_ARM_L2X0_ONLINE,
CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,

Просмотреть файл

@ -129,6 +129,15 @@ struct hw_perf_event_extra {
int idx; /* index in shared_regs->regs[] */
};
/**
* hw_perf_event::flag values
*
* PERF_EVENT_FLAG_ARCH bits are reserved for architecture-specific
* usage.
*/
#define PERF_EVENT_FLAG_ARCH 0x0000ffff
#define PERF_EVENT_FLAG_USER_READ_CNT 0x80000000
/**
* struct hw_perf_event - performance event hardware details:
*/
@ -822,6 +831,7 @@ struct perf_event_context {
int nr_events;
int nr_active;
int nr_user;
int is_active;
int nr_stat;
int nr_freq;

Просмотреть файл

@ -8,22 +8,6 @@
struct task_struct;
struct pt_regs;
#ifdef CONFIG_STACKTRACE
void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
int spaces);
int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
unsigned int nr_entries, int spaces);
unsigned int stack_trace_save(unsigned long *store, unsigned int size,
unsigned int skipnr);
unsigned int stack_trace_save_tsk(struct task_struct *task,
unsigned long *store, unsigned int size,
unsigned int skipnr);
unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
unsigned int size, unsigned int skipnr);
unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
unsigned int filter_irq_stacks(unsigned long *entries, unsigned int nr_entries);
/* Internal interfaces. Do not use in generic code */
#ifdef CONFIG_ARCH_STACKWALK
/**
@ -76,8 +60,25 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie,
void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
const struct pt_regs *regs);
#endif /* CONFIG_ARCH_STACKWALK */
#else /* CONFIG_ARCH_STACKWALK */
#ifdef CONFIG_STACKTRACE
void stack_trace_print(const unsigned long *trace, unsigned int nr_entries,
int spaces);
int stack_trace_snprint(char *buf, size_t size, const unsigned long *entries,
unsigned int nr_entries, int spaces);
unsigned int stack_trace_save(unsigned long *store, unsigned int size,
unsigned int skipnr);
unsigned int stack_trace_save_tsk(struct task_struct *task,
unsigned long *store, unsigned int size,
unsigned int skipnr);
unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
unsigned int size, unsigned int skipnr);
unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
unsigned int filter_irq_stacks(unsigned long *entries, unsigned int nr_entries);
#ifndef CONFIG_ARCH_STACKWALK
/* Internal interfaces. Do not use in generic code */
struct stack_trace {
unsigned int nr_entries, max_entries;
unsigned long *entries;

Просмотреть файл

@ -1808,6 +1808,8 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
list_add_rcu(&event->event_entry, &ctx->event_list);
ctx->nr_events++;
if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
ctx->nr_user++;
if (event->attr.inherit_stat)
ctx->nr_stat++;
@ -1999,6 +2001,8 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
event->attach_state &= ~PERF_ATTACH_CONTEXT;
ctx->nr_events--;
if (event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT)
ctx->nr_user--;
if (event->attr.inherit_stat)
ctx->nr_stat--;

Просмотреть файл

@ -8,6 +8,7 @@ CFLAGS_REMOVE_debugfs.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_report.o = $(CC_FLAGS_FTRACE)
CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
$(call cc-option,-mno-outline-atomics) \
-fno-stack-protector -DDISABLE_BRANCH_PROFILING
obj-y := core.o debugfs.o report.o

Просмотреть файл

@ -4,7 +4,7 @@
ARCH ?= $(shell uname -m 2>/dev/null || echo not)
ifneq (,$(filter $(ARCH),aarch64 arm64))
ARM64_SUBTARGETS ?= tags signal pauth fp mte bti
ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi
else
ARM64_SUBTARGETS :=
endif

1
tools/testing/selftests/arm64/abi/.gitignore поставляемый Normal file
Просмотреть файл

@ -0,0 +1 @@
syscall-abi

Просмотреть файл

@ -0,0 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
# Copyright (C) 2021 ARM Limited
TEST_GEN_PROGS := syscall-abi
include ../../lib.mk
$(OUTPUT)/syscall-abi: syscall-abi.c syscall-abi-asm.S

Просмотреть файл

@ -0,0 +1,240 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (C) 2021 ARM Limited.
//
// Assembly portion of the syscall ABI test
//
// Load values from memory into registers, invoke a syscall and save the
// register values back to memory for later checking. The syscall to be
// invoked is configured in x8 of the input GPR data.
//
// x0: SVE VL, 0 for FP only
//
// GPRs: gpr_in, gpr_out
// FPRs: fpr_in, fpr_out
// Zn: z_in, z_out
// Pn: p_in, p_out
// FFR: ffr_in, ffr_out
.arch_extension sve
.globl do_syscall
do_syscall:
// Store callee saved registers x19-x29 (80 bytes) plus x0 and x1
stp x29, x30, [sp, #-112]!
mov x29, sp
stp x0, x1, [sp, #16]
stp x19, x20, [sp, #32]
stp x21, x22, [sp, #48]
stp x23, x24, [sp, #64]
stp x25, x26, [sp, #80]
stp x27, x28, [sp, #96]
// Load GPRs x8-x28, and save our SP/FP for later comparison
ldr x2, =gpr_in
add x2, x2, #64
ldp x8, x9, [x2], #16
ldp x10, x11, [x2], #16
ldp x12, x13, [x2], #16
ldp x14, x15, [x2], #16
ldp x16, x17, [x2], #16
ldp x18, x19, [x2], #16
ldp x20, x21, [x2], #16
ldp x22, x23, [x2], #16
ldp x24, x25, [x2], #16
ldp x26, x27, [x2], #16
ldr x28, [x2], #8
str x29, [x2], #8 // FP
str x30, [x2], #8 // LR
// Load FPRs if we're not doing SVE
cbnz x0, 1f
ldr x2, =fpr_in
ldp q0, q1, [x2]
ldp q2, q3, [x2, #16 * 2]
ldp q4, q5, [x2, #16 * 4]
ldp q6, q7, [x2, #16 * 6]
ldp q8, q9, [x2, #16 * 8]
ldp q10, q11, [x2, #16 * 10]
ldp q12, q13, [x2, #16 * 12]
ldp q14, q15, [x2, #16 * 14]
ldp q16, q17, [x2, #16 * 16]
ldp q18, q19, [x2, #16 * 18]
ldp q20, q21, [x2, #16 * 20]
ldp q22, q23, [x2, #16 * 22]
ldp q24, q25, [x2, #16 * 24]
ldp q26, q27, [x2, #16 * 26]
ldp q28, q29, [x2, #16 * 28]
ldp q30, q31, [x2, #16 * 30]
1:
// Load the SVE registers if we're doing SVE
cbz x0, 1f
ldr x2, =z_in
ldr z0, [x2, #0, MUL VL]
ldr z1, [x2, #1, MUL VL]
ldr z2, [x2, #2, MUL VL]
ldr z3, [x2, #3, MUL VL]
ldr z4, [x2, #4, MUL VL]
ldr z5, [x2, #5, MUL VL]
ldr z6, [x2, #6, MUL VL]
ldr z7, [x2, #7, MUL VL]
ldr z8, [x2, #8, MUL VL]
ldr z9, [x2, #9, MUL VL]
ldr z10, [x2, #10, MUL VL]
ldr z11, [x2, #11, MUL VL]
ldr z12, [x2, #12, MUL VL]
ldr z13, [x2, #13, MUL VL]
ldr z14, [x2, #14, MUL VL]
ldr z15, [x2, #15, MUL VL]
ldr z16, [x2, #16, MUL VL]
ldr z17, [x2, #17, MUL VL]
ldr z18, [x2, #18, MUL VL]
ldr z19, [x2, #19, MUL VL]
ldr z20, [x2, #20, MUL VL]
ldr z21, [x2, #21, MUL VL]
ldr z22, [x2, #22, MUL VL]
ldr z23, [x2, #23, MUL VL]
ldr z24, [x2, #24, MUL VL]
ldr z25, [x2, #25, MUL VL]
ldr z26, [x2, #26, MUL VL]
ldr z27, [x2, #27, MUL VL]
ldr z28, [x2, #28, MUL VL]
ldr z29, [x2, #29, MUL VL]
ldr z30, [x2, #30, MUL VL]
ldr z31, [x2, #31, MUL VL]
ldr x2, =ffr_in
ldr p0, [x2, #0]
wrffr p0.b
ldr x2, =p_in
ldr p0, [x2, #0, MUL VL]
ldr p1, [x2, #1, MUL VL]
ldr p2, [x2, #2, MUL VL]
ldr p3, [x2, #3, MUL VL]
ldr p4, [x2, #4, MUL VL]
ldr p5, [x2, #5, MUL VL]
ldr p6, [x2, #6, MUL VL]
ldr p7, [x2, #7, MUL VL]
ldr p8, [x2, #8, MUL VL]
ldr p9, [x2, #9, MUL VL]
ldr p10, [x2, #10, MUL VL]
ldr p11, [x2, #11, MUL VL]
ldr p12, [x2, #12, MUL VL]
ldr p13, [x2, #13, MUL VL]
ldr p14, [x2, #14, MUL VL]
ldr p15, [x2, #15, MUL VL]
1:
// Do the syscall
svc #0
// Save GPRs x8-x30
ldr x2, =gpr_out
add x2, x2, #64
stp x8, x9, [x2], #16
stp x10, x11, [x2], #16
stp x12, x13, [x2], #16
stp x14, x15, [x2], #16
stp x16, x17, [x2], #16
stp x18, x19, [x2], #16
stp x20, x21, [x2], #16
stp x22, x23, [x2], #16
stp x24, x25, [x2], #16
stp x26, x27, [x2], #16
stp x28, x29, [x2], #16
str x30, [x2]
// Restore x0 and x1 for feature checks
ldp x0, x1, [sp, #16]
// Save FPSIMD state
ldr x2, =fpr_out
stp q0, q1, [x2]
stp q2, q3, [x2, #16 * 2]
stp q4, q5, [x2, #16 * 4]
stp q6, q7, [x2, #16 * 6]
stp q8, q9, [x2, #16 * 8]
stp q10, q11, [x2, #16 * 10]
stp q12, q13, [x2, #16 * 12]
stp q14, q15, [x2, #16 * 14]
stp q16, q17, [x2, #16 * 16]
stp q18, q19, [x2, #16 * 18]
stp q20, q21, [x2, #16 * 20]
stp q22, q23, [x2, #16 * 22]
stp q24, q25, [x2, #16 * 24]
stp q26, q27, [x2, #16 * 26]
stp q28, q29, [x2, #16 * 28]
stp q30, q31, [x2, #16 * 30]
// Save the SVE state if we have some
cbz x0, 1f
ldr x2, =z_out
str z0, [x2, #0, MUL VL]
str z1, [x2, #1, MUL VL]
str z2, [x2, #2, MUL VL]
str z3, [x2, #3, MUL VL]
str z4, [x2, #4, MUL VL]
str z5, [x2, #5, MUL VL]
str z6, [x2, #6, MUL VL]
str z7, [x2, #7, MUL VL]
str z8, [x2, #8, MUL VL]
str z9, [x2, #9, MUL VL]
str z10, [x2, #10, MUL VL]
str z11, [x2, #11, MUL VL]
str z12, [x2, #12, MUL VL]
str z13, [x2, #13, MUL VL]
str z14, [x2, #14, MUL VL]
str z15, [x2, #15, MUL VL]
str z16, [x2, #16, MUL VL]
str z17, [x2, #17, MUL VL]
str z18, [x2, #18, MUL VL]
str z19, [x2, #19, MUL VL]
str z20, [x2, #20, MUL VL]
str z21, [x2, #21, MUL VL]
str z22, [x2, #22, MUL VL]
str z23, [x2, #23, MUL VL]
str z24, [x2, #24, MUL VL]
str z25, [x2, #25, MUL VL]
str z26, [x2, #26, MUL VL]
str z27, [x2, #27, MUL VL]
str z28, [x2, #28, MUL VL]
str z29, [x2, #29, MUL VL]
str z30, [x2, #30, MUL VL]
str z31, [x2, #31, MUL VL]
ldr x2, =p_out
str p0, [x2, #0, MUL VL]
str p1, [x2, #1, MUL VL]
str p2, [x2, #2, MUL VL]
str p3, [x2, #3, MUL VL]
str p4, [x2, #4, MUL VL]
str p5, [x2, #5, MUL VL]
str p6, [x2, #6, MUL VL]
str p7, [x2, #7, MUL VL]
str p8, [x2, #8, MUL VL]
str p9, [x2, #9, MUL VL]
str p10, [x2, #10, MUL VL]
str p11, [x2, #11, MUL VL]
str p12, [x2, #12, MUL VL]
str p13, [x2, #13, MUL VL]
str p14, [x2, #14, MUL VL]
str p15, [x2, #15, MUL VL]
ldr x2, =ffr_out
rdffr p0.b
str p0, [x2, #0]
1:
// Restore callee saved registers x19-x30
ldp x19, x20, [sp, #32]
ldp x21, x22, [sp, #48]
ldp x23, x24, [sp, #64]
ldp x25, x26, [sp, #80]
ldp x27, x28, [sp, #96]
ldp x29, x30, [sp], #112
ret

Просмотреть файл

@ -0,0 +1,318 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2021 ARM Limited.
*/
#include <errno.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/auxv.h>
#include <sys/prctl.h>
#include <asm/hwcap.h>
#include <asm/sigcontext.h>
#include <asm/unistd.h>
#include "../../kselftest.h"
#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
#define NUM_VL ((SVE_VQ_MAX - SVE_VQ_MIN) + 1)
extern void do_syscall(int sve_vl);
static void fill_random(void *buf, size_t size)
{
int i;
uint32_t *lbuf = buf;
/* random() returns a 32 bit number regardless of the size of long */
for (i = 0; i < size / sizeof(uint32_t); i++)
lbuf[i] = random();
}
/*
* We also repeat the test for several syscalls to try to expose different
* behaviour.
*/
static struct syscall_cfg {
int syscall_nr;
const char *name;
} syscalls[] = {
{ __NR_getpid, "getpid()" },
{ __NR_sched_yield, "sched_yield()" },
};
#define NUM_GPR 31
uint64_t gpr_in[NUM_GPR];
uint64_t gpr_out[NUM_GPR];
static void setup_gpr(struct syscall_cfg *cfg, int sve_vl)
{
fill_random(gpr_in, sizeof(gpr_in));
gpr_in[8] = cfg->syscall_nr;
memset(gpr_out, 0, sizeof(gpr_out));
}
static int check_gpr(struct syscall_cfg *cfg, int sve_vl)
{
int errors = 0;
int i;
/*
* GPR x0-x7 may be clobbered, and all others should be preserved.
*/
for (i = 9; i < ARRAY_SIZE(gpr_in); i++) {
if (gpr_in[i] != gpr_out[i]) {
ksft_print_msg("%s SVE VL %d mismatch in GPR %d: %llx != %llx\n",
cfg->name, sve_vl, i,
gpr_in[i], gpr_out[i]);
errors++;
}
}
return errors;
}
#define NUM_FPR 32
uint64_t fpr_in[NUM_FPR * 2];
uint64_t fpr_out[NUM_FPR * 2];
static void setup_fpr(struct syscall_cfg *cfg, int sve_vl)
{
fill_random(fpr_in, sizeof(fpr_in));
memset(fpr_out, 0, sizeof(fpr_out));
}
static int check_fpr(struct syscall_cfg *cfg, int sve_vl)
{
int errors = 0;
int i;
if (!sve_vl) {
for (i = 0; i < ARRAY_SIZE(fpr_in); i++) {
if (fpr_in[i] != fpr_out[i]) {
ksft_print_msg("%s Q%d/%d mismatch %llx != %llx\n",
cfg->name,
i / 2, i % 2,
fpr_in[i], fpr_out[i]);
errors++;
}
}
}
return errors;
}
static uint8_t z_zero[__SVE_ZREG_SIZE(SVE_VQ_MAX)];
uint8_t z_in[SVE_NUM_PREGS * __SVE_ZREG_SIZE(SVE_VQ_MAX)];
uint8_t z_out[SVE_NUM_PREGS * __SVE_ZREG_SIZE(SVE_VQ_MAX)];
static void setup_z(struct syscall_cfg *cfg, int sve_vl)
{
fill_random(z_in, sizeof(z_in));
fill_random(z_out, sizeof(z_out));
}
static int check_z(struct syscall_cfg *cfg, int sve_vl)
{
size_t reg_size = sve_vl;
int errors = 0;
int i;
if (!sve_vl)
return 0;
/*
* After a syscall the low 128 bits of the Z registers should
* be preserved and the rest be zeroed or preserved.
*/
for (i = 0; i < SVE_NUM_ZREGS; i++) {
void *in = &z_in[reg_size * i];
void *out = &z_out[reg_size * i];
if (memcmp(in, out, SVE_VQ_BYTES) != 0) {
ksft_print_msg("%s SVE VL %d Z%d low 128 bits changed\n",
cfg->name, sve_vl, i);
errors++;
}
}
return errors;
}
uint8_t p_in[SVE_NUM_PREGS * __SVE_PREG_SIZE(SVE_VQ_MAX)];
uint8_t p_out[SVE_NUM_PREGS * __SVE_PREG_SIZE(SVE_VQ_MAX)];
static void setup_p(struct syscall_cfg *cfg, int sve_vl)
{
fill_random(p_in, sizeof(p_in));
fill_random(p_out, sizeof(p_out));
}
static int check_p(struct syscall_cfg *cfg, int sve_vl)
{
size_t reg_size = sve_vq_from_vl(sve_vl) * 2; /* 1 bit per VL byte */
int errors = 0;
int i;
if (!sve_vl)
return 0;
/* After a syscall the P registers should be preserved or zeroed */
for (i = 0; i < SVE_NUM_PREGS * reg_size; i++)
if (p_out[i] && (p_in[i] != p_out[i]))
errors++;
if (errors)
ksft_print_msg("%s SVE VL %d predicate registers non-zero\n",
cfg->name, sve_vl);
return errors;
}
uint8_t ffr_in[__SVE_PREG_SIZE(SVE_VQ_MAX)];
uint8_t ffr_out[__SVE_PREG_SIZE(SVE_VQ_MAX)];
static void setup_ffr(struct syscall_cfg *cfg, int sve_vl)
{
/*
* It is only valid to set a contiguous set of bits starting
* at 0. For now since we're expecting this to be cleared by
* a syscall just set all bits.
*/
memset(ffr_in, 0xff, sizeof(ffr_in));
fill_random(ffr_out, sizeof(ffr_out));
}
static int check_ffr(struct syscall_cfg *cfg, int sve_vl)
{
size_t reg_size = sve_vq_from_vl(sve_vl) * 2; /* 1 bit per VL byte */
int errors = 0;
int i;
if (!sve_vl)
return 0;
/* After a syscall the P registers should be preserved or zeroed */
for (i = 0; i < reg_size; i++)
if (ffr_out[i] && (ffr_in[i] != ffr_out[i]))
errors++;
if (errors)
ksft_print_msg("%s SVE VL %d FFR non-zero\n",
cfg->name, sve_vl);
return errors;
}
typedef void (*setup_fn)(struct syscall_cfg *cfg, int sve_vl);
typedef int (*check_fn)(struct syscall_cfg *cfg, int sve_vl);
/*
* Each set of registers has a setup function which is called before
* the syscall to fill values in a global variable for loading by the
* test code and a check function which validates that the results are
* as expected. Vector lengths are passed everywhere, a vector length
* of 0 should be treated as do not test.
*/
static struct {
setup_fn setup;
check_fn check;
} regset[] = {
{ setup_gpr, check_gpr },
{ setup_fpr, check_fpr },
{ setup_z, check_z },
{ setup_p, check_p },
{ setup_ffr, check_ffr },
};
static bool do_test(struct syscall_cfg *cfg, int sve_vl)
{
int errors = 0;
int i;
for (i = 0; i < ARRAY_SIZE(regset); i++)
regset[i].setup(cfg, sve_vl);
do_syscall(sve_vl);
for (i = 0; i < ARRAY_SIZE(regset); i++)
errors += regset[i].check(cfg, sve_vl);
return errors == 0;
}
static void test_one_syscall(struct syscall_cfg *cfg)
{
int sve_vq, sve_vl;
/* FPSIMD only case */
ksft_test_result(do_test(cfg, 0),
"%s FPSIMD\n", cfg->name);
if (!(getauxval(AT_HWCAP) & HWCAP_SVE))
return;
for (sve_vq = SVE_VQ_MAX; sve_vq > 0; --sve_vq) {
sve_vl = prctl(PR_SVE_SET_VL, sve_vq * 16);
if (sve_vl == -1)
ksft_exit_fail_msg("PR_SVE_SET_VL failed: %s (%d)\n",
strerror(errno), errno);
sve_vl &= PR_SVE_VL_LEN_MASK;
if (sve_vq != sve_vq_from_vl(sve_vl))
sve_vq = sve_vq_from_vl(sve_vl);
ksft_test_result(do_test(cfg, sve_vl),
"%s SVE VL %d\n", cfg->name, sve_vl);
}
}
int sve_count_vls(void)
{
unsigned int vq;
int vl_count = 0;
int vl;
if (!(getauxval(AT_HWCAP) & HWCAP_SVE))
return 0;
/*
* Enumerate up to SVE_VQ_MAX vector lengths
*/
for (vq = SVE_VQ_MAX; vq > 0; --vq) {
vl = prctl(PR_SVE_SET_VL, vq * 16);
if (vl == -1)
ksft_exit_fail_msg("PR_SVE_SET_VL failed: %s (%d)\n",
strerror(errno), errno);
vl &= PR_SVE_VL_LEN_MASK;
if (vq != sve_vq_from_vl(vl))
vq = sve_vq_from_vl(vl);
vl_count++;
}
return vl_count;
}
int main(void)
{
int i;
srandom(getpid());
ksft_print_header();
ksft_set_plan(ARRAY_SIZE(syscalls) * (sve_count_vls() + 1));
for (i = 0; i < ARRAY_SIZE(syscalls); i++)
test_one_syscall(&syscalls[i]);
ksft_print_cnts();
return 0;
}

1
tools/testing/selftests/arm64/fp/.gitignore поставляемый
Просмотреть файл

@ -1,3 +1,4 @@
fp-pidbench
fpsimd-test
rdvl-sve
sve-probe-vls

Просмотреть файл

@ -2,13 +2,15 @@
CFLAGS += -I../../../../../usr/include/
TEST_GEN_PROGS := sve-ptrace sve-probe-vls vec-syscfg
TEST_PROGS_EXTENDED := fpsimd-test fpsimd-stress \
TEST_PROGS_EXTENDED := fp-pidbench fpsimd-test fpsimd-stress \
rdvl-sve \
sve-test sve-stress \
vlset
all: $(TEST_GEN_PROGS) $(TEST_PROGS_EXTENDED)
fp-pidbench: fp-pidbench.S asm-utils.o
$(CC) -nostdlib $^ -o $@
fpsimd-test: fpsimd-test.o asm-utils.o
$(CC) -nostdlib $^ -o $@
rdvl-sve: rdvl-sve.o rdvl.o

Просмотреть файл

@ -0,0 +1,71 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (C) 2021 ARM Limited.
// Original author: Mark Brown <broonie@kernel.org>
//
// Trivial syscall overhead benchmark.
//
// This is implemented in asm to ensure that we don't have any issues with
// system libraries using instructions that disrupt the test.
#include <asm/unistd.h>
#include "assembler.h"
.arch_extension sve
.macro test_loop per_loop
mov x10, x20
mov x8, #__NR_getpid
mrs x11, CNTVCT_EL0
1:
\per_loop
svc #0
sub x10, x10, #1
cbnz x10, 1b
mrs x12, CNTVCT_EL0
sub x0, x12, x11
bl putdec
puts "\n"
.endm
// Main program entry point
.globl _start
function _start
_start:
puts "Iterations per test: "
mov x20, #10000
lsl x20, x20, #8
mov x0, x20
bl putdec
puts "\n"
// Test having never used SVE
puts "No SVE: "
test_loop
// Check for SVE support - should use hwcap but that's hard in asm
mrs x0, ID_AA64PFR0_EL1
ubfx x0, x0, #32, #4
cbnz x0, 1f
puts "System does not support SVE\n"
b out
1:
// Execute a SVE instruction
puts "SVE VL: "
rdvl x0, #8
bl putdec
puts "\n"
puts "SVE used once: "
test_loop
// Use SVE per syscall
puts "SVE used per syscall: "
test_loop "rdvl x0, #8"
// And we're done
out:
mov x0, #0
mov x8, #__NR_exit
svc #0

Просмотреть файл

@ -21,16 +21,37 @@
#include "../../kselftest.h"
#define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
#define FPSIMD_TESTS 5
#define EXPECTED_TESTS (VL_TESTS + FPSIMD_TESTS)
#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
/* <linux/elf.h> and <sys/auxv.h> don't like each other, so: */
#ifndef NT_ARM_SVE
#define NT_ARM_SVE 0x405
#endif
struct vec_type {
const char *name;
unsigned long hwcap_type;
unsigned long hwcap;
int regset;
int prctl_set;
};
static const struct vec_type vec_types[] = {
{
.name = "SVE",
.hwcap_type = AT_HWCAP,
.hwcap = HWCAP_SVE,
.regset = NT_ARM_SVE,
.prctl_set = PR_SVE_SET_VL,
},
};
#define VL_TESTS (((SVE_VQ_MAX - SVE_VQ_MIN) + 1) * 3)
#define FLAG_TESTS 2
#define FPSIMD_TESTS 3
#define EXPECTED_TESTS ((VL_TESTS + FLAG_TESTS + FPSIMD_TESTS) * ARRAY_SIZE(vec_types))
static void fill_buf(char *buf, size_t size)
{
int i;
@ -59,7 +80,8 @@ static int get_fpsimd(pid_t pid, struct user_fpsimd_state *fpsimd)
return ptrace(PTRACE_GETREGSET, pid, NT_PRFPREG, &iov);
}
static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
static struct user_sve_header *get_sve(pid_t pid, const struct vec_type *type,
void **buf, size_t *size)
{
struct user_sve_header *sve;
void *p;
@ -80,7 +102,7 @@ static struct user_sve_header *get_sve(pid_t pid, void **buf, size_t *size)
iov.iov_base = *buf;
iov.iov_len = sz;
if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov))
if (ptrace(PTRACE_GETREGSET, pid, type->regset, &iov))
goto error;
sve = *buf;
@ -96,17 +118,18 @@ error:
return NULL;
}
static int set_sve(pid_t pid, const struct user_sve_header *sve)
static int set_sve(pid_t pid, const struct vec_type *type,
const struct user_sve_header *sve)
{
struct iovec iov;
iov.iov_base = (void *)sve;
iov.iov_len = sve->size;
return ptrace(PTRACE_SETREGSET, pid, NT_ARM_SVE, &iov);
return ptrace(PTRACE_SETREGSET, pid, type->regset, &iov);
}
/* Validate setting and getting the inherit flag */
static void ptrace_set_get_inherit(pid_t child)
static void ptrace_set_get_inherit(pid_t child, const struct vec_type *type)
{
struct user_sve_header sve;
struct user_sve_header *new_sve = NULL;
@ -118,9 +141,10 @@ static void ptrace_set_get_inherit(pid_t child)
sve.size = sizeof(sve);
sve.vl = sve_vl_from_vq(SVE_VQ_MIN);
sve.flags = SVE_PT_VL_INHERIT;
ret = set_sve(child, &sve);
ret = set_sve(child, type, &sve);
if (ret != 0) {
ksft_test_result_fail("Failed to set SVE_PT_VL_INHERIT\n");
ksft_test_result_fail("Failed to set %s SVE_PT_VL_INHERIT\n",
type->name);
return;
}
@ -128,35 +152,39 @@ static void ptrace_set_get_inherit(pid_t child)
* Read back the new register state and verify that we have
* set the flags we expected.
*/
if (!get_sve(child, (void **)&new_sve, &new_sve_size)) {
ksft_test_result_fail("Failed to read SVE flags\n");
if (!get_sve(child, type, (void **)&new_sve, &new_sve_size)) {
ksft_test_result_fail("Failed to read %s SVE flags\n",
type->name);
return;
}
ksft_test_result(new_sve->flags & SVE_PT_VL_INHERIT,
"SVE_PT_VL_INHERIT set\n");
"%s SVE_PT_VL_INHERIT set\n", type->name);
/* Now clear */
sve.flags &= ~SVE_PT_VL_INHERIT;
ret = set_sve(child, &sve);
ret = set_sve(child, type, &sve);
if (ret != 0) {
ksft_test_result_fail("Failed to clear SVE_PT_VL_INHERIT\n");
ksft_test_result_fail("Failed to clear %s SVE_PT_VL_INHERIT\n",
type->name);
return;
}
if (!get_sve(child, (void **)&new_sve, &new_sve_size)) {
ksft_test_result_fail("Failed to read SVE flags\n");
if (!get_sve(child, type, (void **)&new_sve, &new_sve_size)) {
ksft_test_result_fail("Failed to read %s SVE flags\n",
type->name);
return;
}
ksft_test_result(!(new_sve->flags & SVE_PT_VL_INHERIT),
"SVE_PT_VL_INHERIT cleared\n");
"%s SVE_PT_VL_INHERIT cleared\n", type->name);
free(new_sve);
}
/* Validate attempting to set the specfied VL via ptrace */
static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
static void ptrace_set_get_vl(pid_t child, const struct vec_type *type,
unsigned int vl, bool *supported)
{
struct user_sve_header sve;
struct user_sve_header *new_sve = NULL;
@ -166,10 +194,10 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
*supported = false;
/* Check if the VL is supported in this process */
prctl_vl = prctl(PR_SVE_SET_VL, vl);
prctl_vl = prctl(type->prctl_set, vl);
if (prctl_vl == -1)
ksft_exit_fail_msg("prctl(PR_SVE_SET_VL) failed: %s (%d)\n",
strerror(errno), errno);
ksft_exit_fail_msg("prctl(PR_%s_SET_VL) failed: %s (%d)\n",
type->name, strerror(errno), errno);
/* If the VL is not supported then a supported VL will be returned */
*supported = (prctl_vl == vl);
@ -178,9 +206,10 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
memset(&sve, 0, sizeof(sve));
sve.size = sizeof(sve);
sve.vl = vl;
ret = set_sve(child, &sve);
ret = set_sve(child, type, &sve);
if (ret != 0) {
ksft_test_result_fail("Failed to set VL %u\n", vl);
ksft_test_result_fail("Failed to set %s VL %u\n",
type->name, vl);
return;
}
@ -188,12 +217,14 @@ static void ptrace_set_get_vl(pid_t child, unsigned int vl, bool *supported)
* Read back the new register state and verify that we have the
* same VL that we got from prctl() on ourselves.
*/
if (!get_sve(child, (void **)&new_sve, &new_sve_size)) {
ksft_test_result_fail("Failed to read VL %u\n", vl);
if (!get_sve(child, type, (void **)&new_sve, &new_sve_size)) {
ksft_test_result_fail("Failed to read %s VL %u\n",
type->name, vl);
return;
}
ksft_test_result(new_sve->vl = prctl_vl, "Set VL %u\n", vl);
ksft_test_result(new_sve->vl = prctl_vl, "Set %s VL %u\n",
type->name, vl);
free(new_sve);
}
@ -209,7 +240,7 @@ static void check_u32(unsigned int vl, const char *reg,
}
/* Access the FPSIMD registers via the SVE regset */
static void ptrace_sve_fpsimd(pid_t child)
static void ptrace_sve_fpsimd(pid_t child, const struct vec_type *type)
{
void *svebuf = NULL;
size_t svebufsz = 0;
@ -219,17 +250,18 @@ static void ptrace_sve_fpsimd(pid_t child)
unsigned char *p;
/* New process should start with FPSIMD registers only */
sve = get_sve(child, &svebuf, &svebufsz);
sve = get_sve(child, type, &svebuf, &svebufsz);
if (!sve) {
ksft_test_result_fail("get_sve: %s\n", strerror(errno));
ksft_test_result_fail("get_sve(%s): %s\n",
type->name, strerror(errno));
return;
} else {
ksft_test_result_pass("get_sve(FPSIMD)\n");
ksft_test_result_pass("get_sve(%s FPSIMD)\n", type->name);
}
ksft_test_result((sve->flags & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD,
"Set FPSIMD registers\n");
"Set FPSIMD registers via %s\n", type->name);
if ((sve->flags & SVE_PT_REGS_MASK) != SVE_PT_REGS_FPSIMD)
goto out;
@ -243,9 +275,9 @@ static void ptrace_sve_fpsimd(pid_t child)
p[j] = j;
}
if (set_sve(child, sve)) {
ksft_test_result_fail("set_sve(FPSIMD): %s\n",
strerror(errno));
if (set_sve(child, type, sve)) {
ksft_test_result_fail("set_sve(%s FPSIMD): %s\n",
type->name, strerror(errno));
goto out;
}
@ -257,16 +289,20 @@ static void ptrace_sve_fpsimd(pid_t child)
goto out;
}
if (memcmp(fpsimd, &new_fpsimd, sizeof(*fpsimd)) == 0)
ksft_test_result_pass("get_fpsimd() gave same state\n");
ksft_test_result_pass("%s get_fpsimd() gave same state\n",
type->name);
else
ksft_test_result_fail("get_fpsimd() gave different state\n");
ksft_test_result_fail("%s get_fpsimd() gave different state\n",
type->name);
out:
free(svebuf);
}
/* Validate attempting to set SVE data and read SVE data */
static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
static void ptrace_set_sve_get_sve_data(pid_t child,
const struct vec_type *type,
unsigned int vl)
{
void *write_buf;
void *read_buf = NULL;
@ -281,8 +317,8 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
data_size = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
write_buf = malloc(data_size);
if (!write_buf) {
ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
data_size, vl);
ksft_test_result_fail("Error allocating %d byte buffer for %s VL %u\n",
data_size, type->name, vl);
return;
}
write_sve = write_buf;
@ -306,23 +342,26 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
/* TODO: Generate a valid FFR pattern */
ret = set_sve(child, write_sve);
ret = set_sve(child, type, write_sve);
if (ret != 0) {
ksft_test_result_fail("Failed to set VL %u data\n", vl);
ksft_test_result_fail("Failed to set %s VL %u data\n",
type->name, vl);
goto out;
}
/* Read the data back */
if (!get_sve(child, (void **)&read_buf, &read_sve_size)) {
ksft_test_result_fail("Failed to read VL %u data\n", vl);
if (!get_sve(child, type, (void **)&read_buf, &read_sve_size)) {
ksft_test_result_fail("Failed to read %s VL %u data\n",
type->name, vl);
goto out;
}
read_sve = read_buf;
/* We might read more data if there's extensions we don't know */
if (read_sve->size < write_sve->size) {
ksft_test_result_fail("Wrote %d bytes, only read %d\n",
write_sve->size, read_sve->size);
ksft_test_result_fail("%s wrote %d bytes, only read %d\n",
type->name, write_sve->size,
read_sve->size);
goto out_read;
}
@ -349,7 +388,8 @@ static void ptrace_set_sve_get_sve_data(pid_t child, unsigned int vl)
check_u32(vl, "FPCR", write_buf + SVE_PT_SVE_FPCR_OFFSET(vq),
read_buf + SVE_PT_SVE_FPCR_OFFSET(vq), &errors);
ksft_test_result(errors == 0, "Set and get SVE data for VL %u\n", vl);
ksft_test_result(errors == 0, "Set and get %s data for VL %u\n",
type->name, vl);
out_read:
free(read_buf);
@ -358,7 +398,9 @@ out:
}
/* Validate attempting to set SVE data and read SVE data */
static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
static void ptrace_set_sve_get_fpsimd_data(pid_t child,
const struct vec_type *type,
unsigned int vl)
{
void *write_buf;
struct user_sve_header *write_sve;
@ -376,8 +418,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
data_size = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
write_buf = malloc(data_size);
if (!write_buf) {
ksft_test_result_fail("Error allocating %d byte buffer for VL %u\n",
data_size, vl);
ksft_test_result_fail("Error allocating %d byte buffer for %s VL %u\n",
data_size, type->name, vl);
return;
}
write_sve = write_buf;
@ -395,16 +437,17 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
fill_buf(write_buf + SVE_PT_SVE_FPSR_OFFSET(vq), SVE_PT_SVE_FPSR_SIZE);
fill_buf(write_buf + SVE_PT_SVE_FPCR_OFFSET(vq), SVE_PT_SVE_FPCR_SIZE);
ret = set_sve(child, write_sve);
ret = set_sve(child, type, write_sve);
if (ret != 0) {
ksft_test_result_fail("Failed to set VL %u data\n", vl);
ksft_test_result_fail("Failed to set %s VL %u data\n",
type->name, vl);
goto out;
}
/* Read the data back */
if (get_fpsimd(child, &fpsimd_state)) {
ksft_test_result_fail("Failed to read VL %u FPSIMD data\n",
vl);
ksft_test_result_fail("Failed to read %s VL %u FPSIMD data\n",
type->name, vl);
goto out;
}
@ -419,7 +462,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
sizeof(tmp));
if (tmp != fpsimd_state.vregs[i]) {
printf("# Mismatch in FPSIMD for VL %u Z%d\n", vl, i);
printf("# Mismatch in FPSIMD for %s VL %u Z%d\n",
type->name, vl, i);
errors++;
}
}
@ -429,8 +473,8 @@ static void ptrace_set_sve_get_fpsimd_data(pid_t child, unsigned int vl)
check_u32(vl, "FPCR", write_buf + SVE_PT_SVE_FPCR_OFFSET(vq),
&fpsimd_state.fpcr, &errors);
ksft_test_result(errors == 0, "Set and get FPSIMD data for VL %u\n",
vl);
ksft_test_result(errors == 0, "Set and get FPSIMD data for %s VL %u\n",
type->name, vl);
out:
free(write_buf);
@ -440,7 +484,7 @@ static int do_parent(pid_t child)
{
int ret = EXIT_FAILURE;
pid_t pid;
int status;
int status, i;
siginfo_t si;
unsigned int vq, vl;
bool vl_supported;
@ -499,26 +543,47 @@ static int do_parent(pid_t child)
}
}
/* FPSIMD via SVE regset */
ptrace_sve_fpsimd(child);
/* prctl() flags */
ptrace_set_get_inherit(child);
/* Step through every possible VQ */
for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
vl = sve_vl_from_vq(vq);
/* First, try to set this vector length */
ptrace_set_get_vl(child, vl, &vl_supported);
/* If the VL is supported validate data set/get */
if (vl_supported) {
ptrace_set_sve_get_sve_data(child, vl);
ptrace_set_sve_get_fpsimd_data(child, vl);
for (i = 0; i < ARRAY_SIZE(vec_types); i++) {
/* FPSIMD via SVE regset */
if (getauxval(vec_types[i].hwcap_type) & vec_types[i].hwcap) {
ptrace_sve_fpsimd(child, &vec_types[i]);
} else {
ksft_test_result_skip("set SVE get SVE for VL %d\n", vl);
ksft_test_result_skip("set SVE get FPSIMD for VL %d\n", vl);
ksft_test_result_skip("%s FPSIMD get via SVE\n",
vec_types[i].name);
ksft_test_result_skip("%s FPSIMD set via SVE\n",
vec_types[i].name);
ksft_test_result_skip("%s set read via FPSIMD\n",
vec_types[i].name);
}
/* prctl() flags */
ptrace_set_get_inherit(child, &vec_types[i]);
/* Step through every possible VQ */
for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; vq++) {
vl = sve_vl_from_vq(vq);
/* First, try to set this vector length */
if (getauxval(vec_types[i].hwcap_type) &
vec_types[i].hwcap) {
ptrace_set_get_vl(child, &vec_types[i], vl,
&vl_supported);
} else {
ksft_test_result_skip("%s get/set VL %d\n",
vec_types[i].name, vl);
vl_supported = false;
}
/* If the VL is supported validate data set/get */
if (vl_supported) {
ptrace_set_sve_get_sve_data(child, &vec_types[i], vl);
ptrace_set_sve_get_fpsimd_data(child, &vec_types[i], vl);
} else {
ksft_test_result_skip("%s set SVE get SVE for VL %d\n",
vec_types[i].name, vl);
ksft_test_result_skip("%s set SVE get FPSIMD for VL %d\n",
vec_types[i].name, vl);
}
}
}

Просмотреть файл

@ -310,14 +310,12 @@ int test_setup(struct tdescr *td)
int test_run(struct tdescr *td)
{
if (td->sig_trig) {
if (td->trigger)
return td->trigger(td);
else
return default_trigger(td);
} else {
if (td->trigger)
return td->trigger(td);
else if (td->sig_trig)
return default_trigger(td);
else
return td->run(td, NULL, NULL);
}
}
void test_result(struct tdescr *td)