Currently, pmd_present() only checks for a non-zero value, returning
true even after pmd_mknotpresent() (which only clears the type bits).
This patch converts pmd_present() to using pte_present(), similar to the
other pmd_*() checks. As a side effect, it will return true for
PROT_NONE mappings, though they are not yet used by the kernel with
transparent huge pages.
For consistency, also change pmd_mknotpresent() to only clear the
PMD_SECT_VALID bit, even though the PMD_TABLE_BIT is already 0 for block
mappings (no functional change). The unused PMD_SECT_PROT_NONE
definition is removed as transparent huge pages use the pte page prot
values.
Fixes: 9c7e535fcc ("arm64: mm: Route pmd thp functions through pte equivalents")
Cc: <stable@vger.kernel.org> # 3.15+
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch replaces the hard-coded value 2 with PMD_TABLE_BIT in the
pmd/pud_bad() macros. Note that using these macros on pmd_trans_huge()
entries is giving incorrect results
(pmd_none_or_trans_huge_or_clear_bad() correctly checks for
pmd_trans_huge before pmd_bad).
Additionally, white-space clean-up for pmd_mkclean().
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The update to the accessed or dirty states for block mappings must be
done atomically on hardware with support for automatic AF/DBM. The
ptep_set_access_flags() function has been fixed as part of commit
66dbd6e61a ("arm64: Implement ptep_set_access_flags() for hardware
AF/DBM"). This patch brings pmdp_set_access_flags() in line with the pte
counterpart.
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Cc: <stable@vger.kernel.org> # 4.4.x: 66dbd6e61a52: arm64: Implement ptep_set_access_flags() for hardware AF/DBM
Cc: <stable@vger.kernel.org> # 4.3+
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With hardware AF/DBM support, pmd modifications (transparent huge pages)
should be performed atomically using load/store exclusive. The initial
patches defined the get-and-clear function and __HAVE_ARCH_* macro
without the "huge" word, leaving the pmdp_huge_get_and_clear() to the
default, non-atomic implementation.
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Cc: <stable@vger.kernel.org> # 4.3+
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
arch_pick_mmap_layout is only called by fs/exec.c which is always built into
kernel, it looks the EXPORT_SYMBOL_GPL is pointless and no architectures export
it other than ARM64.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Inspired by the counterpart of powerpc [1], which shows there is no negative
effect on code generation from enabling STRICT_MM_TYPECHECKS with a modern
compiler.
And, Arnd's comment [2] about that patch says STRICT_MM_TYPECHECKS could
be default as long as the architecture can pass structures in registers as
function arguments. ARM64 can do it as long as the size of structure <= 16
bytes. All the page table value types are u64 on ARM64.
The below disassembly demonstrates it, entry is pte_t type:
entry = arch_make_huge_pte(entry, vma, page, writable);
0xffff00000826fc38 <+80>: and x0, x0, #0xfffffffffffffffd
0xffff00000826fc3c <+84>: mov w3, w21
0xffff00000826fc40 <+88>: mov x2, x20
0xffff00000826fc44 <+92>: mov x1, x19
0xffff00000826fc48 <+96>: orr x0, x0, #0x400
0xffff00000826fc4c <+100>: bl 0xffff00000809bcc0 <arch_make_huge_pte>
[1] http://www.spinics.net/lists/linux-mm/msg105951.html
[2] http://www.spinics.net/lists/linux-mm/msg105969.html
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If memory is located above 1<<VA_BITS, kvm adds an extra level to its page
tables, merging the runtime tables and boot tables that contain the idmap.
This lets us avoid the trampoline dance during initialisation.
This also means there is no trampoline page mapped, so
__cpu_reset_hyp_mode() can't call __kvm_hyp_reset() in this page. The good
news is the idmap is still mapped, so we don't need the trampoline page.
The bad news is we can't call it directly as the idmap is above
HYP_PAGE_OFFSET, so its address is masked by kvm_call_hyp.
Add a function __extended_idmap_trampoline which will branch into
__kvm_hyp_reset in the idmap, change kvm_hyp_reset_entry() to return
this address if __kvm_cpu_uses_extended_idmap(). In this case
__kvm_hyp_reset() will still switch to the boot tables (which are the
merged tables that were already in use), and branch into the idmap (where
it already was).
This fixes boot failures on these systems, where we fail to execute the
missing trampoline page when tearing down kvm in init_subsystems():
[ 2.508922] kvm [1]: 8-bit VMID
[ 2.512057] kvm [1]: Hyp mode initialized successfully
[ 2.517242] kvm [1]: interrupt-controller@e1140000 IRQ13
[ 2.522622] kvm [1]: timer IRQ3
[ 2.525783] Kernel panic - not syncing: HYP panic:
[ 2.525783] PS:200003c9 PC:0000007ffffff820 ESR:86000005
[ 2.525783] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
[ 2.525783] VCPU: (null)
[ 2.525783]
[ 2.547667] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.6.0-rc5+ #1
[ 2.555137] Hardware name: Default string Default string/Default string, BIOS ROD0084E 09/03/2015
[ 2.563994] Call trace:
[ 2.566432] [<ffffff80080888d0>] dump_backtrace+0x0/0x240
[ 2.571818] [<ffffff8008088b24>] show_stack+0x14/0x20
[ 2.576858] [<ffffff80083423ac>] dump_stack+0x94/0xb8
[ 2.581899] [<ffffff8008152130>] panic+0x10c/0x250
[ 2.586677] [<ffffff8008152024>] panic+0x0/0x250
[ 2.591281] SMP: stopping secondary CPUs
[ 3.649692] SMP: failed to stop secondary CPUs 0-2,4-7
[ 3.654818] Kernel Offset: disabled
[ 3.658293] Memory Limit: none
[ 3.661337] ---[ end Kernel panic - not syncing: HYP panic:
[ 3.661337] PS:200003c9 PC:0000007ffffff820 ESR:86000005
[ 3.661337] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
[ 3.661337] VCPU: (null)
[ 3.661337]
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, our KASLR implementation randomizes the placement of the core
kernel at 2 MB granularity. This is based on the arm64 kernel boot
protocol, which mandates that the kernel is loaded TEXT_OFFSET bytes above
a 2 MB aligned base address. This requirement is a result of the fact that
the block size used by the early mapping code may be 2 MB at the most (for
a 4 KB granule kernel)
But we can do better than that: since a KASLR kernel needs to be relocated
in any case, we can tolerate a physical misalignment as long as the virtual
misalignment relative to this 2 MB block size is equal in size, and code to
deal with this is already in place.
Since we align the kernel segments to 64 KB, let's randomize the physical
offset at 64 KB granularity as well (unless CONFIG_DEBUG_ALIGN_RODATA is
enabled). This way, the page table and TLB footprint is not affected.
The higher granularity allows for 5 bits of additional entropy to be used.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The rtc-lib dependency is not required, and seems it was just
copy-pasted from ARM's Kconfig. If platform requires rtc-lib,
they should select it individually.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Selecting both DEBUG_PAGEALLOC and HIBERNATION results in a build failure:
| kernel/built-in.o: In function `saveable_page':
| memremap.c:(.text+0x100f90): undefined reference to `kernel_page_present'
| kernel/built-in.o: In function `swsusp_save':
| memremap.c:(.text+0x1026f0): undefined reference to `kernel_page_present'
| make: *** [vmlinux] Error 1
James sayeth:
"This is caused by DEBUG_PAGEALLOC, which clears the PTE_VALID bit from
'free' pages. Hibernate uses it as a hint that it shouldn't save/access
that page. This function is used to test whether the PTE_VALID bit has
been cleared by kernel_map_pages(), hibernate is the only user.
Fixing this exposes a bigger problem with that configuration though: if
the resume kernel has cut free pages out of the linear map, we copy this
swiss-cheese view of memory, and try to use it to restore...
We can fixup the copy of the linear map, but it then explodes in my lazy
'clean the whole kernel to PoC' after resume, as now both the kernel and
linear map have holes in them."
On closer inspection, the whole Kconfig machinery around DEBUG_PAGEALLOC,
HIBERNATION, ARCH_SUPPORTS_DEBUG_PAGEALLOC and PAGE_POISONING looks like
it might need some affection. In particular, DEBUG_ALLOC has:
> depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
which looks pretty fishy.
For the moment, require ARCH_SUPPORTS_DEBUG_PAGEALLOC to depend on
!HIBERNATION on arm64 and get allmodconfig building again.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Hibernation represents a system state save/restore through
a system reboot; this implies that the logical cpus carrying
out hibernation/thawing must be the same, so that the context
saved in the snapshot image on hibernation is consistent with
the state of the system on resume. If resume from hibernation
is driven through kernel command line parameter, the cpu responsible
for thawing the system will be whatever CPU firmware boots the system
on upon cold-boot (ie logical cpu 0); this means that in order to
keep system context consistent between the hibernate snapshot image
and system state on kernel resume from hibernate, logical cpu 0 must
be online on hibernation and must be the logical cpu that creates
the snapshot image.
This patch adds a PM notifier that enforces logical cpu 0 is online
when the hibernation is started (and prevents hibernation if it is
not), which is sufficient to guarantee it will be the one creating
the snapshot image therefore providing the resume cpu a consistent
snapshot of the system to resume to.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add support for hibernate/suspend-to-disk.
Suspend borrows code from cpu_suspend() to write cpu state onto the stack,
before calling swsusp_save() to save the memory image.
Restore creates a set of temporary page tables, covering only the
linear map, copies the restore code to a 'safe' page, then uses the copy to
restore the memory image. The copied code executes in the lower half of the
address space, and once complete, restores the original kernel's page
tables. It then calls into cpu_resume(), and follows the normal
cpu_suspend() path back into the suspend code.
To restore a kernel using KASLR, the address of the page tables, and
cpu_resume() are stored in the hibernate arch-header and the el2
vectors are pivotted via the 'safe' page in low memory.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Kevin Hilman <khilman@baylibre.com> # Tested on Juno R2
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some architectures require code written to memory as if it were data to be
'cleaned' from any data caches before the processor can fetch them as new
instructions.
During resume from hibernate, the snapshot code copies some pages directly,
meaning these architectures do not get a chance to perform their cache
maintenance. Modify the read and decompress code to call
flush_icache_range() on all pages that are restored, so that the restored
in-place pages are guaranteed to be executable on these architectures.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
[will: make clean_pages_on_* static and remove initialisers]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Kexec and hibernate need to copy pages of memory, but may not have all
of the kernel mapped, and are unable to call copy_page().
Add a simplistic copy_page() macro, that can be inlined in these
situations. lib/copy_page.S provides a bigger better version, but
uses more registers.
Signed-off-by: Geoff Levand <geoff@infradead.org>
[Changed asm label to 9998, added commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
KERNEL_START and KERNEL_END are useful outside head.S, move them to a
header file.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
page.h uses '_AC' in the definition of PAGE_SIZE, but doesn't include
linux/const.h where this is defined. This produces build warnings when only
asm/page.h is included by asm code.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
By enabling the MMU early in cpu_resume(), the sleep_save_sp and stack can
be accessed by VA, which avoids the need to convert-addresses and clean to
PoC on the suspend path.
MMU setup is shared with the boot path, meaning the swapper_pg_dir is
restored directly: ttbr1_el1 is no longer saved/restored.
struct sleep_save_sp is removed, replacing it with a single array of
pointers.
cpu_do_{suspend,resume} could be further reduced to not restore: cpacr_el1,
mdscr_el1, tcr_el1, vbar_el1 and sctlr_el1, all of which are set by
__cpu_setup(). However these values all contain res0 bits that may be used
to enable future features.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Hibernate could make use of the cpu_suspend() code to save/restore cpu
state, however it needs to be able to return '0' from the 'finisher'.
Rework cpu_suspend() so that the finisher is called from C code,
independently from the save/restore of cpu state. Space to save the context
in is allocated in the caller's stack frame, and passed into
__cpu_suspend_enter().
Hibernate's use of this API will look like a copy of the cpu_suspend()
function.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents kexec from rebooting the system at EL2.
This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need the arm64 specific cpu hotplug hook any more.
Since this patch modifies common code between arm and arm64, one stub
definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compilation errors.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[Rebase, added separate VHE init/exit path, changed resets use of
kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
guest-enter after teardown handling]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A later patch implements kvm_arch_hardware_disable(), to remove kvm
from el2, and re-instate the hyp-stub.
This can happen while guests are running, particularly when kvm_reboot()
calls kvm_arch_hardware_disable() on each cpu. This can interrupt a guest,
remove kvm, then allow the guest to be scheduled again. This causes
kvm_call_hyp() to be run against the hyp-stub.
Change the hyp-stub to return a new exception type when this happens,
and add code to kvm's handle_exit() to tell userspace we failed to
enter the guest.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The existing arm64 hcall implementations are limited in that they only
allow for two distinct hcalls; with the x0 register either zero or not
zero. Also, the API of the hyp-stub exception vector routines and the
KVM exception vector routines differ; hyp-stub uses a non-zero value in
x0 to implement __hyp_set_vectors, whereas KVM uses it to implement
kvm_call_hyp.
To allow for additional hcalls to be defined and to make the arm64 hcall
API more consistent across exception vector routines, change the hcall
implementations to reserve all x0 values below 0xfff for hcalls such
as {s,g}et_vectors().
Define two new preprocessor macros HVC_GET_VECTORS, and HVC_SET_VECTORS
to be used as hcall type specifiers and convert the existing
__hyp_get_vectors() and __hyp_set_vectors() routines to use these new
macros when executing an HVC call. Also, change the corresponding
hyp-stub and KVM el1_sync exception vector routines to use these new
macros.
Signed-off-by: Geoff Levand <geoff@infradead.org>
[Merged two hcall patches, moved immediate value from esr to x0, use lr
as a scratch register, changed limit to 0xfff]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Today the 'hvc' calling KVM or the hyp-stub is expected to preserve all
registers. KVM saves/restores the registers it needs on the EL2 stack using
do_el2_call(). The hyp-stub has no stack, later patches need to be able to
be able to clobber the link register.
Move the link register save/restore to the the call sites.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We currently have macros defining flags for the arm64 sctlr registers in
both kvm_arm.h and sysreg.h. To clean things up and simplify move the
definitions of the SCTLR_EL2 flags from kvm_arm.h to sysreg.h, rename any
SCTLR_EL1 or SCTLR_EL2 flags that are common to both registers to be
SCTLR_ELx, with 'x' indicating a common flag, and fixup all files to
include the proper header or to use the new macro names.
Signed-off-by: Geoff Levand <geoff@infradead.org>
[Restored pgtable-hwdef.h include]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To allow the assembler macros defined in arch/arm64/mm/proc-macros.S to
be used outside the mm code move the contents of proc-macros.S to
asm/assembler.h. Also, delete proc-macros.S, and fix up all references
to proc-macros.S.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
[rebased, included dcache_by_line_op]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If both ACPI and DT platform descriptions are available, and the
kernel was configured at build time to support both flavours, the
default policy is to prefer DT over ACPI, and preferring ACPI over
DT while still allowing DT as a fallback is not possible.
Since some enterprise features (such as RAS) depend on ACPI, it may
be desirable for, e.g., distro installers to prefer ACPI boot but
fall back to DT rather than failing completely if no ACPI tables are
available.
So introduce the 'acpi=on' kernel command line parameter for arm64,
which signifies that ACPI should be used if available, and DT should
only be used as a fallback.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When booting a relocatable kernel image, there is no practical reason
to refuse an image whose load address is not exactly TEXT_OFFSET bytes
above a 2 MB aligned base address, as long as the physical and virtual
misalignment with respect to the swapper block size are equal, and are
both aligned to THREAD_SIZE.
Since the virtual misalignment is under our control when we first enter
the kernel proper, we can simply choose its value to be equal to the
physical misalignment.
So treat the misalignment of the physical load address as the initial
KASLR offset, and fix up the remaining code to deal with that.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
For historical reasons, the kernel Image must be loaded into physical
memory at a 512 KB offset above a 2 MB aligned base address. The region
between the base address and the start of the kernel Image has no
significance to the kernel itself, but it is currently mapped explicitly
into the early kernel VMA range for all translation granules.
In some cases (i.e., 4 KB granule), this is unavoidable, due to the 2 MB
granularity of the early kernel mappings. However, in other cases, e.g.,
when running with larger page sizes, or in the future, with more granular
KASLR, there is no reason to map it explicitly like we do currently.
So update the logic so that the region is mapped only if that happens as
a side effect of rounding the start address of the kernel to swapper block
size, and leave it unmapped otherwise.
Since the symbol kernel_img_size now simply resolves to the memory
footprint of the kernel Image, we can drop its definition from image.h
and opencode its calculation.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When building a relocatable kernel, we currently rely on the fact that
early 64-bit literal loads need to be deferred to after the relocation
has been performed only if they involve symbol references, and not if
they involve assemble time constants. While this is not an unreasonable
assumption to make, it is better to switch to movk/movz sequences, since
these are guaranteed to be resolved at link time, simply because there are
no dynamic relocation types to describe them.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Implement a macro mov_q that can be used to move an immediate constant
into a 64-bit register, using between 2 and 4 movz/movk instructions
(depending on the operand)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Refactor the relocation processing so that the code executes from the
ID map while accessing the relocation tables via the virtual mapping.
This way, we can use literals containing virtual addresses as before,
instead of having to use convoluted absolute expressions.
For symmetry with the secondary code path, the relocation code and the
subsequent jump to the virtual entry point are implemented in a function
called __primary_switch(), and __mmap_switched() is renamed to
__primary_switched(). Also, the call sequence in stext() is aligned with
the one in secondary_startup(), by replacing the awkward 'adr_l lr' and
'b cpu_setup' sequence with a simple branch and link.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We can simply use a relocated 64-bit literal to store the address of
__secondary_switched(), and the relocation code will ensure that it
holds the correct value at secondary entry time, as long as we make sure
that the literal is not dereferenced until after we have enabled the MMU.
So jump via a small __secondary_switch() function covered by the ID map
that performs the literal load and branch-to-register.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This unexports some symbols from head.S that are only used locally.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
maxcpu=n sets the number of CPUs activated at boot time to a max of n,
but allowing the remaining CPUs to be brought up later if the user
decides to do so. However, on arm64 due to various reasons, we disallowed
hotplugging CPUs beyond n, by marking them not present. Now that
we have checks in place to make sure the hotplugged CPUs have compatible
features with system and requires no new errata, relax the restriction.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CPU Errata work arounds are detected and applied to the
kernel code at boot time and the data is then freed up.
If a new hotplugged CPU requires a work around which
was not applied at boot time, there is nothing we can
do but simply fail the booting.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When introducing the whole CPU feature detection framework,
we lost the capability to detect a mismatched GIC configuration
(using the GICv2 MMIO interface, but having the system register
interface enabled).
In order to solve this, use the new this_cpu_has_cap() helper.
Also move the check to the CPU interface path in order to catch
systems where the first CPU has been correctly configured,
but the secondaries are not.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the capabilities are only available once all the CPUs
have booted, we're unable to check for a particular feature
in any subsystem that gets initialized before then.
In order to support this, introduce a local_cpu_has_cap() function
that tests for the presence of a given capability independently
of the whole framework.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[ Added preemptible() check ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[will: remove duplicate initialisation of caps in this_cpu_has_cap]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add scope parameter to the arm64_cpu_capabilities::matches(), so that
this can be reused for checking the capability on a given CPU vs the
system wide. The system uses the default scope associated with the
capability for initialising the CPU_HWCAPs and ELF_HWCAPs.
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Improve the readability of dt_scan_depth1_nodes by removing the nested
conditionals.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When it's a Xen domain0 booting with ACPI, it will supply a /chosen and
a /hypervisor node in DT. So check if it needs to enable ACPI.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Annotate the KASAN shadow region with boundary markers, so that its
mappings stand out in the page table dumper output.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
There is no need to initialize the vmemmap region boundaries dynamically,
since they are compile time constants. So just add these constants to the
global struct initializer, and drop the dynamic assignment and related code.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The open coded conversion from struct page address to virtual address in
lowmem_page_address() involves an intermediate conversion step to pfn
number/physical address. Since the placement of the struct page array
relative to the linear mapping may be completely independent from the
placement of physical RAM (as is that case for arm64 after commit
dfd55ad85e 'arm64: vmemmap: use virtual projection of linear region'),
the conversion to physical address and back again should factor out of
the equation, but unfortunately, the shifting and pointer arithmetic
involved prevent this from happening, and the resulting calculation
essentially subtracts the address of the start of physical memory and
adds it back again, in a way that prevents the compiler from optimizing
it away.
Since the start of physical memory is not a build time constant on arm64,
the resulting conversion involves an unnecessary memory access, which
we would like to get rid of. So replace the open coded conversion with
a call to page_to_virt(), and use the open coded conversion as its
default definition, to be overriden by the architecture, if desired.
The existing arch specific definitions of page_to_virt are all equivalent
to this default definition, so by itself this patch is a no-op.
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To align with generic code and other architectures that expect the macro
page_to_virt to produce an expression whose type is 'void*', drop the
arch specific definition, which is never referenced anyway.
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
To align with other architectures, the expression produced by expanding
the macro page_to_virt() should be of type void*, since it returns a
virtual address. Fix that, and also fix up an instance where page_to_virt
was expected to return 'unsigned long', and drop another instance that was
entirely unused (page_to_bus)
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
With the IOMMU core now taking care of default domains for groups
regardless of bus type, we can gleefully rip out this stop-gap, as
slight recompense for having to expand the other one.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
PCI devices now suffer the same hiccup as platform devices, in that they
get their DMA ops configured before they have been added to their bus,
and thus before we know whether they have successfully registered with
an IOMMU or not. Until the necessary driver core changes to reorder
calls during device creation have been worked out, extend our delayed
notifier trick onto the PCI bus so as to avoid broken DMA ops once
IOMMUs get plugged into the PCI code.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Make sure we have AArch32 state available for running COMPAT
binaries and also for switching the personality to PER_LINUX32.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
[ Added cap bit, checks for HWCAP, personality ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Add cpu_hwcap bit for keeping track of the support for 32bit EL0.
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
On ARMv8 support for AArch32 state is optional. Hence it is
not safe to check the AArch32 ID registers for sanity, which
could lead to false warnings. This patch makes sure that the
AArch32 state is implemented before we keep track of the 32bit
ID registers.
As per ARM ARM (D.1.21.2 - Support for Exception Levels and
Execution States, DDI0487A.h), checking the support for AArch32
at EL0 is good enough to check the support for AArch32 (i.e,
AArch32 at EL1 => AArch32 at EL0, but not vice versa).
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Adds a helper to extract the support for AArch32 at EL0
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>