Граф коммитов

24219 Коммитов

Автор SHA1 Сообщение Дата
Michael Ellerman 5677e07723 powerpc/ptdump: Fix DEBUG_WX since generic ptdump conversion
commit 8d84fca437 upstream.

In note_prot_wx() we bail out without reporting anything if
CONFIG_PPC_DEBUG_WX is disabled.

But CONFIG_PPC_DEBUG_WX was removed in the conversion to generic ptdump,
we now need to use CONFIG_DEBUG_WX instead.

Fixes: e084728393 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20211203124112.2912562-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-05 12:42:33 +01:00
Russell Currey 9b9f256714 powerpc/module_64: Fix livepatching for RO modules
commit 8734b41b3e upstream.

Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c71e ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
[mpe: Check return codes from patch_instruction()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 09:32:49 +01:00
Xiaoming Ni d0ac17b9ba powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
[ Upstream commit 3dc709e518 ]

When CONFIG_FSL_PMC is set to n, no value is assigned to cpu_up_prepare
in the mpc85xx_pm_ops structure. As a result, oops is triggered in
smp_85xx_start_cpu().

  smp: Bringing up secondary CPUs ...
  kernel tried to execute user page (0) - exploit attempt? (uid: 0)
  BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  Faulting instruction address: 0x00000000
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [00000000] 0x0
  LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
  Call Trace:
  [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
  [c1051de8] [c0011460] __cpu_up+0xc0/0x228
  [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
  [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
  [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
  [c1051eb8] [c07e67bc] smp_init+0x30/0x78
  [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
  [c1051f18] [c00032d8] kernel_init+0x14/0x124
  [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c

Fixes: c45361abb9 ("powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n")
Reported-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Tested-by: Martin Kennedy <hurricos@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211126041153.16926-1-nixiaoming@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22 09:32:45 +01:00
Alexey Kardashevskiy 05d27cd9bc powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window
[ Upstream commit ad3976025b ]

There is a possibility of having just one DMA window available with
a limited capacity which the existing code does not handle that well.
If the window is big enough for the system RAM but less than
MAX_PHYSMEM_BITS (which we want when persistent memory is present),
we create 1:1 window and leave persistent memory without DMA.

This disables 1:1 mapping entirely if there is persistent memory and
either:
- the huge DMA window does not cover the entire address space;
- the default DMA window is removed.

This relies on reverted 54fc3c681d
("powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory")
to return the actual amount RAM in ddw_memory_hotplug_max() (posted
separately).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-4-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08 09:04:37 +01:00
Alexey Kardashevskiy b67ff10e43 powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"
[ Upstream commit 2d33f55044 ]

This reverts commit 54fc3c681d
which does not allow 1:1 mapping even for the system RAM which
is usually possible.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-2-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08 09:04:37 +01:00
Christophe Leroy c4e3ff8b8b powerpc/32: Fix hardlockup on vmap stack overflow
commit 5bb60ea611 upstream.

Since the commit c118c7303a ("powerpc/32: Fix vmap stack - Do not
activate MMU before reading task struct") a vmap stack overflow
results in a hard lockup. This is because emergency_ctx is still
addressed with its virtual address allthough data MMU is not active
anymore at that time.

Fix it by using a physical address instead.

Fixes: c118c7303a ("powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce30364fb7ccda489272af4a1612b6aa147e1d23.1637227521.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:04:44 +01:00
Nicholas Piggin 83247fdb94 KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
commit cf0b0e3712 upstream.

The POWER9 ERAT flush instruction is a SLBIA with IH=7, which is a
reserved value on POWER7/8. On POWER8 this invalidates the SLB entries
above index 0, similarly to SLBIA IH=0.

If the SLB entries are invalidated, and then the guest is bypassed, the
host SLB does not get re-loaded, so the bolted entries above 0 will be
lost. This can result in kernel stack access causing a SLB fault.

Kernel stack access causing a SLB fault was responsible for the infamous
mega bug (search "Fix SLB reload bug"). Although since commit
48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C") that
starts using the kernel stack in the SLB miss handler, it might only
result in an infinite loop of SLB faults. In any case it's a bug.

Fix this by only executing the instruction on >= POWER9 where IH=7 is
defined not to invalidate the SLB. POWER7/8 don't require this ERAT
flush.

Fixes: 5008711259 ("KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119031627.577853-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01 09:04:43 +01:00
Eric W. Biederman 686bf79203 signal: Replace force_fatal_sig with force_exit_sig when in doubt
commit fcb116bc43 upstream.

Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.

Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.

Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit.  This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.

This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.

In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig.  That can be done where it matters on
a case-by-case basis with careful analysis.

Reported-by: Kyle Huey <me@kylehuey.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c02 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27ad7 ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf07 ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444f8 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306d5 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d634 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df8a ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5bf1 ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Backlund <tmb@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:49:07 +01:00
Eric W. Biederman 02d28b5fdb signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
commit e21294a7aa upstream.

Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Backlund <tmb@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:49:06 +01:00
Eric W. Biederman c7b7868dba signal/powerpc: On swapcontext failure force SIGSEGV
commit 83a1f27ad7 upstream.

If the register state may be partial and corrupted instead of calling
do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
process with SIGSEGV and does not let any more userspace code execute,
instead of just killing one thread of the process and potentially
confusing everything.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
Link: https://lkml.kernel.org/r/20211020174406.17889-7-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Thomas Backlund <tmb@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:49:06 +01:00
Nicholas Piggin c3b0ab956d printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces
commit 5d5e4522a7 upstream.

printk from NMI context relies on irq work being raised on the local CPU
to print to console. This can be a problem if the NMI was raised by a
lockup detector to print lockup stack and regs, because the CPU may not
enable irqs (because it is locked up).

Introduce printk_trigger_flush() that can be called another CPU to try
to get those messages to the console, call that where printk_safe_flush
was previously called.

Fixes: 93d102f094 ("printk: remove safe buffers")
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211107045116.1754411-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:48:45 +01:00
Christophe Leroy de04ee7d7d powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWX
commit 1e35eba405 upstream.

As spotted and explained in commit c12ab8dbc4 ("powerpc/8xx: Fix
Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST"), the selection
of STRICT_KERNEL_RWX without selecting DEBUG_RODATA_TEST has spotted
the lack of the DIRTY bit in the pinned kernel data TLBs.

This problem should have been detected a lot earlier if things had
been working as expected. But due to an incredible level of chance or
mishap, this went undetected because of a set of bugs: In fact the
DTLBs were not pinned, because instead of setting the reserve bit
in MD_CTR, it was set in MI_CTR that is the register for ITLBs.

But then, another huge bug was there: the physical address was
reset to 0 at the boundary between RO and RW areas, leading to the
same physical space being mapped at both 0xc0000000 and 0xc8000000.
This had by miracle no consequence until now because the entry was
not really pinned so it was overwritten soon enough to go undetected.

Of course, now that we really pin the DTLBs, it must be fixed as well.

Fixes: f76c8f6d25 ("powerpc/8xx: Add function to set pinned TLBs")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Depends-on: c12ab8dbc4 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a21e9a057fe2d247a535aff0d157a54eefee017a.1636963688.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:48:44 +01:00
Cédric Le Goater 54e11a4e9d powerpc/xive: Change IRQ domain to a tree domain
commit 8e80a73fa9 upstream.

Commit 4f86a06e2d ("irqdomain: Make normal and nomap irqdomains
exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the
'nomap' domains still in use under the powerpc arch. With this new
flag, the revmap_tree of the IRQ domain is not used anymore. This
change broke the support of shared LSIs [1] in the XIVE driver because
it was relying on a lookup in the revmap_tree to query previously
mapped interrupts. Linux now creates two distinct IRQ mappings on the
same HW IRQ which can lead to unexpected behavior in the drivers.

The XIVE IRQ domain is not a direct mapping domain and its HW IRQ
interrupt number space is rather large : 1M/socket on POWER9 and
POWER10, change the XIVE driver to use a 'tree' domain type instead.

[1] For instance, a linux KVM guest with virtio-rng and virtio-balloon
    devices.

Fixes: 4f86a06e2d ("irqdomain: Make normal and nomap irqdomains exclusive")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116134022.420412-1-clg@kaod.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:48:44 +01:00
Christophe Leroy 7cc16be1ae powerpc/signal32: Fix sigset_t copy
commit 5499802b22 upstream.

The conversion from __copy_from_user() to __get_user() by
commit d3ccc97815 ("powerpc/signal: Use __get_user() to copy
sigset_t") introduced a regression in __get_user_sigset() for
powerpc/32. The bug was subsequently moved into
unsafe_get_user_sigset().

The bug is due to the copied 64 bit value being truncated to
32 bits while being assigned to dst->sig[0]

The regression was reported by users of the Xorg packages distributed in
Debian/powerpc --

    "The symptoms are that the fb screen goes blank, with the backlight
    remaining on and no errors logged in /var/log; wdm (or startx) run
    with no effect (I tried logging in in the blind, with no effect).
    And they are hard to kill, requiring 'kill -KILL ...'"

Fix the regression by copying each word of the sigset, not only the
first one.

__get_user_sigset() was tentatively optimised to copy 64 bits at once
in order to minimise KUAP unlock/lock impact, but the unsafe variant
doesn't suffer that, so it can just copy words.

Fixes: 887f3ceb51 ("powerpc/signal32: Convert do_setcontext[_tm]() to user access block")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Finn Thain <fthain@linux-m68k.org>
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99ef38d61c0eb3f79c68942deb0c35995a93a777.1636966353.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25 09:48:44 +01:00
Nicholas Piggin ad03b901d0 powerpc/pseries: Fix numa FORM2 parsing fallback code
[ Upstream commit 302039466f ]

In case the FORM2 distance table from firmware is not the expected size,
there is fallback code that just populates the lookup table as local vs
remote.

However it then continues on to use the distance table. Fix.

Fixes: 1c6b5a7e74 ("powerpc/pseries: Add support for FORM2 associativity")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109064900.2041386-2-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:41 +01:00
Nicholas Piggin ad9ade6c94 powerpc/pseries: rename numa_dist_table to form2_distances
[ Upstream commit 0bd81274e3 ]

The name of the local variable holding the "form2" property address
conflicts with the numa_distance_table global.

This patch does 's/numa_dist_table/form2_distances/g' over the function,
which also renames numa_dist_table_length to form2_distances_length.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109064900.2041386-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:41 +01:00
Masahiro Yamada a0995ebe4e powerpc: clean vdso32 and vdso64 directories
[ Upstream commit 964c33cd0b ]

Since commit bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of
vgettimeofday.o"), "make ARCH=powerpc clean" does not clean up the
arch/powerpc/kernel/{vdso32,vdso64} directories.

Use the subdir- trick to let "make clean" descend into them.

Fixes: bce74491c3 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211109185015.615517-1-masahiroy@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:40 +01:00
Michael Ellerman a7e7002571 KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()
[ Upstream commit dae5818646 ]

kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into
it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because
kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable.

When called from hcall_try_real_mode() we have the kernel TOC in r2,
established near the start of kvmppc_interrupt_hv(), so there is no
issue.

But they can also be called from kvmppc_pseries_do_hcall() which is
module code, so the access ends up happening with the kvm-hv module's
r2, which will not point at dawr_force_enable and could even cause a
fault.

With the current code layout and compilers we haven't observed a fault
in practice, the load hits somewhere in kvm-hv.ko and silently returns
some bogus value.

Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses
h_set_dabr() to test if sc1 works correctly, see SLOF's
lib/libhvcall/brokensc1.c.

Fixes: c1fe190c06 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Link: https://lore.kernel.org/r/20210923151031.72408-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:40 +01:00
Christophe Leroy 23274bd8d7 powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
[ Upstream commit c12ab8dbc4 ]

Until now, all tests involving CONFIG_STRICT_KERNEL_RWX were done with
DEBUG_RODATA_TEST to check the result. But now that
CONFIG_STRICT_KERNEL_RWX is selected by default, it came without
CONFIG_DEBUG_RODATA_TEST and led to the following Oops

[    6.830908] Freeing unused kernel image (initmem) memory: 352K
[    6.840077] BUG: Unable to handle kernel data access on write at 0xc1285200
[    6.846836] Faulting instruction address: 0xc0004b6c
[    6.851745] Oops: Kernel access of bad area, sig: 11 [#1]
[    6.857075] BE PAGE_SIZE=16K PREEMPT CMPC885
[    6.861348] SAF3000 DIE NOTIFICATION
[    6.864830] CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0-rc5-s3k-dev-02255-g2747d7b7916f #451
[    6.873429] NIP:  c0004b6c LR: c0004b60 CTR: 00000000
[    6.878419] REGS: c902be60 TRAP: 0300   Not tainted  (5.15.0-rc5-s3k-dev-02255-g2747d7b7916f)
[    6.886852] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 53000335  XER: 8000ff40
[    6.893564] DAR: c1285200 DSISR: 82000000
[    6.893564] GPR00: 0c000000 c902bf20 c20f4000 08000000 00000001 04001f00 c1800000 00000035
[    6.893564] GPR08: ff0001ff c1280000 00000002 c0004b60 00001000 00000000 c0004b1c 00000000
[    6.893564] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.893564] GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c1060000
[    6.932034] NIP [c0004b6c] kernel_init+0x50/0x138
[    6.936682] LR [c0004b60] kernel_init+0x44/0x138
[    6.941245] Call Trace:
[    6.943653] [c902bf20] [c0004b60] kernel_init+0x44/0x138 (unreliable)
[    6.950022] [c902bf30] [c001122c] ret_from_kernel_thread+0x5c/0x64
[    6.956135] Instruction dump:
[    6.959060] 48ffc521 48045469 4800d8cd 3d20c086 89295fa0 2c090000 41820058 480796c9
[    6.966890] 4800e48d 3d20c128 39400002 3fe0c106 <91495200> 3bff8000 4806fa1d 481f7d75
[    6.974902] ---[ end trace 1e397bacba4aa610 ]---

0xc1285200 corresponds to 'system_state' global var that the kernel is trying to set to
SYSTEM_RUNNING. This var is above the RO/RW limit so it shouldn't Oops.

It oopses because the dirty bit is missing.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3d5800b0bbcd7b19761b98f50421358667b45331.1635520232.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:30 +01:00
Michael Ellerman 2f4dede20c powerpc/dcr: Use cmplwi instead of 3-argument cmpli
[ Upstream commit fef071be57 ]

In dcr-low.S we use cmpli with three arguments, instead of four
arguments as defined in the ISA:

	cmpli	cr0,r3,1024

This appears to be a PPC440-ism, looking at the "PPC440x5 CPU Core
User’s Manual" it shows cmpli having no L field, but implied to be 0 due
to the core being 32-bit. It mentions that the ISA defines four
arguments and recommends using cmplwi.

It also corresponds to the old POWER instruction set, which had no L
field there, a reserved bit instead.

dcr-low.S is only built 32-bit, because it is only built when
DCR_NATIVE=y, which is only selected by 40x and 44x. Looking at the
generated code (with gcc/gas) we see cmplwi as expected.

Although gas is happy with the 3-argument version when building for
32-bit, the LLVM assembler is not and errors out with:

  arch/powerpc/sysdev/dcr-low.S:27:10: error: invalid operand for instruction
   cmpli 0,%r3,1024; ...
           ^

Switch to the cmplwi extended opcode, which avoids any confusion when
reading the ISA, fixes the issue with the LLVM assembler, and also means
the code could be built 64-bit in future (though that's very unlikely).

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
BugLink: https://github.com/ClangBuiltLinux/linux/issues/1419
Link: https://lore.kernel.org/r/20211014024424.528848-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:30 +01:00
Anatolij Gustschin 72b4e7b7be powerpc/5200: dts: fix memory node unit name
[ Upstream commit aed2886a5e ]

Fixes build warnings:
Warning (unit_address_vs_reg): /memory: node has a reg or ranges property, but no unit name

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013220532.24759-4-agust@denx.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:30 +01:00
Xiaoming Ni 0084aaaec2 powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
commit c45361abb9 upstream.

When CONFIG_SMP=y, timebase synchronization is required when the second
kernel is started.

arch/powerpc/kernel/smp.c:
  int __cpu_up(unsigned int cpu, struct task_struct *tidle)
  {
  	...
  	if (smp_ops->give_timebase)
  		smp_ops->give_timebase();
  	...
  }

  void start_secondary(void *unused)
  {
  	...
  	if (smp_ops->take_timebase)
  		smp_ops->take_timebase();
  	...
  }

When CONFIG_HOTPLUG_CPU=n and CONFIG_KEXEC_CORE=n,
 smp_85xx_ops.give_timebase is NULL,
 smp_85xx_ops.take_timebase is NULL,
As a result, the timebase is not synchronized.

Timebase  synchronization does not depend on CONFIG_HOTPLUG_CPU.

Fixes: 56f1ba2807 ("powerpc/mpc85xx: refactor the PM operations")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210929033646.39630-3-nixiaoming@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:20 +01:00
Nathan Lynch 310b6f976c powerpc/pseries/mobility: ignore ibm, platform-facilities updates
commit 319fa1a52e upstream.

On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:

  $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
  /sys/firmware/devicetree/base/ibm,platform-facilities/
  ├── ibm,compression-v1
  ├── ibm,random-v1
  └── ibm,sym-encryption-v1

  3 directories

The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.

But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):

  mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
  mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
  mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
  mobility: removing node /ibm,platform-facilities:4294967286
  ...
  mobility: added node /ibm,platform-facilities:4294967286

Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.

Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.

We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:

  refcount_t: addition on 0; use-after-free.
  WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
  refcount_warn_saturate+0x160/0x1f0 (unreliable)
  kobject_get+0xf0/0x100
  of_node_get+0x30/0x50
  of_get_parent+0x50/0xb0
  of_fwnode_get_parent+0x54/0x90
  fwnode_count_parents+0x50/0x150
  fwnode_full_name_string+0x30/0x110
  device_node_string+0x49c/0x790
  vsnprintf+0x1c0/0x4c0
  sprintf+0x44/0x60
  devspec_show+0x34/0x50
  dev_attr_show+0x40/0xa0
  sysfs_kf_seq_show+0xbc/0x200
  kernfs_seq_show+0x44/0x60
  seq_read_iter+0x2a4/0x740
  kernfs_fop_read_iter+0x254/0x2e0
  new_sync_read+0x120/0x190
  vfs_read+0x1d0/0x240

Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:

  $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
  /sys/firmware/devicetree/base/ibm,platform-facilities/

  0 directories

  $ cd /sys/devices/vio ; find . -xtype l -exec file {} +
  ./ibm,sym-encryption-v1/of_node: broken symbolic link to
    ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
  ./ibm,random-v1/of_node:         broken symbolic link to
    ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
  ./ibm,compression-v1/of_node:    broken symbolic link to
    ../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1

This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.

One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.

The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.

Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.

Fixes: 410bccf978 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Nicholas Piggin d05dc4bdc3 powerpc/64s/interrupt: Fix check_return_regs_valid() false positive
commit 4a5cb51f3d upstream.

The check_return_regs_valid() can cause a false positive if the return
regs are marked as norestart and they are an HSRR type interrupt,
because the low bit in the bottom of regs->trap causes interrupt type
matching to fail.

This can occcur for example on bare metal with a HV privileged doorbell
interrupt that causes a signal, but do_signal returns early because
get_signal() fails, and takes the "No signal to deliver" path. In this
case no signal was delivered so the return location is not changed so
return SRRs are not invalidated, yet set_trap_norestart is called, which
messes up the match. Building go-1.16.6 is known to reproduce this.

Fix it by using the TRAP() accessor which masks out the low bit.

Fixes: 6eaaf9de35 ("powerpc/64s/interrupt: Check and fix srr_valid without crashing")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211026122531.3599918-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Russell Currey 02da363241 powerpc/security: Use a mutex for interrupt exit code patching
commit 3c12b4df8d upstream.

The mitigation-patching.sh script in the powerpc selftests toggles
all mitigations on and off simultaneously, revealing that rfi_flush
and stf_barrier cannot safely operate at the same time due to races
in updating the static key.

On some systems, the static key code throws a warning and the kernel
remains functional.  On others, the kernel will hang or crash.

Fix this by slapping on a mutex.

Fixes: 13799748b9 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211027072410.40950-1-ruscur@russell.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Vasant Hegde 7c8ad3fc64 powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
commit 52862ab33c upstream.

Commit 587164cd, introduced new opal message type (OPAL_MSG_PRD2) and
added opal notifier. But I missed to unregister the notifier during
module unload path. This results in below call trace if you try to
unload and load opal_prd module.

Also add new notifier_block for OPAL_MSG_PRD2 message.

Sample calltrace (modprobe -r opal_prd; modprobe opal_prd)
  BUG: Unable to handle kernel data access on read at 0xc0080000192200e0
  Faulting instruction address: 0xc00000000018d1cc
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 66 PID: 7446 Comm: modprobe Kdump: loaded Tainted: G            E     5.14.0prd #759
  NIP:  c00000000018d1cc LR: c00000000018d2a8 CTR: c0000000000cde10
  REGS: c0000003c4c0f0a0 TRAP: 0300   Tainted: G            E      (5.14.0prd)
  MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24224824  XER: 20040000
  CFAR: c00000000018d2a4 DAR: c0080000192200e0 DSISR: 40000000 IRQMASK: 1
  ...
  NIP notifier_chain_register+0x2c/0xc0
  LR  atomic_notifier_chain_register+0x48/0x80
  Call Trace:
    0xc000000002090610 (unreliable)
    atomic_notifier_chain_register+0x58/0x80
    opal_message_notifier_register+0x7c/0x1e0
    opal_prd_probe+0x84/0x150 [opal_prd]
    platform_probe+0x78/0x130
    really_probe+0x110/0x5d0
    __driver_probe_device+0x17c/0x230
    driver_probe_device+0x60/0x130
    __driver_attach+0xfc/0x220
    bus_for_each_dev+0xa8/0x130
    driver_attach+0x34/0x50
    bus_add_driver+0x1b0/0x300
    driver_register+0x98/0x1a0
    __platform_driver_register+0x38/0x50
    opal_prd_driver_init+0x34/0x50 [opal_prd]
    do_one_initcall+0x60/0x2d0
    do_init_module+0x7c/0x320
    load_module+0x3394/0x3650
    __do_sys_finit_module+0xd4/0x160
    system_call_exception+0x140/0x290
    system_call_common+0xf4/0x258

Fixes: 587164cd59 ("powerpc/powernv: Add new opal message type")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211028165716.41300-1-hegdevasant@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Nicholas Piggin 317cc5bacf powerpc/32e: Ignore ESR in instruction storage interrupt handler
commit 81291383ff upstream.

A e5500 machine running a 32-bit kernel sometimes hangs at boot,
seemingly going into an infinite loop of instruction storage interrupts.

The ESR (Exception Syndrome Register) has a value of 0x800000 (store)
when this happens, which is likely set by a previous store. An
instruction TLB miss interrupt would then leave ESR unchanged, and if no
PTE exists it calls directly to the instruction storage interrupt
handler without changing ESR.

access_error() does not cause a segfault due to a store to a read-only
vma because is_exec is true. Most subsequent fault handling does not
check for a write fault on a read-only vma, and might do strange things
like create a writeable PTE or call page_mkwrite on a read only vma or
file. It's not clear what happens here to cause the infinite faulting in
this case, a fault handler failure or low level PTE or TLB handling.

In any case this can be fixed by having the instruction storage
interrupt zero regs->dsisr rather than storing the ESR value to it.

Fixes: a01a3f2ddb ("powerpc: remove arguments from fault handler functions")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Jacques de Laval <jacques.delaval@protonmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Jacques de Laval <jacques.delaval@protonmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211028133043.4159501-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Hari Bathini 6f657bb66f powerpc/bpf: Fix write protecting JIT code
commit 44a8214de9 upstream.

Running program with bpf-to-bpf function calls results in data access
exception (0x300) with the below call trace:

  bpf_int_jit_compile+0x238/0x750 (unreliable)
  bpf_check+0x2008/0x2710
  bpf_prog_load+0xb00/0x13a0
  __sys_bpf+0x6f4/0x27c0
  sys_bpf+0x2c/0x40
  system_call_exception+0x164/0x330
  system_call_vectored_common+0xe8/0x278

as bpf_int_jit_compile() tries writing to write protected JIT code
location during the extra pass.

Fix it by holding off write protection of JIT code until the extra
pass, where branch target addresses fixup happens.

Fixes: 62e3d4210a ("powerpc/bpf: Write protect JIT code")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211025055649.114728-1-hbathini@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Gustavo A. R. Silva 0356cc5d27 powerpc/vas: Fix potential NULL pointer dereference
commit 61cb9ac66b upstream.

(!ptr && !ptr->foo) strikes again. :)

The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL,
it leads to a NULL pointer dereference: ptr->foo.

Fix this by converting && to ||

This issue was detected with the help of Coccinelle, and audited and
fixed manually.

Fixes: 1a0d0d5ed5 ("powerpc/vas: Add platform specific user window operations")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015050345.GA1161918@embeddedor
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:19 +01:00
Christophe Leroy dfbcaef202 powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC
[ Upstream commit f8c0e36b48 ]

When ARCH_SUPPORTS_DEBUG_PAGEALLOC is not selected, the user can
still select CONFIG_DEBUG_PAGEALLOC in which case __kernel_map_pages()
is provided by mm/page_poison.c

So only define __kernel_map_pages() when both
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC
are defined.

Fixes: 68b44f94d6 ("powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/971b69739ff4746252e711a9845210465c023a9e.1635425947.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:57 +01:00
Denis Kirjanov f5927a9973 powerpc/xmon: fix task state output
[ Upstream commit b1f896ce35 ]

p_state is unsigned since the commit 2f064a59a1

The patch also uses TASK_RUNNING instead of null.

Fixes: 2f064a59a1 ("sched: Change task_struct::state")
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211026133108.7113-1-kda@linux-powerpc.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:57 +01:00
Bixuan Cui d6b102db21 powerpc/44x/fsp2: add missing of_node_put
[ Upstream commit 290fe8aa69 ]

Early exits from for_each_compatible_node() should decrement the
node reference counter.  Reported by Coccinelle:

./arch/powerpc/platforms/44x/fsp2.c:206:1-25: WARNING: Function
"for_each_compatible_node" should have of_node_put() before return
around line 218.

Fixes: 7813043e1b ("powerpc/44x/fsp2: Add irq error handlers")
Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1635406102-88719-1-git-send-email-cuibixuan@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:57 +01:00
Christophe Leroy 5a85790059 powerpc/book3e: Fix set_memory_x() and set_memory_nx()
[ Upstream commit b6cb20fdc2 ]

set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.

Book3e has 2 bits, UX and SX, which defines the exec rights
resp. for user (PR=1) and for kernel (PR=0).

_PAGE_EXEC is defined as UX only.

An executable kernel page is set with either _PAGE_KERNEL_RWX
or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.

So set_memory_nx() call for an executable kernel page does
nothing because UX is already cleared.

And set_memory_x() on a non-executable kernel page makes it
executable for the user and keeps it non-executable for kernel.

Also, pte_exec() always returns 'false' on kernel pages, because
it checks _PAGE_EXEC which doesn't include SX, so for instance
the W+X check doesn't work.

To fix this:
  - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
  - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns
    true whenever one of the two bits is set and pte_exprotect()
    clears both bits.
  - Define a book3e specific version of pte_mkexec() which sets
    either SX or UX based on UR.

Fixes: 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:57 +01:00
Christophe Leroy f748277139 powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
[ Upstream commit b1b93cb7e7 ]

Commit 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
changed those two functions to use pte helpers to determine which
bits to clear and which bits to set.

This change was based on the assumption that bits to be set/cleared
are always the same and can be determined by applying the pte
manipulation helpers on __pte(0).

But on platforms like book3e, the bits depend on whether the page
is a user page or not.

For the time being it more or less works because of _PAGE_EXEC being
used for user pages only and exec right being set at all time on
kernel page. But following patch will clean that and output of
pte_mkexec() will depend on the page being a user or kernel page.

Instead of trying to make an even more complicated helper where bits
would become dependent on the final pte value, come back to a more
static situation like before commit 26973fa5ac ("powerpc/mm: use
pte helpers in generic code"), by introducing an 8xx specific
version of __ptep_set_access_flags() and ptep_set_wrprotect().

Fixes: 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/922bdab3a220781bae2360ff3dd5adb7fe4d34f1.1635226743.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:56 +01:00
Christophe Leroy 46256f11c2 powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE
[ Upstream commit 68b44f94d6 ]

fsl_booke and 44x are not able to map kernel linear memory with
pages, so they can't support DEBUG_PAGEALLOC and KFENCE, and
STRICT_KERNEL_RWX is also a problem for now.

Enable those only on book3s (both 32 and 64 except KFENCE), 8xx and 40x.

Fixes: 88df6e90fa ("[POWERPC] DEBUG_PAGEALLOC for 32-bit")
Fixes: 95902e6c88 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
Fixes: 90cbac0e99 ("powerpc: Enable KFENCE for PPC32")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d1ad9fdd9b27da3fdfa16510bb542ed51fa6e134.1634292136.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:54 +01:00
Athira Rajeev 547eae8d29 powerpc/perf: Fix cycles/instructions as PM_CYC/PM_INST_CMPL in power10
[ Upstream commit 8f6aca0e0f ]

On power9 and earlier platforms, the default event used for cyles and
instructions is PM_CYC (0x0001e) and PM_INST_CMPL (0x00002)
respectively. These events use two programmable PMCs and by default will
count irrespective of the run latch state (idle state). But since they
use programmable PMCs, these events can lead to multiplexing with other
events, because there are only 4 programmable PMCs. Hence in power10,
performance monitoring unit (PMU) driver uses performance monitor
counter 5 (PMC5) and performance monitor counter6 (PMC6) for counting
instructions and cycles.

Currently on power10, the event used for cycles is PM_RUN_CYC (0x600F4)
and instructions uses PM_RUN_INST_CMPL (0x500fa). But counting of these
events in idle state is controlled by the CC56RUN bit setting in Monitor
Mode Control Register0 (MMCR0). If the CC56RUN bit is zero, PMC5/6 will
not count when CTRL[RUN] (run latch) is zero. This could lead to missing
some counts if a thread is in idle state during system wide profiling.

To fix it, set the CC56RUN bit in MMCR0 for power10, which makes PMC5
and PMC6 count instructions and cycles regardless of the run latch
state. Since this change make PMC5/6 count as PM_INST_CMPL/PM_CYC,
rename the event code 0x600f4 as PM_CYC instead of PM_RUN_CYC and event
code 0x500fa as PM_INST_CMPL instead of PM_RUN_INST_CMPL. The changes
are only for PMC5/6 event codes and will not affect the behaviour of
PM_RUN_CYC/PM_RUN_INST_CMPL if progammed in other PMC's.

Fixes: a64e697cef ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.cm>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
[mpe: Tweak change log wording for style and consistency]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007075121.28497-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:52 +01:00
Nathan Lynch e271314bf2 powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()
[ Upstream commit fda0eb2200 ]

vcpu_is_preempted() can be used outside of preempt-disabled critical
sections, yielding warnings such as:

BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
caller is rwsem_spin_on_owner+0x1cc/0x2d0
CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
Call Trace:
[c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
[c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
[c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
[c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
[c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
[c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
[c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
[c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
[c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250

The result of vcpu_is_preempted() is always used speculatively, and the
function does not access per-cpu resources in a (Linux) preempt-unsafe way.
Use raw_smp_processor_id() to avoid such warnings, adding explanatory
comments.

Fixes: ca3f969dcb ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928214147.312412-3-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:51 +01:00
Nathan Lynch ddd95eb01a powerpc: fix unbalanced node refcount in check_kvm_guest()
[ Upstream commit 56537faf88 ]

When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
returns without performing a matching put for the lookup, leaving the
node's reference count elevated.

Add the necessary call to of_node_put(), rearranging the code slightly to
avoid repetition or goto.

Fixes: 107c55005f ("powerpc/pseries: Add KVM guest doorbell restrictions")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928124550.132020-1-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:51 +01:00
Christophe Leroy 1164485548 powerpc/mem: Fix arch/powerpc/mm/mem.c:53:12: error: no previous prototype for 'create_section_mapping'
[ Upstream commit 7eff9bc00d ]

Commit 8e11d62e2e ("powerpc/mem: Add back missing header to fix 'no
previous prototype' error") was supposed to fix the problem, but in
the meantime commit a927bd6ba9 ("mm: fix phys_to_target_node() and*
memory_add_physaddr_to_nid() exports") moved create_section_mapping()
prototype from asm/sparsemem.h to asm/mmzone.h

Fixes: 8e11d62e2e ("powerpc/mem: Add back missing header to fix 'no previous prototype' error")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/025754fde3d027904ae9d0191f395890bec93369.1631541649.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:51 +01:00
Xiaoming Ni e1bb995036 powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
commit 3c2172c1c4 upstream.

When the field described in mpc85xx_smp_guts_ids[] is not configured in
dtb, the mpc85xx_setup_pmc() does not assign a value to the "guts"
variable. As a result, the oops is triggered when
mpc85xx_freeze_time_base() is executed.

Fixes: 56f1ba2807 ("powerpc/mpc85xx: refactor the PM operations")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210929033646.39630-2-nixiaoming@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:16:03 +01:00
Laurent Vivier 0e47821087 KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
commit 235cee1624 upstream.

Commit 112665286d ("KVM: PPC: Book3S HV: Context tracking exit guest
context before enabling irqs") moved guest_exit() into the interrupt
protected area to avoid wrong context warning (or worse). The problem is
that tick-based time accounting has not yet been updated at this point
(because it depends on the timer interrupt firing), so the guest time
gets incorrectly accounted to system time.

To fix the problem, follow the x86 fix in commit 1604571401 ("Defer
vtime accounting 'til after IRQ handling"), and allow host IRQs to run
before accounting the guest exit time.

In the case vtime accounting is enabled, this is not required because TB
is used directly for accounting.

Before this patch, with CONFIG_TICK_CPU_ACCOUNTING=y in the host and a
guest running a kernel compile, the 'guest' fields of /proc/stat are
stuck at zero. With the patch they can be observed increasing roughly as
expected.

Fixes: e233d54d4d ("KVM: booke: use __kvm_guest_exit")
Fixes: 112665286d ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs")
Cc: stable@vger.kernel.org # 5.12+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
[np: only required for tick accounting, add Book3E fix, tweak changelog]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211027142150.3711582-1-npiggin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:15:57 +01:00
Linus Torvalds 119c85055d powerpc fixes for 5.15 #6
Three commits fixing some issues introduced with the recent IOMMU changes we merged.
 
 Thanks to: Alexey Kardashevskiy
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmF8e20THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG4GEAC6SAo82kdWOR5MEwZ+U9ht91KqzK/O
 UYm5WLapGaWzKtS5zMZUzc3dTXFsbeSdPghfGp2eLpnkQkfDnltyGA4ERVunEQUg
 soGXt0OXBfmt++D+yogMkURr9tYzZ+ssrdCEC6Vmuv35Tf/dsHA9upWCVVw4UOOv
 w/RRR1uku6kup7NCX6TJZoUQSeAXISDhwk3LXF5jh/hqSyV3BV9yulHYs3J4WtFQ
 zr2dsxXL5DvgM3cOmLBZbnmTYSmU1f0jtGhqEf/6Ar3ljLHZgIvLLWmZK1UyyMwv
 4P7yFIBVObW0sBqGz/4K9p73l8MnVHORrBUe73OavZgrxaM2u0k/7JRj72txcbMw
 KXNKvDoA9nambWV98OXt4HE2bInsporn4DSDSAwJZmuTcqltbFOCf47t0kpe66fB
 ZC2IJCOfrol5ELZvmdAsuVWEucHkdPMPqz0ZB46E/givQf45RmfYnjvVKm8HTQls
 aOivKNuP2VLXhHdeocly1adaDMaeMEouYw1p00VKhOW4GbfHCZs4IczDtqacHRMR
 NB+C0awi6DDQ7WXCEYURte9iURn8owjJhMuIsQj2/SUHVfuQEeJL1llSzo9sa/f2
 ONT0esDhKSuq74T1nmimUMOvAyBg/TKuwY87TJ9BKcJzqSj+oPuuw28d8HBi61qk
 S9rG2jHto84HVw==
 =R9Yq
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Three commits fixing some issues introduced with the recent IOMMU
  changes we merged.

  Thanks to Alexey Kardashevskiy"

* tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
  powerpc/pseries/iommu: Check if the default window in use before removing it
  powerpc/pseries/iommu: Use correct vfree for it_map
2021-10-29 17:35:56 -07:00
Alexey Kardashevskiy d853adc7ad powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present
The iommu_init_table() helper takes an address range to reserve in
the IOMMU table being initialized to exclude MMIO addresses, this is
useful if the window stretches far beyond 4GB (although wastes some TCEs).
At the moment the code searches for such MMIO32 range and fails if none
found which is considered a problem while it really is not: it is actually
better as this says there is no MMIO32 to reserve and we can use
usually wasted TCEs. Furthermore PHYP never actually allows creating
windows starting at busaddress=0 so this MMIO32 range is never useful.

This removes error exit and initializes the table with zero range if
no MMIO32 is detected.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-5-aik@ozlabs.ru
2021-10-25 11:41:15 +11:00
Alexey Kardashevskiy 92fe01b7c6 powerpc/pseries/iommu: Check if the default window in use before removing it
At the moment this check is performed after we remove the default window
which is late and disallows to revert whatever changes enable_ddw()
has made to DMA windows.

This moves the check and error exit before removing the window.

This raised the message severity from "debug" to "warning" as this
should not happen in practice and cannot be triggered by the userspace.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-4-aik@ozlabs.ru
2021-10-25 11:41:14 +11:00
Alexey Kardashevskiy 41ee7232fa powerpc/pseries/iommu: Use correct vfree for it_map
The it_map array is vzalloc'ed so use vfree() for it when creating
a huge DMA window failed for whatever reason.

While at this, write zero to it_map.

Fixes: 381ceda88c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020132315.2287178-3-aik@ozlabs.ru
2021-10-25 11:41:14 +11:00
Linus Torvalds 0a3221b658 powerpc fixes for 5.15 #5
Fix a bug exposed by a previous fix, where running guests with certain SMT topologies
 could crash the host on Power8.
 
 Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is enabled.
 
 Thanks to: Nathan Lynch, Srikar Dronamraju, Valentin Schneider.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFxTaUTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFj/D/9J/rz1zYKxvPd+o5WsJBWrgsVBxgGr
 cIhtKARlh0Yoa9fhIYF6ir4HUT2TOhokwJJco7fcND73rfrHfbWshWEzwaR1Lz/2
 WXr0Np509GQw1KCC9nv5DoE5qqLLL+rS+ul75Khkoi99wa/TDD8kQ5o3zUWtDoad
 D6paUIOf+DjnzzD7hIuTljfuOL/P3S4OyarQEopD69n8IgCTCrSsKdzOzAvlKcuL
 c3OoGAY5gasHowfxLfOigkCYloT4gJES7Kxl92QnPByOJUfqhjfKxiN4l0ydR6Bl
 4dUFUp/PkvvmTZiIA3tFzu0fHOgjUCvICyEAewLX7Bh9fXAVYMRc1a2l4LrCWD8T
 hQqQFskxa/viG7Uq2N58Rts9GtF5nG5d6+4l5Ndxus2+tUledjiuzOv3nsSq64TF
 y+Gws66a+A8a7DjcHo281BWzWn3XCIuug4A0doyvJFNdiy/+zj3ojfUhppZam8ca
 DYo7ot71l0XGvFM0Yhr/NQ9MEdpgT19VC0xU2pafnkp6yGUqjH1b9xelvg+uMZEc
 vXb7h+XV0eY4Rdr+3wTnXIwIWnEslJ23YQxFvhZeWeF321wsR5kbBGvNf/aY6xQB
 rLkNX63RdpyYjHGcn3qwcYTkk8wAjCR719tFal8hRMLw+CCjzqcinb7P5FLi4hQv
 cUV/vDffiwyReg==
 =572c
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug exposed by a previous fix, where running guests with
   certain SMT topologies could crash the host on Power8.

 - Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is
   enabled.

Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider.

* tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/smp: do not decrement idle task preempt count in CPU offline
  powerpc/idle: Don't corrupt back chain when going idle
2021-10-21 15:30:09 -10:00
Nathan Lynch 787252a10d powerpc/smp: do not decrement idle task preempt count in CPU offline
With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:

BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
Call Trace:
 dump_stack_lvl+0xac/0x108
 __schedule_bug+0xac/0xe0
 __schedule+0xcf8/0x10d0
 schedule_idle+0x3c/0x70
 do_idle+0x2d8/0x4a0
 cpu_startup_entry+0x38/0x40
 start_secondary+0x2ec/0x3a0
 start_secondary_prolog+0x10/0x14

This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8279 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."

However, since commit 2c669ef697 ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a376ca ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.

The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.

Tested with pseries and powernv in qemu, and pseries on PowerVM.

Fixes: 2c669ef697 ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
2021-10-20 21:38:01 +11:00
Michael Ellerman 496c5fe25c powerpc/idle: Don't corrupt back chain when going idle
In isa206_idle_insn_mayloss() we store various registers into the stack
red zone, which is allowed.

However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again,
to 0(r1), which corrupts the stack back chain.

We used to do the same in isa206_idle_insn_mayloss() itself, but we
fixed that in 73287caa92 ("powerpc64/idle: Fix SP offsets when saving
GPRs"), however we missed that the macro also corrupts the back chain.

Corrupting the back chain is bad for debuggability but doesn't
necessarily cause a bug.

However we recently changed the stack handling in some KVM code, and it
now relies on the stack back chain being valid when it returns. The
corruption causes that code to return with r1 pointing somewhere in
kernel data, at some point LR is restored from the stack and we branch
to NULL or somewhere else invalid.

Only affects Power8 hosts running KVM guests, with dynamic_mt_modes
enabled (which it is by default).

The fixes tag below points to the commit that changed the KVM stack
handling, exposing this bug. The actual corruption of the back chain has
always existed since 948cf67c47 ("powerpc: Add NAP mode support on
Power7 in HV mode").

Fixes: 9b4416c509 ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au
2021-10-20 21:37:58 +11:00
Linus Torvalds be9eb2f00f powerpc fixes for 5.15 #4
Fix a bug where guests on P9 with interrupts passed through could get stuck in
 synchronize_irq().
 
 Fix a bug in KVM on P8 where secondary threads entering a guest would write outside their
 allocated stack.
 
 Fix a bug in KVM on P8 where secondary threads could confuse the host offline code and
 cause the guest or host to crash.
 
 Thanks to: Cédric Le Goater
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFsFZkTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBQND/0VSdTHegF+7kBqWnyP+Rc8aXwr/kPL
 xijmmeWS+BY0nkW7BNabRwVtOHZ4lKyDxPMwFiQ+tbpsmm4LX9sA9igV2yXdK73Y
 pRFkJBxljxXuO3eYa4peQwsvHnI7cR//fUr94b0OT6r0N39T5tKq5MZvWawk05y4
 RivpMOnEtDEd12ySJk5/fshwNKV5L7InI2FRKi3EU8sPNV3XA/z5JQ1CKuvXUUiK
 1HIwwxYqECcMM0dkIo7Qa7F93Wr0V8ZjsXLFV96jpBNL2aUfXVMaIBmITwa1Pk+i
 1581QmI4AJx3b3YAMqoujrgFJQKxaw2ZjiFfDcHGcTD4qCNVH2MNeyHM86Z7lFKL
 osYI7Ya3zzlwd4qhPsLV21mBdBp1xjpiBj1FiRrq4o+Me+O6eQnmG2abL8xUFI/3
 F7ab9qV8xjqfw/XlAXKtb7cco4SSQfi6XuLYSdTduCU5A1unLgpBw9F6H14YrQ5A
 8/uriGLIzhgklwDYgBJUGOjJzMCv0ey1+MJF4R4tEeIJ7+83WuAHCf2eAnxSVXLT
 Q2JaFt1LP8fdW1IPYYICjVgahREsWvTW4wK2lk5sO+BSoop6EbOVfz5TK52ePxLC
 II4ETYG3tIipLCkf2tqF/rU4De7d2SRg70lCNdEBVU9GpmiXec4uwPyX3uAwKdlf
 9qphXh0d15LAPg==
 =fPMr
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a bug where guests on P9 with interrupts passed through could get
   stuck in synchronize_irq().

 - Fix a bug in KVM on P8 where secondary threads entering a guest would
   write outside their allocated stack.

 - Fix a bug in KVM on P8 where secondary threads could confuse the host
   offline code and cause the guest or host to crash.

Thanks to Cédric Le Goater.

* tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
  KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
  powerpc/xive: Discard disabled interrupts in get_irqchip_state()
2021-10-17 18:01:32 -10:00
Michael Ellerman cdeb5d7d89 KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
We call idle_kvm_start_guest() from power7_offline() if the thread has
been requested to enter KVM. We pass it the SRR1 value that was returned
from power7_idle_insn() which tells us what sort of wakeup we're
processing.

Depending on the SRR1 value we pass in, the KVM code might enter the
guest, or it might return to us to do some host action if the wakeup
requires it.

If idle_kvm_start_guest() is able to handle the wakeup, and enter the
guest it is supposed to indicate that by returning a zero SRR1 value to
us.

That was the behaviour prior to commit 10d91611f4 ("powerpc/64s:
Reimplement book3s idle code in C"), however in that commit the
handling of SRR1 was reworked, and the zeroing behaviour was lost.

Returning from idle_kvm_start_guest() without zeroing the SRR1 value can
confuse the host offline code, causing the guest to crash and other
weirdness.

Fixes: 10d91611f4 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au
2021-10-16 00:40:03 +11:00