WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Nicholas Piggin	387e220a2e	powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com	2021-12-09 22:41:13 +11:00
Nicholas Piggin	c28573744b	powerpc/64s: Make hash MMU support configurable This adds Kconfig selection which allows 64s hash MMU support to be disabled. It can be disabled if radix support is enabled, the minimum supported CPU type is POWER9 (or higher), and KVM is not selected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com	2021-12-09 22:40:24 +11:00
Nicholas Piggin	debeda0171	powerpc/64s: Always define arch unmapped area calls To avoid any functional changes to radix paths when building with hash MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the arch get_unmapped_area calls on 64s platforms. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com	2021-12-09 22:40:24 +11:00
Nicholas Piggin	af3a0ea41c	powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clear There are a few places that require MMU_FTR_HPTE_TABLE to be set even when running in radix mode. Fix those up. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-15-npiggin@gmail.com	2021-12-09 22:40:24 +11:00
Nicholas Piggin	8dbfc0092b	powerpc/64e: remove mmu_linear_psize mmu_linear_psize is only set at boot once on 64e, is not necessarily the correct size of the linear map pages, and is never used anywhere. Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Retain the extern, so we can use IS_ENABLED() for related code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-14-npiggin@gmail.com	2021-12-09 22:39:39 +11:00
Nicholas Piggin	20626177c9	powerpc: make memremap_compat_align 64s-only memremap_compat_align is only relevant when ZONE_DEVICE is selected. ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected by PPC_BOOK3S_64. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-13-npiggin@gmail.com	2021-12-02 22:57:24 +11:00
Nicholas Piggin	f43d2ffb47	powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.c This file contains functions and data common to radix, so rename it to remove the hash_ prefix. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-11-npiggin@gmail.com	2021-12-02 22:57:23 +11:00
Nicholas Piggin	bdad5d57df	powerpc/64s: move page size definitions from hash specific file The radix code uses some of the psize variables. Move the common ones from hash_utils.c to pgtable.c. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-10-npiggin@gmail.com	2021-12-02 22:57:23 +11:00
Nicholas Piggin	162b0889bb	powerpc/64s: move THP trace point creation out of hash specific file In preparation for making hash MMU support configurable, move THP trace point function definitions out of an otherwise hash-specific file. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-8-npiggin@gmail.com	2021-12-02 22:57:23 +11:00
Nicholas Piggin	935b534c24	powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific slb.c is hash-specific SLB management, but do_bad_slb_fault deals with segment interrupts that occur with radix MMU as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com	2021-12-02 22:57:23 +11:00
Nicholas Piggin	a4135cbebd	powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE The pseries platform does not use the native hash code but the PAPR virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE. This requires moving tlbiel code from hash_native.c to hash_utils.c. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com	2021-12-02 22:57:23 +11:00
Nicholas Piggin	7ebc49031d	powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVE PPC_NATIVE now only controls the native HPT code, so rename it to be more descriptive. Restrict it to Book3S only. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-3-npiggin@gmail.com	2021-12-02 22:57:22 +11:00
Christophe Leroy	af11dee436	powerpc/32s: Fix shift-out-of-bounds in KASAN init ================================================================================ UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c [c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138 [c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c [c214bf30] [c1c0ed84] kasan_init+0xc0/0x198 [c214bf70] [c1c08024] setup_arch+0x18/0x54c [c214bfc0] [c1c037f0] start_kernel+0x90/0x33c [c214bff0] [00003610] 0x3610 ================================================================================ This happens when the directly mapped memory is a power of 2. Fix it by checking the shift and set the result to 0 when shift is -1 Fixes: `7974c47326` ("powerpc/32s: Implement dedicated kasan_init_region()") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169 Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu	2021-11-30 22:44:39 +11:00
Nicholas Piggin	5402e239d0	powerpc/64s: Get LPID bit width from device tree Allow the LPID bit width and partition table size to be set at runtime from the device tree. Move the PID bit width detection into the same place. KVM does not support using the extra bits yet, this is mainly required to get the PTCR register values correct (so KVM will run but it will not allocate > 4096 LPIDs). OPAL firmware provides this property for POWER10 CPUs since skiboot commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for POWER10 CPUs"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com	2021-11-30 22:27:07 +11:00
Christophe Leroy	cdc81aece8	powerpc/ptdump: Fix display a BAT's size unit We have wrong units on BAT's sizes (G instead of M, M instead of ...) ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 4G Kernel x m 1: 0xc0400000-0xc05fffff 0x00400000 2G Kernel x m 2: 0xc0600000-0xc06fffff 0x00600000 1G Kernel x m 3: 0xc0700000-0xc077ffff 0x00700000 512M Kernel x m 4: 0xc0780000-0xc079ffff 0x00780000 128M Kernel x m 5: 0xc07a0000-0xc07bffff 0x007a0000 128M Kernel x m 6: - 7: - This is because pt_dump_size() expects a size in Kbytes but bat_show_603() gives the size in bytes. To avoid risk of confusion, change pt_dump_size() to take bytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f16c30f5c9185a63335322cf1a8b22f189d335ef.1637922595.git.christophe.leroy@csgroup.eu	2021-11-29 22:49:29 +11:00
Michael Ellerman	ff47a95d1a	powerpc/mm: Move tlbcam_sz() and make it static Building with W=1 we see a warning: linux/arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for ‘tlbcam_sz’ tlbcam_sz() is not used outside this file, so we can make it static. However it's only used inside #ifdef CONFIG_PPC32, so move it within that ifdef, otherwise we would get a defined but not used error. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-4-mpe@ellerman.id.au	2021-11-29 22:49:20 +11:00
Michael Ellerman	af3fdce4ab	Revert "powerpc/code-patching: Improve verification of patchability" This reverts commit `8b8a8f0ab3`. As reported[1] by Sachin this causes problems with ftrace, and it also causes the code patching selftests to fail as reported[2] by Stephen. So revert it for now. 1: https://lore.kernel.org/linuxppc-dev/3668743C-09DF-4673-B15C-2FFE2A57F7D7@linux.vnet.ibm.com/ 2: https://lore.kernel.org/linuxppc-dev/20211126161747.1f7795b0@canb.auug.org.au/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2021-11-29 17:41:52 +11:00
Christophe Leroy	8b8a8f0ab3	powerpc/code-patching: Improve verification of patchability Today, patch_instruction() assumes that it is called exclusively on valid addresses, and only checks that it is not called on an init address after init section has been freed. Improve verification by calling kernel_text_address() instead. kernel_text_address() already includes a verification of initmem release. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bc683d499a411730504b132a924de0ccc2ef1f79.1636971137.git.christophe.leroy@csgroup.eu	2021-11-25 11:25:32 +11:00
Nicholas Piggin	46f9caf1a2	powerpc/64s: Keep AMOR SPR a constant ~0 at runtime This register controls supervisor SPR modifications, and as such is only relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never restores it coming back out to the host, so it can be kept constant and avoid the mtSPR in KVM guest entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-10-npiggin@gmail.com	2021-11-24 21:08:57 +11:00
Christophe Leroy	5b54860943	powerpc/book3e: Fix TLBCAM preset at boot Commit `52bda69ae8` ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is done") was supposed to just add an additional parameter to map_mem_in_cams() and always set it to 'true' at that time. But a few call sites were messed up. Fix them. Fixes: `52bda69ae8` ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is done") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d319f2a9367d4d08fd2154e506101bd5f100feeb.1636967119.git.christophe.leroy@csgroup.eu	2021-11-16 21:20:59 +11:00
Nicholas Piggin	302039466f	powerpc/pseries: Fix numa FORM2 parsing fallback code In case the FORM2 distance table from firmware is not the expected size, there is fallback code that just populates the lookup table as local vs remote. However it then continues on to use the distance table. Fix. Fixes: `1c6b5a7e74` ("powerpc/pseries: Add support for FORM2 associativity") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109064900.2041386-2-npiggin@gmail.com	2021-11-15 15:46:46 +11:00
Nicholas Piggin	0bd81274e3	powerpc/pseries: rename numa_dist_table to form2_distances The name of the local variable holding the "form2" property address conflicts with the numa_distance_table global. This patch does 's/numa_dist_table/form2_distances/g' over the function, which also renames numa_dist_table_length to form2_distances_length. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109064900.2041386-1-npiggin@gmail.com	2021-11-15 15:46:46 +11:00
Linus Torvalds	59a2ceeef6	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "87 patches. Subsystems affected by this patch series: mm (pagecache and hugetlb), procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs, init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork, sysvfs, kcov, gdb, resource, selftests, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits) ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL ipc: check checkpoint_restore_ns_capable() to modify C/R proc files selftests/kselftest/runner/run_one(): allow running non-executable files virtio-mem: disallow mapping virtio-mem memory via /dev/mem kernel/resource: disallow access to exclusive system RAM regions kernel/resource: clean up and optimize iomem_is_exclusive() scripts/gdb: handle split debug for vmlinux kcov: replace local_irq_save() with a local_lock_t kcov: avoid enable+disable interrupts if !in_task() kcov: allocate per-CPU memory on the relevant node Documentation/kcov: define `ip' in the example Documentation/kcov: include types.h in the example sysv: use BUILD_BUG_ON instead of runtime check kernel/fork.c: unshare(): use swap() to make code cleaner seq_file: fix passing wrong private data seq_file: move seq_escape() to a header signal: remove duplicate include in signal.h crash_dump: remove duplicate include in crash_dump.h crash_dump: fix boolreturn.cocci warning hfs/hfsplus: use WARN_ON for sanity check ...	2021-11-09 10:11:53 -08:00
Kefeng Wang	843a1ffaf6	powerpc/mm: use core_kernel_text() helper Use core_kernel_text() helper to simplify code, also drop etext, _stext, _sinittext, _einittext declaration which already declared in section.h. Link: https://lkml.kernel.org/r/20210930071143.63410-10-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-09 10:02:51 -08:00
Linus Torvalds	512b7931ad	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...	2021-11-06 14:08:17 -07:00
Zhenguo Yao	b5389086ad	hugetlbfs: extend the definition of hugepages parameter to support node allocation We can specify the number of hugepages to allocate at boot. But the hugepages is balanced in all nodes at present. In some scenarios, we only need hugepages in one node. For example: DPDK needs hugepages which are in the same node as NIC. If DPDK needs four hugepages of 1G size in node1 and system has 16 numa nodes we must reserve 64 hugepages on the kernel cmdline. But only four hugepages are used. The others should be free after boot. If the system memory is low(for example: 64G), it will be an impossible task. So extend the hugepages parameter to support specifying hugepages on a specific node. For example add following parameter: hugepagesz=1G hugepages=0:1,1:3 It will allocate 1 hugepage in node0 and 3 hugepages in node1. Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zhenguo Yao <yaozhenguo1@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-11-06 13:30:41 -07:00
Christophe Leroy	f8c0e36b48	powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC When ARCH_SUPPORTS_DEBUG_PAGEALLOC is not selected, the user can still select CONFIG_DEBUG_PAGEALLOC in which case __kernel_map_pages() is provided by mm/page_poison.c So only define __kernel_map_pages() when both CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC are defined. Fixes: `68b44f94d6` ("powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/971b69739ff4746252e711a9845210465c023a9e.1635425947.git.christophe.leroy@csgroup.eu	2021-10-29 11:10:43 +11:00
Christophe Leroy	44c14509b0	powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs Building tqm8541_defconfig results in: arch/powerpc/mm/nohash/fsl_book3e.c: In function 'settlbcam': arch/powerpc/mm/nohash/fsl_book3e.c:126:40: error: '_PAGE_BAP_SX' undeclared (first use in this function) 126 \| TLBCAM[index].MAS3 \|= (flags & _PAGE_BAP_SX) ? MAS3_SX : 0; \| ^~~~~~~~~~~~ arch/powerpc/mm/nohash/fsl_book3e.c:126:40: note: each undeclared identifier is reported only once for each function it appears in make[3]: * [scripts/Makefile.build:277: arch/powerpc/mm/nohash/fsl_book3e.o] Error 1 make[2]: * [scripts/Makefile.build:540: arch/powerpc/mm/nohash] Error 2 make[1]: * [scripts/Makefile.build:540: arch/powerpc/mm] Error 2 make: * [Makefile:1868: arch/powerpc] Error 2 This is because _PAGE_BAP_SX is not defined when using 32 bits PTE. Now that _PAGE_EXEC contains both _PAGE_BAP_SX and _PAGE_BAP_UX, it can be used instead. Fixes: `01116e6e98` ("powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/91a0235e7f2a85308b84aa5b9efd8d022e2b899a.1635226743.git.christophe.leroy@csgroup.eu	2021-10-28 00:41:29 +11:00
Christophe Leroy	b6cb20fdc2	powerpc/book3e: Fix set_memory_x() and set_memory_nx() set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC. set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC. Book3e has 2 bits, UX and SX, which defines the exec rights resp. for user (PR=1) and for kernel (PR=0). _PAGE_EXEC is defined as UX only. An executable kernel page is set with either _PAGE_KERNEL_RWX or _PAGE_KERNEL_ROX, which both have SX set and UX cleared. So set_memory_nx() call for an executable kernel page does nothing because UX is already cleared. And set_memory_x() on a non-executable kernel page makes it executable for the user and keeps it non-executable for kernel. Also, pte_exec() always returns 'false' on kernel pages, because it checks _PAGE_EXEC which doesn't include SX, so for instance the W+X check doesn't work. To fix this: - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns true whenever one of the two bits is set and pte_exprotect() clears both bits. - Define a book3e specific version of pte_mkexec() which sets either SX or UX based on UR. Fixes: `1f9ad21c3b` ("powerpc/mm: Implement set_memory() routines") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu	2021-10-28 00:41:29 +11:00
Christophe Leroy	c7d19189d7	powerpc/32: Don't use a struct based type for pte_t Long time ago we had a config item called STRICT_MM_TYPECHECKS to build the kernel with pte_t defined as a structure in order to perform additional build checks or build it with pte_t defined as a simple type in order to get simpler generated code. Commit `670eea9241` ("powerpc/mm: Always use STRICT_MM_TYPECHECKS") made the struct based definition the only one, considering that the generated code was similar in both cases. That's right on ppc64 because the ABI is such that the content of a struct having a single simple type element is passed as register, but on ppc32 such a structure is passed via the stack like any structure. Simple test function: pte_t test(pte_t pte) { return pte; } Before this patch we get c00108ec <test>: c00108ec: 81 24 00 00 lwz r9,0(r4) c00108f0: 91 23 00 00 stw r9,0(r3) c00108f4: 4e 80 00 20 blr So, for PPC32, restore the simple type behaviour we got before commit `670eea9241`, but instead of adding a config option to activate type check, do it when __CHECKER__ is set so that type checking is performed by 'sparse' and provides feedback like: arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types) arch/powerpc/mm/pgtable.c:466:16: expected unsigned long arch/powerpc/mm/pgtable.c:466:16: got struct pte_t [usertype] x With this patch we now get c0010890 <test>: c0010890: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Define STRICT_MM_TYPECHECKS rather than repeating the condition] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:06 +11:00
Christophe Leroy	63f501e07a	powerpc/8xx: Simplify TLB handling In the old days, TLB handling for 8xx was using tlbie and tlbia instructions directly as much as possible. But commit `f048aace29` ("powerpc/mm: Add SMP support to no-hash TLB handling") broke that by introducing out-of-line unnecessary complex functions for booke/smp which don't have tlbie/tlbia instructions and require more complex handling. Restore direct use of tlbie and tlbia for 8xx which is never SMP. With this patch we now get c00ecc68 <ptep_clear_flush>: c00ecc68: 39 00 00 00 li r8,0 c00ecc6c: 81 46 00 00 lwz r10,0(r6) c00ecc70: 91 06 00 00 stw r8,0(r6) c00ecc74: 7c 00 2a 64 tlbie r5,r0 c00ecc78: 7c 00 04 ac hwsync c00ecc7c: 91 43 00 00 stw r10,0(r3) c00ecc80: 4e 80 00 20 blr Before it was c0012880 <local_flush_tlb_page>: c0012880: 2c 03 00 00 cmpwi r3,0 c0012884: 41 82 00 54 beq c00128d8 <local_flush_tlb_page+0x58> c0012888: 81 22 00 00 lwz r9,0(r2) c001288c: 81 43 00 20 lwz r10,32(r3) c0012890: 39 29 00 01 addi r9,r9,1 c0012894: 91 22 00 00 stw r9,0(r2) c0012898: 2c 0a 00 00 cmpwi r10,0 c001289c: 41 82 00 10 beq c00128ac <local_flush_tlb_page+0x2c> c00128a0: 81 2a 01 dc lwz r9,476(r10) c00128a4: 2c 09 ff ff cmpwi r9,-1 c00128a8: 41 82 00 0c beq c00128b4 <local_flush_tlb_page+0x34> c00128ac: 7c 00 22 64 tlbie r4,r0 c00128b0: 7c 00 04 ac hwsync c00128b4: 81 22 00 00 lwz r9,0(r2) c00128b8: 39 29 ff ff addi r9,r9,-1 c00128bc: 2c 09 00 00 cmpwi r9,0 c00128c0: 91 22 00 00 stw r9,0(r2) c00128c4: 4c a2 00 20 bclr+ 4,eq c00128c8: 81 22 00 70 lwz r9,112(r2) c00128cc: 71 29 00 04 andi. r9,r9,4 c00128d0: 4d 82 00 20 beqlr c00128d4: 48 65 76 74 b c0669f48 <preempt_schedule> c00128d8: 81 22 00 00 lwz r9,0(r2) c00128dc: 39 29 00 01 addi r9,r9,1 c00128e0: 91 22 00 00 stw r9,0(r2) c00128e4: 4b ff ff c8 b c00128ac <local_flush_tlb_page+0x2c> ... c00ecdc8 <ptep_clear_flush>: c00ecdc8: 94 21 ff f0 stwu r1,-16(r1) c00ecdcc: 39 20 00 00 li r9,0 c00ecdd0: 93 c1 00 08 stw r30,8(r1) c00ecdd4: 83 c6 00 00 lwz r30,0(r6) c00ecdd8: 91 26 00 00 stw r9,0(r6) c00ecddc: 93 e1 00 0c stw r31,12(r1) c00ecde0: 7c 08 02 a6 mflr r0 c00ecde4: 7c 7f 1b 78 mr r31,r3 c00ecde8: 7c 83 23 78 mr r3,r4 c00ecdec: 7c a4 2b 78 mr r4,r5 c00ecdf0: 90 01 00 14 stw r0,20(r1) c00ecdf4: 4b f2 5a 8d bl c0012880 <local_flush_tlb_page> c00ecdf8: 93 df 00 00 stw r30,0(r31) c00ecdfc: 7f e3 fb 78 mr r3,r31 c00ece00: 80 01 00 14 lwz r0,20(r1) c00ece04: 83 c1 00 08 lwz r30,8(r1) c00ece08: 83 e1 00 0c lwz r31,12(r1) c00ece0c: 7c 08 03 a6 mtlr r0 c00ece10: 38 21 00 10 addi r1,r1,16 c00ece14: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fb324f1c8f2ddb57cf6aad1cea26329558f1c1c0.1631887021.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:05 +11:00
Christophe Leroy	d5970045cf	powerpc/fsl_booke: Update of TLBCAMs after init After init, set readonly memory as ROX and set readwrite memory as RWX, if STRICT_KERNEL_RWX is enabled. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/66bef0b9c273e1121706883f3cf5ad0a053d863f.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	0b2859a743	powerpc/fsl_booke: Allocate separate TLBCAMs for readonly memory Reorganise TLBCAM allocation so that when STRICT_KERNEL_RWX is enabled, TLBCAMs are allocated such that readonly memory uses different TLBCAMs. This results in an allocation looking like: Memory CAM mapping: 4/4/4/1/1/1/1/16/16/16/64/64/64/256/256 Mb, residual: 256Mb Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ca169bc288261a0e0558712f979023c3a960ebb.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	52bda69ae8	powerpc/fsl_booke: Tell map_mem_in_cams() if init is done In order to be able to call map_mem_in_cams() once more after init for STRICT_KERNEL_RWX, add an argument. For now, map_mem_in_cams() is always called only during init. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3b69a7e0b393b16984ade882a5eae5d727717459.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	a97dd9e2f7	powerpc/fsl_booke: Enable reloading of TLBCAM without switching to AS1 Avoid switching to AS1 when reloading TLBCAM after init for STRICT_KERNEL_RWX. When we setup AS1 we expect the entire accessible memory to be mapped through one entry, this is not the case anymore at the end of init. We are not changing the size of TLBCAMs, only flags, so no need to switch to AS1. So change loadcam_multi() to not switch to AS1 when the given temporary tlb entry in 0. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a9d517fbfbc940f56103c46b323f6eb8f4485571.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	01116e6e98	powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs Don't force MAS3_SX and MAS3_UX at all time. Take into account the exec flag. While at it, fix a couple of closeby style problems (indent with space and unnecessary parenthesis), it keeps more readability. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5467044e59f27f9fcf709b9661779e3ce5f784f6.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	3a75fd709c	powerpc/fsl_booke: Rename fsl_booke.c to fsl_book3e.c We have a myriad of CONFIG symbols around different variants of BOOKEs, which would be worth tidying up one day. But at least, make file names and CONFIG option match: We have CONFIG_FSL_BOOKE and CONFIG_PPC_FSL_BOOK3E. fsl_booke.c is selected by and only by CONFIG_PPC_FSL_BOOK3E. So rename it fsl_book3e to reduce confusion. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5dc871db1f67739319bec11f049ca450da1c13a2.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:02 +11:00
Joel Stanley	4f703e7faa	powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOC The page_alloc.c code will call into __kernel_map_pages() when DEBUG_PAGEALLOC is configured and enabled. As the implementation assumes hash, this should crash spectacularly if not for a bit of luck in __kernel_map_pages(). In this function linear_map_hash_count is always zero, the for loop exits without doing any damage. There are no other platforms that determine if they support debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to do that, this change turns the map/unmap into a noop when in radix mode and prints a warning once. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Reformat if per Christophe's suggestion] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211013213438.675095-1-joel@jms.id.au	2021-10-22 15:22:02 +11:00
Christophe Leroy	602946ec2f	powerpc: Set max_mapnr correctly max_mapnr is used by virt_addr_valid() to check if a linear address is valid. It must only include lowmem PFNs, like other architectures. Problem detected on a system with 1G mem (Only 768M are mapped), with CONFIG_DEBUG_VIRTUAL and CONFIG_TEST_DEBUG_VIRTUAL, it didn't report virt_to_phys(VMALLOC_START), VMALLOC_START being 0xf1000000. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/77d99037782ac4b3c3b0124fc4ae80ce7b760b05.1634035228.git.christophe.leroy@csgroup.eu	2021-10-13 13:28:22 +11:00
Christophe Leroy	7eff9bc00d	powerpc/mem: Fix arch/powerpc/mm/mem.c:53:12: error: no previous prototype for 'create_section_mapping' Commit `8e11d62e2e` ("powerpc/mem: Add back missing header to fix 'no previous prototype' error") was supposed to fix the problem, but in the meantime commit `a927bd6ba9` ("mm: fix phys_to_target_node() and* memory_add_physaddr_to_nid() exports") moved create_section_mapping() prototype from asm/sparsemem.h to asm/mmzone.h Fixes: `8e11d62e2e` ("powerpc/mem: Add back missing header to fix 'no previous prototype' error") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/025754fde3d027904ae9d0191f395890bec93369.1631541649.git.christophe.leroy@csgroup.eu	2021-10-09 00:15:59 +11:00
Linus Torvalds	2d338201d5	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "147 patches, based on `7d2a07b769`. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...	2021-09-08 12:55:35 -07:00
David Hildenbrand	65a2aa5f48	mm/memory_hotplug: remove nid parameter from arch_remove_memory() The parameter is unused, let's remove it. Link: https://lkml.kernel.org/r/20210712124052.26491-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Baoquan He <bhe@redhat.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Sergei Trofimovich <slyfox@gentoo.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Jia He <justin.he@arm.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-09-08 11:50:23 -07:00
Linus Torvalds	7cca308cfd	powerpc updates for 5.15 - Convert pseries & powernv to use MSI IRQ domains. - Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are given a CPU number on the same node as previously, when possible. - Add support for a new more flexible device-tree format for specifying NUMA distances. - Convert powerpc to GENERIC_PTDUMP. - Retire sbc8548 and sbc8641d board support. - Various other small features and fixes. Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R. Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, Zheng Yongjun. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEyHTYTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDo3D/9aXMVP2wsEMNB0XhTiJ1UUdi311Uq9 PvkAaGZH14ZqZLVigeiD3gt6YzTH0cEuGj6qgwsJrPDjF8FESnMbBsprMLr5/qE1 itWRGMAMCFaeTcB9ogYVJkzwg6RN2ZgIqoq4NVswNSXoAQGWb+1bvXq3RnXXNuGR TQmLL02poNC6nX0YbRaQoT1Xx4nfUTiKHhU+Aok9uOCMJIyYZVATR6Qafb7/j7tO UvjwOHztbu84lcJOGmSnw4LcmwNORLuP9IwR0r+O1M3ijEZqDo9TPkvtSz8HZwjU mxdJwhrUmN0euMcghuiFxW+1XG2eM49ugsdJugiezG2RaIijbIp0nAIvdeaKAgT1 OSSwvWCQ0fkTPyLXE+O6tVqMhlUMdqQlRcyNwmN9svIip9VnwGNq3vA4ePlJm6Fi i0i/tLqVNlJwFokZ7blW5g8SRgGRuFfXd5XUYLFvy5Teez+/7b1mW95gPQZSJ8kV Tbx2e0nHAPX4hCAxJ1AB3/zTlnjY+4+WJ9bD5XdgXkeVE8PPh1BEkulhMi1R1OMj 57D1W6OgsBu/Pze78wjAvwO8+NAb1T/2mv2Bd/LY6Q+7hNDqOOhuajyBTxbH41FG sqx5bKjKOwgTybfV9A0Eo0e4FQBX07yXltBFHaPlyA4sOsIhM59+PxNrEwN1eZrQ LVVsdBXg8pHxrw== =EbN0 -----END PGP SIGNATURE----- Merge tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Convert pseries & powernv to use MSI IRQ domains. - Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are given a CPU number on the same node as previously, when possible. - Add support for a new more flexible device-tree format for specifying NUMA distances. - Convert powerpc to GENERIC_PTDUMP. - Retire sbc8548 and sbc8641d board support. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R. Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, and Zheng Yongjun. * tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits) powerpc/bug: Cast to unsigned long before passing to inline asm powerpc/ptdump: Fix generic ptdump for 64-bit KVM: PPC: Fix clearing never mapped TCEs in realmode powerpc/pseries/iommu: Rename "direct window" to "dma window" powerpc/pseries/iommu: Make use of DDW for indirect mapping powerpc/pseries/iommu: Find existing DDW with given property name powerpc/pseries/iommu: Update remove_dma_window() to accept property name powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() powerpc/pseries/iommu: Allow DDW windows starting at 0x00 powerpc/pseries/iommu: Add ddw_list_new_entry() helper powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper powerpc/kernel/iommu: Add new iommu_table_in_use() helper powerpc/pseries/iommu: Replace hard-coded page shift powerpc/numa: Update cpu_cpu_map on CPU online/offline powerpc/numa: Print debug statements only when required powerpc/numa: convert printk to pr_xxx powerpc/numa: Drop dbg in favour of pr_debug powerpc/smp: Enable CACHE domain for shared processor powerpc/smp: Update cpu_core_map on all PowerPc systems ...	2021-09-03 11:22:50 -07:00
Michael Ellerman	a3314262ee	Merge branch 'fixes' into next Merge our fixes branch into next. That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c. Between `cbc06f051c` ("powerpc/xive: Do not skip CPU-less nodes when creating the IPIs"), which moved request_irq() out of xive_init_ipis(), and `17df41fec5` ("powerpc: use IRQF_NO_DEBUG for IPIs") which added IRQF_NO_DEBUG to that request_irq() call, which has now moved.	2021-09-03 22:54:12 +10:00
Michael Ellerman	b14b8b1ed0	powerpc/ptdump: Fix generic ptdump for 64-bit Since the conversion to generic ptdump we see crashes on 64-bit: BUG: Unable to handle kernel data access on read at 0xc0eeff7f00000000 Faulting instruction address: 0xc00000000045e5fc Oops: Kernel access of bad area, sig: 11 [#1] ... NIP __walk_page_range+0x2bc/0xce0 LR __walk_page_range+0x240/0xce0 Call Trace: __walk_page_range+0x240/0xce0 (unreliable) walk_page_range_novma+0x74/0xb0 ptdump_walk_pgd+0x98/0x170 ptdump_check_wx+0x88/0xd0 mark_rodata_ro+0x48/0x80 kernel_init+0x74/0x1a0 ret_from_kernel_thread+0x5c/0x64 What's happening is that have walked off the end of the kernel page tables, and started dereferencing junk values. That happens because we initialised the ptdump_range to span all the way up to 0xffffffffffffffff: static struct ptdump_range ptdump_range[] __ro_after_init = { {TASK_SIZE_MAX, ~0UL}, But the kernel page tables don't span that far. So on 64-bit set the end of the range to be the address immediately past the end of the kernel page tables, to limit the page table walk to valid addresses. Fixes: `e084728393` ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210831135151.886620-1-mpe@ellerman.id.au	2021-09-01 16:52:53 +10:00
Srikar Dronamraju	9a245d0e1f	powerpc/numa: Update cpu_cpu_map on CPU online/offline cpu_cpu_map holds all the CPUs in the DIE. However in PowerPC, when onlining/offlining of CPUs, this mask doesn't get updated. This mask is however updated when CPUs are added/removed. So when both operations like online/offline of CPUs and adding/removing of CPUs are done simultaneously, then cpumaps end up broken. WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898 build_sched_domains+0xd48/0x1720 Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28 Workqueue: events cpuset_hotplug_workfn NIP: c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec REGS: c00000005596f220 TRAP: 0700 Not tainted (5.13.0-rc6+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48828222 XER: 00000009 CFAR: c0000000001ea698 IRQMASK: 0 GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036 GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90 GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8 GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800 GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18 GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400 GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060 GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720 LR [c0000000001caac4] build_sched_domains+0xd44/0x1720 Call Trace: [c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable) [c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0 [c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0 [c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70 [c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10 [c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590 [c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620 [c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0 [c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120 7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000 Fix this by updating cpu_cpu_map aka cpumask_of_node() on every CPU online/offline. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-5-srikar@linux.vnet.ibm.com	2021-08-27 00:56:54 +10:00
Srikar Dronamraju	544a09ee74	powerpc/numa: Print debug statements only when required Currently, a debug message gets printed every time an attempt to add(remove) a CPU. However this is redundant if the CPU is already added (removed) from the node. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-4-srikar@linux.vnet.ibm.com	2021-08-27 00:56:54 +10:00
Srikar Dronamraju	506c2075ff	powerpc/numa: convert printk to pr_xxx Convert the remaining printk to pr_xxx One advantage would be all prints will now have prefix "numa:" from pr_fmt(). [ convert printk(KERN_ERR) to pr_warn : Suggested by Laurent Dufour ] Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> [mpe: Rebase onto powerpc/next, s/WARNING/Warning/] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-3-srikar@linux.vnet.ibm.com	2021-08-27 00:56:54 +10:00
Srikar Dronamraju	544af64297	powerpc/numa: Drop dbg in favour of pr_debug powerpc supported numa=debug which is not documented. This option was used to print early debug output. However something more flexible can be achieved by using CONFIG_DYNAMIC_DEBUG. Hence drop dbg (and numa=debug) in favour of pr_debug Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> [mpe: Rebase on to powerpc/next form2 affinity changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210826100521.412639-2-srikar@linux.vnet.ibm.com	2021-08-27 00:56:53 +10:00
Christophe Leroy	806c0e6e7e	powerpc: Refactor verification of MSR_RI 40x and BOOKE don't have MSR_RI therefore all tests involving MSR_RI may be problematic on those plateforms. Create helpers to check or set MSR_RI in regs, and use them in common code. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c2fb93708196734f4176dda334aaa3055f213b89.1629707037.git.christophe.leroy@csgroup.eu	2021-08-26 21:21:07 +10:00
Christophe Leroy	e084728393	powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP This patch converts powerpc to the generic PTDUMP implementation. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/03166d569526be70214fe9370a7bad219d2f41c8.1625762907.git.christophe.leroy@csgroup.eu	2021-08-25 13:35:48 +10:00
Christophe Leroy	cf98d2b6ee	powerpc/ptdump: Reduce level numbers by 1 in note_page() and add p4d level Do the same as commit `f8f0d0b6fa` ("mm: ptdump: reduce level numbers by 1 in note_page()") and add missing p4d level. This will align powerpc to the users of generic ptdump. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d76495c574132b197b445a1f133755cca4b912a4.1625762906.git.christophe.leroy@csgroup.eu	2021-08-25 13:35:48 +10:00
Christophe Leroy	64b87b0c70	powerpc/ptdump: Remove unused 'page_size' parameter note_page_update_state() doesn't use page_size. Remove it. Could also be removed to note_page() but as a following patch will remove all current users of note_page(), just leave it as is for now. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e2f80d052001155251bfe009c360d0c5d9242c6b.1625762906.git.christophe.leroy@csgroup.eu	2021-08-25 13:35:48 +10:00
Christophe Leroy	11f27a7fa4	powerpc/ptdump: Use DEFINE_SHOW_ATTRIBUTE() Use DEFINE_SHOW_ATTRIBUTE() instead of open coding open() and fops. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b864a92693ca8413ef0b19f0c12065c212899b6e.1625762905.git.christophe.leroy@csgroup.eu	2021-08-25 13:35:48 +10:00
Christophe Leroy	f5007dbf4d	powerpc/booke: Avoid link stack corruption in several places Use bcl 20,31,+4 instead of bl in order to preserve link stack. See commit `c974809a26` ("powerpc/vdso: Avoid link stack corruption in __get_datapage()") for details. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e9fbc285eceb720e6c0e032ef47fe8b05f669b48.1629791751.git.christophe.leroy@csgroup.eu	2021-08-25 13:35:47 +10:00
Linus Torvalds	1bdc3d5be7	powerpc fixes for 5.14 #6 - Fix random crashes on some 32-bit CPUs by adding isync() after locking/unlocking KUEP - Fix intermittent crashes when loading modules with strict module RWX - Fix a section mismatch introduce by a previous fix. Thanks to: Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo Opsfelder Araújo, Nathan Chancellor, Stan Johnson. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEhjVUTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM/VD/4o5D9d2Xppt+T0zdnKomq+3ffkC33/ zBK4vqOVOXbnlRpChqIqsHB3LNxMNMvTVaoLvxgy3ZQ57+rnirSDaFOaj4Nbazdx STwWmyxW9xPshqvj8tz8uHadSkvbrCClFy59FXtJf4H/iztTnQORKnXI9r3wxXS+ wBhw8Nhquuqg4O5h4q6yLLRIAaskus7uymDzYHVZkHO0RhfPLEZJwfCxydc29ukK wIRB6qojFBbWm/UscY1w6FiYrBn4Y5F3DzoTzJ7xlO6l1NYaE+58aun/oTGU7922 /8fXYs34TnkF6sA9qGhOtOc1MXfH8meFoH9s/fY3Z3O88xTe8k15wo2Ujlk/u0X1 1Gzv9FZI0RnpPtSLPiiu72/zS/vFOxAVCFMTvcodlte9RN90fW5Qwt/O1ya22vWt Ea3O9iNmYgQ+lV7ZZYDtKQ22WHIublg6cY5d3NDyj5HrzN/vGyp3QJFb2dnWoEpx k/KkK16oiIlduLGiFoYjn1ELyHUBTvp483y7zmspA4fCb0ue6W8b2zt8FszH0hI7 N4uroGXuk9OyhNsLWR8UHUR0s6Gi0XSaQ0O4XgWfoDAAvdev4oZiCqw0q5552OvX eE/Ogxc7INCiaoeLwOhYhCKjr+jBP8fhqyQzquyqMgUqEbxLtcFZCJ09bpXHjjiH OlAvZwlzOhwcKg== =K2B0 -----END PGP SIGNATURE----- Merge tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix random crashes on some 32-bit CPUs by adding isync() after locking/unlocking KUEP - Fix intermittent crashes when loading modules with strict module RWX - Fix a section mismatch introduce by a previous fix. Thanks to Christophe Leroy, Fabiano Rosas, Laurent Vivier, Murilo Opsfelder Araújo, Nathan Chancellor, and Stan Johnson. h# -----BEGIN PGP SIGNATURE----- * tag 'powerpc-5.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fix set_memory_*() against concurrent accesses powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP powerpc/xive: Do not mark xive_request_ipi() as __init	2021-08-22 09:49:31 -07:00
Michael Ellerman	9f7853d760	powerpc/mm: Fix set_memory_*() against concurrent accesses Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes on one of his systems: kernel tried to execute exec-protected page (c008000004073278) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc008000004073278 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ... CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12 Workqueue: events control_work_handler [virtio_console] NIP: c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0 REGS: c00000002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+) MSR: 800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 200400cf ... NIP fill_queue+0xf0/0x210 [virtio_console] LR fill_queue+0xf0/0x210 [virtio_console] Call Trace: fill_queue+0xb4/0x210 [virtio_console] (unreliable) add_port+0x1a8/0x470 [virtio_console] control_work_handler+0xbc/0x1e8 [virtio_console] process_one_work+0x290/0x590 worker_thread+0x88/0x620 kthread+0x194/0x1a0 ret_from_kernel_thread+0x5c/0x64 Jordan, Fabiano & Murilo were able to reproduce and identify that the problem is caused by the call to module_enable_ro() in do_init_module(), which happens after the module's init function has already been called. Our current implementation of change_page_attr() is not safe against concurrent accesses, because it invalidates the PTE before flushing the TLB and then installing the new PTE. That leaves a window in time where there is no valid PTE for the page, if another CPU tries to access the page at that time we see something like the fault above. We can't simply switch to set_pte_at()/flush TLB, because our hash MMU code doesn't handle a set_pte_at() of a valid PTE. See [1]. But we do have pte_update(), which replaces the old PTE with the new, meaning there's no window where the PTE is invalid. And the hash MMU version hash__pte_update() deals with synchronising the hash page table correctly. [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r.fsf@linux.ibm.com/ Fixes: `1f9ad21c3b` ("powerpc/mm: Implement set_memory() routines") Reported-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Murilo Opsfelder Araújo <muriloo@linux.ibm.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210818120518.3603172-1-mpe@ellerman.id.au	2021-08-19 09:41:54 +10:00
Aneesh Kumar K.V	1c6b5a7e74	powerpc/pseries: Add support for FORM2 associativity PAPR interface currently supports two different ways of communicating resource grouping details to the OS. These are referred to as Form 0 and Form 1 associativity grouping. Form 0 is the older format and is now considered deprecated. This patch adds another resource grouping named FORM2. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:27 +10:00
Aneesh Kumar K.V	ef31cb83d1	powerpc/pseries: Add a helper for form1 cpu distance This helper is only used with the dispatch trace log collection. A later patch will add Form2 affinity support and this change helps in keeping that simpler. Also add a comment explaining we don't expect the code to be called with FORM0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:27 +10:00
Aneesh Kumar K.V	8ddc6448ec	powerpc/pseries: Consolidate different NUMA distance update code paths The associativity details of the newly added resourced are collected from the hypervisor via "ibm,configure-connector" rtas call. Update the numa distance details of the newly added numa node after the above call. Instead of updating NUMA distance every time we lookup a node id from the associativity property, add helpers that can be used during boot which does this only once. Also remove the distance update from node id lookup helpers. Currently, we duplicate parsing code for ibm,associativity and ibm,associativity-lookup-arrays in the kernel. The associativity array provided by these device tree properties are very similar and hence can use a helper to parse the node id and numa distance details. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V	0eacd06bb8	powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY Also make related code cleanup that will allow adding FORM2_AFFINITY in later patches. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V	7e35ef662c	powerpc/pseries: rename min_common_depth to primary_domain_index No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-2-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V	dbf77fed8b	powerpc: rename powerpc_debugfs_root to arch_debugfs_dir No functional change in this patch. arch_debugfs_dir is the generic kernel name declared in linux/debugfs.h for arch-specific debugfs directory. Architectures like x86/s390 already use the name. Rename powerpc specific powerpc_debugfs_root to arch_debugfs_dir. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V	3e188b1ae8	powerpc/book3s64/radix: make tlb_single_page_flush_ceiling a debugfs entry Similar to x86/s390 add a debugfs file to tune tlb_single_page_flush_ceiling. Also add a debugfs entry for tlb_local_single_page_flush_ceiling. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132831.233794-1-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:26 +10:00
Laurent Dufour	d144f4d5a8	pseries/drmem: update LMBs after LPM After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be updated by the hypervisor in the case the NUMA topology of the LPAR's memory is updated. This is handled by the kernel, but the memory's node is not updated because there is no way to move a memory block between nodes from the Linux kernel point of view. If later a memory block is added or removed, drmem_update_dt() is called and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to match the added or removed LMB. But the LMB's associativity node has not been updated after the DT node update and thus the node is overwritten by the Linux's topology instead of the hypervisor one. Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is updated to force an update of the LMB's associativity. However, ignore the call to that hook when the update has been triggered by drmem_update_dt(). Because, in that case, the LMB tree has been used to set the DT property and thus it doesn't need to be updated back. Since drmem_update_dt() is called under the protection of the device_hotplug_lock and the hook is called in the same context, use a simple boolean variable to detect that call. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com	2021-08-10 23:14:55 +10:00
Laurent Dufour	9c7248bb8d	powerpc/numa: Consider the max NUMA node for migratable LPAR When a LPAR is migratable, we should consider the maximum possible NUMA node instead of the number of NUMA nodes from the actual system. The DT property 'ibm,current-associativity-domains' defines the maximum number of nodes the LPAR can see when running on that box. But if the LPAR is being migrated on another box, it may see up to the nodes defined by 'ibm,max-associativity-domains'. So if a LPAR is migratable, that value should be used. Unfortunately, there is no easy way to know if an LPAR is migratable or not. The hypervisor exports the property 'ibm,migratable-partition' in the case it set to migrate partition, but that would not mean that the current partition is migratable. Without this patch, when a LPAR is started on a 2 node box and then migrated to a 3 node box, the hypervisor may spread the LPAR's CPUs on the 3rd node. In that case if a CPU from that 3rd node is added to the LPAR, it will be wrongly assigned to the node because the kernel has been set to use up to 2 nodes (the configuration of the departure node). With this patch applies, the CPU is correctly added to the 3rd node. Fixes: `f9f130ff2e` ("powerpc/numa: Detect support for coregroup") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210511073136.17795-1-ldufour@linux.ibm.com	2021-08-10 23:14:55 +10:00
Hari Bathini	8119cefd9a	powerpc/kexec: blacklist functions called in real mode for kprobe As kprobe does not handle events happening in real mode, blacklist the functions that only get called in real mode or in kexec sequence with MMU turned off. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/162626687834.155313.4692863392927831843.stgit@hbathini-workstation.ibm.com	2021-07-26 20:38:51 +10:00
Jonathan Marek	d8a719059b	Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge" This reverts commit `c742199a01`. `c742199a01` ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge") breaks arm64 in at least two ways for configurations where PUD or PMD folding occur: 1. We no longer install huge-vmap mappings and silently fall back to page-granular entries, despite being able to install block entries at what is effectively the PGD level. 2. If the linear map is backed with block mappings, these will now silently fail to be created in alloc_init_pud(), causing a panic early during boot. The pgtable selftests caught this, although a fix has not been forthcoming and Christophe is AWOL at the moment, so just revert the change for now to get a working -rc3 on which we can queue patches for 5.15. A simple revert breaks the build for 32-bit PowerPC 8xx machines, which rely on the default function definitions when the corresponding page-table levels are folded, since commit `a6a8f7c4aa` ("powerpc/8xx: add support for huge pages on VMAP and VMALLOC"), eg: powerpc64-linux-ld: mm/vmalloc.o: in function `vunmap_pud_range': linux/mm/vmalloc.c:362: undefined reference to `pud_clear_huge' To avoid that, add stubs for pud_clear_huge() and pmd_clear_huge() in arch/powerpc/mm/nohash/8xx.c as suggested by Christophe. Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: `c742199a01` ("mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge") Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> [mpe: Fold in 8xx.c changes from Christophe and mention in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/linux-arm-kernel/CAMuHMdXShORDox-xxaeUfDW3wx2PeggFSqhVSHVZNKCGK-y_vQ@mail.gmail.com/ Link: https://lore.kernel.org/r/20210717160118.9855-1-jonathan@marek.ca Link: https://lore.kernel.org/r/87r1fs1762.fsf@mpe.ellerman.id.au Signed-off-by: Will Deacon <will@kernel.org>	2021-07-21 11:28:09 +01:00
Linus Torvalds	1459718d7d	powerpc fixes for 5.14 #2 Fix crashes on 64-bit Book3E due to use of Book3S only mtmsrd instruction. Fix "scheduling while atomic" warnings at boot due to preempt count underflow. Two commits fixing our handling of BPF atomic instructions. Fix error handling in xive when allocating an IPI. Fix lockup on kernel exec fault on 603. Thanks to: Bharata B Rao, Cédric Le Goater, Christian Zigotzky, Christophe Leroy, Guenter Roeck, Jiri Olsa, Naveen N. Rao, Nicholas Piggin, Valentin Schneider. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDoT7cTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFbSEACa703VD0R/Zx7/MASoc0nfLkoyDvOS 1dQ7isuNLLwTgFCymS36w5cyUxpQJUepUFLjZajJ5rHgGM60i78trZgrxKp5o+us r/F530QI2RZdko2rrtoSVUe6Oj4z0l2lYtBHa7UybKIRG7R/po/DmYYe1/Wmq7Bc fu/R3C7m5/HY63E2mzdCPFHfNDPZavScXyPUpfgEka7PGDJsTCZqIOzeGt9oqm6f lEysuIvBNvq5NVvUOaBiW4dlbkxckVANvj43kjeX+c0YnJ7MTW/xCYl4AuaAxL2T Kc23mWgTj2ONrThbhrB5Bq2uf9bMA+4cJKTRfWiGslxLyhXP59jBnJvn6H5oCPf/ pr/iLjA8bBS7HnaeAEjC74IIs8SDnhkWjW9ec3Da2Z4ihl8W15Bkf0+g3GOMdj3J hrphnhAayr4NKuXgkI/SfoFrAiH+V00LabPA1IFm5Zgvs5DSX/Lygtcf/kNcCZeQ jI0jgy5HjmVhTyySw+htVicdW8xJ/rvXqJDguLkxiNEZN0PF4UQSPUX+ARBnMUDT Bn/RSGWrgMR5VI1+kpYephhv/PFmF6dKUFpSFrBqt/sZ7rVj4jJCZpNdLDmkaIYi v1H4eNN3p85KeApa4PJXRuXrxi5F5XRkGTlk2Sy51ClyFYT6WlzKUcOhMlG4HuzO oro2jl2LF9steg== =q3Ag -----END PGP SIGNATURE----- Merge tag 'powerpc-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix crashes on 64-bit Book3E due to use of Book3S only mtmsrd instruction. Fix "scheduling while atomic" warnings at boot due to preempt count underflow. Two commits fixing our handling of BPF atomic instructions. Fix error handling in xive when allocating an IPI. Fix lockup on kernel exec fault on 603. Thanks to Bharata B Rao, Cédric Le Goater, Christian Zigotzky, Christophe Leroy, Guenter Roeck, Jiri Olsa, Naveen N. Rao, Nicholas Piggin, and Valentin Schneider" * tag 'powerpc-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/preempt: Don't touch the idle task's preempt_count during hotplug powerpc/64e: Fix system call illegal mtmsrd instruction powerpc/xive: Fix error handling when allocating an IPI powerpc/bpf: Reject atomic ops in ppc32 JIT powerpc/bpf: Fix detecting BPF atomic instructions powerpc/mm: Fix lockup on kernel exec fault	2021-07-09 10:26:52 -07:00
Aneesh Kumar K.V	cec6515abb	powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache flush_tlb_range is special in that we don't specify the page size used for the translation. Hence when flushing TLB we flush the translation cache for all possible page sizes. The kernel also uses the same interface when moving page tables around. Such a move requires us to flush the page walk cache. Instead of adding another interface to force page walk cache flush, update flush_tlb_range to flush page walk cache if the range flushed is more than the PMD range. A page table move will always involve an invalidate range more than PMD_SIZE. Running microbenchmark with mprotect and parallel memory access didn't show any observable performance impact. Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-08 11:48:23 -07:00
Aneesh Kumar K.V	dc4875f0e7	mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * No functional change in this patch. [aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot] Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-08 11:48:22 -07:00
Aneesh Kumar K.V	9cf6fa2458	mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * No functional change in this patch. [aneesh.kumar@linux.ibm.com: fix] Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com [sfr@canb.auug.org.au: another fix] Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-08 11:48:22 -07:00
Christophe Leroy	cd5d5e602f	powerpc/mm: Fix lockup on kernel exec fault The powerpc kernel is not prepared to handle exec faults from kernel. Especially, the function is_exec_fault() will return 'false' when an exec fault is taken by kernel, because the check is based on reading current->thread.regs->trap which contains the trap from user. For instance, when provoking a LKDTM EXEC_USERSPACE test, current->thread.regs->trap is set to SYSCALL trap (0xc00), and the fault taken by the kernel is not seen as an exec fault by set_access_flags_filter(). Commit `d7df2443cd` ("powerpc/mm: Fix spurious segfaults on radix with autonuma") made it clear and handled it properly. But later on commit `d3ca587404` ("powerpc/mm: Fix reporting of kernel execute faults") removed that handling, introducing test based on error_code. And here is the problem, because on the 603 all upper bits of SRR1 get cleared when the TLB instruction miss handler bails out to ISI. Until commit `cbd7e6ca02` ("powerpc/fault: Avoid heavy search_exception_tables() verification"), an exec fault from kernel at a userspace address was indirectly caught by the lack of entry for that address in the exception tables. But after that commit the kernel mainly relies on KUAP or on core mm handling to catch wrong user accesses. Here the access is not wrong, so mm handles it. It is a minor fault because PAGE_EXEC is not set, set_access_flags_filter() should set PAGE_EXEC and voila. But as is_exec_fault() returns false as explained in the beginning, set_access_flags_filter() bails out without setting PAGE_EXEC flag, which leads to a forever minor exec fault. As the kernel is not prepared to handle such exec faults, the thing to do is to fire in bad_kernel_fault() for any exec fault taken by the kernel, as it was prior to commit `d3ca587404`. Fixes: `d3ca587404` ("powerpc/mm: Fix reporting of kernel execute faults") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/024bb05105050f704743a0083fe3548702be5706.1625138205.git.christophe.leroy@csgroup.eu	2021-07-05 22:23:24 +10:00
Linus Torvalds	019b3fd94b	powerpc updates for 5.14 - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, Zhen Lei. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDfFS4THG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFxHEAC88NJ+Gz87LiTQFt6QjhziBaJUd+sY uqADPRROr4P50O8PjYZbMi2qbXzOlLkZO4wJWX7jpZ1F9KmbPNqY2shD8h4ahyge F/uqzBW1FXBJfnDEKdU2MzalkeTP+dwxLZyouUamjDCGNLFjOV4x/Fft5otOdXjO k9uO6yoGyOkWYzjC+Y/irNPlIDDByB/+bD92Cb52Y2mXMDDEnx4JzbtkeJW+8udT Sjn3bWzeL+dz5GehjMKwK4+SptNiyQGOgM8FwtnKUMvgzxv04DqCGjr9YC12L2Z7 VoFZc4GzVgtf8DZg4fJ3KG5aG2nH3Tui7jc9lUckdrxixDAZw5wSG7CQ39gFb/5+ 7A4fEJk4Z3h5llibwxAZrC7wV8ZDDXn8oRFzRcOJjfxYaD+ohOOyWHIebwkdiXYx nfYI7sBcScDLXeBvHtDra2GJpbFSpVL3S/QNhhi1vKVNrFSyAgbAybcVL2xPLZ6+ 8Mh7A8xt+hf2bo9AXuYJDo9mwXWfg1093d0kT+AslcRhZioBk18c2AiZLIz0FzuL Ua/e5FPb99x9LSdcZHvaAXBoHT2iTgDyCyDa3gkIesyuRX6ggHoFcVQuvdDcbJ9d H8LK+Tahy1Y+E5b6KdtU8mDEGE+QG+CWLnwQ6YSCaL/MYgaFzNa32Jdj1fmztSBC cttP43kHZ7ljTw== =zo4d -----END PGP SIGNATURE----- Merge tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ...	2021-07-02 12:54:34 -07:00
Nicholas Piggin	5567b1ee29	powerpc/64s: fix hash page fault interrupt handler The early bad fault or key fault test in do_hash_fault() ends up calling into ___do_page_fault without having gone through an interrupt handler wrapper (except the initial _RAW one). This can end up calling local irq functions while the interrupt has not been reconciled, which will likely cause crashes and it trips up on a later patch that adds more assertions. pkey_exec_prot from selftests causes this path to be executed. There is no real reason to run the in_nmi() test should be performed before the key fault check. In fact if a perf interrupt in the hash fault code did a stack walk that was made to take a key fault somehow then running ___do_page_fault could possibly cause another hash fault causing problems. Move the in_nmi() test first, and then do everything else inside the regular interrupt handler function. Fixes: `3a96570ffc` ("powerpc: convert interrupt handlers to use wrappers") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-2-npiggin@gmail.com	2021-06-30 22:21:19 +10:00
Christophe Leroy	fc4999864b	powerpc/4xx: Fix setup_kuep() on SMP On SMP, setup_kuep() is also called from start_secondary() since commit `86f46f3432` ("powerpc/32s: Initialise KUAP and KUEP in C"). start_secondary() is not an __init function. Remove the __init marker from setup_kuep() and bail out when not caller on the first CPU as the work is already done. Fixes: `10248dcba1` ("powerpc/44x: Implement Kernel Userspace Exec Protection (KUEP)") Fixes: `86f46f3432` ("powerpc/32s: Initialise KUAP and KUEP in C") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ee05934288994a65743a987acb1558f12c0c8c1.1624969450.git.christophe.leroy@csgroup.eu	2021-06-30 22:21:02 +10:00
Christophe Leroy	c89e632658	powerpc/32s: Fix setup_{kuap/kuep}() on SMP On SMP, setup_kup() is also called from start_secondary(). start_secondary() is not an __init function. Remove the __init marker from setup_kuep() and setup_kuap(). Fixes: `86f46f3432` ("powerpc/32s: Initialise KUAP and KUEP in C") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/42f4bd12b476942e4d5dc81c0e839d8871b20b1c.1624863319.git.christophe.leroy@csgroup.eu	2021-06-30 22:20:39 +10:00
Linus Torvalds	65090f30ab	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...	2021-06-29 17:29:11 -07:00
Linus Torvalds	21edf50948	Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbIg8THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYobZNEAC2wTq3Ishk026va7g5mbQVSvAQyf8G 0msmgJ48lJWVL9a6JUogNcCO7sZCTcAy4CYbuHI6kz1fGZZnNWSCrtEz0rFNAdWE WVR2k8ExR2R73vJm+K50WUMMj8YsefRnIFXWlJdTp+pksr3TZ7Lo70taGUK/6tMo aL0dqvnf7Vb3LG0iIkaHWLF4HnyK/UGqB+121rlL4UhI1/g+3EUxNWNcY5eg/dmc Ym73U1uDsjydp3/3jm8v8NYNtwCDGpujZZc/88RFLjP6PMpF1S9JUvDEt+LHJi0a cdS3RreB78HYXpLg5NtDFJwIegRMLSitvCGPBjHvWBzbifkMsA2zWIb6Cs8VkYys vuPoEGZ0ol+SWvcnSh5Xy36nyr4iGIBhQql47UAaqelSxsYPjvCCSD4yJV3k8hnC ZuDscOekXUMn75qZR0quNdi1SkgKpGZxK73QFbuW3Apl5EgArVai6kq0rbl6zlx6 ACy0SEcevhOcpU6WpqDgrmUBgFr+M8zina8edRELgiFEuWT6pYxKwrN3pT4U5djO e5V3YuNzzwzvtUoXN4AiTlT8gwRiGfgeiEvHpvZBXPNvk5ffS6XzPiV81ZMWiBkb ReoCbqME3PKoxj1VAHJvVXHbcjiPIJeCRdV+5vQSNh1SPSQOmEdWyJtNUDrSkoym QkKKY5jrOhPhlQ== =FIKh -----END PGP SIGNATURE----- Merge tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...	2021-06-29 12:25:04 -07:00
Mike Rapoport	a9ee6cf5c6	mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA configuration options are equivalent. Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. Done with $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ $(git grep -wl NEED_MULTIPLE_NODES) with manual tweaks afterwards. [rppt@linux.ibm.com: fix arm boot crash] Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-29 10:53:55 -07:00
Linus Torvalds	36824f198c	ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4 QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6 +/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w== =NcRh -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This covers all architectures (except MIPS) so I don't expect any other feature pull requests this merge window. ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits) KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled kvm: x86: disable the narrow guest module parameter on unload selftests: kvm: Allows userspace to handle emulation errors. kvm: x86: Allow userspace to handle emulation errors KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic KVM: x86: Enhance comments for MMU roles and nested transition trickiness KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU KVM: x86/mmu: Use MMU's role to determine PTTYPE KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers KVM: x86/mmu: Add a helper to calculate root from role_regs KVM: x86/mmu: Add helper to update paging metadata KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper KVM: x86/mmu: Get nested MMU's root level from the MMU's role ...	2021-06-28 15:40:51 -07:00
Christophe Leroy	6ca6512c71	powerpc/mm: Properly coalesce pages in ptdump Commit `aaa2295292` ("powerpc/mm: Add physical address to Linux page table dump") changed range coalescing to only combine ranges that are both virtually and physically contiguous, in order to avoid erroneous combination of unrelated mappings in IOREMAP space. But in the VMALLOC space, mappings almost never have contiguous physical pages, so the commit mentionned above leads to dumping one line per page for vmalloc mappings. Taking into account the vmalloc always leave a gap between two areas, we never have two mappings dumped as a single combination even if they have the exact same flags. The only space that may have encountered such an issue was the early IOREMAP which is not using vmalloc engine. But previous commits added gaps between early IO mappings, so it is not an issue anymore. That commit created some difficulties with KASAN mappings, see commit `cabe8138b2` ("powerpc: dump as a single line areas mapping a single physical page.") and with huge page, see commit `b00ff6d8c1` ("powerpc/ptdump: Properly handle non standard page size"). So, almost revert commit `aaa2295292` to properly coalesce pages mapped with the same flags as before, only keep the display of the first physical address of the range, as it can be usefull especially for IO mappings. It brings back powerpc at the same level as other architectures and simplifies the conversion to GENERIC PTDUMP. With the patch: ---[ kasan shadow mem start ]--- 0xf8000000-0xf8ffffff 0x07000000 16M huge rw present dirty accessed 0xf9000000-0xf91fffff 0x01434000 2M r present accessed 0xf9200000-0xf95affff 0x02104000 3776K rw present dirty accessed 0xfef5c000-0xfeffffff 0x01434000 656K r present accessed ---[ kasan shadow mem end ]--- Before: ---[ kasan shadow mem start ]--- 0xf8000000-0xf8ffffff 0x07000000 16M huge rw present dirty accessed 0xf9000000-0xf91fffff 0x01434000 16K r present accessed 0xf9200000-0xf9203fff 0x02104000 16K rw present dirty accessed 0xf9204000-0xf9207fff 0x0213c000 16K rw present dirty accessed 0xf9208000-0xf920bfff 0x02174000 16K rw present dirty accessed 0xf920c000-0xf920ffff 0x02188000 16K rw present dirty accessed 0xf9210000-0xf9213fff 0x021dc000 16K rw present dirty accessed 0xf9214000-0xf9217fff 0x02220000 16K rw present dirty accessed 0xf9218000-0xf921bfff 0x023c0000 16K rw present dirty accessed 0xf921c000-0xf921ffff 0x023d4000 16K rw present dirty accessed 0xf9220000-0xf9227fff 0x023ec000 32K rw present dirty accessed ... 0xf93b8000-0xf93e3fff 0x02614000 176K rw present dirty accessed 0xf93e4000-0xf94c3fff 0x027c0000 896K rw present dirty accessed 0xf94c4000-0xf94c7fff 0x0236c000 16K rw present dirty accessed 0xf94c8000-0xf94cbfff 0x041f0000 16K rw present dirty accessed 0xf94cc000-0xf94cffff 0x029c0000 16K rw present dirty accessed 0xf94d0000-0xf94d3fff 0x041ec000 16K rw present dirty accessed 0xf94d4000-0xf94d7fff 0x0407c000 16K rw present dirty accessed 0xf94d8000-0xf94f7fff 0x041c0000 128K rw present dirty accessed ... 0xf95ac000-0xf95affff 0x042b0000 16K rw present dirty accessed 0xfef5c000-0xfeffffff 0x01434000 16K r present accessed ---[ kasan shadow mem end ]--- Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c56ce1f5c3c75adc9811b1a5f9c410fa74183a8d.1618828806.git.christophe.leroy@csgroup.eu	2021-06-25 00:07:10 +10:00
Christophe Leroy	57307f1b6e	powerpc/mm: Leave a gap between early allocated IO areas Vmalloc system leaves a gap between allocated areas. It helps catching overflows. Do the same for IO areas which are allocated with early_ioremap_range() until slab_is_available(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c433e358190fb5d47650463ea1ab755fc7b73e6e.1618828806.git.christophe.leroy@csgroup.eu	2021-06-25 00:07:10 +10:00
Michael Ellerman	3018fbc636	powerpc/64s: Fix boot failure with 4K Radix When using the Radix MMU our PGD is always 64K, and must be naturally aligned. For a 4K page size kernel that means page alignment of swapper_pg_dir is not sufficient, leading to failure to boot. Use the existing MAX_PTRS_PER_PGD which has the correct value, and avoids us hard-coding 64K here. Fixes: `e72421a085` ("powerpc: Define swapper_pg_dir[] in C") Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210624123420.2784187-1-mpe@ellerman.id.au	2021-06-25 00:06:54 +10:00
Michael Ellerman	a736143afd	Merge branch 'topic/ppc-kvm' into next Pull in some more ppc KVM patches we are keeping in our topic branch. In particular this brings in the series to add H_RPT_INVALIDATE.	2021-06-23 00:19:08 +10:00
Bharata B Rao	53324b51c5	KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE Enable support for process-scoped invalidations from nested guests and partition-scoped invalidations for nested guests. Process-scoped invalidations for any level of nested guests are handled by implementing H_RPT_INVALIDATE handler in the nested guest exit path in L0. Partition-scoped invalidation requests are forwarded to the right nested guest, handled there and passed down to L0 for eventual handling. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [aneesh: Nested guest partition-scoped invalidation changes] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Squash in fixup patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-5-bharata@linux.ibm.com	2021-06-22 23:35:37 +10:00
Bharata B Rao	f0c6fbbb90	KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE H_RPT_INVALIDATE does two types of TLB invalidations: 1. Process-scoped invalidations for guests when LPCR[GTSE]=0. This is currently not used in KVM as GTSE is not usually disabled in KVM. 2. Partition-scoped invalidations that an L1 hypervisor does on behalf of an L2 guest. This is currently handled by H_TLB_INVALIDATE hcall and this new replaces the old that. This commit enables process-scoped invalidations for L1 guests. Support for process-scoped and partition-scoped invalidations from/for nested guests will be added separately. Process scoped tlbie invalidations from L1 and nested guests need RS register for TLBIE instruction to contain both PID and LPID. This patch introduces primitives that execute tlbie instruction with both PID and LPID set in prepartion for H_RPT_INVALIDATE hcall. A description of H_RPT_INVALIDATE follows: int64 /* H_Success: Return code on successful completion / / H_Busy - repeat the call with the same / / H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters / hcall(const uint64 H_RPT_INVALIDATE, / Invalidate RPT translation lookaside information / uint64 id, / PID/LPID to invalidate / uint64 target, / Invalidation target / uint64 type, / Type of lookaside information / uint64 pg_sizes, / Page sizes / uint64 start, / Start of Effective Address (EA) range (inclusive) / uint64 end) / End of EA range (exclusive) / Invalidation targets (target) ----------------------------- Core MMU 0x01 / All virtual processors in the partition / Core local MMU 0x02 / Current virtual processor / Nest MMU 0x04 / All nest/accelerator agents in use by the partition / A combination of the above can be specified, except core and core local. Type of translation to invalidate (type) --------------------------------------- NESTED 0x0001 / invalidate nested guest partition-scope / TLB 0x0002 / Invalidate TLB / PWC 0x0004 / Invalidate Page Walk Cache / PRT 0x0008 / Invalidate caching of Process Table Entries if NESTED is clear / PAT 0x0008 / Invalidate caching of Partition Table Entries if NESTED is set / A combination of the above can be specified. Page size mask (pages) ---------------------- 4K 0x01 64K 0x02 2M 0x04 1G 0x08 All sizes (-1UL) A combination of the above can be specified. All page sizes can be selected with -1. Semantics: Invalidate radix tree lookaside information matching the parameters given. Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are different from the defined values. * Return H_PARAMETER if NESTED is set and pid is not a valid nested LPID allocated to this partition * Return H_P5 if (start, end) doesn't form a valid range. Start and end should be a valid Quadrant address and end > start. * Return H_NotSupported if the partition is not in running in radix translation mode. * May invalidate more translation information than requested. * If start = 0 and end = -1, set the range to cover all valid addresses. Else start and end should be aligned to 4kB (lower 11 bits clear). * If NESTED is clear, then invalidate process scoped lookaside information. Else pid specifies a nested LPID, and the invalidation is performed on nested guest partition table and nested guest partition scope real addresses. * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and quadrant 0 spaces, Else valid addresses are quadrant 0. * Pages which are fully covered by the range are to be invalidated. Those which are partially covered are considered outside invalidation range, which allows a caller to optimally invalidate ranges that may contain mixed page sizes. * Return H_SUCCESS on success. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com	2021-06-21 22:54:27 +10:00
Bharata B Rao	d6265cb33b	powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def Add a field to mmu_psize_def to store the page size encodings of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix AP encodings. This will be used when invalidating with required page size encoding in the hcall. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-3-bharata@linux.ibm.com	2021-06-21 22:48:18 +10:00
Christophe Leroy	c988cfd38e	powerpc/32: use set_memory_attr() Use set_memory_attr() instead of the PPC32 specific change_page_attr() change_page_attr() was checking that the address was not mapped by blocks and was handling highmem, but that's unneeded because the affected pages can't be in highmem and block mapping verification is already done by the callers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [ruscur: rebase on powerpc/merge with Christophe's new patches] Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210609013431.9805-10-jniethe5@gmail.com	2021-06-21 21:13:21 +10:00
Christophe Leroy	4d1755b6a7	powerpc/mm: implement set_memory_attr() In addition to the set_memory_xx() functions which allows to change the memory attributes of not (yet) used memory regions, implement a set_memory_attr() function to: - set the final memory protection after init on currently used kernel regions. - enable/disable kernel memory regions in the scope of DEBUG_PAGEALLOC. Unlike the set_memory_xx() which can act in three step as the regions are unused, this function must modify 'on the fly' as the kernel is executing from them. At the moment only PPC32 will use it and changing page attributes on the fly is not an issue. Reported-by: kbuild test robot <lkp@intel.com> [ruscur: cast "data" to unsigned long instead of int] Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210609013431.9805-9-jniethe5@gmail.com	2021-06-21 21:13:21 +10:00
Russell Currey	1f9ad21c3b	powerpc/mm: Implement set_memory() routines The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX, and are generally useful primitives to have. This implementation is designed to be generic across powerpc's many MMUs. It's possible that this could be optimised to be faster for specific MMUs. This implementation does not handle cases where the caller is attempting to change the mapping of the page it is executing from, or if another CPU is concurrently using the page being altered. These cases likely shouldn't happen, but a more complex implementation with MMU-specific code could safely handle them. On hash, the linear mapping is not kept in the linux pagetable, so this will not change the protection if used on that range. Currently these functions are not used on the linear map so just WARN for now. apply_to_existing_page_range() does not work on huge pages so for now disallow changing the protection of huge pages. [jpn: - Allow set memory functions to be used without Strict RWX - Hash: Disallow certain regions - Have change_page_attr() take function pointers to manipulate ptes - Radix: Add ptesync after set_pte_at()] Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210609013431.9805-2-jniethe5@gmail.com	2021-06-21 21:13:20 +10:00
Michael Ellerman	3c53642324	Merge branch 'topic/ppc-kvm' into next Merge some powerpc KVM patches from our topic branch. In particular this brings in Nick's big series rewriting parts of the guest entry/exit path in C. Conflicts: arch/powerpc/kernel/security.c arch/powerpc/kvm/book3s_hv_rmhandlers.S	2021-06-17 16:51:38 +10:00
Aneesh Kumar K.V	07d8ad6fd8	powerpc/mm/book3s64: Fix possible build error Update _tlbiel_pid() such that we can avoid build errors like below when using this function in other places. arch/powerpc/mm/book3s64/radix_tlb.c: In function ‘__radix__flush_tlb_range_psize’: arch/powerpc/mm/book3s64/radix_tlb.c:114:2: warning: ‘asm’ operand 3 probably does not match constraints 114 \| asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1) \| ^~~ arch/powerpc/mm/book3s64/radix_tlb.c:114:2: error: impossible constraint in ‘asm’ make[4]: *** [scripts/Makefile.build:271: arch/powerpc/mm/book3s64/radix_tlb.o] Error 1 m With this fix, we can also drop the __always_inline in __radix_flush_tlb_range_psize which was added by commit `e12d6d7d46` ("powerpc/mm/radix: mark __radix__flush_tlb_range_psize() as __always_inline") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210610083639.387365-1-aneesh.kumar@linux.ibm.com	2021-06-17 16:25:50 +10:00
Christophe Leroy	baf24d23be	powerpc/32: Display modules range in virtual memory layout book3s/32 and 8xx don't use vmalloc for modules. Print the modules area at startup as part of the virtual memory layout: [ 0.000000] Kernel virtual memory layout: [ 0.000000] * 0xffafc000..0xffffc000 : fixmap [ 0.000000] * 0xc9000000..0xffafc000 : vmalloc & ioremap [ 0.000000] * 0xb0000000..0xc0000000 : modules [ 0.000000] Memory: 118480K/131072K available (7152K kernel code, 2320K rwdata, 1328K rodata, 368K init, 854K bss, 12592K reserved, 0K cma-reserved) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/98394503e92d6fd6d8f657e0b263b32f21cf2790.1623438478.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:11 +10:00
Christophe Leroy	91e9ee7e94	powerpc/32s: Rename PTE_SIZE to PTE_T_SIZE PTE_SIZE means PTE page table size in most placed, whereas in hash_low.S in means size of one entry in the table. Rename it PTE_T_SIZE, and define it directly in hash_low.S instead of going through asm-offsets. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/83a008a9fd6cc3f2bbcb470f592555d260ed7a3d.1623063174.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:10 +10:00
Christophe Leroy	e72421a085	powerpc: Define swapper_pg_dir[] in C Don't duplicate swapper_pg_dir[] in each platform's head.S Define it in mm/pgtable.c Define MAX_PTRS_PER_PGD because on book3s/64 PTRS_PER_PGD is not a constant. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5e3f1b8a4695c33ccc80aa3870e016bef32b85e1.1623063174.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:10 +10:00
Christophe Leroy	45b30fafe5	powerpc: Define empty_zero_page[] in C At the time being, empty_zero_page[] is defined in each platform head.S. Define it in mm/mem.c instead, and put it in BSS section instead of the DATA section. Commit `5227cfa71f` ("arm64: mm: place empty_zero_page in bss") explains why it is interesting to have it in BSS. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5838caffa269e0957c5a50cc85477876220298b0.1623063174.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:10 +10:00
Christophe Leroy	e2c043163d	powerpc/nohash: Remove DEBUG_HARDER DEBUG_HARDER is not user selectable. Remove it together with related messages. Also remove two pr_devel() messages that should likely have been pr_hard(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0f25109b0e12fdd1e6541dedbb2212cc53526a57.1622712515.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:10 +10:00
Christophe Leroy	a36c0faf3d	powerpc/nohash: Remove DEBUG_CLAMP_LAST_CONTEXT DEBUG_CLAMP_LAST_CONTEXT was there in the old days to reduce number of contexts in order to ease debugging implementation of context switching, but that's been quite stable during years now. As it is not user selectable, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/da81837b452e8b9f1657b529b9c3050dc10b9770.1622712515.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:09 +10:00
Christophe Leroy	dac3db1edf	powerpc/nohash: Remove DEBUG_MAP_CONSISTENCY mmu_context handling has been there for years, so we would know if there was problems with maps. DEBUG_MAP_CONSISTENCY is not user selectable, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6fe2b88956db53f8d6ee221525b2c5dc6aec82c6.1622712515.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:09 +10:00
Christophe Leroy	c13066e53a	powerpc/nohash: Remove CONFIG_SMP #ifdefery in mmu_context.h Everything can be done even when CONFIG_SMP is not selected. Just use IS_ENABLED() where relevant and rely on GCC to opt out unneeded code and variables when CONFIG_SMP is not set. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cc13b87b0f750a538621876ecc24c22a07e7c8be.1622712515.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:09 +10:00
Christophe Leroy	a56ab7c729	powerpc/nohash: Convert set_context() to C ppc8xx already has set_context() in C. Other ones have it in assembly. The only thing it does is to write the context id into SPRN_PID. Do it in C. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a5d0759064f3831c6b88af49ef5d3b05ba1c4dad.1622712515.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:09 +10:00
Christophe Leroy	25910260ff	powerpc/nohash: Refactor update of BDI2000 pointers in switch_mmu_context() Instead of duplicating the update of BDI2000 pointers in set_context(), do it directly from switch_mmu_context(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4c54997edd3548fa54717915e7c6ebaf60f208c0.1622712515.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:09 +10:00
Christophe Leroy	16132529ce	powerpc/32s: Rework Kernel Userspace Access Protection On book3s/32, KUAP is provided by toggling Ks bit in segment registers. One segment register addresses 256M of virtual memory. At the time being, KUAP implements a complex logic to apply the unlock/lock on the exact number of segments covering the user range to access, with saving the boundaries of the range of segments in a member of thread struct. But most if not all user accesses are within a single segment. Rework KUAP with a different approach: - Open only one segment, the one corresponding to the starting address of the range to be accessed. - If a second segment is involved, it will generate a page fault. The segment will then be open by the page fault handler. The kuap member of thread struct will now contain: - The start address of the current on going user access, that will be used to know which segment to lock at the end of the user access. - ~0 when no user access is open - ~1 when additionnal segments are opened by a page fault. Then, at lock time - When only one segment is open, close it. - When several segments are open, close all user segments. Almost 100% of the time, only one segment will be involved. In interrupts, inline the function that unlock/lock all segments, because not inlining them implies a lot of register save/restore. With the patch, writing value 128 in userspace in perf_copy_attr() is done with 16 instructions: 3890: 93 82 04 dc stw r28,1244(r2) 3894: 7d 20 e5 26 mfsrin r9,r28 3898: 55 29 00 80 rlwinm r9,r9,0,2,0 389c: 7d 20 e1 e4 mtsrin r9,r28 38a0: 4c 00 01 2c isync 38a4: 39 20 00 80 li r9,128 38a8: 91 3c 00 00 stw r9,0(r28) 38ac: 81 42 04 dc lwz r10,1244(r2) 38b0: 39 00 ff ff li r8,-1 38b4: 91 02 04 dc stw r8,1244(r2) 38b8: 2c 0a ff fe cmpwi r10,-2 38bc: 41 82 00 88 beq 3944 <perf_copy_attr+0x36c> 38c0: 7d 20 55 26 mfsrin r9,r10 38c4: 65 29 40 00 oris r9,r9,16384 38c8: 7d 20 51 e4 mtsrin r9,r10 38cc: 4c 00 01 2c isync ... 3944: 48 00 00 01 bl 3944 <perf_copy_attr+0x36c> 3944: R_PPC_REL24 kuap_lock_all_ool Before the patch it was 118 instructions. In reality only 42 are executed in most cases, but GCC is not able to see that a properly aligned user access cannot involve more than one segment. 5060: 39 1d 00 04 addi r8,r29,4 5064: 3d 20 b0 00 lis r9,-20480 5068: 7c 08 48 40 cmplw r8,r9 506c: 40 81 00 08 ble 5074 <perf_copy_attr+0x2cc> 5070: 3d 00 b0 00 lis r8,-20480 5074: 39 28 ff ff addi r9,r8,-1 5078: 57 aa 00 06 rlwinm r10,r29,0,0,3 507c: 55 29 27 3e rlwinm r9,r9,4,28,31 5080: 39 29 00 01 addi r9,r9,1 5084: 7d 29 53 78 or r9,r9,r10 5088: 91 22 04 dc stw r9,1244(r2) 508c: 7d 20 ed 26 mfsrin r9,r29 5090: 55 29 00 80 rlwinm r9,r9,0,2,0 5094: 7c 08 50 40 cmplw r8,r10 5098: 40 81 00 c0 ble 5158 <perf_copy_attr+0x3b0> 509c: 7d 46 50 f8 not r6,r10 50a0: 7c c6 42 14 add r6,r6,r8 50a4: 54 c6 27 be rlwinm r6,r6,4,30,31 50a8: 7d 20 51 e4 mtsrin r9,r10 50ac: 3c ea 10 00 addis r7,r10,4096 50b0: 39 29 01 11 addi r9,r9,273 50b4: 7f 88 38 40 cmplw cr7,r8,r7 50b8: 55 29 02 06 rlwinm r9,r9,0,8,3 50bc: 40 9d 00 9c ble cr7,5158 <perf_copy_attr+0x3b0> 50c0: 2f 86 00 00 cmpwi cr7,r6,0 50c4: 41 9e 00 4c beq cr7,5110 <perf_copy_attr+0x368> 50c8: 2f 86 00 01 cmpwi cr7,r6,1 50cc: 41 9e 00 2c beq cr7,50f8 <perf_copy_attr+0x350> 50d0: 2f 86 00 02 cmpwi cr7,r6,2 50d4: 41 9e 00 14 beq cr7,50e8 <perf_copy_attr+0x340> 50d8: 7d 20 39 e4 mtsrin r9,r7 50dc: 39 29 01 11 addi r9,r9,273 50e0: 3c e7 10 00 addis r7,r7,4096 50e4: 55 29 02 06 rlwinm r9,r9,0,8,3 50e8: 7d 20 39 e4 mtsrin r9,r7 50ec: 39 29 01 11 addi r9,r9,273 50f0: 3c e7 10 00 addis r7,r7,4096 50f4: 55 29 02 06 rlwinm r9,r9,0,8,3 50f8: 7d 20 39 e4 mtsrin r9,r7 50fc: 3c e7 10 00 addis r7,r7,4096 5100: 39 29 01 11 addi r9,r9,273 5104: 7f 88 38 40 cmplw cr7,r8,r7 5108: 55 29 02 06 rlwinm r9,r9,0,8,3 510c: 40 9d 00 4c ble cr7,5158 <perf_copy_attr+0x3b0> 5110: 7d 20 39 e4 mtsrin r9,r7 5114: 39 29 01 11 addi r9,r9,273 5118: 3c c7 10 00 addis r6,r7,4096 511c: 55 29 02 06 rlwinm r9,r9,0,8,3 5120: 7d 20 31 e4 mtsrin r9,r6 5124: 39 29 01 11 addi r9,r9,273 5128: 3c c6 10 00 addis r6,r6,4096 512c: 55 29 02 06 rlwinm r9,r9,0,8,3 5130: 7d 20 31 e4 mtsrin r9,r6 5134: 39 29 01 11 addi r9,r9,273 5138: 3c c7 30 00 addis r6,r7,12288 513c: 55 29 02 06 rlwinm r9,r9,0,8,3 5140: 7d 20 31 e4 mtsrin r9,r6 5144: 3c e7 40 00 addis r7,r7,16384 5148: 39 29 01 11 addi r9,r9,273 514c: 7f 88 38 40 cmplw cr7,r8,r7 5150: 55 29 02 06 rlwinm r9,r9,0,8,3 5154: 41 9d ff bc bgt cr7,5110 <perf_copy_attr+0x368> 5158: 4c 00 01 2c isync 515c: 39 20 00 80 li r9,128 5160: 91 3d 00 00 stw r9,0(r29) 5164: 38 e0 00 00 li r7,0 5168: 90 e2 04 dc stw r7,1244(r2) 516c: 7d 20 ed 26 mfsrin r9,r29 5170: 65 29 40 00 oris r9,r9,16384 5174: 40 81 00 c0 ble 5234 <perf_copy_attr+0x48c> 5178: 7d 47 50 f8 not r7,r10 517c: 7c e7 42 14 add r7,r7,r8 5180: 54 e7 27 be rlwinm r7,r7,4,30,31 5184: 7d 20 51 e4 mtsrin r9,r10 5188: 3d 4a 10 00 addis r10,r10,4096 518c: 39 29 01 11 addi r9,r9,273 5190: 7c 08 50 40 cmplw r8,r10 5194: 55 29 02 06 rlwinm r9,r9,0,8,3 5198: 40 81 00 9c ble 5234 <perf_copy_attr+0x48c> 519c: 2c 07 00 00 cmpwi r7,0 51a0: 41 82 00 4c beq 51ec <perf_copy_attr+0x444> 51a4: 2c 07 00 01 cmpwi r7,1 51a8: 41 82 00 2c beq 51d4 <perf_copy_attr+0x42c> 51ac: 2c 07 00 02 cmpwi r7,2 51b0: 41 82 00 14 beq 51c4 <perf_copy_attr+0x41c> 51b4: 7d 20 51 e4 mtsrin r9,r10 51b8: 39 29 01 11 addi r9,r9,273 51bc: 3d 4a 10 00 addis r10,r10,4096 51c0: 55 29 02 06 rlwinm r9,r9,0,8,3 51c4: 7d 20 51 e4 mtsrin r9,r10 51c8: 39 29 01 11 addi r9,r9,273 51cc: 3d 4a 10 00 addis r10,r10,4096 51d0: 55 29 02 06 rlwinm r9,r9,0,8,3 51d4: 7d 20 51 e4 mtsrin r9,r10 51d8: 3d 4a 10 00 addis r10,r10,4096 51dc: 39 29 01 11 addi r9,r9,273 51e0: 7c 08 50 40 cmplw r8,r10 51e4: 55 29 02 06 rlwinm r9,r9,0,8,3 51e8: 40 81 00 4c ble 5234 <perf_copy_attr+0x48c> 51ec: 7d 20 51 e4 mtsrin r9,r10 51f0: 39 29 01 11 addi r9,r9,273 51f4: 3c ea 10 00 addis r7,r10,4096 51f8: 55 29 02 06 rlwinm r9,r9,0,8,3 51fc: 7d 20 39 e4 mtsrin r9,r7 5200: 39 29 01 11 addi r9,r9,273 5204: 3c e7 10 00 addis r7,r7,4096 5208: 55 29 02 06 rlwinm r9,r9,0,8,3 520c: 7d 20 39 e4 mtsrin r9,r7 5210: 39 29 01 11 addi r9,r9,273 5214: 3c ea 30 00 addis r7,r10,12288 5218: 55 29 02 06 rlwinm r9,r9,0,8,3 521c: 7d 20 39 e4 mtsrin r9,r7 5220: 3d 4a 40 00 addis r10,r10,16384 5224: 39 29 01 11 addi r9,r9,273 5228: 7c 08 50 40 cmplw r8,r10 522c: 55 29 02 06 rlwinm r9,r9,0,8,3 5230: 41 81 ff bc bgt 51ec <perf_copy_attr+0x444> 5234: 4c 00 01 2c isync Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Export the ool handlers to fix build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Christophe Leroy	6b4d630068	powerpc/32s: Allow disabling KUAP at boot time PPC64 uses MMU features to enable/disable KUAP at boot time. But feature fixups are applied way too early on PPC32. Now that all KUAP related actions are in C following the conversion of KUAP initial setup and context switch in C, static branches can be used to enable/disable KUAP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Export disable_kuap_key to fix build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cd79e8008455fba5395d099f9bb1305c039b931c.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Christophe Leroy	50d2f104cd	powerpc/32s: Allow disabling KUEP at boot time PPC64 uses MMU features to enable/disable KUEP at boot time. But feature fixups are applied way too early on PPC32. Now that all KUEP related actions are in C following the conversion of KUEP initial setup and context switch in C, static branches can be used to enable/disable KUEP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7745a2c3a08ec46302920a3f48d1cb9b5469dbbb.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Christophe Leroy	86f46f3432	powerpc/32s: Initialise KUAP and KUEP in C In order to selectively activate KUAP and KUEP in a following patch, perform KUAP and KUEP initialisation in C. Unlike PPC64, PPC32 doesn't have an early_setup_secondary(), so do it in start_secondary(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/87be72023448dd4e476744ed279b8c04b8d08a1c.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Christophe Leroy	863771a28e	powerpc/32s: Convert switch_mmu_context() to C switch_mmu_context() does things that can easily be done in C. For updating user segments, we have update_user_segments(). As mentionned in commit `b5efec00b6` ("powerpc/32s: Move KUEP locking/unlocking in C"), update_user_segments() has the loop unrolled which is a significant performance gain. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05c0875ad8220c03452c3a334946e207c6ca04d6.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Christophe Leroy	7235bb3593	powerpc/32s: move CTX_TO_VSID() into mmu-hash.h In order to reuse it in switch_mmu_context(), this patch moves CTX_TO_VSID() macro into asm/book3s/32/mmu-hash.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/26b36ef2939234a04b37baf6ffe50cba81f5d1b7.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Christophe Leroy	91bb30822a	powerpc/32s: Refactor update of user segment registers KUEP implements the update of user segment registers. Move it into mmu-hash.h in order to use it from other places. And inline kuep_lock() and kuep_unlock(). Inlining kuep_lock() is important for system_call_exception(), otherwise system_call_exception() has to save into stack the system call parameters that are used just after, and doing that takes more instructions than kuep_lock() itself. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/24591ca480d14a62ef910e38a5273d551262c4a2.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:07 +10:00
Christophe Leroy	91ec66719d	powerpc/32s: Move setup_{kuep/kuap}() into {kuep/kuap}.c Avoids the #ifdef in mmu.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0b7a13d414837e58264edc336b89c2fe9f35f9bc.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:07 +10:00
Christophe Leroy	f6025a140b	powerpc/8xx: Allow disabling KUAP at boot time PPC64 uses MMU features to enable/disable KUAP at boot time. But feature fixups are applied way too early on PPC32. But since commit `c16728835e` ("powerpc/32: Manage KUAP in C"), all KUAP is in C so it is now possible to use static branches. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3dca510ce555335261a47c4799167da698f569c0.1622782111.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:07 +10:00
Christophe Leroy	10248dcba1	powerpc/44x: Implement Kernel Userspace Exec Protection (KUEP) Powerpc 44x has two bits for exec protection in TLBs: one for user (UX) and one for superviser (SX). Clear SX on user pages in TLB miss handlers to provide KUEP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/169310e08152aa1d96c979770291d165ec6896ae.1622616032.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:07 +10:00
Christophe Leroy	69d4d6e5fd	powerpc: Don't use 'struct ppc_inst' to reference instruction location 'struct ppc_inst' is an internal representation of an instruction, but in-memory instructions are and will remain a table of 'u32' forever. Replace all 'struct ppc_inst ' used for locating an instruction in memory by 'u32 '. This removes a lot of undue casts to 'struct ppc_inst *'. It also helps locating ab-use of 'struct ppc_inst' dereference. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix ppc_inst_next(), use u32 instead of unsigned int] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:00 +10:00
Nicholas Piggin	2e1ae9cd56	KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU Rather than partition the guest PID space + flush a rogue guest PID to work around this problem, instead fix it by always disabling the MMU when switching in or out of guest MMU context in HV mode. This may be a bit less efficient, but it is a lot less complicated and allows the P9 path to trivally implement the workaround too. Newer CPUs are not subject to this issue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Marc Zyngier	13a9a5d17d	powerpc: Add missing linux/{of.h,irqdomain.h} include directives A bunch of PPC files are missing the inclusion of linux/of.h and linux/irqdomain.h, relying on transitive inclusion from another file. As we are about to break this dependency, make sure these dependencies are explicit. Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-10 13:09:16 +01:00
Christophe Leroy	8e11d62e2e	powerpc/mem: Add back missing header to fix 'no previous prototype' error Commit `b26e8f2725` ("powerpc/mem: Move cache flushing functions into mm/cacheflush.c") removed asm/sparsemem.h which is required when CONFIG_MEMORY_HOTPLUG is selected to get the declaration of create_section_mapping(). Add it back. Fixes: `b26e8f2725` ("powerpc/mem: Move cache flushing functions into mm/cacheflush.c") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3e5b63bb3daab54a1eb9c20221c2e9528c4db9b3.1622883330.git.christophe.leroy@csgroup.eu	2021-06-06 21:43:11 +10:00
Linus Torvalds	8404c9fbc8	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...	2021-05-05 13:50:15 -07:00
Peter Xu	aec44e0f02	hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4. This series tries to disable huge pmd unshare of hugetlbfs backed memory for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of this series may be needed for multiple tasks (Axel's uffd minor fault series, and Mike's soft dirty series), so I picked it out from the larger series. This patch (of 4): It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. [peterx@redhat.com: build fix] Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:20 -07:00
Linus Torvalds	d42f323a7d	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "A few misc subsystems and some of MM. 175 patches. Subsystems affected by this patch series: ia64, kbuild, scripts, sh, ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub, kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap, mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits) mm/memory-failure: unnecessary amount of unmapping mm/mmzone.h: fix existing kernel-doc comments and link them to core-api mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1 net: page_pool: use alloc_pages_bulk in refill code path net: page_pool: refactor dma_map into own function page_pool_dma_map SUNRPC: refresh rq_pages using a bulk page allocator SUNRPC: set rq_page_end differently mm/page_alloc: inline __rmqueue_pcplist mm/page_alloc: optimize code layout for __alloc_pages_bulk mm/page_alloc: add an array-based interface to the bulk page allocator mm/page_alloc: add a bulk page allocator mm/page_alloc: rename alloced to allocated mm/page_alloc: duplicate include linux/vmalloc.h mm, page_alloc: avoid page_to_pfn() in move_freepages() mm/Kconfig: remove default DISCONTIGMEM_MANUAL mm: page_alloc: dump migrate-failed pages mm/mempolicy: fix mpol_misplaced kernel-doc mm/mempolicy: rewrite alloc_pages_vma documentation mm/mempolicy: rewrite alloc_pages documentation mm/mempolicy: rename alloc_pages_current to alloc_pages ...	2021-04-30 14:38:01 -07:00
Kefeng Wang	1f9d03c5e9	mm: move mem_init_print_info() into mm_init() mem_init_print_info() is called in mem_init() on each architecture, and pass NULL argument, so using void argument and move it into mm_init(). Link: https://lkml.kernel.org/r/20210317015210.33641-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> [powerpc] Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> [sparc64] Acked-by: Russell King <rmk+kernel@armlinux.org.uk> [arm] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Guo Ren <guoren@kernel.org> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-30 11:20:42 -07:00
Nicholas Piggin	4ad0ae8c64	mm/vmalloc: remove unmap_kernel_range This is a shim around vunmap_range, get rid of it. Move the main API comment from the _noflush variant to the normal variant, and make _noflush internal to mm/. [npiggin@gmail.com: fix nommu builds and a comment bug per sfr] Link: https://lkml.kernel.org/r/1617292598.m6g0knx24s.astroid@bobo.none [akpm@linux-foundation.org: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA] [npiggin@gmail.com: fix nommu builds] Link: https://lkml.kernel.org/r/1617292497.o1uhq5ipxp.astroid@bobo.none Link: https://lkml.kernel.org/r/20210322021806.892164-5-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Cédric Le Goater <clg@kaod.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-30 11:20:40 -07:00
Nicholas Piggin	8309c9d717	powerpc: inline huge vmap supported functions This allows unsupported levels to be constant folded away, and so p4d_free_pud_page can be removed because it's no longer linked to. Link: https://lkml.kernel.org/r/20210317062402.533919-8-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-30 11:20:40 -07:00
Nicholas Piggin	bbc180a5ad	mm: HUGE_VMAP arch support cleanup This changes the awkward approach where architectures provide init functions to determine which levels they can provide large mappings for, to one where the arch is queried for each call. This removes code and indirection, and allows constant-folding of dead code for unsupported levels. This also adds a prot argument to the arch query. This is unused currently but could help with some architectures (e.g., some powerpc processors can't map uncacheable memory with large pages). Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Russell King <linux@armlinux.org.uk> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-30 11:20:40 -07:00
Nicholas Piggin	0f197ddce4	powerpc/64s: Fix mm_cpumask memory ordering comment The memory ordering comment no longer applies, because mm_ctx_id is no longer used anywhere. At best always been difficult to follow. It's better to consider the load on which the slbmte depends on, which the MMU depends on before it can start loading TLBs, rather than a store which may or may not have a subsequent dependency chain to the slbmte. So update the comment and we use the load of the mm's user context ID. This is much more analogous the radix ordering too, which is good. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210421151733.212858-1-npiggin@gmail.com	2021-04-23 01:38:03 +10:00
Christophe Leroy	39352430aa	powerpc: Move copy_inst_from_kernel_nofault() When probe_kernel_read_inst() was created, there was no good place to put it, so a file called lib/inst.c was dedicated for it. Since then, probe_kernel_read_inst() has been renamed copy_inst_from_kernel_nofault(). And mm/maccess.h didn't exist at that time. Today, mm/maccess.h is related to copy_from_kernel_nofault(). Move copy_inst_from_kernel_nofault() into mm/maccess.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9655d8957313906b77b8db5700a0e33ce06f45e5.1618405715.git.christophe.leroy@csgroup.eu	2021-04-21 22:52:34 +10:00
Xiongwei Song	7153d4bf0b	powerpc/traps: Enhance readability for trap types Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com	2021-04-17 22:20:19 +10:00
Michael Ellerman	7098f8f0cf	powerpc/mm/radix: Make radix__change_memory_range() static The lkp bot pointed out that with W=1 we get: arch/powerpc/mm/book3s64/radix_pgtable.c:183:6: error: no previous prototype for 'radix__change_memory_range' Which is really saying that it could be static, make it so. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2021-04-14 23:04:45 +10:00
Nicholas Piggin	c45ba4f44f	powerpc: clean up do_page_fault search_exception_tables + __bad_page_fault can be substituted with bad_page_fault, do_page_fault no longer needs to return a value to asm for any sub-architecture, and __bad_page_fault can be static. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210316104206.407354-10-npiggin@gmail.com	2021-04-14 23:04:44 +10:00
Nicholas Piggin	d738ee8d56	powerpc/64e/interrupt: handle bad_page_fault in C With non-volatile registers saved on interrupt, bad_page_fault can now be called by do_page_fault. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210316104206.407354-9-npiggin@gmail.com	2021-04-14 23:04:43 +10:00
Christophe Leroy	7e9ab144c1	powerpc/mem: Use kmap_local_page() in flushing functions Flushing functions don't rely on preemption being disabled, so use kmap_local_page() instead of kmap_atomic(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b6a880ea0ec7886b51edbb4979c188be549231c0.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:19 +10:00
Christophe Leroy	6c96020882	powerpc/mem: Inline flush_dcache_page() flush_dcache_page() is only a few lines, it is worth inlining. ia64, csky, mips, openrisc and riscv have a similar flush_dcache_page() and inline it. On pmac32_defconfig, we get a small size reduction. On ppc64_defconfig, we get a very small size increase. In both case that's in the noise (less than 0.1%). text data bss dec hex filename 18991155 5934744 `1497624` 26423523 19330e3 vmlinux64.before 18994829 `5936732` `1497624` 26429185 1934701 vmlinux64.after 9150963 2467502 184548 11803013 b41985 vmlinux32.before `9149689` 2467302 184548 11801539 b413c3 vmlinux32.after Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/21c417488b70b7629dae316539fb7bb8bdef4fdd.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:19 +10:00
Christophe Leroy	67b8e6af19	powerpc/mem: Help GCC realise __flush_dcache_icache() flushes single pages 'And' the given page address with PAGE_MASK to help GCC. With the patch: 00000024 <__flush_dcache_icache>: 24: 54 63 00 26 rlwinm r3,r3,0,0,19 28: 39 40 00 40 li r10,64 2c: 7c 69 1b 78 mr r9,r3 30: 7d 49 03 a6 mtctr r10 34: 7c 00 48 6c dcbst 0,r9 38: 39 29 00 20 addi r9,r9,32 3c: 7c 00 48 6c dcbst 0,r9 40: 39 29 00 20 addi r9,r9,32 44: 42 00 ff f0 bdnz 34 <__flush_dcache_icache+0x10> 48: 7c 00 04 ac hwsync 4c: 39 20 00 40 li r9,64 50: 7d 29 03 a6 mtctr r9 54: 7c 00 1f ac icbi 0,r3 58: 38 63 00 20 addi r3,r3,32 5c: 7c 00 1f ac icbi 0,r3 60: 38 63 00 20 addi r3,r3,32 64: 42 00 ff f0 bdnz 54 <__flush_dcache_icache+0x30> 68: 7c 00 04 ac hwsync 6c: 4c 00 01 2c isync 70: 4e 80 00 20 blr Without the patch: 00000024 <__flush_dcache_icache>: 24: 54 6a 00 34 rlwinm r10,r3,0,0,26 28: 39 23 10 1f addi r9,r3,4127 2c: 7d 2a 48 50 subf r9,r10,r9 30: 55 29 d9 7f rlwinm. r9,r9,27,5,31 34: 41 82 00 94 beq c8 <__flush_dcache_icache+0xa4> 38: 71 28 00 01 andi. r8,r9,1 3c: 38 c9 ff ff addi r6,r9,-1 40: 7d 48 53 78 mr r8,r10 44: 7d 27 4b 78 mr r7,r9 48: 40 82 00 6c bne b4 <__flush_dcache_icache+0x90> 4c: 54 e7 f8 7e rlwinm r7,r7,31,1,31 50: 7c e9 03 a6 mtctr r7 54: 7c 00 40 6c dcbst 0,r8 58: 39 08 00 20 addi r8,r8,32 5c: 7c 00 40 6c dcbst 0,r8 60: 39 08 00 20 addi r8,r8,32 64: 42 00 ff f0 bdnz 54 <__flush_dcache_icache+0x30> 68: 7c 00 04 ac hwsync 6c: 71 28 00 01 andi. r8,r9,1 70: 39 09 ff ff addi r8,r9,-1 74: 40 82 00 2c bne a0 <__flush_dcache_icache+0x7c> 78: 55 29 f8 7e rlwinm r9,r9,31,1,31 7c: 7d 29 03 a6 mtctr r9 80: 7c 00 57 ac icbi 0,r10 84: 39 4a 00 20 addi r10,r10,32 88: 7c 00 57 ac icbi 0,r10 8c: 39 4a 00 20 addi r10,r10,32 90: 42 00 ff f0 bdnz 80 <__flush_dcache_icache+0x5c> 94: 7c 00 04 ac hwsync 98: 4c 00 01 2c isync 9c: 4e 80 00 20 blr a0: 7c 00 57 ac icbi 0,r10 a4: 2c 08 00 00 cmpwi r8,0 a8: 39 4a 00 20 addi r10,r10,32 ac: 40 82 ff cc bne 78 <__flush_dcache_icache+0x54> b0: 4b ff ff e4 b 94 <__flush_dcache_icache+0x70> b4: 7c 00 50 6c dcbst 0,r10 b8: 2c 06 00 00 cmpwi r6,0 bc: 39 0a 00 20 addi r8,r10,32 c0: 40 82 ff 8c bne 4c <__flush_dcache_icache+0x28> c4: 4b ff ff a4 b 68 <__flush_dcache_icache+0x44> c8: 7c 00 04 ac hwsync cc: 7c 00 04 ac hwsync d0: 4c 00 01 2c isync d4: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/23030822ea5cd0a122948b10226abe56602dc027.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:18 +10:00
Christophe Leroy	52d490437f	powerpc/mem: flush_dcache_icache_phys() is for HIGHMEM pages only __flush_dcache_icache() is usable for non HIGHMEM pages on every platform. It is only for HIGHMEM pages that BOOKE needs kmap() and BOOK3S needs flush_dcache_icache_phys(). So make flush_dcache_icache_phys() dependent on CONFIG_HIGHMEM and call it only when it is a HIGHMEM page. We could make flush_dcache_icache_phys() available at all time, but as it is declared NOKPROBE_SYMBOL(), GCC doesn't optimise it out when it is not used. So define a stub for !CONFIG_HIGHMEM in order to remove the #ifdef in flush_dcache_icache_page() and use IS_ENABLED() instead. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/79ed5d7914f497cd5fcd681ca2f4d50a91719455.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:18 +10:00
Christophe Leroy	cd97d9e8b5	powerpc/mem: Optimise flush_dcache_icache_hugepage() flush_dcache_icache_hugepage() is a static function, with only one caller. That caller calls it when PageCompound() is true, so bugging on !PageCompound() is useless if we can trust the compiler a little. Remove the BUG_ON(!PageCompound()). The number of elements of a page won't change over time, but GCC doesn't know about it, so it gets the value at every iteration. To avoid that, call compound_nr() outside the loop and save it in a local variable. Whether the page is a HIGHMEM page or not doesn't change over time. But GCC doesn't know it so it does the test on every iteration. Do the test outside the loop. When the page is not a HIGHMEM page, page_address() will fallback on lowmem_page_address(), so call lowmem_page_address() directly and don't suffer the call to page_address() on every iteration. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ab03712b70105fccfceef095aa03007de9295a40.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:18 +10:00
Christophe Leroy	e618c7aea1	powerpc/mem: Call flush_coherent_icache() at higher level flush_coherent_icache() doesn't need the address anymore, so it can be called immediately when entering the public functions and doesn't need to be disseminated among lower level functions. And use page_to_phys() instead of open coding the calculation of phys address to call flush_dcache_icache_phys(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5f063986e325d2efdd404b8f8c5f4bcbd4eb11a6.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:18 +10:00
Christophe Leroy	131637a17d	powerpc/mem: Remove address argument to flush_coherent_icache() flush_coherent_icache() can use any valid address as mentionned by the comment. Use PAGE_OFFSET as base address. This allows removing the user access stuff. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/742b6360ae4f344a1c6ecfadcf3b6645f443fa7a.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:18 +10:00
Christophe Leroy	bf26e0bbd2	powerpc/mem: Declare __flush_dcache_icache() static __flush_dcache_icache() is only used in mem.c. Move it before the functions that use it and declare it static. And also fix the name of the parameter in the comment. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3fa903eb5a10b2bc7d99a8c559ffdaa05452d8e0.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:18 +10:00
Christophe Leroy	b26e8f2725	powerpc/mem: Move cache flushing functions into mm/cacheflush.c Cache flushing functions are in the middle of completely unrelated stuff in mm/mem.c Create a dedicated mm/cacheflush.c for those functions. Also cleanup the list of included headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7bf6f1600acad146e541a4e220940062f2e5b03d.1617895813.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:17 +10:00
Christophe Leroy	80edc68e04	powerpc/32s: Define a MODULE area below kernel text all the time On book3s/32, the segment below kernel text is used for module allocation when CONFIG_STRICT_KERNEL_RWX is defined. In order to benefit from the powerpc specific module_alloc() function which allocate modules with 32 Mbytes from end of kernel text, use that segment below PAGE_OFFSET at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a46dcdd39a9e80b012d86c294c4e5cd8d31665f3.1617283827.git.christophe.leroy@csgroup.eu	2021-04-14 23:04:13 +10:00
Vaibhav Jain	a5d6a3e73a	powerpc/mm: Add cond_resched() while removing hpte mappings While removing large number of mappings from hash page tables for large memory systems as soft-lockup is reported because of the time spent inside htap_remove_mapping() like one below: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! <snip> NIP plpar_hcall+0x38/0x58 LR pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: 0x1fffffffffff000 (unreliable) pSeries_lpar_hpte_removebolted+0x9c/0x230 hash__remove_section_mapping+0xec/0x1c0 remove_section_mapping+0x28/0x3c arch_remove_memory+0xfc/0x150 devm_memremap_pages_release+0x180/0x2f0 devm_action_release+0x30/0x50 release_nodes+0x28c/0x300 device_release_driver_internal+0x16c/0x280 unbind_store+0x124/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd4/0x270 ksys_write+0xdc/0x130 system_call+0x5c/0x70 Fix this by adding a cond_resched() to the loop in htap_remove_mapping() that issues hcall to remove hpte mapping. The call to cond_resched() is issued every HZ jiffies which should prevent the soft-lockup from being reported. Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210404163148.321346-1-vaibhav@linux.ibm.com	2021-04-14 23:04:12 +10:00
Michael Ellerman	87e65ad7bd	powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR When we enabled STRICT_KERNEL_RWX we received some reports of boot failures when using the Hash MMU and running under phyp. The crashes are intermittent, and often exhibit as a completely unresponsive system, or possibly an oops. One example, which was caught in xmon: [ 14.068327][ T1] devtmpfs: mounted [ 14.069302][ T1] Freeing unused kernel memory: 5568K [ 14.142060][ T347] BUG: Unable to handle kernel instruction fetch [ 14.142063][ T1] Run /sbin/init as init process [ 14.142074][ T347] Faulting instruction address: 0xc000000000004400 cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0] pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80 lr: c0000000001862d4: update_rq_clock+0x44/0x110 sp: c00000000c747880 msr: 8000000040001031 current = 0xc00000000c60d380 paca = 0xc00000001ec9de80 irqmask: 0x03 irq_happened: 0x01 pid = 347, comm = kworker/2:1 ... enter ? for help [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable) [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0 [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0 [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390 [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610 [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480 [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950 [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120 [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640 [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0 [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c This shows that CPU 2, which was idle, woke up and then appears to randomly take an instruction fault on a completely valid area of kernel text. The cause turns out to be the call to hash__mark_rodata_ro(), late in boot. Due to the way we layout text and rodata, that function actually changes the permissions for all of text and rodata to read-only plus execute. To do the permission change we use a hypervisor call, H_PROTECT. On phyp that appears to be implemented by briefly removing the mapping of the kernel text, before putting it back with the updated permissions. If any other CPU is executing during that window, it will see spurious faults on the kernel text and/or data, leading to crashes. To fix it we use stop machine to collect all other CPUs, and then have them drop into real mode (MMU off), while we change the mapping. That way they are unaffected by the mapping temporarily disappearing. We don't see this bug on KVM because KVM always use VPM=1, where faults are directed to the hypervisor, and the fault will be serialised vs the h_protect() by HPTE_V_HVLOCK. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au	2021-04-08 21:17:43 +10:00
Michael Ellerman	6f223ebe9c	powerpc/mm/64s/hash: Factor out change_memory_range() Pull the loop calling hpte_updateboltedpp() out of hash__change_memory_range() into a helper function. We need it to be a separate function for the next patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-4-mpe@ellerman.id.au	2021-04-08 21:17:43 +10:00
Michael Ellerman	2c02e656a2	powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro() In hash__mark_rodata_ro() we pass the raw PP_RXXX value to hash__change_memory_range(). That has the effect of setting the key to zero, because PP_RXXX contains no key value. Fix it by using htab_convert_pte_flags(), which knows how to convert a pgprot into a pp value, including the key. Fixes: `d94b827e89` ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20210331003845.216246-3-mpe@ellerman.id.au	2021-04-08 21:17:43 +10:00
Jordan Niethe	b8b2f37cf6	powerpc/64s: Fix pte update for kernel memory on radix When adding a PTE a ptesync is needed to order the update of the PTE with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the PTE and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c Faulting instruction address: 0xc00000000008bd74 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: nop_module(PO+) [last unloaded: nop_module] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) ------------[ ftrace bug ]------------ ftrace failed to modify [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 actual: 11:3b:47:4b Setting ftrace call site to call ftrace function ftrace record flags: 80000001 (1) expected tramp: c00000000006c96c ------------[ cut here ]------------ WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 Modules linked in: nop_module(PO-) [last unloaded: nop_module] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 CFAR: c0000000001a9c98 IRQMASK: 0 GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 NIP ftrace_bug+0x28c/0x2e8 LR ftrace_bug+0x288/0x2e8 Call Trace: ftrace_bug+0x288/0x2e8 (unreliable) ftrace_modify_all_code+0x168/0x210 arch_ftrace_update_code+0x18/0x30 ftrace_run_update_code+0x44/0xc0 ftrace_startup+0xf8/0x1c0 register_ftrace_function+0x4c/0xc0 function_trace_init+0x80/0xb0 tracing_set_tracer+0x2a4/0x4f0 tracing_set_trace_write+0xd4/0x130 vfs_write+0xf0/0x330 ksys_write+0x84/0x140 system_call_exception+0x14c/0x230 system_call_common+0xf0/0x27c To fix this when updating kernel memory PTEs using ptesync. Fixes: `f1cb8f9beb` ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tidy up change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com	2021-04-08 21:17:42 +10:00
Bhaskar Chowdhury	4763d37827	powerpc: Spelling/typo fixes Various spelling/typo fixes. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2021-04-08 21:17:42 +10:00
Nicholas Piggin	1479e3d3b7	powerpc/64s: Fix hash fault to use TRAP accessor Hash faults use the trap vector to decide whether this is an instruction or data fault. This should use the TRAP accessor rather than open access regs->trap. This won't cause a problem at the moment because 64s only uses trap flags for system call interrupts (the norestart flag), but that could change if any other trap flags get used in future. Fixes: `a4922f5442` ("powerpc/64s: move the hash fault handling logic to C") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210316105205.407767-1-npiggin@gmail.com	2021-03-29 13:22:15 +11:00
Christophe Leroy	98c26a7275	powerpc/mm: Remove unneeded #ifdef CONFIG_PPC_MEM_KEYS In fault.c, #ifdef CONFIG_PPC_MEM_KEYS is not needed because all functions are always defined, and arch_vma_access_permitted() always returns true when CONFIG_PPC_MEM_KEYS is not defined so access_pkey_error() will return false so bad_access_pkey() will never be called. Include linux/pkeys.h to get a definition of vma_pkeys() for bad_access_pkey(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8038392f38d81f2ad169347efac29146f553b238.1615819955.git.christophe.leroy@csgroup.eu	2021-03-29 13:22:15 +11:00
Michael Ellerman	c2a2a5d027	powerpc/64s: Fold update_current_thread_[i]amr() into their only callers lkp reported warnings in some configuration due to update_current_thread_amr() being unused: arch/powerpc/mm/book3s64/pkeys.c:284:20: error: unused function 'update_current_thread_amr' static inline void update_current_thread_amr(u64 value) Which is because it's only use is inside an ifdef. We could move it inside the ifdef, but it's a single line function and only has one caller, so just fold it in. Similarly update_current_thread_iamr() is small and only called once, so fold it in also. Fixes: `48a8ab4eeb` ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210314093320.132331-1-mpe@ellerman.id.au	2021-03-29 13:22:14 +11:00
Bhaskar Chowdhury	7a7d744ffe	powerpc/mm/book3s64: Fix a typo in mmu_context.c s/detalis/details/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210312112537.4585-1-unixbhaskar@gmail.com	2021-03-29 13:22:12 +11:00
Christophe Leroy	b5efec00b6	powerpc/32s: Move KUEP locking/unlocking in C This can be done in C, do it. Unrolling the loop gains approx. 15% performance. From now on, prepare_transfer_to_handler() is only for interrupts from kernel. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4eadd873927e9a73c3d1dfe2f9497353465514cf.1615552867.git.christophe.leroy@csgroup.eu	2021-03-29 13:22:10 +11:00
Christophe Leroy	af6f2ce84b	powerpc/32: Call bad_page_fault() from do_page_fault() Now that non volatile registers are saved at all time, no need to split bad_page_fault() out of do_page_fault(). Remove handle_page_fault() and use do_page_fault() directly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cfb95be8863204cc2bf45a22ea44dd1d0dc16b7f.1615552867.git.christophe.leroy@csgroup.eu	2021-03-29 13:22:08 +11:00
Christophe Leroy	7aa8dd67f1	powerpc/32: Always enable data translation in exception prolog If the code can use a stack in vm area, it can also use a stack in linear space. Simplify code by removing old non VMAP stack code on PPC32. That means the data translation is now re-enabled early in exception prolog in all cases, not only when using VMAP stacks. While we are touching EXCEPTION_PROLOG macros, remove the unused for_rtas parameter in EXCEPTION_PROLOG_1. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7cd6440c60a7e8f4f035b245c57720f51e225aae.1615552866.git.christophe.leroy@csgroup.eu	2021-03-29 13:22:05 +11:00
Christophe Leroy	90cbac0e99	powerpc: Enable KFENCE for PPC32 Add architecture specific implementation details for KFENCE and enable KFENCE for the ppc32 architecture. In particular, this implements the required interface in <asm/kfence.h>. KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the Read/Write linear map to be mapped at page granularity. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8dfe1bd2abde26337c1d8c1ad0acfcc82185e0d5.1614868445.git.christophe.leroy@csgroup.eu	2021-03-24 14:09:30 +11:00
Sebastian Andrzej Siewior	9be77e11da	powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used The mutex linear_mapping_mutex is defined at the of the file while its only two user are within the CONFIG_MEMORY_HOTPLUG block. A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because its mutex implementation is smart enough to realize that it is unused. Move the definition of linear_mapping_mutex to ifdef block where it is used. Fixes: `1f73ad3e8d` ("powerpc/mm: print warning in arch_remove_linear_mapping()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210219165648.2505482-1-bigeasy@linutronix.de	2021-03-24 14:09:29 +11:00
Linus Torvalds	b12b472496	powerpc updates for 5.12 A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. Conversion of the 32-bit syscall handling into C. A series from Nick to streamline our TLB flushing when using the Radix MMU. Switch to using queued spinlocks by default for 64-bit server CPUs. A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng Yongjun. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAzMagTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAbBD/wMS2g1Q9oAGZPsx2NGd2RoeAauGxUs Yj6cZVmR+oa6sJyFYgEG7dT7tcwJITQxLBD3HpsHSnJ/rLrMloE33+cZNA9c4STz 0mlzm3R7M5pOgcEqZglsgLP0RQeUuHSSF01g0kf1N3r+HYtmbmPjuUIl8CnAjlbT iMD2ZN2p8/r3kDDht0iBO534HUpsqhc00duSZgQhsV/PR7ZWVxoPk7PEJeo4vXlJ 77986F7J5NLUTjMiLv5lTx49FcPbRd7a1jubsBtahJrwXj2GVvuy2i86G7HY+a+B eSxN7zJQgaFeLo0YPo7fZLBI0MAsIQt3nnZhKX0TMglbv/K8Aq64xiJqsVQdJ883 CeEt0HvSJhsSC0C4O595NEINfDhDd+5IeSF9MvsujYXiUKRXtRkm1EPuAzTcZIzW NwkCLRo33NMXa+khMKaiqF/g7INayPUXoWESx75NXFsuNfcORvstkeUuEoi5GwJo TSlmosFqwRjghQ8eTLZuWBzmh3EpPGdtC4gm6D+lbzhzjah5c/1whyuLqra275kK E3Qt0/V0ixKyvlG7MI5yYh3L7+R/hrsflH7xIJJxZp2DW6mwBJzQYmkxDbSS8PzK nWien2XgpIQhSFat3QqreEFSfNkzdN2MClVi2Y1hpAgi+2Zm9rPdPNGcQI+DSOsB kpJkjOjWNJU/PQ== =dB2S -----END PGP SIGNATURE----- Merge tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. - Conversion of the 32-bit syscall handling into C. - A series from Nick to streamline our TLB flushing when using the Radix MMU. - Switch to using queued spinlocks by default for 64-bit server CPUs. - A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. - Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun. * tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits) powerpc/perf: Adds support for programming of Thresholding in P10 powerpc/pci: Remove unimplemented prototypes powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user() powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size() powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user() powerpc/64: Fix stack trace not displaying final frame powerpc/time: Remove get_tbl() powerpc/time: Avoid using get_tbl() spi: mpc52xx: Avoid using get_tbl() powerpc/syscall: Avoid storing 'current' in another pointer powerpc/32: Handle bookE debugging in C in syscall entry/exit powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/32: Remove the counter in global_dbcr0 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: Change condition to check MSR_RI powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Use is_compat_task() powerpc/syscall: Make interrupt.c buildable on PPC32 ...	2021-02-22 14:34:00 -08:00
Aneesh Kumar K.V	2ac02e5ece	powerpc/mm: Remove dcache flush from memory remove. We added dcache flush on memory add/remove in commit `fb5924fddf` ("powerpc/mm: Flush cache on memory hot(un)plug") to handle crashes on GPU hotplug. Instead of adding dcache flush in generic memory add/remove routine which is used even for regular memory, we should handle these devices specific flush in the device driver code. memtrace did handle this in the driver and that was removed by commit `7fd6641de2` ("powerpc/powernv/memtrace: Let the arch hotunplug code flush cache"). This patch reverts that commit. The dcache flush in memory add was removed by commit `ea458effa8` ("powerpc: Don't flush caches when adding memory") which I don't think is correct. The reason why we require dcache flush in memtrace is to make sure we don't have a dirty cache when we remap a pfn to cache inhibited. We should do that when the memtrace module removes the memory and make the pfn available for HTM traces to map it as cache inhibited. The other device mentioned in commit `fb5924fddf` ("powerpc/mm: Flush cache on memory hot(un)plug") is nvlink device with coherent memory. The support for that was removed in commit `7eb3cf7619` ("powerpc/powernv: remove unused NPU DMA code") and commit `25b2995a35` ("mm: remove MEMORY_DEVICE_PUBLIC support") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-3-aneesh.kumar@linux.ibm.com	2021-02-11 23:35:07 +11:00
Aneesh Kumar K.V	ec94b9b23d	powerpc/mm: Add PG_dcache_clean to indicate dcache clean state This just add a better name for PG_arch_1. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-2-aneesh.kumar@linux.ibm.com	2021-02-11 23:35:06 +11:00
Aneesh Kumar K.V	c7ba2d6363	powerpc/mm: Enable compound page check for both THP and HugeTLB THP config results in compound pages. Make sure the kernel enables the PageCompound() check with CONFIG_HUGETLB_PAGE disabled and CONFIG_TRANSPARENT_HUGEPAGE enabled. This makes sure we correctly flush the icache with THP pages. flush_dcache_icache_page only matter for platforms that don't support COHERENT_ICACHE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-1-aneesh.kumar@linux.ibm.com	2021-02-11 23:35:06 +11:00
Michael Ellerman	2bb421a3d9	powerpc/mm/64s: Fix no previous prototype warning As reported by lkp: arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous prototype for function 'exit_lazy_flush_tlb' Fix it by moving the prototype into the existing header. Fixes: `032b7f0893` ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au	2021-02-11 23:28:51 +11:00
Nicholas Piggin	e4bb64c7a4	powerpc: remove interrupt handler functions from the noinstr section The allyesconfig ppc64 kernel fails to link with relocations unable to fit after commit `3a96570ffc` ("powerpc: convert interrupt handlers to use wrappers"), which is due to the interrupt handler functions being put into the .noinstr.text section, which the linker script places on the opposite side of the main .text section from the interrupt entry asm code which calls the handlers. This results in a lot of linker stubs that overwhelm the 252-byte sized space we allow for them, or in the case of BE a .opd relocation link error for some reason. It's not required to put interrupt handlers in the .noinstr section, previously they used NOKPROBE_SYMBOL, so take them out and replace with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes a number of interrupt handlers nokprobe that were not prior to the interrupt wrappers commit, but since that commit they were made nokprobe due to being in .noinstr.text, so this fix does not change that. The fixes tag is different to the commit that first exposes the problem because it is where the wrapper macros were introduced. Fixes: `8d41fc618a` ("powerpc: interrupt handler wrapper functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Slightly fix up comment wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210211063636.236420-1-npiggin@gmail.com	2021-02-11 23:28:34 +11:00
Christophe Leroy	179ae57dba	powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr() Function names should tell what the function does, not how. mfsrin() and mtsrin() are read/writing segment registers. They are called that way because they are using mfsrin and mtsrin instructions, but it doesn't matter for the caller. In preparation of following patch, change their name to mfsr() and mtsr() in order to make it obvious they manipulate segment registers without messing up with how they do it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f92d99f4349391b77766745900231aa880a0efb5.1612612022.git.christophe.leroy@csgroup.eu	2021-02-09 01:10:15 +11:00
Nicholas Piggin	032b7f0893	powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask serialize_against_pte_lookup() performs IPIs to all CPUs in mm_cpumask. Take this opportunity to try trim the CPU out of mm_cpumask. This can reduce the cost of future serialize_against_pte_lookup() and/or the cost of future TLB flushes. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-7-npiggin@gmail.com	2021-02-09 01:09:45 +11:00
Nicholas Piggin	9393544842	powerpc/64s/radix: occasionally attempt to trim mm_cpumask A single-threaded process that is flushing its own address space is so far the only case where the mm_cpumask is attempted to be trimmed. This patch expands that to flush in other situations, multi-threaded processes and external sources. For now it's a relatively simple occasional trim attempt. The main aim is to add the mechanism, tweaking and tuning can come with more data. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-6-npiggin@gmail.com	2021-02-09 01:09:45 +11:00
Nicholas Piggin	780de40601	powerpc/64s/radix: Allow mm_cpumask trimming from external sources mm_cpumask trimming is currently restricted to be issued by the current thread of a single-threaded mm. This patch relaxes that and allows the mask to be trimmed from any context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-5-npiggin@gmail.com	2021-02-09 01:09:44 +11:00
Nicholas Piggin	54bb503345	powerpc/64s/radix: Check for no TLB flush required If there are no CPUs in mm_cpumask, no TLB flush is required at all. This patch adds a check for this case. Currently it's not tested for, in fact mm_is_thread_local() returns false if the current CPU is not in mm_cpumask, so it's treated as a global flush. This can come up in some cases like exec failure before the new mm has ever been switched to. This patch reduces TLBIE instructions required to build a kernel from about 120,000 to 45,000. Another situation it could help is page reclaim, KSM, THP, etc., (i.e., asynch operations external to the process) where the process is sleeping and has all TLBs flushed out of all CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com	2021-02-09 01:09:44 +11:00
Nicholas Piggin	26418b36a1	powerpc/64s/radix: refactor TLB flush type selection The logic to decide what kind of TLB flush is required (local, global, or IPI) is spread multiple times over the several kinds of TLB flushes. Move it all into a single function which may issue IPIs if necessary, and also returns a flush type that is to be used. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-3-npiggin@gmail.com	2021-02-09 01:09:44 +11:00
Nicholas Piggin	a2496049f1	powerpc/64s/radix: add warning and comments in mm_cpumask trim Add a comment explaining part of the logic for mm_cpumask trimming, and add a (hopefully graceful) check and warning in case something gets it wrong. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-2-npiggin@gmail.com	2021-02-09 01:09:44 +11:00
Nicholas Piggin	540d4d34be	powerpc/64: context tracking move to interrupt wrappers This moves exception_enter/exit calls to wrapper functions for synchronous interrupts. More interrupt handlers are covered by this than previously. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-33-npiggin@gmail.com	2021-02-09 00:10:46 +11:00
Nicholas Piggin	a008f8f9fd	powerpc/64s/hash: improve context tracking of hash faults This moves the 64s/hash context tracking from hash_page_mm() to __do_hash_fault(), so it's no longer called by OCXL / SPU accelerators, which was certainly the wrong thing to be doing, because those callers are not low level interrupt handlers, so should have entered a kernel context tracking already. Then remain in kernel context for the duration of the fault, rather than enter/exit for the hash fault then enter/exit for the page fault, which is pointless. Even still, calling exception_enter/exit in __do_hash_fault seems questionable because that's touching per-cpu variables, tracing, etc., which might have been interrupted by this hash fault or themselves cause hash faults. But maybe I miss something because hash_page_mm very deliberately calls trace_hash_fault too, for example. So for now go with it, it's no worse than before, in this regard. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-32-npiggin@gmail.com	2021-02-09 00:02:12 +11:00
Nicholas Piggin	e6f8a6c86c	powerpc: add interrupt_cond_local_irq_enable helper Simple helper for synchronous interrupt handlers (i.e., process-context) to enable interrupts if it was taken in an interrupts-enabled context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-30-npiggin@gmail.com	2021-02-09 00:02:12 +11:00
Nicholas Piggin	3a96570ffc	powerpc: convert interrupt handlers to use wrappers Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com	2021-02-09 00:02:12 +11:00
Nicholas Piggin	e44370abb2	powerpc/64s: slb comment update This makes a small improvement to the description of the SLB interrupt environment. Move the memory access restrictions into one paragraph, and the interrupt restrictions into the next rather than mix them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-18-npiggin@gmail.com	2021-02-09 00:02:10 +11:00
Nicholas Piggin	31d6490ccb	powerpc/mm: Remove stale do_page_fault comment referring to SLB faults SLB faults no longer call do_page_fault, this was removed somewhere between 2.6.0 and 2.6.12. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-17-npiggin@gmail.com	2021-02-09 00:02:10 +11:00
Nicholas Piggin	bf0e2374aa	powerpc/64s: split do_hash_fault This is required for subsequent interrupt wrapper implementation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-16-npiggin@gmail.com	2021-02-09 00:02:10 +11:00
Nicholas Piggin	f4c03b0e52	powerpc/64s: move bad_page_fault handling to C This simplifies code, and it is also useful when introducing interrupt handler wrappers when introducing wrapper functionality that doesn't cope with asm entry code calling into more than one handler function. 32-bit and 64e still have some such cases, which limits some ways they can use interrupt wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-15-npiggin@gmail.com	2021-02-09 00:02:10 +11:00
Nicholas Piggin	4cb8428465	powerpc: rearrange do_page_fault error case to be inside exception_enter This keeps the context tracking over the entire interrupt handler which helps later with moving context tracking into interrupt wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-14-npiggin@gmail.com	2021-02-09 00:02:09 +11:00
Nicholas Piggin	71f47976fa	powerpc/64s: add do_bad_page_fault_segv handler This function acts like an interrupt handler so it needs to follow the standard interrupt handler function signature which will be introduced in a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-13-npiggin@gmail.com	2021-02-09 00:02:09 +11:00
Nicholas Piggin	8458c628a5	powerpc: bad_page_fault get registers from regs Similar to the previous patch this makes interrupt handler function types more regular so they can be wrapped with the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-12-npiggin@gmail.com	2021-02-09 00:02:09 +11:00
Nicholas Piggin	a01a3f2ddb	powerpc: remove arguments from fault handler functions Make mm fault handlers all just take the pt_regs * argument and load DAR/DSISR from that. Make those that return a value return long. This is done to make the function signatures match other handlers, which will help with a future patch to add wrappers. Explicit arguments could be added for performance but that would require more wrapper macro variants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-7-npiggin@gmail.com	2021-02-09 00:02:08 +11:00
Nicholas Piggin	a4922f5442	powerpc/64s: move the hash fault handling logic to C The fault handling still has some complex logic particularly around hash table handling, in asm. Implement most of this in C. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-6-npiggin@gmail.com	2021-02-09 00:02:08 +11:00
Aneesh Kumar K.V	8c511eff18	powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 .......... Call Trace: [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .......... NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c The kernel consider thread AMR value for kernel thread to be AMR_KUAP_BLOCKED. Hence access to userspace is denied. This of course not correct and we should allow userspace access after kthread_use_mm(). To be precise, kthread_use_mm() should inherit the AMR value of the operating address space. But, the AMR value is thread-specific and we inherit the address space and not thread access restrictions. Because of this ignore AMR value when accessing userspace via kernel thread. current_thread_amr/iamr() are updated, because we use them in the below stack. .... [ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3 .... NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90 LR [c0000000004b9278] gup_pte_range+0x188/0x420 --- interrupt: 700 [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable) [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20 [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410 [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0 [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700 [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0 [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740 [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0 [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80 [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs] [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs] [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0 [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50 [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0 [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0 [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0 [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0 Fixes: `48a8ab4eeb` ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com	2021-02-06 23:13:04 +11:00
Christophe Leroy	259149cf7c	powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selected It is now possible to only build book3s/32 kernel for CPUs without hash table. Opt out hash related code when CONFIG_PPC_BOOK3S_604 is not selected. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62df436454ef06e104cc334a0859a2878d7888d5.1608274548.git.christophe.leroy@csgroup.eu	2021-01-31 22:35:50 +11:00
Qian Cai	b5952f8125	powerpc/mm/book3s64/iommu: fix some RCU-list locks It is safe to traverse mm->context.iommu_group_mem_list with either mem_list_mutex or the RCU read lock held. Silence a few RCU-list false positive warnings and fix a few missing RCU read locks. arch/powerpc/mm/book3s64/iommu_api.c:330 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4305: #0: c000000bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce] #1: c000000001501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_get+0x50/0x190 ==== arch/powerpc/mm/book3s64/iommu_api.c:132 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4305: #0: c000000bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce] #1: c000000001501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_do_alloc+0x120/0x5f0 ==== arch/powerpc/mm/book3s64/iommu_api.c:292 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4312: #0: c000000ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000000045e6c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm] ==== arch/powerpc/mm/book3s64/iommu_api.c:424 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4312: #0: c000000ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000000045e6c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm] Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200510051559.1959-1-cai@lca.pw	2021-01-31 22:35:49 +11:00
Cédric Le Goater	94b87d72fc	powerpc/mm/hugetlb: Make pseries_alloc_bootmem_huge_page() static pseries_alloc_bootmem_huge_page() is only used locally in alloc_bootmem_huge_page() and does not need to be external. It fixes this W=1 compile error : ../arch/powerpc/mm/hugetlbpage.c:220:12: error: no previous prototype for ‘pseries_alloc_bootmem_huge_page’ [-Werror=missing-prototypes] 220 \| int __init pseries_alloc_bootmem_huge_page(struct hstate *hstate) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-16-clg@kaod.org	2021-01-30 11:39:30 +11:00
Cédric Le Goater	11f9c1d2fb	powerpc/mm: Move hpte_insert_repeating() prototype It fixes this W=1 compile error : ../arch/powerpc/mm/book3s64/hash_utils.c:1867:6: error: no previous prototype for ‘hpte_insert_repeating’ [-Werror=missing-prototypes] 1867 \| long hpte_insert_repeating(unsigned long hash, unsigned long vpn, \| ^~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-14-clg@kaod.org	2021-01-30 11:39:29 +11:00
Cédric Le Goater	d25da505c3	powerpc/mm: Include __find_linux_pte() prototype It fixes this W=1 compile error : ../arch/powerpc/mm/pgtable.c:337:8: error: no previous prototype for ‘__find_linux_pte’ [-Werror=missing-prototypes] 337 \| pte_t __find_linux_pte(pgd_t pgdir, unsigned long ea, \| ^~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-2-clg@kaod.org	2021-01-30 11:39:26 +11:00
Linus Torvalds	8a5be36b93	powerpc updates for 5.11 - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, Zhang Xiaoxu. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/bURITHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEzBEAC1Vwibcog2P9rkJPb0q3UGWSYSx25V h/LwwxtM9Tm14j/LZsSgkOgIsfMaWEBIw/8D4efQ7AX9aFo+R0c2DdQMx1MG5MXz gZk58+l3LwId6h9+OrwurpEW+ZmURLAtGMSyFdkeiZ3/XTnkbf1XnewC0QWQe56a EGLmjx1MFl45jspoy7UIUXsXoNJIfflEKhrgUzSUh8X2eLmvB9ws6A4BXxbVzyZl lZv3+uWimU2pFgdkB9jOCxoG4zFEr2o5ovLHG7zCCVo5JoXmTPQ5cMVBraH206ms +5vCmu4qI8uP5UlZW/mZfhrtDiMdHdQqqFOaQwVlOmoUbU6L6E6rxm3iVnov2Bbi iUgxoeJDxAb2cM2EWFK6oWVgr7+NkwvXM1IG8xtprhHrCdnC9r+psQr/dswb3LSg MJ7u/RCq3uixy2kWP8E0NEHw7ngQZ/ZKPqzfnmIWOC7tYUxgaL02I8Ff9/ZXAI2J CnmqFYOjrimHkcwXGOtKkXNvfU0DiL97qpK2AQNWElE8+bUUmpw+ltUrsdSycYmv Afc4WIcVrTA+a9laSLgjdZbolbNSa3p+cMIYdPrVx9g+xqygbxIKv+EDGNv1WHfD GU1gmohMY+ZkMOvFRMi8LAsEm0DH/etWE0py/8uyxDYKnGyD1Ur6452DStkmgGb2 azmcaOyLdb+HXA== =Ga3K -----END PGP SIGNATURE----- Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu. * tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits) powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug powerpc: Add config fragment for disabling -Werror powerpc/configs: Add ppc64le_allnoconfig target powerpc/powernv: Rate limit opal-elog read failure message powerpc/pseries/memhotplug: Quieten some DLPAR operations powerpc/ps3: use dma_mapping_error() powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 powerpc/perf: Fix Threshold Event Counter Multiplier width for P10 powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp KVM: PPC: fix comparison to bool warning KVM: PPC: Book3S: Assign boolean values to a bool variable powerpc: Inline setup_kup() powerpc/64s: Mark the kuap/kuep functions non __init KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering powerpc/xive: Improve error reporting of OPAL calls powerpc/xive: Simplify xive_do_source_eoi() powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG ...	2020-12-17 13:34:25 -08:00
Christophe Leroy	2198d4934e	powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() Commit `7bfe54b5f1` ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions") inadvertely removed the mask applied to start parameter in those two functions, leading to the following crash on power9. LTP: starting hugemmap05_1 (hugemmap05 -m) ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV ... CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G O 5.10.0-rc7-next-20201211 #1 NIP: c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000 REGS: c00020000bb6f830 TRAP: 0700 Tainted: G O (5.10.0-rc7-next-20201211) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002284 XER: 20040000 GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0 GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001 GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002 GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000 GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10 GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4 GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000 NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0 pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387 (inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405 LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0 Call Trace: __tlb_remove_table+0x13c/0x1e0 (unreliable) tlb_remove_table_rcu+0x54/0xa0 __tlb_remove_table_free at mm/mmu_gather.c:101 (inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156 rcu_core+0x35c/0xbb0 rcu_do_batch at kernel/rcu/tree.c:2502 (inlined by) rcu_core at kernel/rcu/tree.c:2737 __do_softirq+0x480/0x704 run_ksoftirqd+0x74/0xd0 run_ksoftirqd at kernel/softirq.c:651 (inlined by) run_ksoftirqd at kernel/softirq.c:642 smpboot_thread_fn+0x278/0x320 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x80 Properly apply the masks before calling pmd_free_tlb() and pud_free_tlb() respectively. Fixes: `7bfe54b5f1` ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions") Reported-by: Qian Cai <qcai@redhat.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/56feccd7b6fcd98e353361a233fa7bb8e67c3164.1607780469.git.christophe.leroy@csgroup.eu	2020-12-15 22:22:07 +11:00
Linus Torvalds	edd7ab7684	The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same accross preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XyQwTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoUolD/9+R+BX96fGir+I8rG9dc3cbLw5meSi 0I/Nq3PToZMs2Iqv50DsoaPYHHz/M6fcAO9LRIgsE9jRbnY93GnsBM0wU9Y8yQaT 4wUzOG5WHaLDfqIkx/CN9coUl458oEiwOEbn79A2FmPXFzr7IpkufnV3ybGDwzwP p73bjMJMPPFrsa9ig87YiYfV/5IAZHi82PN8Cq1v4yNzgXRP3Tg6QoAuCO84ZnWF RYlrfKjcJ2xPdn+RuYyXolPtxr1hJQ0bOUpe4xu/UfeZjxZ7i1wtwLN9kWZe8CKH +x4Lz8HZZ5QMTQ9sCHOLtKzu2MceMcpISzoQH4/aFQCNMgLn1zLbS790XkYiQCuR ne9Cua+IqgYfGMG8cq8+bkU9HCNKaXqIBgPEKE/iHYVmqzCOqhW5Cogu4KFekf6V Wi7pyyUdX2en8BAWpk5NHc8de9cGcc+HXMq2NIcgXjVWvPaqRP6DeITERTZLJOmz XPxq5oPLGl7wdm7z+ICIaNApy8zuxpzb6sPLNcn7l5OeorViORlUu08AN8587wAj FiVjp6ZYomg+gyMkiNkDqFOGDH5TMENpOFoB0hNNEyJwwS0xh6CgWuwZcv+N8aPO HuS/P+tNANbD8ggT4UparXYce7YCtgOf3IG4GA3JJYvYmJ6pU+AZOWRoDScWq4o+ +jlfoJhMbtx5Gg== =n71I -----END PGP SIGNATURE----- Merge tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kmap updates from Thomas Gleixner: "The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same across preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided" * tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) ARM: highmem: Fix cache_is_vivt() reference x86/crashdump/32: Simplify copy_oldmem_page() io-mapping: Provide iomap_local variant mm/highmem: Provide kmap_local* sched: highmem: Store local kmaps in task struct x86: Support kmap_local() forced debugging mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL microblaze/mm/highmem: Add dropped #ifdef back xtensa/mm/highmem: Make generic kmap_atomic() work correctly mm/highmem: Take kmap_high_get() properly into account highmem: High implementation details and document API Documentation/io-mapping: Remove outdated blurb io-mapping: Cleanup atomic iomap mm/highmem: Remove the old kmap_atomic cruft highmem: Get rid of kmap_types.h xtensa/mm/highmem: Switch to generic kmap atomic sparc/mm/highmem: Switch to generic kmap atomic powerpc/mm/highmem: Switch to generic kmap atomic nds32/mm/highmem: Switch to generic kmap atomic ...	2020-12-14 18:35:53 -08:00
Michael Ellerman	1791ebd131	powerpc: Inline setup_kup() setup_kup() is used by both 64-bit and 32-bit code. However on 64-bit it must not be __init, because it's used for CPU hotplug, whereas on 32-bit it should be __init because it calls setup_kuap/kuep() which are __init. We worked around that problem in the past by marking it __ref, see commit `67d53f30e2` ("powerpc/mm: fix section mismatch for setup_kup()"). Marking it __ref basically just omits it from section mismatch checking, which can lead to bugs, and in fact it did, see commit `44b4c4450f` ("powerpc/64s: Mark the kuap/kuep functions non __init") We can avoid all these problems by just making it static inline. Because all it does is call other functions, making it inline actually shrinks the 32-bit vmlinux by ~76 bytes. Make it __always_inline as pointed out by Christophe. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201214123011.311024-1-mpe@ellerman.id.au	2020-12-15 13:13:49 +11:00
Aneesh Kumar K.V	44b4c4450f	powerpc/64s: Mark the kuap/kuep functions non __init The kernel calls these functions on CPU online and hence they must not be marked __init. Otherwise if the memory they occupied has been reused the system can crash in various ways. Sachin reported it caused his LPAR to spontaneously restart with no other output. With xmon enabled it may drop into xmon with a dump like: cpu 0x1: Vector: 700 (Program Check) at [c000000003c5fcb0] pc: 00000000011e0a78 lr: 00000000011c51d4 sp: c000000003c5ff50 msr: 8000000000081001 current = 0xc000000002c12b00 paca = 0xc000000003cff280 irqmask: 0x03 irq_happened: 0x01 pid = 0, comm = swapper/1 ... [c000000003c5ff50] 0000000000087c38 (unreliable) [c000000003c5ff70] 000000000003870c [c000000003c5ff90] 000000000000d108 Fixes: `3b47b7549e` ("powerpc/book3s64/kuap: Move KUAP related function outside radix") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Expand change log with details and xmon output] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201214080121.358567-1-aneesh.kumar@linux.ibm.com	2020-12-14 23:12:27 +11:00
Linus Torvalds	47003b9971	powerpc fixes for 5.10 #6 One commit to implement copy_from_kernel_nofault_allowed(), otherwise copy_from_kernel_nofault() can trigger warnings when accessing bad addresses in some configurations. Thanks to: Christophe Leroy, Qian Cai. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/StwITHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgHtgD/9XePu2lenUUZDbzHVKJ/4oozNqJaYc mM7k/53GmPyi7AAttLdQGlSB0Gv2xBSDhng7T/UOnnBKwBk7gP8J4espuGraoYkA 8q1LsAO9wwpN+oDQjFQ+s4uErildwIy73uSXByfhIESHo5VtY9ol7g+zZaTfyNhO W/wpSzcHLmTCMoWcJfk5vLCHMDmaY1Qq7U9uNt78bwUaNXz9LVZ/UFWSe4Bt6jEM 573bgsSkbLoTV5QptDUOPpIBw1T+zahwB6dMjPzbxYa6Rws1I4QeJRNYxdvunDHP +F2ZYK/zyFBQlojPnjXJmbqQHEtXA/l9DNyLwR9VqjAOmgZaQezTVMIV56b8ndpM X7+AG37Nt6hqUfPz3f7L67y64VFAmAt8dFqVqUzEXBcN1KpVkS5BvBxjTUKwItwo Fdf80iSHaHYPdYAJJzjbeGuaaKID3w9H6npTR5xCKmN9o1r+N+VoZtQumlG+t6jl EtnPu0r6y/tPcyKixk/myAAx/8mVTQicDyIj2klheDClmNMK7NA0+QpLEBus10tl +bhk7KdWx7mQwYRltI+v7T3+mJ2SddVpQ84KmV6q21d/QbH1fQY/SvVNRYKWYb31 s31KT9lYiW7xZ5qiA6R9YNGynvhrD61Bzr5o2dKJxvbpmFkvosVWN7waPqwozlRC l1Xvuc/1kBesiQ== =ERDG -----END PGP SIGNATURE----- Merge tag 'powerpc-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "One commit to implement copy_from_kernel_nofault_allowed(), otherwise copy_from_kernel_nofault() can trigger warnings when accessing bad addresses in some configurations. Thanks to Christophe Leroy and Qian Cai" * tag 'powerpc-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()	2020-12-10 16:36:30 -08:00
Nicholas Piggin	c33cd1ed60	powerpc/64s/iommu: Don't use atomic_ function on atomic64_t type Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111110723.3148665-3-npiggin@gmail.com	2020-12-09 23:48:14 +11:00
Christophe Leroy	da481c4fe0	powerpc/32s: Cleanup around PTE_FLAGS_OFFSET in hash_low.S PTE_FLAGS_OFFSET is defined in asm/page_32.h and used only in hash_low.S And PTE_FLAGS_OFFSET nullity depends on CONFIG_PTE_64BIT Instead of tests like #if (PTE_FLAGS_OFFSET != 0), use CONFIG_PTE_64BIT related code. Also move the definition of PTE_FLAGS_OFFSET into hash_low.S directly, that improves readability. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f5bc21db7a33dab55924734e6060c2e9daed562e.1606247495.git.christophe.leroy@csgroup.eu	2020-12-09 23:48:14 +11:00
Christophe Leroy	fec6166b44	powerpc/32s: In add_hash_page(), calculate VSID later VSID is only for create_hpte(). When _PAGE_HASHPTE is already set, add_hash_page() bails out without calling create_hpte() and doesn't need the value of VSID. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3907199974c89b85a3441cf3f528751173b7649c.1606247495.git.christophe.leroy@csgroup.eu	2020-12-09 23:48:14 +11:00
Christophe Leroy	c5ccb4e789	powerpc/32s: Remove unused counters incremented by create_hpte() primary_pteg_full and htab_hash_searches are not used. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6470ab99e58c84a5445af43ce4d1d772b0dc3e93.1606247495.git.christophe.leroy@csgroup.eu	2020-12-09 23:48:14 +11:00
Christophe Leroy	7bfe54b5f1	powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions All hugetlb range freeing functions have a verification like the following, which only differs by the mask used, depending on the page table level. start &= MASK; if (start < floor) return; if (ceiling) { ceiling &= MASK; if (! ceiling) return; } if (end - 1 > ceiling - 1) return; Refactor that into a helper function which takes the mask as an argument, returning true when [start;end[ is not fully contained inside [floor;ceiling[ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/16a571bb32eb6e8cd44bda484c8d81cd8a25e6d7.1604668827.git.christophe.leroy@csgroup.eu	2020-12-09 23:48:14 +11:00
Christophe Leroy	5f1888a077	powerpc/fault: Perform exception fixup in do_page_fault() Exception fixup doesn't require the heady full regs saving, do it from do_page_fault() directly. For that, split bad_page_fault() in two parts. As bad_page_fault() can also be called from other places than handle_page_fault(), it will still perform exception fixup and fallback on __bad_page_fault(). handle_page_fault() directly calls __bad_page_fault() as the exception fixup will now be done by do_page_fault() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bd07d6fef9237614cd6d318d8f19faeeadaa816b.1607491748.git.christophe.leroy@csgroup.eu	2020-12-09 23:48:14 +11:00
Christophe Leroy	cbd7e6ca02	powerpc/fault: Avoid heavy search_exception_tables() verification search_exception_tables() is an heavy operation, we have to avoid it. When KUAP is selected, we'll know the fault has been blocked by KUAP. When it is blocked by KUAP, check whether we are in an expected userspace access place. If so, emit a warning to spot something is going work. Otherwise, just remain silent, it will likely Oops soon. When KUAP is not selected, it behaves just as if the address was already in the TLBs and no fault was generated. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9870f01e293a5a76c4f4e4ddd4a6b0f63038c591.1607491748.git.christophe.leroy@csgroup.eu	2020-12-09 23:48:13 +11:00

... 2 3 4 5 6 ...

2933 Коммитов