Граф коммитов

150738 Коммитов

Автор SHA1 Сообщение Дата
Maciej W. Rozycki 148b9aba99
MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression
Fix a commit 8a8158c85e ("MIPS: memset.S: EVA & fault support for
small_memset") regression and remove assembly warnings:

arch/mips/lib/memset.S: Assembler messages:
arch/mips/lib/memset.S:243: Warning: Macro instruction expanded into multiple instructions in a branch delay slot

triggering with the CPU_DADDI_WORKAROUNDS option set and this code:

	PTR_SUBU	a2, t1, a0
	jr		ra
	 PTR_ADDIU	a2, 1

This is because with that option in place the DADDIU instruction, which
the PTR_ADDIU CPP macro expands to, becomes a GAS macro, which in turn
expands to an LI/DADDU (or actually ADDIU/DADDU) sequence:

 13c:	01a4302f 	dsubu	a2,t1,a0
 140:	03e00008 	jr	ra
 144:	24010001 	li	at,1
 148:	00c1302d 	daddu	a2,a2,at
	...

Correct this by switching off the `noreorder' assembly mode and letting
GAS schedule this jump's delay slot, as there is nothing special about
it that would require manual scheduling.  With this change in place
correct code is produced:

 13c:	01a4302f 	dsubu	a2,t1,a0
 140:	24010001 	li	at,1
 144:	03e00008 	jr	ra
 148:	00c1302d 	daddu	a2,a2,at
	...

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 8a8158c85e ("MIPS: memset.S: EVA & fault support for small_memset")
Patchwork: https://patchwork.linux-mips.org/patch/20833/
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: stable@vger.kernel.org # 4.17+
2018-10-05 09:41:39 -07:00
Greg Kroah-Hartman 08b297bb10 x86 and PPC bugfixes, mostly introduced in 4.19-rc1.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbtxXyAAoJEL/70l94x66D52kH/jEbBo/jz9Jx2bnbxYkG1YzO
 cIpIRjbRcOKVFNGxjStlJ0PedQBWAfPQl+SywRfqwiSlOOt/yo0lZ5ZewENR2TxO
 CLQC/OnV/5SU7BJvbsKgH9tc+Wp9X55wBUEalfcvG/knFlmR+eK/7TwTS+hv/U21
 uYKRnGfz5AGfdmB9FyCn0blkPNnFaQ8KB+y+INZTkB+YZzNsybow230FRPs22fjX
 HGeJ7gngah50M5gxDW+YPPNXFhs36x2hsyQXBN9TPxLPHoxTsRRoeqx2nl/UvA+e
 LXZWg8/UAzXFO/fKVHkJX4jSnCDr2W7HYGNyLPtXFPWhcOelP1h9uHrfuX+fxA4=
 =UNUo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Paolo writes:
  "KVM changes for 4.19-rc7

   x86 and PPC bugfixes, mostly introduced in 4.19-rc1."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: nVMX: fix entry with pending interrupt if APICv is enabled
  KVM: VMX: hide flexpriority from guest when disabled at the module level
  KVM: VMX: check for existence of secondary exec controls before accessing
  KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault
  KVM: x86: fix L1TF's MMIO GFN calculation
  tools/kvm_stat: cut down decimal places in update interval dialog
  KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
  KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
  KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
  KVM: x86: never trap MSR_KERNEL_GS_BASE
2018-10-05 08:29:44 -07:00
Srikar Dronamraju ac1788cc7d powerpc/numa: Skip onlining a offline node in kdump path
With commit 2ea6263068 ("powerpc/topology: Get topology for shared
processors at boot"), kdump kernel on shared LPAR may crash.

The necessary conditions are
- Shared LPAR with at least 2 nodes having memory and CPUs.
- Memory requirement for kdump kernel must be met by the first N-1
  nodes where there are at least N nodes with memory and CPUs.

Example numactl of such a machine.
  $ numactl -H
  available: 5 nodes (0,2,5-7)
  node 0 cpus:
  node 0 size: 0 MB
  node 0 free: 0 MB
  node 2 cpus:
  node 2 size: 255 MB
  node 2 free: 189 MB
  node 5 cpus: 24 25 26 27 28 29 30 31
  node 5 size: 4095 MB
  node 5 free: 4024 MB
  node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
  node 6 size: 6353 MB
  node 6 free: 5998 MB
  node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39
  node 7 size: 7640 MB
  node 7 free: 7164 MB
  node distances:
  node   0   2   5   6   7
    0:  10  40  40  40  40
    2:  40  10  40  40  40
    5:  40  40  10  40  40
    6:  40  40  40  10  20
    7:  40  40  40  20  10

Steps to reproduce.
1. Load / start kdump service.
2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger)

When booting a kdump kernel with 2048M:

  kexec: Starting switchover sequence.
  I'm in purgatory
  Using 1TB segments
  hash-mmu: Initializing hash mmu with SLB
  Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018
  Found initrd at 0xc000000009e70000:0xc00000000ae554b4
  Using pSeries machine description
  -----------------------------------------------------
  ppc64_pft_size    = 0x1e
  phys_mem_size     = 0x88000000
  dcache_bsize      = 0x80
  icache_bsize      = 0x80
  cpu_features      = 0x000000ff8f5d91a7
    possible        = 0x0000fbffcf5fb1a7
    always          = 0x0000006f8b5c91a1
  cpu_user_features = 0xdc0065c2 0xef000000
  mmu_features      = 0x7c006001
  firmware_features = 0x00000007c45bfc57
  htab_hash_mask    = 0x7fffff
  physical_start    = 0x8000000
  -----------------------------------------------------
  numa:   NODE_DATA [mem 0x87d5e300-0x87d67fff]
  numa:     NODE_DATA(0) on node 6
  numa:   NODE_DATA [mem 0x87d54600-0x87d5e2ff]
  Top of RAM: 0x88000000, Total RAM: 0x88000000
  Memory hole size: 0MB
  Zone ranges:
    DMA      [mem 0x0000000000000000-0x0000000087ffffff]
    DMA32    empty
    Normal   empty
  Movable zone start for each node
  Early memory node ranges
    node   6: [mem 0x0000000000000000-0x0000000087ffffff]
  Could not find start_pfn for node 0
  Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
  On node 0 totalpages: 0
  Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff]
  On node 6 totalpages: 34816

  Unable to handle kernel paging request for data at address 0x00000060
  Faulting instruction address: 0xc000000008703a54
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1
  NIP:  c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000
  REGS: c00000000b673440 TRAP: 0380   Not tainted  (4.19.0-rc5-master+)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24022022  XER: 20000002
  CFAR: c0000000086fc238 IRQMASK: 0
  GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000
  GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220
  GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008
  GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000
  GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008
  GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78
  GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400
  NIP [c000000008703a54] bus_add_device+0x84/0x1e0
  LR [c000000008703a38] bus_add_device+0x68/0x1e0
  Call Trace:
  [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable)
  [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0
  [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240
  [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180
  [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90
  [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190
  [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580
  [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108
  [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c
  [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450
  [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160
  [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80
  Instruction dump:
  60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c
  e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378
  ---[ end trace 593577668c2daa65 ]---

However a regular kernel with 4096M (2048 gets reserved for crash
kernel) boots properly.

Unlike regular kernels, which mark all available nodes as online,
kdump kernel only marks just enough nodes as online and marks the rest
as offline at boot. However kdump kernel boots with all available
CPUs. With Commit 2ea6263068 ("powerpc/topology: Get topology for
shared processors at boot"), all CPUs are onlined on their respective
nodes at boot time. try_online_node() tries to online the offline
nodes but fails as all needed subsystems are not yet initialized.

As part of fix, detect and skip early onlining of a offline node.

Fixes: 2ea6263068 ("powerpc/topology: Get topology for shared processors at boot")
Reported-by: Pavithra Prakash <pavrampu@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-05 23:21:54 +10:00
Michael Ellerman a932ed3b71 powerpc: Don't print kernel instructions in show_user_instructions()
Recently we implemented show_user_instructions() which dumps the code
around the NIP when a user space process dies with an unhandled
signal. This was modelled on the x86 code, and we even went so far as
to implement the exact same bug, namely that if the user process
crashed with its NIP pointing into the kernel we will dump kernel text
to dmesg. eg:

  bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1
  bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6
  bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048

This was discovered on x86 by Jann Horn and fixed in commit
342db04ae7 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").

Fix it by checking the adjusted NIP value (pc) and number of
instructions against USER_DS, and bail if we fail the check, eg:

  bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1
  bad-bctr[2969]: Bad NIP, not dumping instructions.

Fixes: 88b0fe1757 ("powerpc: Add show_user_instructions()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-05 23:18:12 +10:00
Paolo Bonzini cc906f07d7 Third set of PPC KVM fixes for 4.19
One patch here, fixing a potential host crash introduced (or at least
 exacerbated) by a previous fix for corruption relating to radix guest
 page faults and THP operations.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbtq7FAAoJEJ2a6ncsY3Gf0PAIAJvq2veNJs0HHsonuI592mK0
 YMKL7IproGHBnIuVvpaI3o6iERjMtRaolh+G4dlk3xoQ/TXYzTdimT1cwF2jZblL
 wQSsit5499UrJdPsFxBc7lc8yXwRR9+N/ooRAQS/TyK6w00Y+Aep7sTQpRdVCb1G
 nskLWXBDWtdB6nGAX9dECpRQP+/voxEJkMMrsFM3rYbBkM/rOqu/IXsCoH2u2jGM
 6zFDe7Z35VP77UaehK9W4fGHVc+92oD5E2kN56DNIotYaAT7fudPmNQF+g1GsVFc
 Mv8AJdXiQIhZFbpRWTS/UTvyb/aq1Ou6Xd5msTYTMG8v+g0M9h8ZSH3FQCX8RMs=
 =Fdit
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master

Third set of PPC KVM fixes for 4.19

One patch here, fixing a potential host crash introduced (or at least
exacerbated) by a previous fix for corruption relating to radix guest
page faults and THP operations.
2018-10-05 09:39:53 +02:00
Greg Kroah-Hartman ac0657edb1 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Russell writes:
  "A couple of small ARM fixes from Stefan and Thomas:
   - Adding the io_pgetevents syscall
   - Fixing a bounds check in pci_ioremap_io()"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8799/1: mm: fix pci_ioremap_io() offset check
  ARM: 8787/1: wire up io_pgetevents syscall
2018-10-04 09:48:10 -07:00
Greg Kroah-Hartman d2467adb66 A Single RISC-V Fix for 4.19-rc7
This tag contains a single patch that managed to get lost in the
 shuffle, which explains why it's so late.  This single line has been
 floating around in various patch sets for months, and fixes our DMA32
 region.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAluzzHITHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQSrQD/4rizKPzKVMVUEMV0fWM9rSJ2mr130W
 O8JGz1l8PtlPum6Fjhy3UHbLlZ/36D23vdf2qv10Zlwj5WquUH2J3dWL4wSh5yTY
 NsM/Ehg03glLY15Nf6L3QSA2Jw+0vEqj/WAaGASdZvEfc7zwlP62jqDWGUMjUdR8
 w9aG3AED4d9gnOxaPgUwDFHP0VdsBWLr546QOhefScMk9FUlKCjENn7IPVHzBqRc
 7V0ILlENbUOrCd/zQIptjXYiEb59YNLsXUEFyqjWU2Ey+PwUiHFBduUexB1+cQC1
 z87E7ZWQ5HM2P9Xp1E4fw0kAxmni2XjeAnNJjfjK/nFv98bPbwyasP3N8eldk98R
 O6FAcLdqNP4xjeOpmF821JkM6eTNXefxh63dRCtUjrh1qTnYMnrERikz/C/LRLWu
 6vWBrUiJYpNsl6HRmCLwZXp6IG1VulD1OkSoAQK6HlJUzD0Gc7pwwXuCKSvpziDd
 JCN3ZHelSgrHaGseSTkn8icJTP2mP6MCzSDfSBcngwj4nMFBrCJir7eb3dGUW57k
 sKbp/cYmbGIfcoz+vNBlxOSbl1IJRND0cc1HNIwANJXbllfair78CLqd2DiHZwMJ
 HW5l3Pi9KkQwet7sMQoqOTcmkdyaq5/m51wFT1pRrwqQjy5oipiYPhIqw+N06zXL
 tDrkUlaYA+aE3A==
 =QoXG
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Palmer writes:
  "A Single RISC-V Fix for 4.19-rc7

   This tag contains a single patch that managed to get lost in the
   shuffle, which explains why it's so late.  This single line has been
   floating around in various patch sets for months, and fixes our DMA32
   region."

* tag 'riscv-for-linus-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISCV: Fix end PFN for low memory
2018-10-04 09:16:11 -07:00
Arnd Bergmann 3f4258bbe0 NXP/FSL SoC driver fixes for v4.19 round 2
- Fix crash of qman_portal by deferring its probe if qman is not probed
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEhb3UXAyxp6UQ0v6khtxQDvusFVQFAluypMcACgkQhtxQDvus
 FVQvKA/+Kc4IDHyidEgcbdNwjZNkp4eHXKSwtCdMstrEquQ7tOqmxCaEnC9izTBy
 wW6XF91ez4DJkQDgxrN1fJN512yTN0v6D7Jq+VclrrGfcT2f1xnNRf33r6957csc
 9hOp7S1hkeHZfCRtMZrjwIRgzWtlfH+XcRne/USE/enQ0GmTKThhv2rngH2hnAxP
 cqW+ult+jyL+m5i+hpzIZUjSMio4w8Z+K/QEwhlfy63nh2oYc8Zpvql5EYq1oFgd
 c0JWA1/DM/dbBiWRZVfjvjw1ZDnkJpYIJDmJYfNz3k4IJJlUYPWM4RrSRn1+yYl9
 wg/zZ3a0NlCskn32KlIwE0m1zLLVEjFam+3a7EiIFh98gBGQ5eeIpZ8UAqyc6zLi
 4etMYtt6+kUt7x2nsZ1GQrQXembt4TUBf70HMBLz85TaCg6z98PRsYyGRq1vKnj4
 sAk8aiCHOsnlqHpKt403U8wUABRAr8m+xGI2D8MSTdzbQnDftf8eqKrLS2qtnlSi
 4R3jkBiaszV8O6FdRC8+3VmtaOx/PCGh8Odo3GitDLQBdB1AkFdmIxJBJDcST/Cp
 CWAmxJsPIS4PUGJupeIALsJF8rFHiwn5uVxsNRKxpL1xpDD6mwxLSn+LOHQT8fD3
 PGhhngQD8ky9nKqL+gckpu/QfjiBrJNeUD6pvhRJc1cI+7ue/nM=
 =qVf2
 -----END PGP SIGNATURE-----

Merge tag 'soc-fsl-fix-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into fixes

NXP/FSL SoC driver fixes for v4.19 round 2

- Fix crash of qman_portal by deferring its probe if qman is not probed

* tag 'soc-fsl-fix-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux:
  soc: fsl: qman_portals: defer probe after qman's probe
  soc: fsl: qbman: add APIs to retrieve the probing status
  soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift()
  soc: fsl: qbman: qman: avoid allocating from non existing gen_pool
  ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
  MAINTAINERS: update the Annapurna Labs maintainer email
  ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT
  ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-10-04 18:01:20 +02:00
Arnd Bergmann 30a0af8826 i.MX fixes for 4.19, round 2:
- i.MX53 QSB board stops working when cpufreq driver is enabled,
    because the default OPP table has the maximum CPU frequency at
    1.2GHz, while the board is only capable of running 1GHz.  Fix up
    the OPP table for the board to get it work with cpufreq driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJbtCuVAAoJEFBXWFqHsHzOHDEH/RXzkCIZCmvVhdXIQDa/PC7b
 GupBi+UJE2iQMS3Z3RNYnZQ2yK8zq7AZNIjZtR7/8dm6ALFjXMGvfj19Cs0wThdt
 VQ3g7FyS76lwvb2sVl6yGZTOcZfn7nIwFC9NNfY1FSt87QXKBPG0UmnMpCxR7m+1
 V1d0T3y0WejpRigtzGJZJwibkOFoKy8HbAn09BBzyE+eRnNcXSp6NunIr+82AGgo
 yA3rtcoDXD997wXMUUgnFzh0VjqH5wrrbMtplWOFpDtKEFxwiHGOjWsIm6S2xi/8
 SKkk13M+GBHtPR0ygTtRngV1TF0z0kTdArhS/GGGTzWbpZ5sri5khyCfNP+i0LA=
 =xyJS
 -----END PGP SIGNATURE-----

Merge tag 'imx-fixes-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes

i.MX fixes for 4.19, round 2:
 - i.MX53 QSB board stops working when cpufreq driver is enabled,
   because the default OPP table has the maximum CPU frequency at
   1.2GHz, while the board is only capable of running 1GHz.  Fix up
   the OPP table for the board to get it work with cpufreq driver.

* tag 'imx-fixes-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx53-qsb: disable 1.2GHz OPP

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-10-04 17:34:53 +02:00
Paolo Bonzini 7e7126846c kvm: nVMX: fix entry with pending interrupt if APICv is enabled
Commit b5861e5cf2 introduced a check on
the interrupt-window and NMI-window CPU execution controls in order to
inject an external interrupt vmexit before the first guest instruction
executes.  However, when APIC virtualization is enabled the host does not
need a vmexit in order to inject an interrupt at the next interrupt window;
instead, it just places the interrupt vector in RVI and the processor will
inject it as soon as possible.  Therefore, on machines with APICv it is
not enough to check the CPU execution controls: the same scenario can also
happen if RVI>vPPR.

Fixes: b5861e5cf2
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 17:10:40 +02:00
Paolo Bonzini 2cf7ea9f40 KVM: VMX: hide flexpriority from guest when disabled at the module level
As of commit 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls
have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
whereas previously KVM would allow a nested guest to enable
VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware.  That is,
KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
(always) allow setting it when kvm-intel.flexpriority=0, and may even
initially allow the control and then clear it when the nested guest
writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
functional issues.

Hide the control completely when the module parameter is cleared.

reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 13:40:44 +02:00
Sean Christopherson fd6b6d9b82 KVM: VMX: check for existence of secondary exec controls before accessing
Return early from vmx_set_virtual_apic_mode() if the processor doesn't
support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
which reside in SECONDARY_VM_EXEC_CONTROL.  This eliminates warnings
due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
on processors without secondary exec controls.

Remove the similar check for TPR shadowing as it is incorporated in the
flexpriority_enabled check and the APIC-related code in
vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.

Reported-by: Gerhard Wiesinger <redhat@wiesinger.com>
Fixes: 8d860bbeed ("kvm: vmx: Basic APIC virtualization controls have three settings")
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-04 13:40:21 +02:00
Andy Lutomirski 02e425668f x86/vdso: Fix vDSO syscall fallback asm constraint regression
When I added the missing memory outputs, I failed to update the
index of the first argument (ebx) on 32-bit builds, which broke the
fallbacks.  Somehow I must have screwed up my testing or gotten
lucky.

Add another test to cover gettimeofday() as well.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 715bd9d12f ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks")
Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04 08:17:50 +02:00
Paul Mackerras 6579804c43 KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault
Commit 71d29f43b6 ("KVM: PPC: Book3S HV: Don't use compound_order to
determine host mapping size", 2018-09-11) added a call to 
__find_linux_pte() and a dereference of the returned PTE pointer to the
radix page fault path in the common case where the page is normal
system memory.  Previously, __find_linux_pte() was only called for
mappings to physical addresses which don't have a page struct (e.g.
memory-mapped I/O) or where the page struct is marked as reserved
memory.

This exposes us to the possibility that the returned PTE pointer
could be NULL, for example in the case of a concurrent THP collapse
operation.  Dereferencing the returned NULL pointer causes a host
crash.

To fix this, we check for NULL, and if it is NULL, we retry the
operation by returning to the guest, with the expectation that it
will generate the same page fault again (unless of course it has
been fixed up by another CPU in the meantime).

Fixes: 71d29f43b6 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-04 14:51:11 +10:00
Nathan Chancellor 88296bd42b x86/cpu/amd: Remove unnecessary parentheses
Clang warns when multiple pairs of parentheses are used for a single
conditional statement.

arch/x86/kernel/cpu/amd.c:925:14: warning: equality comparison with
extraneous parentheses [-Wparentheses-equality]
        if ((c->x86 == 6)) {
             ~~~~~~~^~~~
arch/x86/kernel/cpu/amd.c:925:14: note: remove extraneous parentheses
around the comparison to silence this warning
        if ((c->x86 == 6)) {
            ~       ^   ~
arch/x86/kernel/cpu/amd.c:925:14: note: use '=' to turn this equality
comparison into an assignment
        if ((c->x86 == 6)) {
                    ^~
                    =
1 warning generated.

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181002224511.14929-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/187
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-03 08:27:47 +02:00
Andy Lutomirski 4f16656401 x86/vdso: Only enable vDSO retpolines when enabled and supported
When I fixed the vDSO build to use inline retpolines, I messed up
the Makefile logic and made it unconditional.  It should have
depended on CONFIG_RETPOLINE and on the availability of compiler
support.  This broke the build on some older compilers.

Reported-by: nikola.ciprich@linuxbox.cz
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Rickard <matt@softrans.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jason.vas.dias@gmail.com
Cc: stable@vger.kernel.org
Fixes: 2e549b2ee0 ("x86/vdso: Fix vDSO build if a retpoline is emitted")
Link: http://lkml.kernel.org/r/08a1f29f2c238dd1f493945e702a521f8a5aa3ae.1538540801.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-03 08:26:14 +02:00
Atish Patra ef1f225874
RISCV: Fix end PFN for low memory
Use memblock_end_of_DRAM which provides correct last low memory
PFN. Without that, DMA32 region becomes empty resulting in zero
pages being allocated for DMA32.

This patch is based on earlier patch from palmer which never
merged into 4.19. I just edited the commit text to make more
sense.

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-10-02 12:50:01 -07:00
Mike Travis 2647c43c7f x86/tsc: Fix UV TSC initialization
The recent rework of the TSC calibration code introduced a regression on UV
systems as it added a call to tsc_early_init() which initializes the TSC
ADJUST values before acpi_boot_table_init().  In the case of UV systems,
that is a necessary step that calls uv_system_init().  This informs
tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC
resets as documented in commit 341102c3ef ("x86/tsc: Add option that TSC
on Socket 0 being non-zero is valid")

Fix it by skipping the early tsc initialization on UV systems and let TSC
init tests take place later in tsc_init().

Fixes: cf7a63ef4e ("x86/tsc: Calibrate tsc only once")
Suggested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com
2018-10-02 21:29:16 +02:00
Mike Travis 20a8378aa9 x86/platform/uv: Provide is_early_uv_system()
Introduce is_early_uv_system() which uses efi.uv_systab to decide early
in the boot process whether the kernel runs on a UV system.

This is needed to skip other early setup/init code that might break
the UV platform if done too early such as before necessary ACPI tables
parsing takes place.

Suggested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russ Anderson <rja@hpe.com>
Reviewed-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Link: https://lkml.kernel.org/r/20181002180144.801700401@stormcage.americas.sgi.com
2018-10-02 21:29:16 +02:00
Christophe Leroy b45ba4a51c powerpc/lib: fix book3s/32 boot failure due to code patching
Commit 51c3c62b58 ("powerpc: Avoid code patching freed init
sections") accesses 'init_mem_is_free' flag too early, before the
kernel is relocated. This provokes early boot failure (before the
console is active).

As it is not necessary to do this verification that early, this
patch moves the test into patch_instruction() instead of
__patch_instruction().

This modification also has the advantage of avoiding unnecessary
remappings.

Fixes: 51c3c62b58 ("powerpc: Avoid code patching freed init sections")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-02 23:34:14 +10:00
Natarajan, Janakarajan d7cbbe49a9 perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
In Family 17h, some L3 Cache Performance events require the ThreadMask
and SliceMask to be set. For other events, these fields do not affect
the count either way.

Set ThreadMask and SliceMask to 0xFF and 0xF respectively.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H . Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/Message-ID:
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 09:38:04 +02:00
Kan Liang 9d92cfeaf5 perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
The counters on M3UPI Link 0 and Link 3 don't count properly, and writing
0 to these counters may causes system crash on some machines.

The PCI BDF addresses of the M3UPI in the current code are incorrect.

The correct addresses should be:

  D18:F1	0x204D
  D18:F2	0x204E
  D18:F5	0x204D

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: cd34cd97b7 ("perf/x86/intel/uncore: Add Skylake server uncore support")
Link: http://lkml.kernel.org/r/1537538826-55489-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 09:38:02 +02:00
Masayoshi Mizuma 6265adb972 perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0
Physical package id 0 doesn't always exist, we should use
boot_cpu_data.phys_proc_id here.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20180910144750.6782-1-msys.mizuma@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 09:37:58 +02:00
Andy Lutomirski 715bd9d12f x86/vdso: Fix asm constraints on vDSO syscall fallbacks
The syscall fallbacks in the vDSO have incorrect asm constraints.
They are not marked as writing to their outputs -- instead, they are
marked as clobbering "memory", which is useless.  In particular, gcc
is smart enough to know that the timespec parameter hasn't escaped,
so a memory clobber doesn't clobber it.  And passing a pointer as an
asm *input* does not tell gcc that the pointed-to value is changed.

Add in the fact that the asm instructions weren't volatile, and gcc
was free to omit them entirely unless their sole output (the return
value) is used.  Which it is (phew!), but that stops happening with
some upcoming patches.

As a trivial example, the following code:

void test_fallback(struct timespec *ts)
{
	vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
}

compiles to:

00000000000000c0 <test_fallback>:
  c0:   c3                      retq

To add insult to injury, the RCX and R11 clobbers on 64-bit
builds were missing.

The "memory" clobber is also unnecessary -- no ordering with respect to
other memory operations is needed, but that's going to be fixed in a
separate not-for-stable patch.

Fixes: 2aae950b21 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org
2018-10-02 08:28:15 +02:00
Greg Kroah-Hartman 385afbf8c3 Late arm64 fixes
- Fix handling of young contiguous ptes for hugetlb mappings
 
 - Fix livelock when taking access faults on contiguous hugetlb mappings
 
 - Tighten up register accesses via KVM SET_ONE_REG ioctl()s
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJbslhqAAoJELescNyEwWM0FIoH/2fQYrzEZk+zjcJxIxwZOVn8
 L1lpSb4+xa0OPLvHU/TEvPCo2B7J3R9jisqQKcqe0MeOvqRThfIsYOWfcFf5NoX8
 K4ysmaVk6treS1IJ9ZK+2g5pSuKpvFNQ0euBdoolCe4wV/ZDTH2dNlovdIvnucV2
 ybpwUptTK33tpUAlkadGsFo/O8Qdsu3MhQD4ymDZXNj8N7L9lrIwCX42wDZpvcFd
 XR2O0/tAOtbz1n7PBmtCehenS0BzU5877MAmQsb9c93qyyZ37cMhS1L1RCPqhXV9
 TfX/+nyjkRpt+gaMJTV39JjMTBcbtVVHNe32cC470H5OvgK6SNELcJsIlEeUFbo=
 =Subb
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Will writes:
  "Late arm64 fixes

   - Fix handling of young contiguous ptes for hugetlb mappings

   - Fix livelock when taking access faults on contiguous hugetlb mappings

   - Tighten up register accesses via KVM SET_ONE_REG ioctl()s"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: KVM: Sanitize PSTATE.M when being set from userspace
  arm64: KVM: Tighten guest core register access from userspace
  arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags
  arm64: hugetlb: Fix handling of young ptes
2018-10-01 17:24:20 -07:00
Greg Kroah-Hartman b62e425593 ARM: SoC fixes
A handful of fixes that have been coming in the last couple of weeks:
 
  - Freescale fixes for on-chip accellerators
  - A DT fix for stm32 to avoid fallback to non-DMA SPI mode
  - Fixes for badly specified interrupts on BCM63xx SoCs
  - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40
  - Drive strength fix for SAMA5D2 NAND pins on one board
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAluxIecPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx3KswP/iJT6PRSv2OiZq5UyUPhAOx9dW+9uQP5qCYO
 43hRkEhUQEbHAibjd4jKq7r2jNfOEeoZARyhE89tQc+RxwU7oOxH5Aohbmk1o4TQ
 bQ8AQHoofdNerwr8LKWAWvXe6Ff74d6NIJEQZ1ampndt7pul6LDJbLGg503tDPKZ
 fomG/W50id7xA8xexEfZZRXZu9HSRqNk6/wZYycUhsreZZ30nSQwJTJvLiSiTTAh
 qWleTc0dD3BazQBEf8VJwLSu3UfigXF+dP7p/joElgULhk00fHYrhWdAa8d0F3ib
 tS0foD/alLVslnjIDh8baEkErfqDvtZlpRCinNob1R56yzmkSxjBqCb6kSt4jCN8
 o+rlNnmnJPRH/qj0wdjd9phw5AWyZw1V1lSRvZGPacG6i7ZYb02Sj13u05k8826m
 hIpnryhrwuO8lKrDUCV4GT/oDpKS7ujskJZFWEUgjXHZA/XDodNXN5Rkuw8LeJmh
 HJx1Ef5v/RLbdoIl3Ybs1zDdbg9rmxdaqfDs3Ukka9doZGB1wtZh+GbF1v6u6GZi
 zmrcu3jzhDVek7Lw1ZWUCUBCxmYLbcg2txd6ZtkCV09M/fuSnQuxF/mLqiq03YAL
 ASy7ejKc5tf8DPnHKlZ7KIR4eMXEhxUFOpKblAQktHvREel2zC5xjOQjQvCTm1hD
 w5rDtaPt
 =+/9J
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Olof writes:
  "ARM: SoC fixes

   A handful of fixes that have been coming in the last couple of weeks:

   - Freescale fixes for on-chip accellerators
   - A DT fix for stm32 to avoid fallback to non-DMA SPI mode
   - Fixes for badly specified interrupts on BCM63xx SoCs
   - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40
   - Drive strength fix for SAMA5D2 NAND pins on one board"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: stm32: update SPI6 dmas property on stm32mp157c
  soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift()
  soc: fsl: qbman: qman: avoid allocating from non existing gen_pool
  ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
  MAINTAINERS: update the Annapurna Labs maintainer email
  ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT
  ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl
2018-10-01 17:23:27 -07:00
Sean Christopherson daa07cbc9a KVM: x86: fix L1TF's MMIO GFN calculation
One defense against L1TF in KVM is to always set the upper five bits
of the *legal* physical address in the SPTEs for non-present and
reserved SPTEs, e.g. MMIO SPTEs.  In the MMIO case, the GFN of the
MMIO SPTE may overlap with the upper five bits that are being usurped
to defend against L1TF.  To preserve the GFN, the bits of the GFN that
overlap with the repurposed bits are shifted left into the reserved
bits, i.e. the GFN in the SPTE will be split into high and low parts.
When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO
access, get_mmio_spte_gfn() unshifts the affected bits and restores
the original GFN for comparison.  Unfortunately, get_mmio_spte_gfn()
neglects to mask off the reserved bits in the SPTE that were used to
store the upper chunk of the GFN.  As a result, KVM fails to detect
MMIO accesses whose GPA overlaps the repurprosed bits, which in turn
causes guest panics and hangs.

Fix the bug by generating a mask that covers the lower chunk of the
GFN, i.e. the bits that aren't shifted by the L1TF mitigation.  The
alternative approach would be to explicitly zero the five reserved
bits that are used to store the upper chunk of the GFN, but that
requires additional run-time computation and makes an already-ugly
bit of code even more inscrutable.

I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to
warn if GENMASK_ULL() generated a nonsensical value, but that seemed
silly since that would mean a system that supports VMX has less than
18 bits of physical address space...

Reported-by: Sakari Ailus <sakari.ailus@iki.fi>
Fixes: d9b47449c1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
Cc: Junaid Shahid <junaids@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Junaid Shahid <junaids@google.com>
Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:41:00 +02:00
Liran Alon 62cf9bd811 KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only
when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls.

Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which
is L1 IA32_BNDCFGS.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:59 +02:00
Liran Alon 503234b3fd KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
Commit a87036add0 ("KVM: x86: disable MPX if host did not enable
MPX XSAVE features") introduced kvm_mpx_supported() to return true
iff MPX is enabled in the host.

However, that commit seems to have missed replacing some calls to
kvm_x86_ops->mpx_supported() to kvm_mpx_supported().

Complete original commit by replacing remaining calls to
kvm_mpx_supported().

Fixes: a87036add0 ("KVM: x86: disable MPX if host did not enable
MPX XSAVE features")

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:57 +02:00
Liran Alon 5f76f6f5ff KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
Before this commit, KVM exposes MPX VMX controls to L1 guest only based
on if KVM and host processor supports MPX virtualization.
However, these controls should be exposed to guest only in case guest
vCPU supports MPX.

Without this change, a L1 guest running with kernel which don't have
commit 691bd4340b ("kvm: vmx: allow host to access guest
MSR_IA32_BNDCFGS") asserts in QEMU on the following:
	qemu-kvm: error: failed to set MSR 0xd90 to 0x0
	qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs:
	Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed'
This is because L1 KVM kvm_init_msr_list() will see that
vmx_mpx_supported() (As it only checks MPX VMX controls support) and
therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS.
However, later when L1 will attempt to set this MSR via KVM_SET_MSRS
IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu).

Therefore, fix the issue by exposing MPX VMX controls to L1 guest only
when vCPU supports MPX.

Fixes: 36be0b9deb ("KVM: x86: Add nested virtualization support for MPX")

Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-01 15:40:57 +02:00
Marc Zyngier 2a3f93459d arm64: KVM: Sanitize PSTATE.M when being set from userspace
Not all execution modes are valid for a guest, and some of them
depend on what the HW actually supports. Let's verify that what
userspace provides is compatible with both the VM settings and
the HW capabilities.

Cc: <stable@vger.kernel.org>
Fixes: 0d854a60b1 ("arm64: KVM: enable initialization of a 32bit vcpu")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-01 14:38:26 +01:00
Dave Martin d26c25a9d1 arm64: KVM: Tighten guest core register access from userspace
We currently allow userspace to access the core register file
in about any possible way, including straddling multiple
registers and doing unaligned accesses.

This is not the expected use of the ABI, and nobody is actually
using it that way. Let's tighten it by explicitly checking
the size and alignment for each field of the register file.

Cc: <stable@vger.kernel.org>
Fixes: 2f4a07c5f9 ("arm64: KVM: guest one-reg interface")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[maz: rewrote Dave's initial patch to be more easily backported]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-01 14:38:05 +01:00
Greg Kroah-Hartman e75417739b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Thomas writes:
  "A single fix for the AMD memory encryption boot code so it does not
   read random garbage instead of the cached encryption bit when a kexec
   kernel is allocated above the 32bit address limit."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Fix kexec booting failure in the SEV bit detection code
2018-09-29 14:34:06 -07:00
Greg Kroah-Hartman f005de0183 powerpc fixes for 4.19 #3
A reasonably big batch of fixes due to me being away for a few weeks.
 
 A fix for the TM emulation support on Power9, which could result in corrupting
 the guest r11 when running under KVM.
 
 Two fixes to the TM code which could lead to userspace GPR corruption if we take
 an SLB miss at exactly the wrong time.
 
 Our dynamic patching code had a bug that meant we could patch freed __init text,
 which could lead to corrupting userspace memory.
 
 csum_ipv6_magic() didn't work on little endian platforms since we optimised it
 recently.
 
 A fix for an endian bug when reading a device tree property telling us how many
 storage keys the machine has available.
 
 Fix a crash seen on some configurations of PowerVM when migrating the partition
 from one machine to another.
 
 A fix for a regression in the setup of our CPU to NUMA node mapping in KVM
 guests.
 
 A fix to our selftest Makefiles to make them work since a recent change to the
 shared Makefile logic.
 
 Thanks to:
   Alexey Kardashevskiy, Breno Leitao, Christophe Leroy, Michael Bringmann,
   Michael Neuling, Nicholas Piggin, Paul Mackerras,, Srikar Dronamraju, Thiago
   Jung Bauermann, Xin Long.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAluuCTsTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDKjD/sHWK1MFXM4baYrdju2tW9tBBiIq18n
 UQ4byhJ6wgXzNDzxSq3v4ofW/5A8HoF+2wTqIXFY5l9hMQRw+TfB3ZymloZZ/0Nf
 3sxFfL0sv7or4V8Ben/vGEdalg9PzFInwAXhBCGV0nnHiE3D7GkO+MU3zsZUjzum
 EbWN/SKMLhjGByeqGbujJYfLC2IdvAdpTzrYd0t0X/Pqi8EUuGjJa0E+cJqD6mjI
 GR4eGagImuDuTH6F9eLwcQt2mlDvmv6KJjtgoMbVeh99pqkG9tSnfOlZ+KrWWVps
 1+/CXHlaJiHsmX3wqWKjw31x1Ag4v+BEtAjXpwnr7MFroumZ18xRLaDfy11eJ+GB
 x3fPbdylittMQy1XYKRrntppDsr0dTAMCjBmE6KSMRyi+F5/agW8bG3sx7D26/AO
 JoWGw/XTxislvdr/PIPkjISXRSKHa5jwT3rH45Id8W1uEt0TQ7hJOmWIOdC1WXU8
 IbYy3imksXLCcxIJmSzc+cUkVo1wHf4S4NxcJsVs4N5pE/zih5mEKi9jaIpxocF/
 YX43n4hGa/6oJYBBMeQZpsO3aDiP/Ynjk4j/I2l92WAIXzQXjVXhreLfHzci+fGg
 Q6KdxNWPbwe3UKUTnQ8TOY4ncshU716HrFsp/B4HWvJTfvY6tc7nKDupieetzyKp
 G5NA6Bh+VOVDxQ==
 =G6wq
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Michael writes:
  "powerpc fixes for 4.19 #3

   A reasonably big batch of fixes due to me being away for a few weeks.

   A fix for the TM emulation support on Power9, which could result in
   corrupting the guest r11 when running under KVM.

   Two fixes to the TM code which could lead to userspace GPR corruption
   if we take an SLB miss at exactly the wrong time.

   Our dynamic patching code had a bug that meant we could patch freed
   __init text, which could lead to corrupting userspace memory.

   csum_ipv6_magic() didn't work on little endian platforms since we
   optimised it recently.

   A fix for an endian bug when reading a device tree property telling
   us how many storage keys the machine has available.

   Fix a crash seen on some configurations of PowerVM when migrating the
   partition from one machine to another.

   A fix for a regression in the setup of our CPU to NUMA node mapping
   in KVM guests.

   A fix to our selftest Makefiles to make them work since a recent
   change to the shared Makefile logic."

* tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix Makefiles for headers_install change
  powerpc/numa: Use associativity if VPHN hcall is successful
  powerpc/tm: Avoid possible userspace r1 corruption on reclaim
  powerpc/tm: Fix userspace r13 corruption
  powerpc/pseries: Fix unitialized timer reset on migration
  powerpc/pkeys: Fix reading of ibm, processor-storage-keys property
  powerpc: fix csum_ipv6_magic() on little endian platforms
  powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)
  powerpc: Avoid code patching freed init sections
  KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
2018-09-28 17:43:32 -07:00
Paul Burton 951d223c6c
MIPS: Fix CONFIG_CMDLINE handling
Commit 8ce355cf2e ("MIPS: Setup boot_command_line before
plat_mem_setup") fixed a problem for systems which have
CONFIG_CMDLINE_BOOL=y & use a DT with a chosen node that has either no
bootargs property or an empty one. In this configuration
early_init_dt_scan_chosen() copies CONFIG_CMDLINE into
boot_command_line, but the MIPS code doesn't know this so it appends
CONFIG_CMDLINE (via builtin_cmdline) to boot_command_line again. The
result is that boot_command_line contains the arguments from
CONFIG_CMDLINE twice.

That commit took the approach of simply setting up boot_command_line
from the MIPS code before early_init_dt_scan_chosen() runs, causing it
not to copy CONFIG_CMDLINE to boot_command_line if a chosen node with no
bootargs property is found.

Unfortunately this is problematic for systems which do have a non-empty
bootargs property & CONFIG_CMDLINE_BOOL=y. There
early_init_dt_scan_chosen() will overwrite boot_command_line with the
arguments from DT, which means we lose those from CONFIG_CMDLINE
entirely. This breaks CONFIG_MIPS_CMDLINE_DTB_EXTEND. If we have
CONFIG_MIPS_CMDLINE_FROM_BOOTLOADER or
CONFIG_MIPS_CMDLINE_BUILTIN_EXTEND selected and the DT has a bootargs
property which we should ignore, it will instead be honoured breaking
those configurations too.

Fix this by reverting commit 8ce355cf2e ("MIPS: Setup
boot_command_line before plat_mem_setup") to restore the former
behaviour, and fixing the CONFIG_CMDLINE duplication issue by
initializing boot_command_line to a non-empty string that
early_init_dt_scan_chosen() will not overwrite with CONFIG_CMDLINE.

This is a little ugly, but cleanup in this area is on its way. In the
meantime this is at least easy to backport & contains the ugliness
within arch/mips/.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 8ce355cf2e ("MIPS: Setup boot_command_line before plat_mem_setup")
References: https://patchwork.linux-mips.org/patch/18804/
Patchwork: https://patchwork.linux-mips.org/patch/20813/
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Jaedon Shin <jaedon.shin@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.16+
2018-09-28 12:09:04 -07:00
Paul Burton ea7e0480a4
MIPS: VDSO: Always map near top of user memory
When using the legacy mmap layout, for example triggered using ulimit -s
unlimited, get_unmapped_area() fills memory from bottom to top starting
from a fairly low address near TASK_UNMAPPED_BASE.

This placement is suboptimal if the user application wishes to allocate
large amounts of heap memory using the brk syscall. With the VDSO being
located low in the user's virtual address space, the amount of space
available for access using brk is limited much more than it was prior to
the introduction of the VDSO.

For example:

  # ulimit -s unlimited; cat /proc/self/maps
  00400000-004ec000 r-xp 00000000 08:00 71436      /usr/bin/coreutils
  004fc000-004fd000 rwxp 000ec000 08:00 71436      /usr/bin/coreutils
  004fd000-0050f000 rwxp 00000000 00:00 0
  00cc3000-00ce4000 rwxp 00000000 00:00 0          [heap]
  2ab96000-2ab98000 r--p 00000000 00:00 0          [vvar]
  2ab98000-2ab99000 r-xp 00000000 00:00 0          [vdso]
  2ab99000-2ab9d000 rwxp 00000000 00:00 0
  ...

Resolve this by adjusting STACK_TOP to reserve space for the VDSO &
providing an address hint to get_unmapped_area() causing it to use this
space even when using the legacy mmap layout.

We reserve enough space for the VDSO, plus 1MB or 256MB for 32 bit & 64
bit systems respectively within which we randomize the VDSO base
address. Previously this randomization was taken care of by the mmap
base address randomization performed by arch_mmap_rnd(). The 1MB & 256MB
sizes are somewhat arbitrary but chosen such that we have some
randomization without taking up too much of the user's virtual address
space, which is often in short supply for 32 bit systems.

With this the VDSO is always mapped at a high address, leaving lots of
space for statically linked programs to make use of brk:

  # ulimit -s unlimited; cat /proc/self/maps
  00400000-004ec000 r-xp 00000000 08:00 71436      /usr/bin/coreutils
  004fc000-004fd000 rwxp 000ec000 08:00 71436      /usr/bin/coreutils
  004fd000-0050f000 rwxp 00000000 00:00 0
  00c28000-00c49000 rwxp 00000000 00:00 0          [heap]
  ...
  7f67c000-7f69d000 rwxp 00000000 00:00 0          [stack]
  7f7fc000-7f7fd000 rwxp 00000000 00:00 0
  7fcf1000-7fcf3000 r--p 00000000 00:00 0          [vvar]
  7fcf3000-7fcf4000 r-xp 00000000 00:00 0          [vdso]

Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Huacai Chen <chenhc@lemote.com>
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.4+
2018-09-28 12:09:00 -07:00
Kairui Song bdec8d7fa5 x86/boot: Fix kexec booting failure in the SEV bit detection code
Commit

  1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")

can occasionally cause system resets when kexec-ing a second kernel even
if SEV is not active.

That's because get_sev_encryption_bit() uses 32-bit rIP-relative
addressing to read the value of enc_bit - a variable which caches a
previously detected encryption bit position - but kexec may allocate
the early boot code to a higher location, beyond the 32-bit addressing
limit.

In this case, garbage will be read and get_sev_encryption_bit() will
return the wrong value, leading to accessing memory with the wrong
encryption setting.

Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
rIP-relative addressing in the first place.

 [ bp: massage commit message heavily. ]

Fixes: 1958b5fc40 ("x86/boot: Add early boot support when running with SEV active")
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: brijesh.singh@amd.com
Cc: kexec@lists.infradead.org
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: ghook@redhat.com
Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
2018-09-27 19:35:03 +02:00
Amelie Delaunay 082dc55e3c ARM: dts: stm32: update SPI6 dmas property on stm32mp157c
Remove unused parameter from SPI6 dmas property on stm32mp157c SoC.

Fixes: dc3f8c86c1 ("ARM: dts: stm32: add SPI support on stm32mp157c")
Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com>
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
[olof: Without this patch, SPI6 will fall back to interrupt mode with
lower perfmance]
Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-25 20:39:06 -07:00
Olof Johansson 1a677ff4ce This pull request contains Broadcom ARM-based SoCs Device Tree changes
intended for 4.19, please pull the following:
 
 - Florian fixes the PPI and SPI interrupts in the BCM63138 (DSL) SoC DTS
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEm+Rq3+YGJdiR9yuFh9CWnEQHBwQFAlupKGwACgkQh9CWnEQH
 BwTgvQ/+KShazu6K/28nduCtvnDCDbro8UYHxweo97UmeeEYIJ9g41ubKxEtWHUF
 s/vNojpv0xlPEDKogHsBfsE6L0S3jNjG5dx1lvJgrJDlaTabtGybM3xRoOkB+xGs
 y7QNTfBcVPOBJjOrUSmh/1CIirBm+yQjSYgg8/YLJWZCy/FRgSd7koTKHAql/Ppw
 xWjG83djIFnWxTs7Hbp0cHQP9Z4kpWo9tuWkyOXeiDR0CMTtFjdKwMPkw4GHOInQ
 68FK75mDROnno9TwtWGylw2op8JRzSqRaX6aDQnXn8xzP7D+ngRRUWlLNOQutuO1
 RX8+qx5ItUMi6vTc3zOm4NJk1WvWSyiiTO2fHLjZ29e+ODy0gi6cZCb8E4gSJK1W
 nS3KFDdqw0hPh4rCOGKe20KqaY0jd/hN1FLjXIA9YRYQJb0aZR1iW+U8PmahNyEb
 9O2zUEUJebFoZWOta4ISnJMKhXNawHGf3f6U24kqMCg/3qJt4NKFn2MjdsXraqij
 nd/biK0M6JhlHwHTV4sVEN7f86anESz1H8T9AB9MWkzlTNvW7narWQEJCiVGp/16
 FDufrM0cNX9mVcBOprzmDpMiY+YiKOP2nrKIlTPaaa/qsxhOr2qZT8KmzlmdRQWU
 Aj/sbaPRPF+882sDSEbLYpl/aE90zYgiLe2S1PfOVRZ5YrJws0E=
 =FoxN
 -----END PGP SIGNATURE-----

Merge tag 'arm-soc/for-4.19/devicetree-fixes' of https://github.com/Broadcom/stblinux into fixes

This pull request contains Broadcom ARM-based SoCs Device Tree changes
intended for 4.19, please pull the following:

- Florian fixes the PPI and SPI interrupts in the BCM63138 (DSL) SoC DTS

* tag 'arm-soc/for-4.19/devicetree-fixes' of https://github.com/Broadcom/stblinux:
  ARM: dts: BCM63xx: Fix incorrect interrupt specifiers

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-25 13:51:16 -07:00
Olof Johansson 9f71e7d5d9 AT91 fixes for 4.19:
- fix a NAND issue on sama5d2_ptc_ek
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlupQmwACgkQAyWl4gNJ
 NJK3mA/8CmtFBatrRcnErd5/Yv5xwVEKIRSKy021Q3wUcBFflJ2AuZIqvEVp7py4
 c2fpjyi+YcsShH6ggGsmZCBCfN3TYDp9tBzSDu7YyIVgWO5zVvfgS/S8A0uYB9Af
 n2C2aRa1HxH26DLPdBZD4a/n6sd+f6P3VnKSK3W3JNwPtlI/M7xYIIihNCqMsXAX
 or7yk4+bbrCV6XyF4fSPbYZ12llulcSOziSEx25gitJ4BfzIi4mowMI/AFe8gnvh
 j1dD4biF8IH+Xfvjx0eSuIH9wtxIoReznZf0qRfvs1GqD6AQwHsQHnssfhw1WLuF
 iFcWpUGvV6ZOTWodvvtf1ePOK2M5ZntXcDkzqs0USFQHUPu2MBxtmhpMSDD9b9dZ
 XEnm2Qq1q28PPM1qoGOP3nfRQ/hUC1BUW4aNRUNUzuZHtGostxcB77URGEPj1ZtG
 G74SI9139Vrh8z731pOg5r+pDiH+WjMDjRNWwrViHUjJ1IiE9YlqvmRDrLmIac+Z
 juK2ZKyD8tWkEJHe0qvOkpqJAbteRYLB+3/Q8NaKnnhaz8rfGJKvSZKYYxeL1Ssk
 h8bP1OpvmfdocppVVrQ253d8hw53J7IKnS970b62C3lIiUcjRG5r2I8NggZoBitg
 YPMKp3c9v5MHKHgdKTjeK2FcbIq9j6HZXRZkcsYaDFU5G13n/Do=
 =pn0U
 -----END PGP SIGNATURE-----

Merge tag 'at91-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into fixes

AT91 fixes for 4.19:

 - fix a NAND issue on sama5d2_ptc_ek (drive strength setting to fix
   corruption)

* tag 'at91-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-25 12:31:19 -07:00
Srikar Dronamraju 2483ef056f powerpc/numa: Use associativity if VPHN hcall is successful
Currently associativity is used to lookup node-id even if the
preceding VPHN hcall failed. However this can cause CPU to be made
part of the wrong node, (most likely to be node 0). This is because
VPHN is not enabled on KVM guests.

With 2ea6263 ("powerpc/topology: Get topology for shared processors at
boot"), associativity is used to set to the wrong node. Hence KVM
guest topology is broken.

For example : A 4 node KVM guest before would have reported.

  [root@localhost ~]#  numactl -H
  available: 4 nodes (0-3)
  node 0 cpus: 0 1 2 3
  node 0 size: 1746 MB
  node 0 free: 1604 MB
  node 1 cpus: 4 5 6 7
  node 1 size: 2044 MB
  node 1 free: 1765 MB
  node 2 cpus: 8 9 10 11
  node 2 size: 2044 MB
  node 2 free: 1837 MB
  node 3 cpus: 12 13 14 15
  node 3 size: 2044 MB
  node 3 free: 1903 MB
  node distances:
  node   0   1   2   3
    0:  10  40  40  40
    1:  40  10  40  40
    2:  40  40  10  40
    3:  40  40  40  10

Would now report:

  [root@localhost ~]# numactl -H
  available: 4 nodes (0-3)
  node 0 cpus: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  node 0 size: 1746 MB
  node 0 free: 1244 MB
  node 1 cpus:
  node 1 size: 2044 MB
  node 1 free: 2032 MB
  node 2 cpus: 1
  node 2 size: 2044 MB
  node 2 free: 2028 MB
  node 3 cpus:
  node 3 size: 2044 MB
  node 3 free: 2032 MB
  node distances:
  node   0   1   2   3
    0:  10  40  40  40
    1:  40  10  40  40
    2:  40  40  10  40
    3:  40  40  40  10

Fix this by skipping associativity lookup if the VPHN hcall failed.

Fixes: 2ea6263068 ("powerpc/topology: Get topology for shared processors at boot")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25 23:05:08 +10:00
Michael Neuling 96dc89d526 powerpc/tm: Avoid possible userspace r1 corruption on reclaim
Current we store the userspace r1 to PACATMSCRATCH before finally
saving it to the thread struct.

In theory an exception could be taken here (like a machine check or
SLB miss) that could write PACATMSCRATCH and hence corrupt the
userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
others do.

We've never actually seen this happen but it's theoretically
possible. Either way, the code is fragile as it is.

This patch saves r1 to the kernel stack (which can't fault) before we
turn MSR[RI] back on. PACATMSCRATCH is still used but only with
MSR[RI] off. We then copy r1 from the kernel stack to the thread
struct once we have MSR[RI] back on.

Suggested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25 22:51:32 +10:00
Michael Neuling cf13435b73 powerpc/tm: Fix userspace r13 corruption
When we treclaim we store the userspace checkpointed r13 to a scratch
SPR and then later save the scratch SPR to the user thread struct.

Unfortunately, this doesn't work as accessing the user thread struct
can take an SLB fault and the SLB fault handler will write the same
scratch SPRG that now contains the userspace r13.

To fix this, we store r13 to the kernel stack (which can't fault)
before we access the user thread struct.

Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
as a random userspace segfault with r13 looking like a kernel address.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25 22:51:08 +10:00
James Cowgill 57a489786d
RISC-V: include linux/ftrace.h in asm-prototypes.h
Building a riscv kernel with CONFIG_FUNCTION_TRACER and
CONFIG_MODVERSIONS enabled results in these two warnings:

  MODPOST vmlinux.o
WARNING: EXPORT symbol "return_to_handler" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.

When exporting symbols from an assembly file, the MODVERSIONS code
requires their prototypes to be defined in asm-prototypes.h (see
scripts/Makefile.build). Since both of these symbols have prototypes
defined in linux/ftrace.h, include this header from RISC-V's
asm-prototypes.h.

Reported-by: Karsten Merker <merker@debian.org>
Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-09-24 13:12:27 -07:00
Florian Fainelli 3ab97942d0 ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
A number of our interrupts were incorrectly specified, fix both the PPI
and SPI interrupts to be correct.

Fixes: b5762cacc4 ("ARM: bcm63138: add NAND DT support")
Fixes: 46d4bca044 ("ARM: BCM63XX: add BCM63138 minimal Device Tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2018-09-24 11:04:04 -07:00
Steve Capper 031e6e6b4e arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags
For contiguous hugetlb, huge_ptep_set_access_flags performs a
get_clear_flush (which then flushes the TLBs) even when no change of ptes
is necessary.

Unfortunately, this behaviour can lead to back-to-back page faults being
generated when running with multiple threads that access the same
contiguous huge page.

Thread 1                     |  Thread 2
-----------------------------+------------------------------
hugetlb_fault                |
huge_ptep_set_access_flags   |
  -> invalidate pte range    | hugetlb_fault
continue processing          | wait for hugetlb_fault_mutex
release mutex and return     | huge_ptep_set_access_flags
                             |   -> invalidate pte range
hugetlb_fault
...

This patch changes huge_ptep_set_access_flags s.t. we first read the
contiguous range of ptes (whilst preserving dirty information); the pte
range is only then invalidated where necessary and this prevents further
spurious page faults.

Fixes: d8bdcff287 ("arm64: hugetlb: Add break-before-make logic for contiguous entries")
Reported-by: Lei Zhang <zhang.lei@jp.fujitsu.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-24 17:51:50 +01:00
Steve Capper 469ed9d823 arm64: hugetlb: Fix handling of young ptes
In the contiguous bit hugetlb break-before-make code we assume that all
hugetlb pages are young.

In fact, remove_migration_pte is able to place an old hugetlb pte so
this assumption is not valid.

This patch fixes the contiguous hugetlb scanning code to preserve young
ptes.

Fixes: d8bdcff287 ("arm64: hugetlb: Add break-before-make logic for contiguous entries")
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-24 17:51:50 +01:00
Paolo Bonzini 4679b61f26 KVM: x86: never trap MSR_KERNEL_GS_BASE
KVM has an old optimization whereby accesses to the kernel GS base MSR
are trapped when the guest is in 32-bit and not when it is in 64-bit mode.
The idea is that swapgs is not available in 32-bit mode, thus the
guest has no reason to access the MSR unless in 64-bit mode and
32-bit applications need not pay the price of switching the kernel GS
base between the host and the guest values.

However, this optimization adds complexity to the code for little
benefit (these days most guests are going to be 64-bit anyway) and in fact
broke after commit 678e315e78 ("KVM: vmx: add dedicated utility to
access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base
can be corrupted across SMIs and UEFI Secure Boot is therefore broken
(a secure boot Linux guest, for example, fails to reach the login prompt
about half the time).  This patch just removes the optimization; the
kernel GS base MSR is now never trapped by KVM, similarly to the FS and
GS base MSRs.

Fixes: 678e315e78
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-24 18:34:13 +02:00
Michael Bringmann 8604895a34 powerpc/pseries: Fix unitialized timer reset on migration
After migration of a powerpc LPAR, the kernel executes code to
update the system state to reflect new platform characteristics.

Such changes include modifications to device tree properties provided
to the system by PHYP. Property notifications received by the
post_mobility_fixup() code are passed along to the kernel in general
through a call to of_update_property() which in turn passes such
events back to all modules through entries like the '.notifier_call'
function within the NUMA module.

When the NUMA module updates its state, it resets its event timer. If
this occurs after a previous call to stop_topology_update() or on a
system without VPHN enabled, the code runs into an unitialized timer
structure and crashes. This patch adds a safety check along this path
toward the problem code.

An example crash log is as follows.

  ibmvscsi 30000081: Re-enabling adapter!
  ------------[ cut here ]------------
  kernel BUG at kernel/time/timer.c:958!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag lockd unix_diag af_packet_diag netlink_diag grace fscache sunrpc xts vmx_crypto pseries_rng sg binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
  CPU: 11 PID: 3067 Comm: drmgr Not tainted 4.17.0+ #179
  ...
  NIP mod_timer+0x4c/0x400
  LR  reset_topology_timer+0x40/0x60
  Call Trace:
    0xc0000003f9407830 (unreliable)
    reset_topology_timer+0x40/0x60
    dt_update_callback+0x100/0x120
    notifier_call_chain+0x90/0x100
    __blocking_notifier_call_chain+0x60/0x90
    of_property_notify+0x90/0xd0
    of_update_property+0x104/0x150
    update_dt_property+0xdc/0x1f0
    pseries_devicetree_update+0x2d0/0x510
    post_mobility_fixup+0x7c/0xf0
    migration_store+0xa4/0xc0
    kobj_attr_store+0x30/0x60
    sysfs_kf_write+0x64/0xa0
    kernfs_fop_write+0x16c/0x240
    __vfs_write+0x40/0x200
    vfs_write+0xc8/0x240
    ksys_write+0x5c/0x100
    system_call+0x58/0x6c

Fixes: 5d88aa85c0 ("powerpc/pseries: Update CPU maps when device tree is updated")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-24 21:05:38 +10:00
Olof Johansson e6f0e1c756 Allwinner fixes - round 2
One additional fix regarding HDMI on the R40 SoC.
 
 Based on preliminary tests and code dumps for the R40, it was thought
 that the whole HDMI block was the same on the R40 and A64.
 
 Recent tests regarding the A64 showed that this was not the case. The
 HDMI PHY on the A64 only has one clock parent. How this occurs at the
 hardware level is unclear, as Allwinner has not given any feedback on
 this matter. Nevertheless it is clear that the hardware acts differently
 between the A64 and R40 in such a way that the R40's HDMI PHY is not
 backward compatible with the A64's. As such we need to drop the fallback
 compatible string in the R40's device tree. This was added in v4.19-rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCgAsFiEE2nN1m/hhnkhOWjtHOJpUIZwPJDAFAlujGKAOHHdlbnNAY3Np
 ZS5vcmcACgkQOJpUIZwPJDAX/RAAqQWmXmxp0m8N4L8FE4cnhWgqA4OhKi5Ehtc1
 qDnmmzitH4nTHzsTFu4zeQ5W7qv/+qZAiFHLlcLouhJJye3LKP4V6zJ15e6oVPM+
 5O2UuT95yOMH7wXOP0tOb5R6mNaLyFxU2L22/zVVQ4lxQNOKNcuR3QWWigQgRD7g
 QagSjUFG9Afg3SJyO0Fyz4qVYHDNuvFbHROM6FfCrqId+fduitLgi0U6FfLQXrVg
 G3jY9w4+CGSMwvBeNL2Ai9zIiDNJA+ZFeGPx43wvXrEpAg+6+oUQuaj+IVkZNsrz
 7Eu1EiZgD7w16IVKl/Eb4tw8iplm9sbAzvSgt8uPEUDgSBTAuJnk3kdAsYk/AHvR
 3YX6ftPzF1+TevB8N7zs03Q8Q08pGroq9Q4yi5g+7muPgpXKJrL7rLxl0BrHGhWv
 E+WkPrjosKHqGiXFbSyhcReKBbtNGE2yUJiIbhl7sNKIkEkDzlpvG3teKyCMIoRV
 NVP02LYHz0zUJXE7BIATB1Fi4QW45ymvwZn/qegfJkbNeWLu1HFykhAMPhpyTJtK
 0iWgzdHEh8q3of6UfHOD5tuLqFbEg/t4c+0ZDP2PygtpPS3QywZL/z2r4Nrsd896
 cSfjrFowIP9RcXn+RF0/EOxlO+Ygu81XRziLIzKSEaIv9pgKbpTiKN7PqyZYUFWN
 PdlTaCk=
 =eo72
 -----END PGP SIGNATURE-----

Merge tag 'sunxi-fixes-for-4.19-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes

Allwinner fixes - round 2

One additional fix regarding HDMI on the R40 SoC.

Based on preliminary tests and code dumps for the R40, it was thought
that the whole HDMI block was the same on the R40 and A64.

Recent tests regarding the A64 showed that this was not the case. The
HDMI PHY on the A64 only has one clock parent. How this occurs at the
hardware level is unclear, as Allwinner has not given any feedback on
this matter. Nevertheless it is clear that the hardware acts differently
between the A64 and R40 in such a way that the R40's HDMI PHY is not
backward compatible with the A64's. As such we need to drop the fallback
compatible string in the R40's device tree. This was added in v4.19-rc1.

* tag 'sunxi-fixes-for-4.19-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-09-23 05:30:28 -07:00
Greg Kroah-Hartman 18d49ec3c6 xen: fixes for 4.19-rc5
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW6dGhgAKCRCAXGG7T9hj
 vs1UAPwPSDmelfUus+P5ndRQZdK/iQWuRgQUe9Gd3RUVTfcQ7AEAljcN4/dSj7SB
 hOgRlCp5WB1s5/vFF7z4jc2wtqvOPAk=
 =8P9c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Juergen writes:
  "xen:
   Two small fixes for xen drivers."

* tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: issue warning message when out of grant maptrack entries
  xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
2018-09-23 13:32:19 +02:00
Greg Kroah-Hartman 328c6333ba Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Thomas writes:
  "A set of fixes for x86:

   - Resolve the kvmclock regression on AMD systems with memory
     encryption enabled. The rework of the kvmclock memory allocation
     during early boot results in encrypted storage, which is not
     shareable with the hypervisor. Create a new section for this data
     which is mapped unencrypted and take care that the later
     allocations for shared kvmclock memory is unencrypted as well.

   - Fix the build regression in the paravirt code introduced by the
     recent spectre v2 updates.

   - Ensure that the initial static page tables cover the fixmap space
     correctly so early console always works. This worked so far by
     chance, but recent modifications to the fixmap layout can -
     depending on kernel configuration - move the relevant entries to a
     different place which is not covered by the initial static page
     tables.

   - Address the regressions and issues which got introduced with the
     recent extensions to the Intel Recource Director Technology code.

   - Update maintainer entries to document reality"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Expand static page table for fixmap space
  MAINTAINERS: Add X86 MM entry
  x86/intel_rdt: Add Reinette as co-maintainer for RDT
  MAINTAINERS: Add Borislav to the x86 maintainers
  x86/paravirt: Fix some warning messages
  x86/intel_rdt: Fix incorrect loop end condition
  x86/intel_rdt: Fix exclusive mode handling of MBA resource
  x86/intel_rdt: Fix incorrect loop end condition
  x86/intel_rdt: Do not allow pseudo-locking of MBA resource
  x86/intel_rdt: Fix unchecked MSR access
  x86/intel_rdt: Fix invalid mode warning when multiple resources are managed
  x86/intel_rdt: Global closid helper to support future fixes
  x86/intel_rdt: Fix size reporting of MBA resource
  x86/intel_rdt: Fix data type in parsing callbacks
  x86/kvm: Use __bss_decrypted attribute in shared variables
  x86/mm: Add .bss..decrypted section to hold shared variables
2018-09-23 08:10:12 +02:00
Greg Kroah-Hartman a27fb6d983 This pull request is slightly bigger than usual at this stage, but
I swear I would have sent it the same to Linus!  The main cause for
 this is that I was on vacation until two weeks ago and it took a while
 to sort all the pending patches between 4.19 and 4.20, test them and
 so on.
 
 It's mostly small bugfixes and cleanups, mostly around x86 nested
 virtualization.  One important change, not related to nested
 virtualization, is that the ability for the guest kernel to trap CPUID
 instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now
 masked by default.  This is because the feature is detected through an
 MSR; a very bad idea that Intel seems to like more and more.  Some
 applications choke if the other fields of that MSR are not initialized
 as on real hardware, hence we have to disable the whole MSR by default,
 as was the case before Linux 4.12.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJbpPo1AAoJEL/70l94x66DdxgH/is0qe6ZBtzb6Qc0W+8mHHD7
 nxIkWAs2V5NsouJ750YwRQ+0Ym407+wlNt30acdBUEoXhrnA5/TvyGq999XvCL96
 upWEIxpIgbvTMX/e2nLhe4wQdhsboUK4r0/B9IFgVFYrdCt5uRXjB2G4ewxcqxL/
 GxxqrAKhaRsbQG9Xv0Fw5Vohh/Ls6fQDJcyuY1EBnbMpVenq2QDLI6cOAPXncyFb
 uLN6ov4GNCWIPckwxejri5XhZesUOsafrmn48sApShh4T6TrisrdtSYdzl+DGza+
 j5vhIEwdFO5kulZ3viuhqKJOnS2+F6wvfZ75IKT0tEKeU2bi+ifGDyGRefSF6Q0=
 =YXLw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Paolo writes:
  "It's mostly small bugfixes and cleanups, mostly around x86 nested
   virtualization.  One important change, not related to nested
   virtualization, is that the ability for the guest kernel to trap
   CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
   now masked by default.  This is because the feature is detected
   through an MSR; a very bad idea that Intel seems to like more and
   more.  Some applications choke if the other fields of that MSR are
   not initialized as on real hardware, hence we have to disable the
   whole MSR by default, as was the case before Linux 4.12."

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
  KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
  kvm: selftests: Add platform_info_test
  KVM: x86: Control guest reads of MSR_PLATFORM_INFO
  KVM: x86: Turbo bits in MSR_PLATFORM_INFO
  nVMX x86: Check VPID value on vmentry of L2 guests
  nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
  KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
  KVM: VMX: check nested state and CR4.VMXE against SMM
  kvm: x86: make kvm_{load|put}_guest_fpu() static
  x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
  KVM: VMX: use preemption timer to force immediate VMExit
  KVM: VMX: modify preemption timer bit only when arming timer
  KVM: VMX: immediately mark preemption timer expired only for zero value
  KVM: SVM: Switch to bitmap_zalloc()
  KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
  kvm: selftests: use -pthread instead of -lpthread
  KVM: x86: don't reset root in kvm_mmu_setup()
  kvm: mmu: Don't read PDPTEs when paging is not enabled
  x86/kvm/lapic: always disable MMIO interface in x2APIC mode
  KVM: s390: Make huge pages unavailable in ucontrol VMs
  ...
2018-09-21 16:21:42 +02:00
Feng Tang 05ab1d8a4b x86/mm: Expand static page table for fixmap space
We met a kernel panic when enabling earlycon, which is due to the fixmap
address of earlycon is not statically setup.

Currently the static fixmap setup in head_64.S only covers 2M virtual
address space, while it actually could be in 4M space with different
kernel configurations, e.g. when VSYSCALL emulation is disabled.

So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
and add a build time check to ensure that the fixmap is covered by the
initial static page tables.

Fixes: 1ad83c858c ("x86_64,vsyscall: Make vsyscall emulation configurable")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: kernel test robot <rong.a.chen@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts)
Cc: H Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
2018-09-20 23:17:22 +02:00
Liran Alon 26b471c7e2 KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set
their return value in "r" local var and break out of switch block
when they encounter some error.
This is because vcpu_load() is called before the switch block which
have a proper cleanup of vcpu_put() afterwards.

However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return
immediately on error without performing above mentioned cleanup.

Thus, change these handlers to behave as expected.

Fixes: 8fcc4b5923 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")

Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 18:54:08 +02:00
Thiago Jung Bauermann c716a25b9b powerpc/pkeys: Fix reading of ibm, processor-storage-keys property
scan_pkey_feature() uses of_property_read_u32_array() to read the
ibm,processor-storage-keys property and calls be32_to_cpu() on the
value it gets. The problem is that of_property_read_u32_array() already
returns the value converted to the CPU byte order.

The value of pkeys_total ends up more or less sane because there's a min()
call in pkey_initialize() which reduces pkeys_total to 32. So in practice
the kernel ignores the fact that the hypervisor reserved one key for
itself (the device tree advertises 31 keys in my test VM).

This is wrong, but the effect in practice is that when a process tries to
allocate the 32nd key, it gets an -EINVAL error instead of -ENOSPC which
would indicate that there aren't any keys available

Fixes: cf43d3b264 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20 22:49:46 +10:00
Gerald Schaefer 55a5542a54 s390/hibernate: fix error handling when suspend cpu != resume cpu
The resume code checks if the resume cpu is the same as the suspend cpu.
If not, and if it is also not possible to switch to the suspend cpu, an
error message should be printed and the resume process should be stopped
by loading a disabled wait psw.

The current logic is broken in multiple ways, the message is never printed,
and the disabled wait psw never loaded because the kernel panics before that:
- sam31 and SIGP_SET_ARCHITECTURE to ESA mode is wrong, this will break
  on the first 64bit instruction in sclp_early_printk().
- The init stack should be used, but the stack pointer is not set up correctly
  (missing aghi %r15,-STACK_FRAME_OVERHEAD).
- __sclp_early_printk() checks the sclp_init_state. If it is not
  sclp_init_state_uninitialized, it simply returns w/o printing anything.
  In the resumed kernel however, sclp_init_state will never be uninitialized.

This patch fixes those issues by removing the sam31/ESA logic, adding a
correct init stack pointer, and also introducing sclp_early_printk_force()
to allow using sclp_early_printk() even when sclp_init_state is not
uninitialized.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-09-20 13:20:23 +02:00
Christophe Leroy 85682a7e3b powerpc: fix csum_ipv6_magic() on little endian platforms
On little endian platforms, csum_ipv6_magic() keeps len and proto in
CPU byte order. This generates a bad results leading to ICMPv6 packets
from other hosts being dropped by powerpc64le platforms.

In order to fix this, len and proto should be converted to network
byte order ie bigendian byte order. However checksumming 0x12345678
and 0x56341278 provide the exact same result so it is enough to
rotate the sum of len and proto by 1 byte.

PPC32 only support bigendian so the fix is needed for PPC64 only

Fixes: e9c4943a10 ("powerpc: Implement csum_ipv6_magic in assembly")
Reported-by: Jianlin Shi <jishi@redhat.com>
Reported-by: Xin Long <lucien.xin@gmail.com>
Cc: <stable@vger.kernel.org> # 4.18+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20 21:12:28 +10:00
Alexey Kardashevskiy 7233b8cab3 powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again)
mpe: This was fixed originally in commit d3d4ffaae4
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size"), but
contrary to what the merge commit says was inadvertently lost by me in
commit ce57c6610c ("Merge branch 'topic/ppc-kvm' into next") which
brought in changes that moved the code to a new file. So reapply it to
the new file.

Original commit message follows:

We use PHB in mode1 which uses bit 59 to select a correct DMA window.
However there is mode2 which uses bits 59:55 and allows up to 32 DMA
windows per a PE.

Even though documentation does not clearly specify that, it seems that
the actual hardware does not support bits 59:55 even in mode1, in
other words we can create a window as big as 1<<58 but DMA simply
won't work.

This reduces the upper limit from 59 to 55 bits to let the userspace
know about the hardware limits.

Fixes: ce57c6610c ("Merge branch 'topic/ppc-kvm' into next")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-20 14:31:03 +10:00
Icenowy Zheng 13b91ed77a ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT
The R40 HDMI PHY seems to be different to the A64 one, the A64 one
has no input mux, but the R40 one has.

Drop the A64 fallback compatible from the HDMI PHY node in R40 DT.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
[wens@csie.org: Fix subject prefix order]
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2018-09-20 11:48:01 +08:00
Drew Schmitt 6fbbde9a19 KVM: x86: Control guest reads of MSR_PLATFORM_INFO
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:46 +02:00
Drew Schmitt d84f1cff90 KVM: x86: Turbo bits in MSR_PLATFORM_INFO
Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
the CPUID faulting bit was settable. But now any bit in
MSR_PLATFORM_INFO would be settable. This can be used, for example, to
convey frequency information about the platform on which the guest is
running.

Signed-off-by: Drew Schmitt <dasch@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:46 +02:00
Krish Sadhukhan ba8e23db59 nVMX x86: Check VPID value on vmentry of L2 guests
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
following check needs to be enforced on vmentry of L2 guests:

    If the 'enable VPID' VM-execution control is 1, the value of the
    of the VPID VM-execution control field must not be 0000H.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:45 +02:00
Krish Sadhukhan 6de84e581c nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
According to section "Checks on VMX Controls" in Intel SDM vol 3C,
the following check needs to be enforced on vmentry of L2 guests:

   - Bits 5:0 of the posted-interrupt descriptor address are all 0.
   - The posted-interrupt descriptor address does not set any bits
     beyond the processor's physical-address width.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:44 +02:00
Liran Alon e6c67d8cf1 KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
it is possible for a vCPU to be blocked while it is in guest-mode.

According to Intel SDM 26.6.5 Interrupt-Window Exiting and
Virtual-Interrupt Delivery: "These events wake the logical processor
if it just entered the HLT state because of a VM entry".
Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
should be waken from the HLT state and injected with the interrupt.

In addition, if while the vCPU is blocked (while it is in guest-mode),
it receives a nested posted-interrupt, then the vCPU should also be
waken and injected with the posted interrupt.

To handle these cases, this patch enhances kvm_vcpu_has_events() to also
check if there is a pending interrupt in L2 virtual APICv provided by
L1. That is, it evaluates if there is a pending virtual interrupt for L2
by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
Evaluation of Pending Interrupts.

Note that this also handles the case of nested posted-interrupt by the
fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
kvm_vcpu_running() -> vmx_check_nested_events() ->
vmx_complete_nested_posted_interrupt().

Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:44 +02:00
Paolo Bonzini 5bea5123cb KVM: VMX: check nested state and CR4.VMXE against SMM
VMX cannot be enabled under SMM, check it when CR4 is set and when nested
virtualization state is restored.

This should fix some WARNs reported by syzkaller, mostly around
alloc_shadow_vmcs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:43 +02:00
Sebastian Andrzej Siewior 822f312d47 kvm: x86: make kvm_{load|put}_guest_fpu() static
The functions
	kvm_load_guest_fpu()
	kvm_put_guest_fpu()

are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit
e5bb40251a ("KVM: Drop kvm_{load,put}_guest_fpu() exports").

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:43 +02:00
Vitaly Kuznetsov a1efa9b700 x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
These structures are going to be used from KVM code so let's make
their names reflect their Hyper-V origin.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:42 +02:00
Sean Christopherson d264ee0c2e KVM: VMX: use preemption timer to force immediate VMExit
A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest.  Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI.  This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).

When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI.  Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed.  But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0.  Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.

For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing.  The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter.  Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.

Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate.  There are no known errata affecting timer value of '0'.

[1] I/O SMIs also have guarantees on when they arrive, but I have
    no idea if/how those are emulated in KVM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:42 +02:00
Sean Christopherson f459a707ed KVM: VMX: modify preemption timer bit only when arming timer
Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths.  For example, the preemption timer can be used to force an
immediate VMExit.  Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:51:41 +02:00
Sean Christopherson 4c008127e4 KVM: VMX: immediately mark preemption timer expired only for zero value
A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest.  This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.

Whether by design or coincidence, commit f4124500c2 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit.  Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.

Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:46 +02:00
Andy Shevchenko a101c9d63e KVM: SVM: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:45 +02:00
Tianyu Lan 9a9845867c KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
This patch is to fix the commit.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:45 +02:00
Wei Yang 83b20b28c6 KVM: x86: don't reset root in kvm_mmu_setup()
Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.

    kvm_vm_ioctl_create_vcpu()
        kvm_arch_vcpu_create()
            vmx_create_vcpu()
                kvm_vcpu_init()
                    kvm_arch_vcpu_init()
                        kvm_mmu_create()
        kvm_arch_vcpu_setup()
            kvm_mmu_setup()
                kvm_init_mmu()

This patch set reset_roots to false in kmv_mmu_setup().

Fixes: 50c28f21d0
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:44 +02:00
Junaid Shahid d35b34a9a7 kvm: mmu: Don't read PDPTEs when paging is not enabled
kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
CR4.PAE = 1.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:43 +02:00
Vitaly Kuznetsov d176620277 x86/kvm/lapic: always disable MMIO interface in x2APIC mode
When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):

PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014

The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.

When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.

Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.

The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-20 00:26:43 +02:00
Boris Ostrovsky 70513d5875 xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
Otherwise we may leak kernel stack for events that sample user
registers.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
2018-09-19 11:26:53 -04:00
Dan Carpenter 571d0563c8 x86/paravirt: Fix some warning messages
The first argument to WARN_ONCE() is a condition.

Fixes: 5800dc5c19 ("x86/paravirt: Fix spectre-v2 mitigations for paravirt guests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919103553.GD9238@mwanda
2018-09-19 13:22:04 +02:00
Thomas Petazzoni 3a58ac65e2 ARM: 8799/1: mm: fix pci_ioremap_io() offset check
IO_SPACE_LIMIT is the ending address of the PCI IO space, i.e
something like 0xfffff (and not 0x100000).

Therefore, when offset = 0xf0000 is passed as argument, this function
fails even though the offset + SZ_64K fits below the
IO_SPACE_LIMIT. This makes the last chunk of 64 KB of the I/O space
not usable as it cannot be mapped.

This patch fixes that by substracing 1 to offset + SZ_64K, so that we
compare the addrss of the last byte of the I/O space against
IO_SPACE_LIMIT instead of the address of the first byte of what is
after the I/O space.

Fixes: c279443709 ("ARM: Add fixed PCI i/o mapping")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-09-19 10:44:14 +01:00
Stefan Agner 73aeb2cbcd ARM: 8787/1: wire up io_pgetevents syscall
Wire up the new io_pgetevents syscall for ARM.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-09-19 10:44:11 +01:00
Greg Kroah-Hartman 4ca719a338 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Crypto stuff from Herbert:
  "This push fixes a potential boot hang in ccp and an incorrect
   CPU capability check in aegis/morus on x86."

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
  crypto: ccp - add timeout support in the SEV command
2018-09-19 08:29:42 +02:00
Reinette Chatre ffb2315fd2 x86/intel_rdt: Fix incorrect loop end condition
In order to determine a sane default cache allocation for a new CAT/CDP
resource group, all resource groups are checked to determine which cache
portions are available to share. At this time all possible CLOSIDs
that can be supported by the resource is checked. This is problematic
if the resource supports more CLOSIDs than another CAT/CDP resource. In
this case, the number of CLOSIDs that could be allocated are fewer than
the number of CLOSIDs that can be supported by the resource.

Limit the check of closids to that what is supported by the system based
on the minimum across all resources.

Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-10-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:07 +02:00
Reinette Chatre 939b90b20b x86/intel_rdt: Fix exclusive mode handling of MBA resource
It is possible for a resource group to consist out of MBA as well as
CAT/CDP resources. The "exclusive" resource mode only applies to the
CAT/CDP resources since MBA allocations cannot be specified to overlap
or not. When a user requests a resource group to become "exclusive" then it
can only be successful if there are CAT/CDP resources in the group
and none of their CBMs associated with the group's CLOSID overlaps with
any other resource group.

Fix the "exclusive" mode setting by failing if there isn't any CAT/CDP
resource in the group and ensuring that the CBM checking is only done on
CAT/CDP resources.

Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-9-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:07 +02:00
Reinette Chatre f0df4e1acf x86/intel_rdt: Fix incorrect loop end condition
A loop is used to check if a CAT resource's CBM of one CLOSID
overlaps with the CBM of another CLOSID of the same resource. The loop
is run over all CLOSIDs supported by the resource.

The problem with running the loop over all CLOSIDs supported by the
resource is that its number of supported CLOSIDs may be more than the
number of supported CLOSIDs on the system, which is the minimum number of
CLOSIDs supported across all resources.

Fix the loop to only consider the number of system supported CLOSIDs,
not all that are supported by the resource.

Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-8-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:06 +02:00
Reinette Chatre 32d736abed x86/intel_rdt: Do not allow pseudo-locking of MBA resource
A system supporting pseudo-locking may have MBA as well as CAT
resources of which only the CAT resources could support cache
pseudo-locking. When the schemata to be pseudo-locked is provided it
should be checked that that schemata does not attempt to pseudo-lock a
MBA resource.

Fixes: e0bdfe8e3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-7-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:06 +02:00
Reinette Chatre 70479c012b x86/intel_rdt: Fix unchecked MSR access
When a new resource group is created, it is initialized with sane
defaults that currently assume the resource being initialized is a CAT
resource. This code path is also followed by a MBA resource that is not
allocated the same as a CAT resource and as a result we encounter the
following unchecked MSR access error:

unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000
000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
Call Trace:
mba_wrmsr+0x41/0x80
update_domains+0x125/0x130
rdtgroup_mkdir+0x270/0x500

Fix the above by ensuring the initial allocation is only attempted on a
CAT resource.

Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-6-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:06 +02:00
Reinette Chatre 47d53b184a x86/intel_rdt: Fix invalid mode warning when multiple resources are managed
When multiple resources are managed by RDT, the number of CLOSIDs used
is the minimum of the CLOSIDs supported by each resource. In the function
rdt_bit_usage_show(), the annotated bitmask is created to depict how the
CAT supporting caches are being used. During this annotated bitmask
creation, each resource group is queried for its mode that is used as a
label in the annotated bitmask.

The maximum number of resource groups is currently assumed to be the
number of CLOSIDs supported by the resource for which the information is
being displayed. This is incorrect since the number of active CLOSIDs is
the minimum across all resources.

If information for a cache instance with more CLOSIDs than another is
being generated we thus encounter a warning like:

invalid mode for closid 8
WARNING: CPU: 88 PID: 1791 at [SNIP]/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
:827 rdt_bit_usage_show+0x221/0x2b0

Fix this by ensuring that only the number of supported CLOSIDs are
considered.

Fixes: e651901187 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-5-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Reinette Chatre c793da8e4c x86/intel_rdt: Global closid helper to support future fixes
The number of CLOSIDs supported by a system is the minimum number of
CLOSIDs supported by any of its resources. Care should be taken when
iterating over the CLOSIDs of a resource since it may be that the number
of CLOSIDs supported on the system is less than the number of CLOSIDs
supported by the resource.

Introduce a helper function that can be used to query the number of
CLOSIDs that is supported by all resources, irrespective of how many
CLOSIDs are supported by a particular resource.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Reinette Chatre f968dc119a x86/intel_rdt: Fix size reporting of MBA resource
Chen Yu reported a divide-by-zero error when accessing the 'size'
resctrl file when a MBA resource is enabled.

divide error: 0000 [#1] SMP PTI
CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25
RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0
Call Trace:
rdtgroup_size_show+0x11a/0x1d0
seq_read+0xd8/0x3b0

Quoting Chen Yu's report: This is because for MB resource, the
r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size()
will trigger the exception.

Fix this issue in the 'size' file by getting correct memory bandwidth value
which is in MBps when MBA software controller is enabled or in percentage
when MBA software controller is disabled.

Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations in bytes")
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20180904174614.26682-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/1537048707-76280-3-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Xiaochen Shen 753694a8df x86/intel_rdt: Fix data type in parsing callbacks
Each resource is associated with a parsing callback to parse the data
provided from user space when writing schemata file.

The 'data' parameter in the callbacks is defined as a void pointer which
is error prone due to lack of type check.

parse_bw() processes the 'data' parameter as a string while its caller
actually passes the parameter as a pointer to struct rdt_cbm_parse_data.
Thus, parse_bw() takes wrong data and causes failure of parsing MBA
throttle value.

To fix the issue, the 'data' parameter in all parsing callbacks is defined
and handled as a pointer to struct rdt_parse_data (renamed from struct
rdt_cbm_parse_data).

Fixes: 7604df6e16 ("x86/intel_rdt: Support flexible data to parsing callbacks")
Fixes: 9ab9aa15c3 ("x86/intel_rdt: Ensure requested schemata respects mode")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Alexey Brodkin 40660f1fce ARC: build: Don't set CROSS_COMPILE in arch's Makefile
There's not much sense in doing that because if user or
his build-system didn't set CROSS_COMPILE we still may
very well make incorrect guess.

But as it turned out setting CROSS_COMPILE is not as harmless
as one may think: with recent changes that implemented automatic
discovery of __host__ gcc features unconditional setup of
CROSS_COMPILE leads to failures on execution of "make xxx_defconfig"
with absent cross-compiler, for more info see [1].

Set CROSS_COMPILE as well gets in the way if we want only to build
.dtb's (again with absent cross-compiler which is not really needed
for building .dtb's), see [2].

Note, we had to change LIBGCC assignment type from ":=" to "="
so that is is resolved on its usage, otherwise if it is resolved
at declaration time with missing CROSS_COMPILE we're getting this
error message from host GCC:

| gcc: error: unrecognized command line option -mmedium-calls
| gcc: error: unrecognized command line option -mno-sdata

[1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-September/004308.html
[2] http://lists.infradead.org/pipermail/linux-snps-arc/2018-September/004320.html

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-09-18 13:24:34 -07:00
Paolo Bonzini 1795f81f61 Second set of PPC KVM fixes for 4.19
Two fixes for KVM on POWER machines.  Both of these relate to memory
 corruption and host crashes seen when transparent huge pages are
 enabled.  The first fixes a host crash that can occur when a DMA
 mapping is removed by the guest and the page mapped was part of a
 transparent huge page; the second fixes corruption that could occur
 when a hypervisor page fault for a radix guest is being serviced at
 the same time that the backing page is being collapsed or split.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbmFIyAAoJEJ2a6ncsY3Gf+L8H/jRQ0ONUpv2xrgirXdPmfuVv
 xIVejn5chiygpo3ZY2YkRGjqMoX8usA5pDQONk9duoc48FedSjjmurfAkSA8NESI
 y6DSRGB6pir/reP/7tBVk0eeeMBjbYnHPA7KfI8ijK424VmRpCT5stiUm7gQvSEm
 LSRUSLwWKfCCjU78HVtiTuK865WZifrOCy6wiNEl79F1K6T1A+LeGaKrcDLjeK/Q
 GsNSbwBK37BOvcsm0W1xrlnCmYtR/nVrhjTFMc5noBuc4znQd3wxitgiInFsOH5V
 LUWL6IStFkbGKSxVZuJilkhVF58AAisrJnwvlZsjrExWYf1J42kbyvVoURt0O8I=
 =blkZ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

Second set of PPC KVM fixes for 4.19

Two fixes for KVM on POWER machines.  Both of these relate to memory
corruption and host crashes seen when transparent huge pages are
enabled.  The first fixes a host crash that can occur when a DMA
mapping is removed by the guest and the page mapped was part of a
transparent huge page; the second fixes corruption that could occur
when a hypervisor page fault for a radix guest is being serviced at
the same time that the backing page is being collapsed or split.
2018-09-18 15:13:00 +02:00
Paolo Bonzini cb5fb87a2f KVM: s390: Fixes for 4.19
- more fallout from the hugetlbfs enablement
 - bugfix for vma handling
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJbm68DAAoJEBF7vIC1phx8i3cP/R3UwXGrTq3u5+78PfUu+Nvm
 zTSgHID25/pCsytvRcK7SSJQt+h53RjuiQlOcrtSJZvq3Btt23gl+pFltqvWGZDM
 PYgy+LaaaoIXNeOJqwkwcnNsq4usQ0N1jQ8EgNUxEc2EBkffA6mKcTS0lWDm47AE
 Yxsyz0RsvwcbYCOj10s1b4D9h97+ZkqAC2RDnafzQOb54j/aOH7blsStNDXfdkte
 xHJ7mELOwAWre2QhD7OeTX1aCjBfKUJzWK2sPxm7qt5nEtH18o9F6j2Y/d8dnWg8
 kqthyyb+nH0aHgv3G+7sKBOp1Ra7ftHAVrrbh3m9hceaW5+VuEhWSN6zVZKVdWta
 4bjiWa9I1gKS/Wz479ztdbpt+koRAYSg5CLHAD7YsBR3hHtnAZwuRO38S4Xg+Tfb
 Hp4Lhf8gnq9yQqrcyfKWQKtU9mb84oNtEw/wQPUvr2tZyeoB5NbnVy5UDa9V1DRS
 FnF1/MIZ/MszkwbNEnhk+WWFy41h+uduizfXwQ/YO/xciEEdM2TaDHuLIpBegiSO
 4kFfHRSMxqbwNhQSsJI8dIFpSHqXrtn2p6JTsWdi94B0QBYaNm4XMNpgEZIs5CsE
 aNGhQa+IZ5jk6YsGMx4mqAmV2jN9bMEoKR/SaOvrSkPvAO9QxZP4V6W72YGoi6le
 RZK8Z+HGxg6qpeaAiG0t
 =TT85
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for 4.19

- more fallout from the hugetlbfs enablement
- bugfix for vma handling
2018-09-18 15:12:51 +02:00
Michael Neuling 51c3c62b58 powerpc: Avoid code patching freed init sections
This stops us from doing code patching in init sections after they've
been freed.

In this chain:
  kvm_guest_init() ->
    kvm_use_magic_page() ->
      fault_in_pages_readable() ->
	 __get_user() ->
	   __get_user_nocheck() ->
	     barrier_nospec();

We have a code patching location at barrier_nospec() and
kvm_guest_init() is an init function. This whole chain gets inlined,
so when we free the init section (hence kvm_guest_init()), this code
goes away and hence should no longer be patched.

We seen this as userspace memory corruption when using a memory
checker while doing partition migration testing on powervm (this
starts the code patching post migration via
/sys/kernel/mobility/migration). In theory, it could also happen when
using /sys/kernel/debug/powerpc/barrier_nospec.

Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-18 22:42:54 +10:00
Greg Kroah-Hartman 5211da9ca5 Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net
Dave writes:
  "Various fixes, all over the place:

   1) OOB data generation fix in bluetooth, from Matias Karhumaa.

   2) BPF BTF boundary calculation fix, from Martin KaFai Lau.

   3) Don't bug on excessive frags, to be compatible in situations mixing
      older and newer kernels on each end.  From Juergen Gross.

   4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger.

   5) Zero keying information in TLS layer before freeing copies
      of them, from Sabrina Dubroca.

   6) Fix NULL deref in act_sample, from Davide Caratti.

   7) Orphan SKB before GRO in veth to prevent crashes with XDP,
      from Toshiaki Makita.

   8) Fix use after free in ip6_xmit, from Eric Dumazet.

   9) Fix VF mac address regression in bnxt_en, from Micahel Chan.

   10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann.

   11) Programming adjustments to r8169 which fix not being to enter deep
       sleep states on some machines, from Kai-Heng Feng and Hans de
       Goede.

   12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter
       Oskolkov."

* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits)
  net/ipv6: do not copy dst flags on rt init
  qmi_wwan: set DTR for modems in forced USB2 mode
  clk: x86: Stop marking clocks as CLK_IS_CRITICAL
  r8169: Get and enable optional ether_clk clock
  clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
  r8169: enable ASPM on RTL8106E
  r8169: Align ASPM/CLKREQ setting function with vendor driver
  Revert "kcm: remove any offset before parsing messages"
  kcm: remove any offset before parsing messages
  net: ethernet: Fix a unused function warning.
  net: dsa: mv88e6xxx: Fix ATU Miss Violation
  tls: fix currently broken MSG_PEEK behavior
  hv_netvsc: pair VF based on serial number
  PCI: hv: support reporting serial number as slot information
  bnxt_en: Fix VF mac address regression.
  ipv6: fix possible use-after-free in ip6_xmit()
  net: hp100: fix always-true check for link up state
  ARM: dts: at91: add new compatibility string for macb on sama5d3
  net: macb: disable scatter-gather for macb on sama5d3
  net: mvpp2: let phylink manage the carrier state
  ...
2018-09-18 09:31:53 +02:00
Nicolas Ferre 321cc359d8 ARM: dts: at91: add new compatibility string for macb on sama5d3
We need this new compatibility string as we experienced different behavior
for this 10/100Mbits/s macb interface on this particular SoC.
Backward compatibility is preserved as we keep the alternative strings.

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17 07:54:40 -07:00
Michael Neuling f14040bca8 KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
When we come into the softpatch handler (0x1500), we use r11 to store
the HSRR0 for later use by the denorm handler.

We also use the softpatch handler for the TM workarounds for
POWER9. Unfortunately, in kvmppc_interrupt_hv we later store r11 out
to the vcpu assuming it's still what we got from userspace.

This causes r11 to be corrupted in the VCPU and hence when we restore
the guest, we get a corrupted r11. We've seen this when running TM
tests inside guests on P9.

This fixes the problem by only touching r11 in the denorm case.

Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
Cc: <stable@vger.kernel.org> # 4.17+
Test-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-17 16:23:28 +10:00
Brijesh Singh 6a1cac56f4 x86/kvm: Use __bss_decrypted attribute in shared variables
The recent removal of the memblock dependency from kvmclock caused a SEV
guest regression because the wall_clock and hv_clock_boot variables are
no longer mapped decrypted when SEV is active.

Use the __bss_decrypted attribute to put the static wall_clock and
hv_clock_boot in the .bss..decrypted section so that they are mapped
decrypted during boot.

In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer
assigns either an element of the static array or dynamically allocated
memory for the pvclock data pointer. The static array are now mapped
decrypted but the dynamically allocated memory is not mapped decrypted.
However, when SEV is active this memory range must be mapped decrypted.

Add a function which is called after the page allocator is up, and
allocate memory for the pvclock data pointers for the all possible cpus.
Map this memory range as decrypted when SEV is active.

Fixes: 368a540e02 ("x86/kvmclock: Remove memblock dependency")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com
2018-09-15 20:48:46 +02:00
Brijesh Singh b3f0907c71 x86/mm: Add .bss..decrypted section to hold shared variables
kvmclock defines few static variables which are shared with the
hypervisor during the kvmclock initialization.

When SEV is active, memory is encrypted with a guest-specific key, and
if the guest OS wants to share the memory region with the hypervisor
then it must clear the C-bit before sharing it.

Currently, we use kernel_physical_mapping_init() to split large pages
before clearing the C-bit on shared pages. But it fails when called from
the kvmclock initialization (mainly because the memblock allocator is
not ready that early during boot).

Add a __bss_decrypted section attribute which can be used when defining
such shared variable. The so-defined variables will be placed in the
.bss..decrypted section. This section will be mapped with C=0 early
during boot.

The .bss..decrypted section has a big chunk of memory that may be unused
when memory encryption is not active, free it when memory encryption is
not active.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Radim Krčmář<rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
2018-09-15 20:48:45 +02:00
Linus Torvalds 27c5a778df Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingol Molnar:
 "Misc fixes:

   - EFI crash fix

   - Xen PV fixes

   - do not allow PTI on 2-level 32-bit kernels for now

   - documentation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/APM: Fix build warning when PROC_FS is not enabled
  Revert "x86/mm/legacy: Populate the user page-table with user pgd's"
  x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3
  x86/xen: Disable CPU0 hotplug for Xen PV
  x86/EISA: Don't probe EISA bus for Xen PV guests
  x86/doc: Fix Documentation/x86/earlyprintk.txt
2018-09-15 08:02:46 -10:00