The source for the Linux kernel used in Windows Subsystem for Linux 2 (WSL2)
Перейти к файлу
Andy Lutomirski bf904d2762 x86/pti/64: Remove the SYSCALL64 entry trampoline
The SYSCALL64 trampoline has a couple of nice properties:

 - The usual sequence of SWAPGS followed by two GS-relative accesses to
   set up RSP is somewhat slow because the GS-relative accesses need
   to wait for SWAPGS to finish.  The trampoline approach allows
   RIP-relative accesses to set up RSP, which avoids the stall.

 - The trampoline avoids any percpu access before CR3 is set up,
   which means that no percpu memory needs to be mapped in the user
   page tables.  This prevents using Meltdown to read any percpu memory
   outside the cpu_entry_area and prevents using timing leaks
   to directly locate the percpu areas.

The downsides of using a trampoline may outweigh the upsides, however.
It adds an extra non-contiguous I$ cache line to system calls, and it
forces an indirect jump to transfer control back to the normal kernel
text after CR3 is set up.  The latter is because x86 lacks a 64-bit
direct jump instruction that could jump from the trampoline to the entry
text.  With retpolines enabled, the indirect jump is extremely slow.

Change the code to map the percpu TSS into the user page tables to allow
the non-trampoline SYSCALL64 path to work under PTI.  This does not add a
new direct information leak, since the TSS is readable by Meltdown from the
cpu_entry_area alias regardless.  It does allow a timing attack to locate
the percpu area, but KASLR is more or less a lost cause against local
attack on CPUs vulnerable to Meltdown regardless.  As far as I'm concerned,
on current hardware, KASLR is only useful to mitigate remote attacks that
try to attack the kernel without first gaining RCE against a vulnerable
user process.

On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
overhead from ~237ns to ~228ns.

There is a possible alternative approach: Move the trampoline within 2G of
the entry text and make a separate copy for each CPU.  This would allow a
direct jump to rejoin the normal entry path. There are pro's and con's for
this approach:

 + It avoids a pipeline stall

 - It executes from an extra page and read from another extra page during
   the syscall. The latter is because it needs to use a relative
   addressing mode to find sp1 -- it's the same *cacheline*, but accessed
   using an alias, so it's an extra TLB entry.

 - Slightly more memory. This would be one page per CPU for a simple
   implementation and 64-ish bytes per CPU or one page per node for a more
   complex implementation.

 - More code complexity.

The current approach is chosen for simplicity and because the alternative
does not provide a significant benefit, which makes it worth.

[ tglx: Added the alternative discussion to the changelog ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
2018-09-12 21:33:53 +02:00
Documentation SCSI fixes on 20180905 2018-09-05 09:17:20 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
arch x86/pti/64: Remove the SYSCALL64 entry trampoline 2018-09-12 21:33:53 +02:00
block block: bsg: move atomic_t ref_count variable to refcount API 2018-08-27 19:17:02 -06:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto DMAengine updates for v4.19-rc1 2018-08-18 15:55:59 -07:00
drivers GPIO fixes for the v4.19 cycle: 2018-09-05 09:27:45 -07:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs for-4.19-rc2-tag 2018-09-06 09:04:45 -07:00
include This fixes two bugs: 2018-09-06 09:06:49 -07:00
init Kbuild updates for v4.19 (2nd) 2018-08-25 13:40:38 -07:00
ipc ipc/shm: properly return EIDRM in shm_lock() 2018-09-04 16:45:02 -07:00
kernel This fixes two bugs: 2018-09-06 09:06:49 -07:00
lib lib/Kconfig.debug: fix three typos in help text 2018-09-04 16:45:02 -07:00
mm mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal 2018-09-04 16:45:02 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-04 12:45:11 -07:00
samples samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM 2018-08-16 21:55:32 +02:00
scripts nds32 patches for 4.19 2018-09-05 09:13:31 -07:00
security - Fix for bad debug check when converting secids to secctx 2018-09-06 09:42:14 -07:00
sound Merge branch 'akpm' (patches from Andrew) 2018-08-23 19:20:12 -07:00
tools Merge branch 'akpm' (patches from Andrew) 2018-09-04 17:01:11 -07:00
usr initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
virt ARM: Support for Group0 interrupts in guests, Cache management 2018-08-22 13:52:44 -07:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Fix up libata MAINTAINERS entry 2018-08-27 08:07:25 -07:00
Makefile Linux 4.19-rc2 2018-09-02 14:37:30 -07:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.