Intel PT decoders need access to various bits of timing related
information to be able to correctly decode timing packets from a PT
stream (MTC and CBR packets). This patch exports all the necessary
bits as sysfs attributes for the sake of consistency:
* max_nonturbo_ratio: ratio between the invariant TSC and base clock;
* tsc_art_ratio: TSC to core crystal clock ratio (also available as CPUID.15H).
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/87zisdvibe.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Not all cores prevent using Intel PT and LBRs simultaneously, although
most of them still do as of today. This patch adds an opt-in flag for
such cores to disable mutual exclusivity between PT and LBR; also flip
it on for Goldmont.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1461857746-31346-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Newer versions of Intel PT support address ranges, which can be used to
define IP address range-based filters or TraceSTOP regions. Number of
ranges in enumerated via cpuid.
This patch implements PMU callbacks and related low-level code to allow
filter validation, configuration and programming into the hardware.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1461771888-10409-7-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Nothing outside of the Intel PT driver should ever care about its MSR
bits, so there is no reason to keep them in msr-index.h. This patch
moves them to a pt-local header.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1461771888-10409-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new sanity check introduced by:
2665784850 ("perf/core: Verify we have a single perf_hw_context PMU")
... triggered on the AMD IOMMU driver.
IOMMUs are not per logical CPU, they cannot have per-task counters. Fix it.
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jroedel@suse.de
Cc: suravee.suthikulpanit@amd.com
Link: http://lkml.kernel.org/r/20160423224255.GB3430@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A while back the following commit:
d394f2d9d8 ("x86/platform/UV: Remove EFI memmap quirk for UV2+")
changed uv_system_init() to only call map_low_mmrs() on older UV1 hardware,
which requires EFI_OLD_MEMMAP to be set in order to boot.
The recent changes to the EFI memory mapping code in:
d2f7cbe7b2 ("x86/efi: Runtime services virtual mapping")
exposed some issues with the fact that we were relying on the EFI memory
mapping mechanisms to map in our MMRs for us, after commit d394f2d9d8.
Rather than revert the entire commit and go back to forcing
EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs()
back into uv_system_init(), and then fix up our EFI runtime calls to use
the appropriate page table.
For now, UV2+ will still need efi=old_map to boot, but there will be
other changes soon that should eliminate the need for this.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Russ Anderson <rja@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that syscalls are called from C code, which copies the args to
new stack slots instead of overlaying pt_regs, asmlinkage_protect()
is no longer needed.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462416278-11974-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The entry code used to cache the thread_info pointer in the EBP register,
but all the code that used it has been moved to C. Remove the unused
code to get the pointer.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462416278-11974-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that NT is filtered by the SYSENTER entry code, it is safe to skip saving and
restoring flags on task switch. Also remove a leftover reset of flags on 64-bit
fork.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1462416278-11974-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Removing the SCI penalize function as the penalty is now calculated on the
fly.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
acpi_irq_get_penalty is now calculating the penalty on the fly now.
No need to maintain global list of penalties or calculate them
at the init time. Removing duplicate code in acpi_irq_penalty_init.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch fixes the problem of incorrect nodes and pnodes being returned
when referring to nodes that either have no cpus (AKA "headless") or no
memory.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215406.192644884@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use no-op messages in place of cross-partition interrupts when nacking a
put message in the GRU. This allows us to remove MMR's as a destination
from the GRU driver.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215406.012228480@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch builds support for the new conversions of physical addresses
to and from sockets, pnodes and nodes in UV4. It is designed to be as
efficient as possible as lookups are done inside an interrupt context
in some cases. It will be further optimized when physical hardware is
available to measure execution time.
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215405.841051741@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
An aspect of the UV4 system architecture changes involve changing the
way sockets, nodes, and pnodes are translated between one another.
Decode the information from the BIOS provided EFI system table to build
the needed conversion tables.
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215405.673495324@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the UV4 system architecture addressing changes, BIOS now provides
this information via an EFI system table. This is the initial decoding
of that system table. It also collects the sizing information for
later allocation of dynamic conversion tables.
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215405.503022681@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
UV4 uses a GAM (globally addressed memory) architecture that supports
variable sized memory per node. This replaces the old "M" value (number
of address bits per node) with a range table for conversions between
addresses and physical node (pnode) id's. This table is obtained from UV
BIOS via the EFI UVsystab table. Support for older EFI UVsystab tables
is maintained.
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215405.329827545@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
UV4 requires early system wide addressing values. This involves the use
of the CPUID instruction to obtain these values. The current function
(detect_extended_topology()) in the kernel has been copied and streamlined,
with the limitation that only CPU's used by UV architectures are supported.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215405.155660884@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Migrate references from the blade info structs to the per node hub info
structs. This phases out the allocation of the list of per blade info
structs on node 0, in favor of a per node hub info struct allocated on
the node's local memory.
There are also some minor cosemetic changes in the comments and whitespace
to clean things up a bit.
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.987204515@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allocate and setup per node hub info structs. CPU 0/Node 0 hub info
is statically allocated to be accessible early in system startup. The
remaining hub info structs are allocated on the node's local memory,
and shared among the CPU's on that node. This leaves the small amount
of info unique to each CPU in the per CPU info struct.
Memory is saved by combining the common per node info fields to common
node local structs. In addtion, since the info is read only only after
setup, it should stay in the L3 cache of the local processor socket.
This should therefore improve the cache hit rate when a group of cpus
on a node are all interrupted for a common task.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.813051625@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move references to blade local processor ID to the new per cpu info
structs. Create an access function that makes this move, and other
potential moves opaque to callers of this function. Define a flag
that indicates to callers in external GPL modules that this function
replaces any local definition. This allows calling source code to be
built for both pre-UV4 kernels as well as post-UV4 kernels.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.644173122@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the references to the SCIR fields to the new per cpu info structs.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.452538234@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The major portion of the hub info is common to all cpus on that hub.
This is step one of moving the per cpu hub info to a per node hub info
struct. This patch creates the small per cpu info struct that will
contain only information specific to each CPU.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.282265563@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since UV3 and UV4 MMIOH regions are setup the same, we can use a common
function to setup both.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215404.100504077@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Clean up any redundancies caused by new UV4 MMR definitions superseding
any previously definitions local to functions.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215403.934728974@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds the MMR definitions for UV4 via an automated script that uses
the output from a hardware verilog code to symbol converter. The large
number of insertions is caused by the UV4 design changing many similarly
named fields in MMR's that are named the same. This prompted the extra
production of architecture dependent field defines.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215403.580158916@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cleanup patch to rearrange code and modify some defines so the next
patch, the new UV4 MMR definitions can be merged cleanly.
* Clean up the M/N related address constants (M is # of address bits per
blade, N is the # of blade selection bits per SSI/partition).
* Fix the lookup of the alias overlay addresses and NMI definitions to
allow for flexibility in newer UV architecture types.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215403.401604203@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This new function is generated by the UV MMR generation script to
identify MMR registers and fields that are not defined for a specific
UV architecture. With this switch, the immediate panic can be replaced
with a message and a bad return value allowing either hardware or the
emulator to diagnose the problem. It allows functions common to some
UV arches to use common defines that might not be fully defined for all
arches, as long as they do not reference them on the unsupported arches.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215403.231926687@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add UV4 specific defines to determine if current system type is a
UV4 system.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215403.072323684@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add defines to control which UV architectures are supported, and modify the
'if (is_uvX_*)' functions to return constant 0 for those not supported.
This will help optimize code paths when support for specific UV arches
is removed.
Tested-by: John Estabrook <estabrook@sgi.com>
Tested-by: Gary Kroening <gfk@sgi.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160429215402.897143440@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The promise of pretty boot splashes from firmware via BGRT was at
best only that; a promise. The kernel diligently checks to make
sure the BGRT data firmware gives it is valid, and dutifully warns
the user when it isn't. However, it does so via the pr_err log
level which seems unnecessary. The user cannot do anything about
this and there really isn't an error on the part of Linux to
correct.
This lowers the log level by using pr_notice instead. Users will
no longer have their boot process uglified by the kernel reminding
us that firmware can and often is broken when the 'quiet' kernel
parameter is specified. Ironic, considering BGRT is supposed to
make boot pretty to begin with.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Môshe van der Sterre <me@moshe.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1462303781-8686-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Guest should only trust data to be valid when version haven't changed
before and after reads of steal time. Besides not changing, it has to
be an even number. Hypervisor may write an odd number to version field
to indicate that an update is in progress.
kvm_steal_clock() in guest has already done the read side, make write
side in hypervisor more robust by following the above rule.
Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Currently x86's get_sigframe() checks for "current->sas_ss_size"
to determine whether there is a need to switch to sigaltstack.
The common practice used by all other arches is to check for
sas_ss_flags(sp) == 0
This patch makes the code consistent with other architectures.
The slight complexity of the patch is added by the optimization on
!sigstack check that was requested by Andy Lutomirski: sas_ss_flags(sp)==0
already implies that we are not on a sigstack, so the code is shuffled
to avoid the duplicate checking.
This patch should have no user-visible impact.
Signed-off-by: Stas Sergeev <stsp@list.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460665206-13646-2-git-send-email-stsp@list.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Check the MCG_STATUS_LMCES bit on Intel to verify that current MCE is
local. It is always local on AMD.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Massaged it a bit. Reflowed comments. Shut up -Wmaybe-uninitialized. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A couple of issues here:
1) MCE_LOG_LEN is only 32 - so we may have more pending records than will
fit in the buffer on high core count CPUs.
2) During a panic we may have a lot of duplicate records because multiple
logical CPUs may have seen and logged the same error because some
banks are shared.
Switch to using the genpool to look for the pending records. Squeeze out
duplicated records.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Replace all calls to MCx_IA32_{CTL,ADDR,MISC,STATUS} with the
appropriate msr_ops.
Use SMCA-specific msr_ops when on an SMCA-enabled processor.
Carved out from a patch by Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Scalable MCA processors have a whole new range of MSR addresses to
obtain bank related info such as CTL, MISC, ADDR, STATUS. Therefore, we
need a way to abstract the MSR addresses per vendor.
Carved out from a patch by Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We need to do this after __mcheck_cpu_init_vendor() as for
ScalableMCA processors, there are going to be new MSR write handlers
if the feature is detected using CPUID bit (which happens in
__mcheck_cpu_init_vendor()).
No functional change is introduced here.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For upcoming processors with Scalable MCA feature, we need to check the
"succor" CPUID bit and the TCC bit in the MCx_STATUS register in order
to grade an MCE's severity.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Simplified code flow, shortened comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1459886686-13977-3-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For Fam17h, we want to report errors that persist across reboots. Error
persistence is dependent on HW and no BIOS currently fiddles with values
here. So allow reporting of errors upon boot until something goes wrong.
Logging is disabled on older families because BIOS didn't clear the MCA
banks after a cold reset.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1459886686-13977-2-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use of a temporary R8 register here seems to be unnecessary.
"push %r8" is a two-byte insn (it needs REX prefix to specify R8),
"push $0" is two-byte too. It seems just using the latter would be
no worse.
Thus, code had an unnecessary "xorq %r8,%r8" insn.
It probably costs nothing in execution time here since we are probably
limited by store bandwidth at this point, but still.
Run-tested under QEMU: 32-bit calls still work:
/ # ./test_syscall_vdso32
[RUN] Executing 6-argument 32-bit syscall via VDSO
[OK] Arguments are preserved across syscall
[NOTE] R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK] R8..R15 did not leak kernel data
[RUN] Executing 6-argument 32-bit syscall via INT 80
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data
[RUN] Running tests under ptrace
[RUN] Executing 6-argument 32-bit syscall via VDSO
[OK] Arguments are preserved across syscall
[NOTE] R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK] R8..R15 did not leak kernel data
[RUN] Executing 6-argument 32-bit syscall via INT 80
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1462201010-16846-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If an overlapping memcpy() is ever attempted, we should at least report
it, in case it might lead to problems, so it could be changed to a
memmove() call instead.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1462229461-3370-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently to use warn(), a caller would need to include misc.h. However,
this means they would get the (unavailable during compressed boot)
gcc built-in memcpy family of functions. But since string.c is defining
these memcpy functions for use by misc.c, we end up in a weird circular
dependency.
To break this loop, move the error reporting functions outside of misc.c
with their own header so that they can be independently included by
other sources. Since the screen-writing routines use memmove(), keep the
low-level *_putstr() functions in misc.c.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1462229461-3370-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The introduction of the ISA_BUS option blocks the compilation of ISA
drivers on non-x86 platforms. The ISA_BUS configuration option should
not be necessary if the X86_32 dependency can be decoupled from the ISA
configuration option. This patch both removes the ISA_BUS configuration
option entirely and removes the X86_32 dependency from the ISA
configuration option.
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit d28bc9dd25 reversed the order of two lines which initialize cr0,
allowing the current (old) cr0 value to mess up vcpu initialization.
This was observed in the checks for cr0 X86_CR0_WP bit in the context of
kvm_mmu_reset_context(). Besides, setting vcpu->arch.cr0 after vmx_set_cr0()
is completely redundant. Change the order back to ensure proper vcpu
initialization.
The combination of booting with ovmf firmware when guest vcpus > 1 and kvm's
ept=N option being set results in a VM-entry failure. This patch fixes that.
Fixes: d28bc9dd25 ("KVM: x86: INIT and reset sequences are different")
Cc: stable@vger.kernel.org
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
The current behavior of set_thread_area() when it modifies a segment that is
currently loaded is a bit confused.
If CS [1] or SS is modified, the change will take effect on return
to userspace because CS and SS are fundamentally always reloaded on
return to userspace.
Similarly, on 32-bit kernels, if DS, ES, FS, or (depending on
configuration) GS refers to a modified segment, the change will take
effect immediately on return to user mode because the entry code
reloads these registers.
If set_thread_area() modifies DS, ES [2], FS, or GS on 64-bit kernels or
GS on 32-bit lazy-GS [3] kernels, however, the segment registers
will be left alone until something (most likely a context switch)
causes them to be reloaded. This means that behavior visible to
user space is inconsistent.
If set_thread_area() is implicitly called via CLONE_SETTLS, then all
segment registers will be reloaded before the thread starts because
CLONE_SETTLS happens before the initial context switch into the
newly created thread.
Empirically, glibc requires the immediate reload on CLONE_SETTLS --
32-bit glibc on my system does *not* manually reload GS when
creating a new thread.
Before enabling FSGSBASE, we need to figure out what the behavior
will be, as FSGSBASE requires that we reconsider our behavior when,
e.g., GS and GSBASE are out of sync in user mode. Given that we
must preserve the existing behavior of CLONE_SETTLS, it makes sense
to me that we simply extend similar behavior to all invocations
of set_thread_area().
This patch explicitly updates any segment register referring to a
segment that is targetted by set_thread_area(). If set_thread_area()
deletes the segment, then the segment register will be nulled out.
[1] This can't actually happen since 0e58af4e1d ("x86/tls:
Disallow unusual TLS segments") but, if it did, this is how it
would behave.
[2] I strongly doubt that any existing non-malicious program loads a
TLS segment into DS or ES on a 64-bit kernel because the context
switch code was badly broken until recently, but that's not an
excuse to leave the current code alone.
[3] One way or another, that config option should to go away. Yuck!
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/27d119b0d396e9b82009e40dff8333a249038225.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unlike ds and es, these are base addresses, not selectors. Rename
them so their meaning is more obvious.
On x86_32, the field is still called fs. Fixing that could make sense
as a future cleanup.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/69a18a51c4cba0ce29a241e570fc618ad721d908.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As far as I know, the optimization doesn't work on any modern distro
because modern distros use high addresses for ASLR. Remove it.
The ptrace code was either wrong or very strange, but the behavior
with this patch should be essentially identical to the behavior
without this patch unless user code goes out of its way to mislead
ptrace.
On newer CPUs, once the FSGSBASE instructions are enabled, we won't
want to use the optimized variant anyway.
This isn't actually much of a performance regression, it has no effect
on normal dynamically linked programs, and it's a considerably
simplification. It also removes some nasty special cases from code
that is already way too full of special cases for comfort.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dd1599b08866961dba9d2458faa6bbd7fba471d7.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On AMD CPUs, a failed load_gs_base currently may not clear the FS
base. Fix it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1a6c4d3a8a4e7be79ba448b42685e0321d50c14c.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On AMD CPUs, a failed loadsegment currently may not clear the FS
base. Fix it.
While we're at it, prevent loadsegment(gs, xyz) from even compiling
on 64-bit kernels. It shouldn't be used.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a084c1b93b7b1408b58d3fd0b5d6e47da8e7d7cf.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
asm/alternative.h isn't directly useful from assembly, but it
shouldn't break the build.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e5b693fcef99fe6e80341c9e97a002fb23871e91.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
alternative.h pulls in ptrace.h, which means that alternatives can't
be used in anything referenced from ptrace.h, which is a mess.
Break the dependency by pulling text patching helpers into their own
header.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/99b93b13f2c9eb671f5c98bba4c2cbdc061293a2.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Relocation handling performs bounds checking on the resulting calculated
addresses. The existing code uses output_len (VO size plus relocs size) as
the max address. This is not right since the max_addr check should stop at
the end of VO and exclude bss, brk, etc, which follows. The valid range
should be VO [_text, __bss_start] in the loaded physical address space.
This patch adds an export for __bss_start in voffset.h and uses it to
set the correct limit for max_addr.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since 'run_size' is now calculated in misc.c, the old script and associated
argument passing is no longer needed. This patch removes them, and renames
'run_size' to the more descriptive 'kernel_total_size'.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, the "run_size" variable holds the total kernel size
(size of code plus brk and bss) and is calculated via the shell script
arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
of the .bss and .brk sections from the vmlinux, and adds them as follows:
run_size = $(( $offsetA + $sizeA + $sizeB ))
However, this is not correct (it is too large). To illustrate, here's
a walk-through of the script's calculation, compared to the correct way
to find it.
First, offsetA is found as the starting address of the first .bss or
.brk section seen in the ELF file. The sizeA and sizeB values are the
respective section sizes.
[bhe@x1 linux]$ objdump -h vmlinux
vmlinux: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
27 .bss 00170000 ffffffff81ec8000 0000000001ec8000 012c8000 2**12
ALLOC
28 .brk 00027000 ffffffff82038000 0000000002038000 012c8000 2**0
ALLOC
Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
0x00027000. The resulting run_size is 0x145f000:
0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000
However, if we instead examine the ELF LOAD program headers, we see a
different picture.
[bhe@x1 linux]$ readelf -l vmlinux
Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
0x0000000000b5e000 0x0000000000b5e000 R E 200000
LOAD 0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
0x0000000000145000 0x0000000000145000 RW 200000
LOAD 0x0000000001000000 0x0000000000000000 0x0000000001d45000
0x0000000000018158 0x0000000000018158 RW 200000
LOAD 0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
0x000000000016a000 0x0000000000301000 RWE 200000
NOTE 0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
0x00000000000001bc 0x00000000000001bc 4
Section to Segment mapping:
Segment Sections...
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
__ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
__modver
01 .data .vvar
02 .data..percpu
03 .init.text .init.data .x86_cpu_dev.init .parainstructions
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
04 .notes
As mentioned, run_size needs to be the size of the running kernel
including .bss and .brk. We can see from the Section/Segment mapping
above that .bss and .brk are included in segment 03 (which corresponds
to the final LOAD program header). To find the run_size, we calculate
the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
and its MemSiz (0x0000000000301000), minus the physical load address of
the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
resulting run_size is 0x105f000:
0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000
So, from this we can see that the existing run_size calculation is
0x400000 too high. And, as it turns out, the correct run_size is
actually equal to VO_end - VO_text, which is certainly easier to calculate.
_end: 0xffffffff8205f000
_text:0xffffffff81000000
0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
As a result, run_size is a simple constant, so we don't need to pass it
around; we already have voffset.h for such things. We can share voffset.h
between misc.c and header.S instead of getting run_size in other ways.
This patch moves voffset.h creation code to boot/compressed/Makefile,
and switches misc.c to use the VO_end - VO_text calculation for run_size.
Dependence before:
boot/header.S ==> boot/voffset.h ==> vmlinux
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
Dependence after:
boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Junjie Mao <eternal.n08@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Fixes: e6023367d7 ("x86, kaslr: Prevent .bss from overlaping initrd")
Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
This doesn't work well because mkpiggy.c doesn't know the details of the
decompressor in use. As a result, it can only make an estimation, which
has risks:
- output + output_len (VO) could be much bigger than input + input_len
(ZO). In this case, the decompressed kernel plus relocs could overwrite
the decompression code while it is running.
- The head code of ZO could be bigger than z_extract_offset. In this case
an overwrite could happen when the head code is running to move ZO to
the end of buffer. Though currently the size of the head code is very
small it's still a potential risk. Since there is no rule to limit the
size of the head code of ZO, it runs the risk of suddenly becoming a
(hard to find) bug.
Instead, this moves the z_extract_offset calculation into header.S, and
makes adjustments to be sure that the above two cases can never happen,
and further corrects the comments describing the calculations.
Since we have (in the previous patch) made ZO always be located against
the end of decompression buffer, z_extract_offset is only used here to
calculate an appropriate buffer size (INIT_SIZE), and is not longer used
elsewhere. As such, it can be removed from voffset.h.
Additionally clean up #if/#else #define to improve readability.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:
77d1a49995 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e3 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
Specifically:
All names prefixed with 'VO_':
- relate to the uncompressed kernel image
- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
All names prefixed with 'ZO_':
- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code
- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
The 'INIT_SIZE' value is used to find the larger of the two image sizes:
#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)
#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif
The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)
Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.
To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.
z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)
When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:
|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|
When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.
|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|
To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.
This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.
Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.
To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Two boot crash fixes and an IRQ handling crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Handle zero vector gracefully in clear_vector_irq()
Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Pull perf fixes from Ingo Molnar:
"x86 PMU driver fixes plus a core code race fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect lbr_sel_mask value
perf/x86/intel/pt: Don't die on VMXON
perf/core: Fix perf_event_open() vs. execve() race
perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
perf/x86/intel/rapl: Add missing Haswell model
perf/x86/intel: Add model number for Skylake Server to perf
Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy. AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.
Fix it with a big hammer: run switch_mm() with IRQs off.
To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's fairly large and it has quite a few callers. This may also
help untangle some headers down the road.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently all of the functions that live in tlb.c are inlined on
!SMP builds. One can debate whether this is a good idea (in many
respects the code in tlb.c is better than the inlined UP code).
Regardless, I want to add code that needs to be built on UP and SMP
kernels and relates to tlb flushing, so arrange for tlb.c to be
compiled unconditionally.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Define ARCH_EFI_IRQ_FLAGS_MASK for x86, which will enable the generic
runtime wrapper code to detect when firmware erroneously modifies flags
over a runtime services function call.
For x86 (both 32-bit and 64-bit), we only need check the interrupt flag.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Harald Hoyer harald@redhat.com
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raphael Hertzog <hertzog@debian.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-40-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now there's a common template for {__,}efi_call_virt(), remove the
duplicate logic from the x86 EFI code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-35-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If an EFI capsule has been sent to the firmware we must match the type
of EFI reset against that required by the capsule to ensure it is
processed correctly.
Force an EFI reboot if a capsule is pending for the next reset.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: joeyli <jlee@suse.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-29-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The efifb quirks handling based on DMI identification of the platform is
specific to x86, so move it to x86 arch code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-19-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Graphics Output Protocol code executes in the stub, so create a generic
version based on the x86 version in libstub so that we can move other archs
to it in subsequent patches. The new source file gop.c is added to the
libstub build for all architectures, but only wired up for x86.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-18-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of moving this code to drivers/firmware/efi and reusing
it on ARM and arm64, apply any changes that will be required to make this
code build for other architectures. This should make it easier to track
down problems that this move may cause to its operation on x86.
Note that the generic version uses slightly different ways of casting the
protocol methods and some other variables to the correct types, since such
method calls are not loosely typed on ARM and arm64 as they are on x86.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-17-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This symbol is always set which makes it useless. Additionally we have
a kernel command-line switch, efi=debug, which actually controls the
printing of the memory map.
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-16-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Our efi_memory_desc_t type is based on EFI_MEMORY_DESCRIPTOR version 1 in
the UEFI spec. No version updates are expected, but since we are about to
introduce support for new firmware tables that use the same descriptor
type, it makes sense to at least warn if we encounter other versions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-9-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Abolish the poorly named EFI memory map, 'memmap'. It is shadowed by a
bunch of local definitions in various files and having two ways to
access the EFI memory map ('efi.memmap' vs. 'memmap') is rather
confusing.
Furthermore, IA64 doesn't even provide this global object, which has
caused issues when trying to write generic EFI memmap code.
Replace all occurrences with efi.memmap, and convert the remaining
iterator code to use for_each_efi_mem_desc().
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Luck, Tony <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Most of the users of for_each_efi_memory_desc() are equally happy
iterating over the EFI memory map in efi.memmap instead of 'memmap',
since the former is usually a pointer to the latter.
For those users that want to specify an EFI memory map other than
efi.memmap, that can be done using for_each_efi_memory_desc_in_map().
One such example is in the libstub code where the firmware is queried
directly for the memory map, it gets iterated over, and then freed.
This change goes part of the way toward deleting the global 'memmap'
variable, which is not universally available on all architectures
(notably IA64) and is rather poorly named.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-7-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's not at all obvious that populate_pgd() and friends are only
executed when mapping EFI virtual memory regions or that no other
pageattr callers pass a ->pgd value.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The EFI_SYSTEM_TABLES status bit is set by all EFI supporting architectures
upon discovery of the EFI system table, but the bit is never tested in any
code we have in the tree. So remove it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Luck, Tony <tony.luck@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of having non-standard memcpy() behavior, explicitly call the new
function memmove(), make it available to the decompressors, and switch
the two overlap cases (screen scrolling and ELF parsing) to use memmove().
Additionally documents the purpose of compressed/string.c.
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20160426214606.GA5758@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes a bug which was introduced by:
b16a5b52eb ("perf/x86: Add option to disable reading branch flags/cycles")
In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
set in lbr_select.
This patch corrects the LBR_SEL_MASK by including all valid bits in
LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
LBR_SELECT. It does not operate in suppress mode, so it needs to be
specially handled in intel_pmu_setup_hw_lbr_filter.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some versions of Intel PT do not support tracing across VMXON, more
specifically, VMXON will clear TraceEn control bit and any attempt to
set it before VMXOFF will throw a #GP, which in the current state of
things will crash the kernel. Namely:
$ perf record -e intel_pt// kvm -nographic
on such a machine will kill it.
To avoid this, notify the intel_pt driver before VMXON and after
VMXOFF so that it knows when not to enable itself.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
referenced by filter_events() which expects undefined events to have a
value of 0.
Found via KASAN:
UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
index 9 is out of range for type 'u64 [9]'
UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().
The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().
We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.
Remove the BUG_ON and return if vector is zero,
[ tglx: Massaged changelog ]
Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.
And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.
The new file is:
# cat /proc/sys/kernel/perf_event_max_stack
127
Chaging it:
# echo 256 > /proc/sys/kernel/perf_event_max_stack
# cat /proc/sys/kernel/perf_event_max_stack
256
But as soon as there is some event using callchains we get:
# echo 512 > /proc/sys/kernel/perf_event_max_stack
-bash: echo: write error: Device or resource busy
#
Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.
Reported-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This reverts commit 320d25b6a0.
This change was problematic for a couple of reasons:
1. It missed a some entry points (Xen things and 64-bit native).
2. The entry it changed can be executed more than once. This isn't
really a problem, but it conflated per-cpu state setup and global
state setup.
3. It broke 64-bit non-NX. 64-bit non-NX worked the other way around from
32-bit -- __supported_pte_mask had NX set initially and was *cleared*
in x86_configure_nx. With the patch applied, it never got cleared.
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/59bd15f7f4b56b633a611b7f70876c6d2ad01a98.1461685884.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Misc fixes: two EDAC driver fixes, a Xen crash fix, a HyperV log spam
fix and a documentation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86 EDAC, sb_edac.c: Take account of channel hashing when needed
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
x86/doc: Correct limits in Documentation/x86/x86_64/mm.txt
x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances
With the array aligned as per events/intel/core.c it was fairly
obvious we missed one, add it in.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Re-order the model array to match the order in events/intel/core.c,
to easier spot gaps and such.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add Skylake client support for RAPL domains. In addition to RAPL domains
in Broadwell clients, it has support for platform domain (aka PSys). The
PSys domain controls the entire SoC instead of just a CPU package. Unlike
package domain, PSys support requires more than just processor level
implementation. The other parts in the system need additional HW level
signaling, which OEMs need to support. When not supported, the energy
counter register in PSys domain returns 0.
Also corrected error in comment for GPU counter, which previously was
DRAM counter.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com
[ Cnverted to model_match stuff. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: jacob.jun.pan@linux.intel.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
LBR filtering is also supported on the Silvermont and Airmont
microarchitectures. The layout of MSR_LBR_SELECT is the same as Nehalem.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1460706825-46163-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add perf core PMU support for Intel Goldmont CPU cores:
- The init code is based on Silvermont.
- There is a new cache event list, based on the Silvermont cache event list.
- Goldmont has 32 LBR entries. It also uses new LBRv6 format, which
report the cycle information using upper 16-bit of the LBR_TO.
- It's recommended to use CPU_CLK_UNHALTED.CORE_P + NPEBS for precise cycles.
For details, please refer to the latest SDM058:
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1460706167-45320-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Everything the same as base Skylake, just a new model number.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1460751933-2264-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
1fb3a8b2cf ("xen/spinlock: Fix locking path engaging too soon under PVHVM.")
... moved the initalization of the kicker interrupt until after
native_cpu_up() is called.
However, when using qspinlocks, a CPU may try to kick another CPU that is
spinning (because it has not yet initialized its kicker interrupt), resulting
in the following crash during boot:
kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210!
invalid opcode: 0000 [#1] SMP
...
RIP: 0010:[<ffffffff814c97c9>] [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60
...
Call Trace:
[<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10
[<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0
[<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[<ffffffff81052936>] ? check_tsc_warp+0x76/0x150
[<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160
[<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0
[<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80
[<ffffffff8108198c>] _cpu_up+0x13c/0x180
[<ffffffff81081a4a>] cpu_up+0x7a/0xa0
[<ffffffff81f80dfc>] smp_init+0x7f/0x81
[<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212
[<ffffffff81817f30>] ? rest_init+0x80/0x80
[<ffffffff81817f3e>] kernel_init+0xe/0xe0
[<ffffffff8182488f>] ret_from_fork+0x3f/0x70
[<ffffffff81817f30>] ? rest_init+0x80/0x80
To fix this, only send the kick if the target CPU's interrupt has been
initialized. This check isn't racy, because the target is waiting for
the spinlock, so it won't have initialized the interrupt in the
meantime.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since we are removing paravirt_enabled() replace it with a
logical equivalent. Even though PNPBIOS is x86 specific we
add an arch-specific type call, which can be implemented by
any architecture to show how other legacy attribute devices
can later be also checked for with other ACPI legacy attribute
flags.
This implicates the first ACPI 5.2.9.3 IA-PC Boot Architecture
ACPI_FADT_LEGACY_DEVICES flag device, and shows how to add more.
The reason pnpbios gets a defined structure and as such uses
a different approach than the RTC legacy quirk is that ACPI
has a respective RTC flag, while pnpbios does not. We fold
the pnpbios quirk under ACPI_FADT_LEGACY_DEVICES ACPI flag
use case, and use a struct of possible devices to enable
future extensions of this.
As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:
TOTAL TEXT init.text x86_early_init_platform_quirks()
+32 +28 +28 +28
That's 4 byte overhead total, the rest is cleared out on init
as its all __init text.
v2: split out subarch handlng on switch to make it easier
later to add other subarchs. The 'fall-through' switch
handling can be confusing and we'll remove it later
when we add handling for X86_SUBARCH_CE4100.
v3: document vmlinux size impact as per 0-day, and also
explain why pnpbios is treated differently than the
RTC legacy feature.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: andriy.shevchenko@linux.intel.com
Cc: bigeasy@linutronix.de
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: ffainelli@freebox.fr
Cc: george.dunlap@citrix.com
Cc: glin@suse.com
Cc: jgross@suse.com
Cc: jlee@suse.com
Cc: josh@joshtriplett.org
Cc: julien.grall@linaro.org
Cc: konrad.wilk@oracle.com
Cc: kozerkov@parallels.com
Cc: lenb@kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-acpi@vger.kernel.org
Cc: lv.zheng@intel.com
Cc: matt@codeblueprint.co.uk
Cc: mbizon@freebox.fr
Cc: rjw@rjwysocki.net
Cc: robert.moore@intel.com
Cc: rusty@rustcorp.com.au
Cc: tiwai@suse.de
Cc: toshi.kani@hp.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1460592286-300-12-git-send-email-mcgrof@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have 4 types of x86 platforms that disable RTC:
* Intel MID
* Lguest - uses paravirt
* Xen dom-U - uses paravirt
* x86 on legacy systems annotated with an ACPI legacy flag
We can consolidate all of these into a platform specific legacy
quirk set early in boot through i386_start_kernel() and through
x86_64_start_reservations(). This deals with the RTC quirks which
we can rely on through the hardware subarch, the ACPI check can
be dealt with separately.
For Xen things are bit more complex given that the @X86_SUBARCH_XEN
x86_hardware_subarch is shared on for Xen which uses the PV path for
both domU and dom0. Since the semantics for differentiating between
the two are Xen specific we provide a platform helper to help override
default legacy features -- x86_platform.set_legacy_features(). Use
of this helper is highly discouraged, its only purpose should be
to account for the lack of semantics available within your given
x86_hardware_subarch.
As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:
TOTAL TEXT init.text x86_early_init_platform_quirks()
+70 +62 +62 +43
Only 8 bytes overhead total, as the main increase in size is
all removed via __init.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: andriy.shevchenko@linux.intel.com
Cc: bigeasy@linutronix.de
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: ffainelli@freebox.fr
Cc: george.dunlap@citrix.com
Cc: glin@suse.com
Cc: jlee@suse.com
Cc: josh@joshtriplett.org
Cc: julien.grall@linaro.org
Cc: konrad.wilk@oracle.com
Cc: kozerkov@parallels.com
Cc: lenb@kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-acpi@vger.kernel.org
Cc: lv.zheng@intel.com
Cc: matt@codeblueprint.co.uk
Cc: mbizon@freebox.fr
Cc: rjw@rjwysocki.net
Cc: robert.moore@intel.com
Cc: rusty@rustcorp.com.au
Cc: tiwai@suse.de
Cc: toshi.kani@hp.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'cpu_has_pse' has changed to boot_cpu_has(X86_FEATURE_PSE), fix this
up in the merge commit when merging the x86/urgent tree that includes
the following commit:
103f6112f2 ("x86/mm/xen: Suppress hugetlbfs in PV guests")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:
kernel BUG at .../fs/hugetlbfs/inode.c:428!
invalid opcode: 0000 [#1] SMP
...
RIP: e030:[<ffffffff811c333b>] [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
...
Call Trace:
[<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
[<ffffffff81167b3d>] evict+0xbd/0x1b0
[<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
[<ffffffff81165b0e>] dput+0x1fe/0x220
[<ffffffff81150535>] __fput+0x155/0x200
[<ffffffff81079fc0>] task_work_run+0x60/0xa0
[<ffffffff81063510>] do_exit+0x160/0x400
[<ffffffff810637eb>] do_group_exit+0x3b/0xa0
[<ffffffff8106e8bd>] get_signal+0x1ed/0x470
[<ffffffff8100f854>] do_signal+0x14/0x110
[<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
[<ffffffff814178a5>] retint_user+0x8/0x13
This is CVE-2016-3961 / XSA-174.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If KASLR is built in but not available at run-time (either due to the
current conflict with hibernation, command-line request, or e820 parsing
failures), announce the state explicitly. To support this, a new "warn"
function is created, based on the existing "error" function.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Two uses of memcpy() (screen scrolling and ELF parsing) were handling
overlapping memory areas. While there were no explicitly noticed bugs
here (yet), it is best to fix this so that the copying will always be
safe.
Instead of making a new memmove() function that might collide with other
memmove() definitions in the decompressors, this just makes the compressed
boot code's copy of memcpy() overlap-safe.
Suggested-by: Lasse Collin <lasse.collin@tukaani.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461185746-8017-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This rearranges the pieces needed to include the decompressor code
in misc.c. It wasn't obvious why things were there, so a comment was
added and definitions consolidated.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.
[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are currently 6 other decompressors
that could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation, fixes several typos, moves
the comment to header.S, updates references, and adds a note at the end
of the decompressor include list to remind us about updating the comment
in the future.
(Instead of moving the comment to mkpiggy.c, where the calculation
is currently happening, it is being moved to header.S because
the calculations in mkpiggy.c will be removed in favor of header.S
calculations in a following patch, and it seemed like overkill to move
the giant comment twice, especially when there's already reference to
z_extract_offset in header.S.)
Signed-off-by: Baoquan He <bhe@redhat.com>
[ Rewrote changelog, cleaned up comment style, moved comments around. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
3387a535ce ("x86/asm: Create stack frames in rwsem functions") has
added FRAME_{BEGIN,END} annotations to rwsem asm stubs. The patch
which has added call_rwsem_down_write_failed_killable() was based on an
older tree so it didn't know about annotations. Let's add them.
This addresses the following objtool warning:
arch/x86/lib/rwsem.o: warning: objtool: call_rwsem_down_write_failed_killable()+0xe: call without frame pointer save/setup
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1460541432-21631-1-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all the architectures implement the necessary glue code
we can introduce down_write_killable(). The only difference wrt. regular
down_write() is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- Incorrect output buffer size calculation in rsa-pkcs1pad
- Uninitialised padding bytes on exported state in ccp driver
- Potentially freed pointer used on completion callback in sha1-mb"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Prevent information leakage on export
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: rsa-pkcs1pad - fix dst len
kvm_make_request and kvm_check_request imply a producer-consumer
relationship; add implicit memory barriers to them. There was indeed
already a place that was adding an explicit smp_mb() to order between
kvm_check_request and the processing of the request. That memory
barrier can be removed (as an added benefit, kvm_check_request can use
smp_mb__after_atomic which is free on x86).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The obsolete sp should not be used on current vCPUs and should not hurt
vCPU's running, so skip it from for_each_gfn_sp() and
for_each_gfn_indirect_valid_sp()
The side effort is we will double check role.invalid in kvm_mmu_get_page()
but i think it is okay as role is well cached
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since accumulate_steal_time is now only called in record_steal_time, it
doesn't quite make sense to put the delta calculation in a separate
function. The function could be called thousands of times before guest
enables the steal time MSR (though the compiler may optimize out this
function call). And after it's enabled, the MSR enable bit is tested twice
every time. Removing the accumulate_steal_time function also avoids the
necessity of having the accum_steal field.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit 2aedcd098a ('kbuild: suppress annoying "... is up to
date." message'), $(call if_changed,...) is evaluated to "@:"
when there is nothing to do.
We no longer need to add "@:" after $(call if_changed,...) to
suppress "... is up to date." message.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
These targets are marked as PHONY. No need to add FORCE to their
dependency.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Now that local_clock() is explicitly inlined in sched.h, taking its
pointer would uninline it in the compilation unit where it's done,
making (among other things) comparing pointers to this function
produce different results in different compilation units.
Case in point, x86 perf core's user page updating function compares
event's clock against &local_clock to see if it needs to set zero
time offset related bits in the page.
This patch fixes the latter by looking at the "use_clockid" event
attribute instead, to determine whether local clock is used. Fixing
the uninlined local_clock() in perf core is left as an exercise for
the author of the prior work.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Cc: vince@deater.net
Fixes: http://lkml.kernel.org/r/1459541050-13654-1-git-send-email-daniel.lezcano@linaro.org
Link: http://lkml.kernel.org/r/1460635189-2320-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The is_ia32_task()/is_x32_task() function names are a big misnomer: they
suggests that the compat-ness of a system call is a task property, which
is not true, the compatness of a system call purely depends on how it
was invoked through the system call layer.
A task may call 32-bit and 64-bit and x32 system calls without changing
any of its kernel visible state.
This specific minomer is also actively dangerous, as it might cause kernel
developers to use the wrong kind of security checks within system calls.
So rename it to in_{ia32,x32}_syscall().
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
[ Expanded the changelog. ]
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1460987025-30360-1-git-send-email-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Very common ethernet. Already enabled in i386_defconfig
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/146105407523.18740.6392078851674393377.stgit@zurg
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The variable "random" is also the name of a libc function. It's better
coding style to avoid overloading such things, so rename it to the more
accurate "random_addr".
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The name "choose_kernel_location" isn't specific enough, and doesn't
describe the primary thing it does: choosing a random location. This
patch renames it to "choose_random_location", and clarifies the what
routines are contained in the kaslr.c source file.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function "decompress_kernel" now performs many more duties, so this
patch renames it to "extract_kernel" and updates callers and comments.
Additionally the file header comment for misc.c is improved to actually
describe what is contained.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The non-compressed boot code uses the (much more obvious) name
"boot_params" for the global pointer to the x86 boot parameters. The
compressed kernel loader code, though, was using the legacy name
"real_mode". There is no need to have a different name, and changing it
improves readability.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the boot_params can be found using the real_mode global variable,
there is no need to pass around a pointer to it. This slightly simplifies
the choose_kernel_location function and its callers.
[kees: rewrote changelog, tracked file rename]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460997735-24785-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In order to avoid confusion over what this file provides, rename it to
kaslr.c since it is used exclusively for the kernel ASLR, not userspace
ASLR.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1460997735-24785-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In arch/x86/kernel/setup.c, the #ifdef kept for CONFIG_ACPI actually is
related to the accessibility of initrd_start/initrd_end, so the stub should
be provided from this source file and should only be dependent on
CONFIG_BLK_DEV_INITRD.
Note that when ACPI=n and BLK_DEV_INITRD=y, early_initrd_acpi_init() is
still a stub because of the stub prepared for early_acpi_table_init().
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch moves acpi_os_table_override() and
acpi_os_physical_table_override() to tables.c.
Along with the mechanisms, acpi_initrd_initialize_tables() is also moved to
tables.c to form a static function. The following functions are renamed
according to this change:
1. acpi_initrd_override() -> renamed to early_acpi_table_init(), which
invokes acpi_table_initrd_init()
2. acpi_os_physical_table_override() -> which invokes
acpi_table_initrd_override()
3. acpi_initialize_initrd_tables() -> renamed to acpi_table_initrd_scan()
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch fix spelling typos found in printk
within various part of the kernel sources.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Four instances of incorrect usage of non-static "inline" crept up
in arch/x86, all trivial; cleaning them up:
EVT_TO_HPET_DEV() - made static, it is only used in kernel/hpet.c
Debug version of check_iommu_entries() is an __init function.
Non-debug dummy empty version of it is declared "inline" instead -
which doesn't help to eliminate it (the caller is in a different unit,
inlining doesn't happen).
Switch to non-inlined __init function, which does eliminate it
(by discarding it as part of __init section).
crypto/sha-mb/sha1_mb.c: looks like they just forgot to add "static"
to their two internal inlines, which emitted two unused functions into
vmlinux.
text data bss dec hex filename
95903394 20860288 35991552 152755234 91adc22 vmlinux_before
95903266 20860288 35991552 152755106 91adba2 vmlinux
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1460739626-12179-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Generation2 instances don't support reporting the NMI status on port 0x61,
read from there returns 'ff' and we end up reporting nonsensical PCI
error (as there is no PCI bus in these instances) on all NMIs:
NMI: PCI system error (SERR) for reason ff on CPU 0.
Dazed and confused, but trying to continue
Fix the issue by overriding x86_platform.get_nmi_reason. Use 'booted on
EFI' flag to detect Gen2 instances.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/1460728232-31433-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In sha_complete_job, incorrect mcryptd_hash_request_ctx pointer is used
when check and complete other jobs. If the memory of first completed req
is freed, while still completing other jobs in the func, kernel will
crash since NULL pointer is assigned to RIP.
Cc: <stable@vger.kernel.org>
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a binutils fix, an lguest fix, an mcelog fix and a missing
documentation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid using object after free in genpool
lguest, x86/entry/32: Fix handling of guest syscalls using interrupt gates
x86/build: Build compressed x86 kernels as PIE
x86/mm/pkeys: Add missing Documentation
This reverts commit c4004b02f8.
Sadly, my hope that nobody would actually use the special kernel entries
in /proc/iomem were dashed by kexec. Which reads /proc/iomem explicitly
to find the kernel base address. Nasty.
Anyway, that means we can't do the sane and simple thing and just remove
the entries, and we'll instead have to mask them out based on permissions.
Reported-by: Zhengyu Zhang <zhezhang@redhat.com>
Reported-by: Dave Young <dyoung@redhat.com>
Reported-by: Freeman Zhang <freeman.zhang1992@gmail.com>
Reported-by: Emrah Demir <ed@abdsec.com>
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Wrong indentation in the PMU code from the merge window
- A long-time bug occuring with running ntpd on the host, candidate for stable
- Properly handle (and warn about) the unsupported configuration of running on
systems with less than 40 bits of PA space
- More fixes to the PM and hotplug notifier stuff from the merge window
x86:
- leak of guest xcr0 (typically shows up as SIGILL)
- new maintainer (who is sending the pull request too)
- fix for merge window regression
- fix for guest CPUID
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXCrBeAAoJEL/70l94x66DHlYIAJseQjhyXMm3J8vkZ1qoPjiM
IWUV3BoSDpMjIXZdiouDShCBwBE1zDoFCXzkGO8FGNpYTy/4WQ2DF+fQr8OFveOv
bpLJI6iYpCESh5ihhp8UDTD9oY4ZAxyPfNUx06Rirze2ijDr6rWkM16bKgdIlMpa
rzGUQWodZO0odCWVMAXSe08uistvqZ71iacpAIJQJrx3MJcq1u2+Y2daZO3df1R6
GnWxY5SK2ZvrNEwjQPXAYgCPGaJKEVUbr9BvWFoQ2O+qGA4TPv4jrOsAQ7bRh4Rr
SGOH465Qdj2EA7eJrP2/NAlD5N7H7RWkiLiyIGF4OTS6wJHU8jdnOLbYPx0VmQg=
=1OKc
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM fixes:
- Wrong indentation in the PMU code from the merge window
- A long-time bug occuring with running ntpd on the host, candidate
for stable
- Properly handle (and warn about) the unsupported configuration of
running on systems with less than 40 bits of PA space
- More fixes to the PM and hotplug notifier stuff from the merge
window
x86:
- leak of guest xcr0 (typically shows up as SIGILL)
- new maintainer (who is sending the pull request too)
- fix for merge window regression
- fix for guest CPUID"
Paolo Bonzini points out:
"For the record, this tag is signed by me because I prepared the pull
request. Further pull requests for 4.6 will be signed and sent out by
Radim directly"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: mask CPUID(0xD,0x1).EAX against host value
kvm: x86: do not leak guest xcr0 into host interrupt handlers
KVM: MMU: fix permission_fault()
KVM: new maintainer on the block
arm64: KVM: unregister notifiers in hyp mode teardown path
arm64: KVM: Warn when PARange is less than 40 bits
KVM: arm/arm64: Handle forward time correction gracefully
arm64: KVM: Add braces to multi-line if statement in virtual PMU code
04633df0c4 ("x86/cpu: Call verify_cpu() after having entered long mode too")
added the call to verify_cpu() for sanitizing CPU configuration.
The latter uses the stack minimally and it can happen that we land in
startup_64() directly from a 64-bit bootloader. Then we want to use our
own, known good stack.
Do that.
APs don't need this as the trampoline sets up a stack for them.
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459434062-31055-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
get_bios_ebda_length() uses min_t() without including linux/kernel.h.
This may result in build errors with some configurations. Since the
function is not used anywhere in the kernel, let's just drop it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Waychison <mikew@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459558314-5625-1-git-send-email-linux@roeck-us.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav asked for a comment explaining why all exception handlers are
allowed early.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/5f1dcd6919f4a5923959a8065cb2c04d9dac1412.1459784772.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will cause unchecked native_rdmsr_safe() failures to return
deterministic results.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/515fb611449a755312a476cfe11675906e7ddf6c.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Enabling CONFIG_PARAVIRT had an unintended side effect: rdmsr() turned
into rdmsr_safe() and wrmsr() turned into wrmsr_safe(), even on bare
metal. Undo that by using the new unsafe paravirt MSR callbacks.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/414fabd6d3527703077c6c2a797223d0a9c3b081.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This adds paravirt callbacks for unsafe MSR access. On native, they
call native_{read,write}_msr(). On Xen, they use xen_{read,write}_msr_safe().
Nothing uses them yet for ease of bisection. The next patch will
use them in rdmsrl(), wrmsrl(), etc.
I intentionally didn't make them warn on #GP on Xen. I think that
should be done separately by the Xen maintainers.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/880eebc5dcd2ad9f310d41345f82061ea500e9fa.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ONCE() and, for RDMSR, a return value of zero.
To be clear, this type of failure should *not* happen. This patch
exists to minimize the chance of nasty undebuggable failures
happening when a CONFIG_PARAVIRT=y bug in the non-"safe" MSR helpers
gets fixed.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/26567b216aae70e795938f4b567eace5a0eb90ba.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
These callbacks match the _safe variants, so name them accordingly.
This will make room for unsafe PV callbacks.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/9ee3fb6a196a514c93325bdfa15594beecf04876.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that early_fixup_exception() has pt_regs, we can just call
fixup_exception() from it. This will make fancy exception handlers
work early.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/20fc047d926150cb08cb9b9f2923519b07ec1a15.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This removes a bunch of assembly and adds some C code instead. It
changes the actual printouts on both 32-bit and 64-bit kernels, but
they still seem okay.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4085070316fc3ab29538d3fcfe282648d1d4ee2e.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
C is nicer than asm.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/dd068269f8d59fe44e9e43a50d0efd67da65c2b5.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
early_fixup_exception() is limited by the fact that it doesn't have a
real struct pt_regs. Change both the 32-bit and 64-bit asm and the
C code to pass and accept a real pt_regs.
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/e3fb680fcfd5e23e38237e8328b64a25cc121d37.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
... and integrate their functionality into their single user
fpu__exception_code().
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459837795-2588-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename it to fpu__init_check_bugs() and do the CPU feature check at
entry, thus getting rid of the old fpu__init_check_bugs() wrapper.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459837795-2588-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both if-branches are under if (boot_cpu_has(X86_FEATURE_APIC)), unify
them.
Also, simplify the test for bits:
- 17 ("ApicExtBrdCst: APIC extended broadcast enable") and
- 18 ("ApicExtId: APIC extended ID enable.")
in "D18F0x68 Link Transaction Control."
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459837795-2588-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
fpregs_{g,s}et() are not sizzling-hot paths to justify the need for
static_cpu_has(). Use the normal boot_cpu_has() helper.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459837795-2588-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use static_cpu_has() in the timing-sensitive paths in fpstate_init() and
fpu__copy().
While at it, simplify the use in init_cyrix() and get rid of the ternary
operator.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following BUG_ON() crash was reported on QEMU/i386:
kernel BUG at arch/x86/mm/physaddr.c:79!
Call Trace:
phys_mem_access_prot_allowed
mmap_mem
? mmap_region
mmap_region
do_mmap
vm_mmap_pgoff
SyS_mmap_pgoff
do_int80_syscall_32
entry_INT80_32
after commit:
edfe63ec97 ("x86/mtrr: Fix Xorg crashes in Qemu sessions")
PAT is now set to disabled state when MTRRs are disabled.
Thus, reactivating the __pa(high_memory) check in
phys_mem_access_prot_allowed().
When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(),
which in turn calls slow_virt_to_phys() for 'high_memory'.
Because 'high_memory' is set to (the max direct mapped virt
addr + 1), it is not a valid virtual address. Hence,
slow_virt_to_phys() returns 0 and hit the BUG_ON. Using
__pa_nodebug() instead of __pa() will fix this BUG_ON.
However, this code block, originally written for Pentiums and
earlier, is no longer adequate since a 32-bit Xen guest has
MTRRs disabled and supports ZONE_HIGHMEM. In this setup,
this code sets UC attribute for accessing RAM in high memory
range.
Delete this code block as it has been unused for a long time.
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1460403360-25441-1-git-send-email-toshi.kani@hpe.com
Link: https://lkml.org/lkml/2016/4/1/608
Signed-off-by: Ingo Molnar <mingo@kernel.org>
mce_start() has an explicit smp_wmb() to serialize writes to global_nwo
and mce_callin. However, atomic_inc_return() implies barriers on both
sides of the call, as such simply rely on this full SMP barrier.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1458602396-840-1-git-send-email-dave@stgolabs.net
Link: http://lkml.kernel.org/r/1459929916-12852-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
... to be the same like the file name of injection module itself to
avoid confusion when grepping.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1459929916-12852-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When we loop over all queued machine check error records to pass them
to the registered notifiers we use llist_for_each_entry(). But the loop
calls gen_pool_free() for the entry in the body of the loop - and then
the iterator looks at node->next after the free.
Use llist_for_each_entry_safe() instead.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Gong Chen <gong.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/0205920@agluck-desk.sc.intel.com
Link: http://lkml.kernel.org/r/1459929916-12852-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At the moment, initialization path is using test_cpu_cap(&boot_cpu_data),
to detect PT, which is just open coding boot_cpu_has(). Use the latter
instead.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1459953307-14372-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
which uses the same fast path as __down_write() except it falls back to
call_rwsem_down_write_failed_killable() slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write() into a helper macro which just takes the semaphore
and the slow path function to be called.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-11-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.
This shouldn't introduce any functional change.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reported-by: David Smith <dsmith@redhat.com>
Tested-by: David Smith <dsmith@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160407204359.GA3720@lst.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allowing user code to map the HPET is problematic. HPET
implementations are notoriously buggy, and there are probably many
machines on which even MMIO reads from bogus HPET addresses are
problematic.
We have a report that the Dell Precision M2800 with:
ACPI: HPET 0x00000000C8FE6238 000038 (v01 DELL CBX3 01072009 AMI. 00000005)
is either so slow when accessing the HPET or actually hangs in some
regard, causing soft lockups to be reported if users do unexpected
things to the HPET.
The vclock HPET code has also always been a questionable speedup.
Accessing an HPET is exceedingly slow (on the order of several
microseconds), so the added overhead in requiring a syscall to read
the HPET is a small fraction of the total code of accessing it.
To avoid future problems, let's just delete the code entirely.
In the long run, this could actually be a speedup. Waiman Long as a
patch to optimize the case where multiple CPUs contend for the HPET,
but that won't help unless all the accesses are mediated by the
kernel.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Waiman Long <waiman.long@hpe.com>
Link: http://lkml.kernel.org/r/d2f90bba98db9905041cff294646d290d378f67a.1460074438.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
... so that it doesn't appear in objdump output.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b9c532a0e5f8d56dede2bd59767d40024d5a75e2.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input
dependency on GS. Therefore, we need to MFENCE before it.
But that MFENCE is expensive and unnecessary on the remaining x86 CPUs
out there so patch it out on the CPUs which don't require it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aec6b2df1bfc56101d4e9e2e5d5d570bf41663c6.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It was in detect_nopl(), which was either a mistake by me or some kind
of mis-merge.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ff236456f072 ("x86/cpu: Move X86_BUG_ESPFIX initialization to generic_identify")
Link: http://lkml.kernel.org/r/0949337f13660461edca08ab67d1a841441289c9.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The old code was incomprehensible and was buggy on AMD CPUs.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5f6bde874c6fe6831c6711b5b1522a238ba035b4.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
AMD and Intel do different things when writing zero to a segment
selector. Since neither vendor documents the behavior well and it's
easy to test the behavior, try nulling fs to see what happens.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/61588ba0e0df35beafd363dc8b68a4c5878ef095.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>