This is my accumulated x86 entry work, part 1, for 3.20. The meat
of this is an IST rework. When an IST exception interrupts user
space, we will handle it on the per-thread kernel stack instead of
on the IST stack. This sounds messy, but it actually simplifies the
IST entry/exit code, because it eliminates some ugly games we used
to play in order to handle rescheduling, signal delivery, etc on the
way out of an IST exception.
The IST rework introduces proper context tracking to IST exception
handlers. I haven't seen any bug reports, but the old code could
have incorrectly treated an IST exception handler as an RCU extended
quiescent state.
The memory failure change (included in this pull request with
Borislav and Tony's permission) eliminates a bunch of code that
is no longer needed now that user memory failure handlers are
called in process context.
Finally, this includes a few on Denys' uncontroversial and Obviously
Correct (tm) cleanups.
The IST and memory failure changes have been in -next for a while.
LKML references:
IST rework:
http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net
Memory failure change:
http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com
Denys' cleanups:
http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUtvkFAAoJEK9N98ZeDfrkcfsIAJxZ0UBUCEDvulbqgk/iPGOa
fIpKLMowS7CpKtw6Wdc/YvAIkeHXWm1vU44Hj0TrjSrXCgVF8yCngs/xlXtOjoa1
dosXQqgqVJJ+hyui7chAEWyalLW7bEO8raq/6snhiMrhiuEkVKpEr7Fer4FVVCZL
4VALmNQQsbV+Qq4pXIhuagZC0Nt/XKi/+/cKvhS4p//q1F/TbHTz0FpDUrh0jPMh
18WFy0jWgxdkMRnSp/wJhekvdXX6PwUy5BdES9fjw8LQJZxxFpqN3Fe1kgfyzV0k
yuvEHw1hPt2aBGj3q69wQvDVyyn4OqMpRDBhk4S+GJYmVh7mFyFMN4BDMEy/EY8=
=LXVl
-----END PGP SIGNATURE-----
Merge tag 'pr-20150114-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm
Pull x86/entry enhancements from Andy Lutomirski:
" This is my accumulated x86 entry work, part 1, for 3.20. The meat
of this is an IST rework. When an IST exception interrupts user
space, we will handle it on the per-thread kernel stack instead of
on the IST stack. This sounds messy, but it actually simplifies the
IST entry/exit code, because it eliminates some ugly games we used
to play in order to handle rescheduling, signal delivery, etc on the
way out of an IST exception.
The IST rework introduces proper context tracking to IST exception
handlers. I haven't seen any bug reports, but the old code could
have incorrectly treated an IST exception handler as an RCU extended
quiescent state.
The memory failure change (included in this pull request with
Borislav and Tony's permission) eliminates a bunch of code that
is no longer needed now that user memory failure handlers are
called in process context.
Finally, this includes a few on Denys' uncontroversial and Obviously
Correct (tm) cleanups.
The IST and memory failure changes have been in -next for a while.
LKML references:
IST rework:
http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net
Memory failure change:
http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com
Denys' cleanups:
http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com
"
This tree semantically depends on and is based on the following RCU commit:
734d168013
("rcu: Make rcu_nmi_enter() handle nesting")
... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Коммит
772a9aca12
|
@ -78,9 +78,6 @@ The expensive (paranoid) way is to read back the MSR_GS_BASE value
|
||||||
xorl %ebx,%ebx
|
xorl %ebx,%ebx
|
||||||
1: ret
|
1: ret
|
||||||
|
|
||||||
and the whole paranoid non-paranoid macro complexity is about whether
|
|
||||||
to suffer that RDMSR cost.
|
|
||||||
|
|
||||||
If we are at an interrupt or user-trap/gate-alike boundary then we can
|
If we are at an interrupt or user-trap/gate-alike boundary then we can
|
||||||
use the faster check: the stack will be a reliable indicator of
|
use the faster check: the stack will be a reliable indicator of
|
||||||
whether SWAPGS was already done: if we see that we are a secondary
|
whether SWAPGS was already done: if we see that we are a secondary
|
||||||
|
@ -93,6 +90,15 @@ which might have triggered right after a normal entry wrote CS to the
|
||||||
stack but before we executed SWAPGS, then the only safe way to check
|
stack but before we executed SWAPGS, then the only safe way to check
|
||||||
for GS is the slower method: the RDMSR.
|
for GS is the slower method: the RDMSR.
|
||||||
|
|
||||||
So we try only to mark those entry methods 'paranoid' that absolutely
|
Therefore, super-atomic entries (except NMI, which is handled separately)
|
||||||
need the more expensive check for the GS base - and we generate all
|
must use idtentry with paranoid=1 to handle gsbase correctly. This
|
||||||
'normal' entry points with the regular (faster) entry macros.
|
triggers three main behavior changes:
|
||||||
|
|
||||||
|
- Interrupt entry will use the slower gsbase check.
|
||||||
|
- Interrupt entry from user mode will switch off the IST stack.
|
||||||
|
- Interrupt exit to kernel mode will not attempt to reschedule.
|
||||||
|
|
||||||
|
We try to only use IST entries and the paranoid entry code for vectors
|
||||||
|
that absolutely need the more expensive check for the GS base - and we
|
||||||
|
generate all 'normal' entry points with the regular (faster) paranoid=0
|
||||||
|
variant.
|
||||||
|
|
|
@ -40,9 +40,11 @@ An IST is selected by a non-zero value in the IST field of an
|
||||||
interrupt-gate descriptor. When an interrupt occurs and the hardware
|
interrupt-gate descriptor. When an interrupt occurs and the hardware
|
||||||
loads such a descriptor, the hardware automatically sets the new stack
|
loads such a descriptor, the hardware automatically sets the new stack
|
||||||
pointer based on the IST value, then invokes the interrupt handler. If
|
pointer based on the IST value, then invokes the interrupt handler. If
|
||||||
software wants to allow nested IST interrupts then the handler must
|
the interrupt came from user mode, then the interrupt handler prologue
|
||||||
adjust the IST values on entry to and exit from the interrupt handler.
|
will switch back to the per-thread stack. If software wants to allow
|
||||||
(This is occasionally done, e.g. for debug exceptions.)
|
nested IST interrupts then the handler must adjust the IST values on
|
||||||
|
entry to and exit from the interrupt handler. (This is occasionally
|
||||||
|
done, e.g. for debug exceptions.)
|
||||||
|
|
||||||
Events with different IST codes (i.e. with different stacks) can be
|
Events with different IST codes (i.e. with different stacks) can be
|
||||||
nested. For example, a debug interrupt can safely be interrupted by an
|
nested. For example, a debug interrupt can safely be interrupted by an
|
||||||
|
|
|
@ -179,8 +179,8 @@ sysenter_dispatch:
|
||||||
sysexit_from_sys_call:
|
sysexit_from_sys_call:
|
||||||
andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
|
andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
|
||||||
/* clear IF, that popfq doesn't enable interrupts early */
|
/* clear IF, that popfq doesn't enable interrupts early */
|
||||||
andl $~0x200,EFLAGS-R11(%rsp)
|
andl $~0x200,EFLAGS-ARGOFFSET(%rsp)
|
||||||
movl RIP-R11(%rsp),%edx /* User %eip */
|
movl RIP-ARGOFFSET(%rsp),%edx /* User %eip */
|
||||||
CFI_REGISTER rip,rdx
|
CFI_REGISTER rip,rdx
|
||||||
RESTORE_ARGS 0,24,0,0,0,0
|
RESTORE_ARGS 0,24,0,0,0,0
|
||||||
xorq %r8,%r8
|
xorq %r8,%r8
|
||||||
|
|
|
@ -83,7 +83,6 @@ For 32-bit we have the following conventions - kernel is built with
|
||||||
#define SS 160
|
#define SS 160
|
||||||
|
|
||||||
#define ARGOFFSET R11
|
#define ARGOFFSET R11
|
||||||
#define SWFRAME ORIG_RAX
|
|
||||||
|
|
||||||
.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1, rax_enosys=0
|
.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1, rax_enosys=0
|
||||||
subq $9*8+\addskip, %rsp
|
subq $9*8+\addskip, %rsp
|
||||||
|
|
|
@ -190,7 +190,6 @@ enum mcp_flags {
|
||||||
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
|
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
|
||||||
|
|
||||||
int mce_notify_irq(void);
|
int mce_notify_irq(void);
|
||||||
void mce_notify_process(void);
|
|
||||||
|
|
||||||
DECLARE_PER_CPU(struct mce, injectm);
|
DECLARE_PER_CPU(struct mce, injectm);
|
||||||
|
|
||||||
|
|
|
@ -75,7 +75,6 @@ struct thread_info {
|
||||||
#define TIF_SYSCALL_EMU 6 /* syscall emulation active */
|
#define TIF_SYSCALL_EMU 6 /* syscall emulation active */
|
||||||
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
|
#define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */
|
||||||
#define TIF_SECCOMP 8 /* secure computing */
|
#define TIF_SECCOMP 8 /* secure computing */
|
||||||
#define TIF_MCE_NOTIFY 10 /* notify userspace of an MCE */
|
|
||||||
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
|
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
|
||||||
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
|
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
|
||||||
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
|
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
|
||||||
|
@ -100,7 +99,6 @@ struct thread_info {
|
||||||
#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
|
#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU)
|
||||||
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
|
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
|
||||||
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
|
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
|
||||||
#define _TIF_MCE_NOTIFY (1 << TIF_MCE_NOTIFY)
|
|
||||||
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
|
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
|
||||||
#define _TIF_UPROBE (1 << TIF_UPROBE)
|
#define _TIF_UPROBE (1 << TIF_UPROBE)
|
||||||
#define _TIF_NOTSC (1 << TIF_NOTSC)
|
#define _TIF_NOTSC (1 << TIF_NOTSC)
|
||||||
|
@ -140,7 +138,7 @@ struct thread_info {
|
||||||
|
|
||||||
/* Only used for 64 bit */
|
/* Only used for 64 bit */
|
||||||
#define _TIF_DO_NOTIFY_MASK \
|
#define _TIF_DO_NOTIFY_MASK \
|
||||||
(_TIF_SIGPENDING | _TIF_MCE_NOTIFY | _TIF_NOTIFY_RESUME | \
|
(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | \
|
||||||
_TIF_USER_RETURN_NOTIFY | _TIF_UPROBE)
|
_TIF_USER_RETURN_NOTIFY | _TIF_UPROBE)
|
||||||
|
|
||||||
/* flags to check in __switch_to() */
|
/* flags to check in __switch_to() */
|
||||||
|
@ -170,6 +168,17 @@ static inline struct thread_info *current_thread_info(void)
|
||||||
return ti;
|
return ti;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static inline unsigned long current_stack_pointer(void)
|
||||||
|
{
|
||||||
|
unsigned long sp;
|
||||||
|
#ifdef CONFIG_X86_64
|
||||||
|
asm("mov %%rsp,%0" : "=g" (sp));
|
||||||
|
#else
|
||||||
|
asm("mov %%esp,%0" : "=g" (sp));
|
||||||
|
#endif
|
||||||
|
return sp;
|
||||||
|
}
|
||||||
|
|
||||||
#else /* !__ASSEMBLY__ */
|
#else /* !__ASSEMBLY__ */
|
||||||
|
|
||||||
/* how to get the thread information struct from ASM */
|
/* how to get the thread information struct from ASM */
|
||||||
|
|
|
@ -1,6 +1,7 @@
|
||||||
#ifndef _ASM_X86_TRAPS_H
|
#ifndef _ASM_X86_TRAPS_H
|
||||||
#define _ASM_X86_TRAPS_H
|
#define _ASM_X86_TRAPS_H
|
||||||
|
|
||||||
|
#include <linux/context_tracking_state.h>
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
|
|
||||||
#include <asm/debugreg.h>
|
#include <asm/debugreg.h>
|
||||||
|
@ -110,6 +111,11 @@ asmlinkage void smp_thermal_interrupt(void);
|
||||||
asmlinkage void mce_threshold_interrupt(void);
|
asmlinkage void mce_threshold_interrupt(void);
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
extern enum ctx_state ist_enter(struct pt_regs *regs);
|
||||||
|
extern void ist_exit(struct pt_regs *regs, enum ctx_state prev_state);
|
||||||
|
extern void ist_begin_non_atomic(struct pt_regs *regs);
|
||||||
|
extern void ist_end_non_atomic(void);
|
||||||
|
|
||||||
/* Interrupts/Exceptions */
|
/* Interrupts/Exceptions */
|
||||||
enum {
|
enum {
|
||||||
X86_TRAP_DE = 0, /* 0, Divide-by-zero */
|
X86_TRAP_DE = 0, /* 0, Divide-by-zero */
|
||||||
|
|
|
@ -43,6 +43,7 @@
|
||||||
#include <linux/export.h>
|
#include <linux/export.h>
|
||||||
|
|
||||||
#include <asm/processor.h>
|
#include <asm/processor.h>
|
||||||
|
#include <asm/traps.h>
|
||||||
#include <asm/mce.h>
|
#include <asm/mce.h>
|
||||||
#include <asm/msr.h>
|
#include <asm/msr.h>
|
||||||
|
|
||||||
|
@ -1002,51 +1003,6 @@ static void mce_clear_state(unsigned long *toclear)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* Need to save faulting physical address associated with a process
|
|
||||||
* in the machine check handler some place where we can grab it back
|
|
||||||
* later in mce_notify_process()
|
|
||||||
*/
|
|
||||||
#define MCE_INFO_MAX 16
|
|
||||||
|
|
||||||
struct mce_info {
|
|
||||||
atomic_t inuse;
|
|
||||||
struct task_struct *t;
|
|
||||||
__u64 paddr;
|
|
||||||
int restartable;
|
|
||||||
} mce_info[MCE_INFO_MAX];
|
|
||||||
|
|
||||||
static void mce_save_info(__u64 addr, int c)
|
|
||||||
{
|
|
||||||
struct mce_info *mi;
|
|
||||||
|
|
||||||
for (mi = mce_info; mi < &mce_info[MCE_INFO_MAX]; mi++) {
|
|
||||||
if (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) {
|
|
||||||
mi->t = current;
|
|
||||||
mi->paddr = addr;
|
|
||||||
mi->restartable = c;
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
mce_panic("Too many concurrent recoverable errors", NULL, NULL);
|
|
||||||
}
|
|
||||||
|
|
||||||
static struct mce_info *mce_find_info(void)
|
|
||||||
{
|
|
||||||
struct mce_info *mi;
|
|
||||||
|
|
||||||
for (mi = mce_info; mi < &mce_info[MCE_INFO_MAX]; mi++)
|
|
||||||
if (atomic_read(&mi->inuse) && mi->t == current)
|
|
||||||
return mi;
|
|
||||||
return NULL;
|
|
||||||
}
|
|
||||||
|
|
||||||
static void mce_clear_info(struct mce_info *mi)
|
|
||||||
{
|
|
||||||
atomic_set(&mi->inuse, 0);
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The actual machine check handler. This only handles real
|
* The actual machine check handler. This only handles real
|
||||||
* exceptions when something got corrupted coming in through int 18.
|
* exceptions when something got corrupted coming in through int 18.
|
||||||
|
@ -1063,6 +1019,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
|
||||||
{
|
{
|
||||||
struct mca_config *cfg = &mca_cfg;
|
struct mca_config *cfg = &mca_cfg;
|
||||||
struct mce m, *final;
|
struct mce m, *final;
|
||||||
|
enum ctx_state prev_state;
|
||||||
int i;
|
int i;
|
||||||
int worst = 0;
|
int worst = 0;
|
||||||
int severity;
|
int severity;
|
||||||
|
@ -1084,6 +1041,10 @@ void do_machine_check(struct pt_regs *regs, long error_code)
|
||||||
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
|
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
|
||||||
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
|
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
|
||||||
char *msg = "Unknown";
|
char *msg = "Unknown";
|
||||||
|
u64 recover_paddr = ~0ull;
|
||||||
|
int flags = MF_ACTION_REQUIRED;
|
||||||
|
|
||||||
|
prev_state = ist_enter(regs);
|
||||||
|
|
||||||
this_cpu_inc(mce_exception_count);
|
this_cpu_inc(mce_exception_count);
|
||||||
|
|
||||||
|
@ -1203,9 +1164,9 @@ void do_machine_check(struct pt_regs *regs, long error_code)
|
||||||
if (no_way_out)
|
if (no_way_out)
|
||||||
mce_panic("Fatal machine check on current CPU", &m, msg);
|
mce_panic("Fatal machine check on current CPU", &m, msg);
|
||||||
if (worst == MCE_AR_SEVERITY) {
|
if (worst == MCE_AR_SEVERITY) {
|
||||||
/* schedule action before return to userland */
|
recover_paddr = m.addr;
|
||||||
mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV);
|
if (!(m.mcgstatus & MCG_STATUS_RIPV))
|
||||||
set_thread_flag(TIF_MCE_NOTIFY);
|
flags |= MF_MUST_KILL;
|
||||||
} else if (kill_it) {
|
} else if (kill_it) {
|
||||||
force_sig(SIGBUS, current);
|
force_sig(SIGBUS, current);
|
||||||
}
|
}
|
||||||
|
@ -1216,6 +1177,27 @@ void do_machine_check(struct pt_regs *regs, long error_code)
|
||||||
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
|
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
|
||||||
out:
|
out:
|
||||||
sync_core();
|
sync_core();
|
||||||
|
|
||||||
|
if (recover_paddr == ~0ull)
|
||||||
|
goto done;
|
||||||
|
|
||||||
|
pr_err("Uncorrected hardware memory error in user-access at %llx",
|
||||||
|
recover_paddr);
|
||||||
|
/*
|
||||||
|
* We must call memory_failure() here even if the current process is
|
||||||
|
* doomed. We still need to mark the page as poisoned and alert any
|
||||||
|
* other users of the page.
|
||||||
|
*/
|
||||||
|
ist_begin_non_atomic(regs);
|
||||||
|
local_irq_enable();
|
||||||
|
if (memory_failure(recover_paddr >> PAGE_SHIFT, MCE_VECTOR, flags) < 0) {
|
||||||
|
pr_err("Memory error not recovered");
|
||||||
|
force_sig(SIGBUS, current);
|
||||||
|
}
|
||||||
|
local_irq_disable();
|
||||||
|
ist_end_non_atomic();
|
||||||
|
done:
|
||||||
|
ist_exit(regs, prev_state);
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(do_machine_check);
|
EXPORT_SYMBOL_GPL(do_machine_check);
|
||||||
|
|
||||||
|
@ -1232,42 +1214,6 @@ int memory_failure(unsigned long pfn, int vector, int flags)
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/*
|
|
||||||
* Called in process context that interrupted by MCE and marked with
|
|
||||||
* TIF_MCE_NOTIFY, just before returning to erroneous userland.
|
|
||||||
* This code is allowed to sleep.
|
|
||||||
* Attempt possible recovery such as calling the high level VM handler to
|
|
||||||
* process any corrupted pages, and kill/signal current process if required.
|
|
||||||
* Action required errors are handled here.
|
|
||||||
*/
|
|
||||||
void mce_notify_process(void)
|
|
||||||
{
|
|
||||||
unsigned long pfn;
|
|
||||||
struct mce_info *mi = mce_find_info();
|
|
||||||
int flags = MF_ACTION_REQUIRED;
|
|
||||||
|
|
||||||
if (!mi)
|
|
||||||
mce_panic("Lost physical address for unconsumed uncorrectable error", NULL, NULL);
|
|
||||||
pfn = mi->paddr >> PAGE_SHIFT;
|
|
||||||
|
|
||||||
clear_thread_flag(TIF_MCE_NOTIFY);
|
|
||||||
|
|
||||||
pr_err("Uncorrected hardware memory error in user-access at %llx",
|
|
||||||
mi->paddr);
|
|
||||||
/*
|
|
||||||
* We must call memory_failure() here even if the current process is
|
|
||||||
* doomed. We still need to mark the page as poisoned and alert any
|
|
||||||
* other users of the page.
|
|
||||||
*/
|
|
||||||
if (!mi->restartable)
|
|
||||||
flags |= MF_MUST_KILL;
|
|
||||||
if (memory_failure(pfn, MCE_VECTOR, flags) < 0) {
|
|
||||||
pr_err("Memory error not recovered");
|
|
||||||
force_sig(SIGBUS, current);
|
|
||||||
}
|
|
||||||
mce_clear_info(mi);
|
|
||||||
}
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Action optional processing happens here (picking up
|
* Action optional processing happens here (picking up
|
||||||
* from the list of faulting pages that do_machine_check()
|
* from the list of faulting pages that do_machine_check()
|
||||||
|
|
|
@ -8,6 +8,7 @@
|
||||||
#include <linux/smp.h>
|
#include <linux/smp.h>
|
||||||
|
|
||||||
#include <asm/processor.h>
|
#include <asm/processor.h>
|
||||||
|
#include <asm/traps.h>
|
||||||
#include <asm/mce.h>
|
#include <asm/mce.h>
|
||||||
#include <asm/msr.h>
|
#include <asm/msr.h>
|
||||||
|
|
||||||
|
@ -17,8 +18,11 @@ int mce_p5_enabled __read_mostly;
|
||||||
/* Machine check handler for Pentium class Intel CPUs: */
|
/* Machine check handler for Pentium class Intel CPUs: */
|
||||||
static void pentium_machine_check(struct pt_regs *regs, long error_code)
|
static void pentium_machine_check(struct pt_regs *regs, long error_code)
|
||||||
{
|
{
|
||||||
|
enum ctx_state prev_state;
|
||||||
u32 loaddr, hi, lotype;
|
u32 loaddr, hi, lotype;
|
||||||
|
|
||||||
|
prev_state = ist_enter(regs);
|
||||||
|
|
||||||
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
|
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
|
||||||
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
|
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
|
||||||
|
|
||||||
|
@ -33,6 +37,8 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
|
||||||
}
|
}
|
||||||
|
|
||||||
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
|
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
|
||||||
|
|
||||||
|
ist_exit(regs, prev_state);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Set up machine check reporting for processors with Intel style MCE: */
|
/* Set up machine check reporting for processors with Intel style MCE: */
|
||||||
|
|
|
@ -7,14 +7,19 @@
|
||||||
#include <linux/types.h>
|
#include <linux/types.h>
|
||||||
|
|
||||||
#include <asm/processor.h>
|
#include <asm/processor.h>
|
||||||
|
#include <asm/traps.h>
|
||||||
#include <asm/mce.h>
|
#include <asm/mce.h>
|
||||||
#include <asm/msr.h>
|
#include <asm/msr.h>
|
||||||
|
|
||||||
/* Machine check handler for WinChip C6: */
|
/* Machine check handler for WinChip C6: */
|
||||||
static void winchip_machine_check(struct pt_regs *regs, long error_code)
|
static void winchip_machine_check(struct pt_regs *regs, long error_code)
|
||||||
{
|
{
|
||||||
|
enum ctx_state prev_state = ist_enter(regs);
|
||||||
|
|
||||||
printk(KERN_EMERG "CPU0: Machine Check Exception.\n");
|
printk(KERN_EMERG "CPU0: Machine Check Exception.\n");
|
||||||
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
|
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
|
||||||
|
|
||||||
|
ist_exit(regs, prev_state);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Set up machine check reporting on the Winchip C6 series */
|
/* Set up machine check reporting on the Winchip C6 series */
|
||||||
|
|
|
@ -156,27 +156,6 @@ ENDPROC(native_usergs_sysret64)
|
||||||
movq \tmp,R11+\offset(%rsp)
|
movq \tmp,R11+\offset(%rsp)
|
||||||
.endm
|
.endm
|
||||||
|
|
||||||
.macro FAKE_STACK_FRAME child_rip
|
|
||||||
/* push in order ss, rsp, eflags, cs, rip */
|
|
||||||
xorl %eax, %eax
|
|
||||||
pushq_cfi $__KERNEL_DS /* ss */
|
|
||||||
/*CFI_REL_OFFSET ss,0*/
|
|
||||||
pushq_cfi %rax /* rsp */
|
|
||||||
CFI_REL_OFFSET rsp,0
|
|
||||||
pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) /* eflags - interrupts on */
|
|
||||||
/*CFI_REL_OFFSET rflags,0*/
|
|
||||||
pushq_cfi $__KERNEL_CS /* cs */
|
|
||||||
/*CFI_REL_OFFSET cs,0*/
|
|
||||||
pushq_cfi \child_rip /* rip */
|
|
||||||
CFI_REL_OFFSET rip,0
|
|
||||||
pushq_cfi %rax /* orig rax */
|
|
||||||
.endm
|
|
||||||
|
|
||||||
.macro UNFAKE_STACK_FRAME
|
|
||||||
addq $8*6, %rsp
|
|
||||||
CFI_ADJUST_CFA_OFFSET -(6*8)
|
|
||||||
.endm
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* initial frame state for interrupts (and exceptions without error code)
|
* initial frame state for interrupts (and exceptions without error code)
|
||||||
*/
|
*/
|
||||||
|
@ -239,51 +218,6 @@ ENDPROC(native_usergs_sysret64)
|
||||||
CFI_REL_OFFSET r15, R15+\offset
|
CFI_REL_OFFSET r15, R15+\offset
|
||||||
.endm
|
.endm
|
||||||
|
|
||||||
/* save partial stack frame */
|
|
||||||
.macro SAVE_ARGS_IRQ
|
|
||||||
cld
|
|
||||||
/* start from rbp in pt_regs and jump over */
|
|
||||||
movq_cfi rdi, (RDI-RBP)
|
|
||||||
movq_cfi rsi, (RSI-RBP)
|
|
||||||
movq_cfi rdx, (RDX-RBP)
|
|
||||||
movq_cfi rcx, (RCX-RBP)
|
|
||||||
movq_cfi rax, (RAX-RBP)
|
|
||||||
movq_cfi r8, (R8-RBP)
|
|
||||||
movq_cfi r9, (R9-RBP)
|
|
||||||
movq_cfi r10, (R10-RBP)
|
|
||||||
movq_cfi r11, (R11-RBP)
|
|
||||||
|
|
||||||
/* Save rbp so that we can unwind from get_irq_regs() */
|
|
||||||
movq_cfi rbp, 0
|
|
||||||
|
|
||||||
/* Save previous stack value */
|
|
||||||
movq %rsp, %rsi
|
|
||||||
|
|
||||||
leaq -RBP(%rsp),%rdi /* arg1 for handler */
|
|
||||||
testl $3, CS-RBP(%rsi)
|
|
||||||
je 1f
|
|
||||||
SWAPGS
|
|
||||||
/*
|
|
||||||
* irq_count is used to check if a CPU is already on an interrupt stack
|
|
||||||
* or not. While this is essentially redundant with preempt_count it is
|
|
||||||
* a little cheaper to use a separate counter in the PDA (short of
|
|
||||||
* moving irq_enter into assembly, which would be too much work)
|
|
||||||
*/
|
|
||||||
1: incl PER_CPU_VAR(irq_count)
|
|
||||||
cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp
|
|
||||||
CFI_DEF_CFA_REGISTER rsi
|
|
||||||
|
|
||||||
/* Store previous stack value */
|
|
||||||
pushq %rsi
|
|
||||||
CFI_ESCAPE 0x0f /* DW_CFA_def_cfa_expression */, 6, \
|
|
||||||
0x77 /* DW_OP_breg7 */, 0, \
|
|
||||||
0x06 /* DW_OP_deref */, \
|
|
||||||
0x08 /* DW_OP_const1u */, SS+8-RBP, \
|
|
||||||
0x22 /* DW_OP_plus */
|
|
||||||
/* We entered an interrupt context - irqs are off: */
|
|
||||||
TRACE_IRQS_OFF
|
|
||||||
.endm
|
|
||||||
|
|
||||||
ENTRY(save_paranoid)
|
ENTRY(save_paranoid)
|
||||||
XCPT_FRAME 1 RDI+8
|
XCPT_FRAME 1 RDI+8
|
||||||
cld
|
cld
|
||||||
|
@ -627,19 +561,6 @@ END(\label)
|
||||||
FORK_LIKE vfork
|
FORK_LIKE vfork
|
||||||
FIXED_FRAME stub_iopl, sys_iopl
|
FIXED_FRAME stub_iopl, sys_iopl
|
||||||
|
|
||||||
ENTRY(ptregscall_common)
|
|
||||||
DEFAULT_FRAME 1 8 /* offset 8: return address */
|
|
||||||
RESTORE_TOP_OF_STACK %r11, 8
|
|
||||||
movq_cfi_restore R15+8, r15
|
|
||||||
movq_cfi_restore R14+8, r14
|
|
||||||
movq_cfi_restore R13+8, r13
|
|
||||||
movq_cfi_restore R12+8, r12
|
|
||||||
movq_cfi_restore RBP+8, rbp
|
|
||||||
movq_cfi_restore RBX+8, rbx
|
|
||||||
ret $REST_SKIP /* pop extended registers */
|
|
||||||
CFI_ENDPROC
|
|
||||||
END(ptregscall_common)
|
|
||||||
|
|
||||||
ENTRY(stub_execve)
|
ENTRY(stub_execve)
|
||||||
CFI_STARTPROC
|
CFI_STARTPROC
|
||||||
addq $8, %rsp
|
addq $8, %rsp
|
||||||
|
@ -780,7 +701,48 @@ END(interrupt)
|
||||||
/* reserve pt_regs for scratch regs and rbp */
|
/* reserve pt_regs for scratch regs and rbp */
|
||||||
subq $ORIG_RAX-RBP, %rsp
|
subq $ORIG_RAX-RBP, %rsp
|
||||||
CFI_ADJUST_CFA_OFFSET ORIG_RAX-RBP
|
CFI_ADJUST_CFA_OFFSET ORIG_RAX-RBP
|
||||||
SAVE_ARGS_IRQ
|
cld
|
||||||
|
/* start from rbp in pt_regs and jump over */
|
||||||
|
movq_cfi rdi, (RDI-RBP)
|
||||||
|
movq_cfi rsi, (RSI-RBP)
|
||||||
|
movq_cfi rdx, (RDX-RBP)
|
||||||
|
movq_cfi rcx, (RCX-RBP)
|
||||||
|
movq_cfi rax, (RAX-RBP)
|
||||||
|
movq_cfi r8, (R8-RBP)
|
||||||
|
movq_cfi r9, (R9-RBP)
|
||||||
|
movq_cfi r10, (R10-RBP)
|
||||||
|
movq_cfi r11, (R11-RBP)
|
||||||
|
|
||||||
|
/* Save rbp so that we can unwind from get_irq_regs() */
|
||||||
|
movq_cfi rbp, 0
|
||||||
|
|
||||||
|
/* Save previous stack value */
|
||||||
|
movq %rsp, %rsi
|
||||||
|
|
||||||
|
leaq -RBP(%rsp),%rdi /* arg1 for handler */
|
||||||
|
testl $3, CS-RBP(%rsi)
|
||||||
|
je 1f
|
||||||
|
SWAPGS
|
||||||
|
/*
|
||||||
|
* irq_count is used to check if a CPU is already on an interrupt stack
|
||||||
|
* or not. While this is essentially redundant with preempt_count it is
|
||||||
|
* a little cheaper to use a separate counter in the PDA (short of
|
||||||
|
* moving irq_enter into assembly, which would be too much work)
|
||||||
|
*/
|
||||||
|
1: incl PER_CPU_VAR(irq_count)
|
||||||
|
cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp
|
||||||
|
CFI_DEF_CFA_REGISTER rsi
|
||||||
|
|
||||||
|
/* Store previous stack value */
|
||||||
|
pushq %rsi
|
||||||
|
CFI_ESCAPE 0x0f /* DW_CFA_def_cfa_expression */, 6, \
|
||||||
|
0x77 /* DW_OP_breg7 */, 0, \
|
||||||
|
0x06 /* DW_OP_deref */, \
|
||||||
|
0x08 /* DW_OP_const1u */, SS+8-RBP, \
|
||||||
|
0x22 /* DW_OP_plus */
|
||||||
|
/* We entered an interrupt context - irqs are off: */
|
||||||
|
TRACE_IRQS_OFF
|
||||||
|
|
||||||
call \func
|
call \func
|
||||||
.endm
|
.endm
|
||||||
|
|
||||||
|
@ -1049,6 +1011,11 @@ ENTRY(\sym)
|
||||||
CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
|
CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
|
||||||
|
|
||||||
.if \paranoid
|
.if \paranoid
|
||||||
|
.if \paranoid == 1
|
||||||
|
CFI_REMEMBER_STATE
|
||||||
|
testl $3, CS(%rsp) /* If coming from userspace, switch */
|
||||||
|
jnz 1f /* stacks. */
|
||||||
|
.endif
|
||||||
call save_paranoid
|
call save_paranoid
|
||||||
.else
|
.else
|
||||||
call error_entry
|
call error_entry
|
||||||
|
@ -1089,6 +1056,36 @@ ENTRY(\sym)
|
||||||
jmp error_exit /* %ebx: no swapgs flag */
|
jmp error_exit /* %ebx: no swapgs flag */
|
||||||
.endif
|
.endif
|
||||||
|
|
||||||
|
.if \paranoid == 1
|
||||||
|
CFI_RESTORE_STATE
|
||||||
|
/*
|
||||||
|
* Paranoid entry from userspace. Switch stacks and treat it
|
||||||
|
* as a normal entry. This means that paranoid handlers
|
||||||
|
* run in real process context if user_mode(regs).
|
||||||
|
*/
|
||||||
|
1:
|
||||||
|
call error_entry
|
||||||
|
|
||||||
|
DEFAULT_FRAME 0
|
||||||
|
|
||||||
|
movq %rsp,%rdi /* pt_regs pointer */
|
||||||
|
call sync_regs
|
||||||
|
movq %rax,%rsp /* switch stack */
|
||||||
|
|
||||||
|
movq %rsp,%rdi /* pt_regs pointer */
|
||||||
|
|
||||||
|
.if \has_error_code
|
||||||
|
movq ORIG_RAX(%rsp),%rsi /* get error code */
|
||||||
|
movq $-1,ORIG_RAX(%rsp) /* no syscall to restart */
|
||||||
|
.else
|
||||||
|
xorl %esi,%esi /* no error code */
|
||||||
|
.endif
|
||||||
|
|
||||||
|
call \do_sym
|
||||||
|
|
||||||
|
jmp error_exit /* %ebx: no swapgs flag */
|
||||||
|
.endif
|
||||||
|
|
||||||
CFI_ENDPROC
|
CFI_ENDPROC
|
||||||
END(\sym)
|
END(\sym)
|
||||||
.endm
|
.endm
|
||||||
|
@ -1109,7 +1106,7 @@ idtentry overflow do_overflow has_error_code=0
|
||||||
idtentry bounds do_bounds has_error_code=0
|
idtentry bounds do_bounds has_error_code=0
|
||||||
idtentry invalid_op do_invalid_op has_error_code=0
|
idtentry invalid_op do_invalid_op has_error_code=0
|
||||||
idtentry device_not_available do_device_not_available has_error_code=0
|
idtentry device_not_available do_device_not_available has_error_code=0
|
||||||
idtentry double_fault do_double_fault has_error_code=1 paranoid=1
|
idtentry double_fault do_double_fault has_error_code=1 paranoid=2
|
||||||
idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0
|
idtentry coprocessor_segment_overrun do_coprocessor_segment_overrun has_error_code=0
|
||||||
idtentry invalid_TSS do_invalid_TSS has_error_code=1
|
idtentry invalid_TSS do_invalid_TSS has_error_code=1
|
||||||
idtentry segment_not_present do_segment_not_present has_error_code=1
|
idtentry segment_not_present do_segment_not_present has_error_code=1
|
||||||
|
@ -1290,16 +1287,14 @@ idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* "Paranoid" exit path from exception stack.
|
* "Paranoid" exit path from exception stack. This is invoked
|
||||||
* Paranoid because this is used by NMIs and cannot take
|
* only on return from non-NMI IST interrupts that came
|
||||||
* any kernel state for granted.
|
* from kernel space.
|
||||||
* We don't do kernel preemption checks here, because only
|
|
||||||
* NMI should be common and it does not enable IRQs and
|
|
||||||
* cannot get reschedule ticks.
|
|
||||||
*
|
*
|
||||||
* "trace" is 0 for the NMI handler only, because irq-tracing
|
* We may be returning to very strange contexts (e.g. very early
|
||||||
* is fundamentally NMI-unsafe. (we cannot change the soft and
|
* in syscall entry), so checking for preemption here would
|
||||||
* hard flags at once, atomically)
|
* be complicated. Fortunately, we there's no good reason
|
||||||
|
* to try to handle preemption here.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/* ebx: no swapgs flag */
|
/* ebx: no swapgs flag */
|
||||||
|
@ -1309,43 +1304,14 @@ ENTRY(paranoid_exit)
|
||||||
TRACE_IRQS_OFF_DEBUG
|
TRACE_IRQS_OFF_DEBUG
|
||||||
testl %ebx,%ebx /* swapgs needed? */
|
testl %ebx,%ebx /* swapgs needed? */
|
||||||
jnz paranoid_restore
|
jnz paranoid_restore
|
||||||
testl $3,CS(%rsp)
|
|
||||||
jnz paranoid_userspace
|
|
||||||
paranoid_swapgs:
|
|
||||||
TRACE_IRQS_IRETQ 0
|
TRACE_IRQS_IRETQ 0
|
||||||
SWAPGS_UNSAFE_STACK
|
SWAPGS_UNSAFE_STACK
|
||||||
RESTORE_ALL 8
|
RESTORE_ALL 8
|
||||||
jmp irq_return
|
INTERRUPT_RETURN
|
||||||
paranoid_restore:
|
paranoid_restore:
|
||||||
TRACE_IRQS_IRETQ_DEBUG 0
|
TRACE_IRQS_IRETQ_DEBUG 0
|
||||||
RESTORE_ALL 8
|
RESTORE_ALL 8
|
||||||
jmp irq_return
|
INTERRUPT_RETURN
|
||||||
paranoid_userspace:
|
|
||||||
GET_THREAD_INFO(%rcx)
|
|
||||||
movl TI_flags(%rcx),%ebx
|
|
||||||
andl $_TIF_WORK_MASK,%ebx
|
|
||||||
jz paranoid_swapgs
|
|
||||||
movq %rsp,%rdi /* &pt_regs */
|
|
||||||
call sync_regs
|
|
||||||
movq %rax,%rsp /* switch stack for scheduling */
|
|
||||||
testl $_TIF_NEED_RESCHED,%ebx
|
|
||||||
jnz paranoid_schedule
|
|
||||||
movl %ebx,%edx /* arg3: thread flags */
|
|
||||||
TRACE_IRQS_ON
|
|
||||||
ENABLE_INTERRUPTS(CLBR_NONE)
|
|
||||||
xorl %esi,%esi /* arg2: oldset */
|
|
||||||
movq %rsp,%rdi /* arg1: &pt_regs */
|
|
||||||
call do_notify_resume
|
|
||||||
DISABLE_INTERRUPTS(CLBR_NONE)
|
|
||||||
TRACE_IRQS_OFF
|
|
||||||
jmp paranoid_userspace
|
|
||||||
paranoid_schedule:
|
|
||||||
TRACE_IRQS_ON
|
|
||||||
ENABLE_INTERRUPTS(CLBR_ANY)
|
|
||||||
SCHEDULE_USER
|
|
||||||
DISABLE_INTERRUPTS(CLBR_ANY)
|
|
||||||
TRACE_IRQS_OFF
|
|
||||||
jmp paranoid_userspace
|
|
||||||
CFI_ENDPROC
|
CFI_ENDPROC
|
||||||
END(paranoid_exit)
|
END(paranoid_exit)
|
||||||
|
|
||||||
|
|
|
@ -69,16 +69,9 @@ static void call_on_stack(void *func, void *stack)
|
||||||
: "memory", "cc", "edx", "ecx", "eax");
|
: "memory", "cc", "edx", "ecx", "eax");
|
||||||
}
|
}
|
||||||
|
|
||||||
/* how to get the current stack pointer from C */
|
|
||||||
#define current_stack_pointer ({ \
|
|
||||||
unsigned long sp; \
|
|
||||||
asm("mov %%esp,%0" : "=g" (sp)); \
|
|
||||||
sp; \
|
|
||||||
})
|
|
||||||
|
|
||||||
static inline void *current_stack(void)
|
static inline void *current_stack(void)
|
||||||
{
|
{
|
||||||
return (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
|
return (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline int
|
static inline int
|
||||||
|
@ -103,7 +96,7 @@ execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq)
|
||||||
|
|
||||||
/* Save the next esp at the bottom of the stack */
|
/* Save the next esp at the bottom of the stack */
|
||||||
prev_esp = (u32 *)irqstk;
|
prev_esp = (u32 *)irqstk;
|
||||||
*prev_esp = current_stack_pointer;
|
*prev_esp = current_stack_pointer();
|
||||||
|
|
||||||
if (unlikely(overflow))
|
if (unlikely(overflow))
|
||||||
call_on_stack(print_stack_overflow, isp);
|
call_on_stack(print_stack_overflow, isp);
|
||||||
|
@ -156,7 +149,7 @@ void do_softirq_own_stack(void)
|
||||||
|
|
||||||
/* Push the previous esp onto the stack */
|
/* Push the previous esp onto the stack */
|
||||||
prev_esp = (u32 *)irqstk;
|
prev_esp = (u32 *)irqstk;
|
||||||
*prev_esp = current_stack_pointer;
|
*prev_esp = current_stack_pointer();
|
||||||
|
|
||||||
call_on_stack(__do_softirq, isp);
|
call_on_stack(__do_softirq, isp);
|
||||||
}
|
}
|
||||||
|
|
|
@ -740,12 +740,6 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
|
||||||
{
|
{
|
||||||
user_exit();
|
user_exit();
|
||||||
|
|
||||||
#ifdef CONFIG_X86_MCE
|
|
||||||
/* notify userspace of pending MCEs */
|
|
||||||
if (thread_info_flags & _TIF_MCE_NOTIFY)
|
|
||||||
mce_notify_process();
|
|
||||||
#endif /* CONFIG_X86_64 && CONFIG_X86_MCE */
|
|
||||||
|
|
||||||
if (thread_info_flags & _TIF_UPROBE)
|
if (thread_info_flags & _TIF_UPROBE)
|
||||||
uprobe_notify_resume(regs);
|
uprobe_notify_resume(regs);
|
||||||
|
|
||||||
|
|
|
@ -108,6 +108,77 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
|
||||||
preempt_count_dec();
|
preempt_count_dec();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
enum ctx_state ist_enter(struct pt_regs *regs)
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* We are atomic because we're on the IST stack (or we're on x86_32,
|
||||||
|
* in which case we still shouldn't schedule.
|
||||||
|
*/
|
||||||
|
preempt_count_add(HARDIRQ_OFFSET);
|
||||||
|
|
||||||
|
if (user_mode_vm(regs)) {
|
||||||
|
/* Other than that, we're just an exception. */
|
||||||
|
return exception_enter();
|
||||||
|
} else {
|
||||||
|
/*
|
||||||
|
* We might have interrupted pretty much anything. In
|
||||||
|
* fact, if we're a machine check, we can even interrupt
|
||||||
|
* NMI processing. We don't want in_nmi() to return true,
|
||||||
|
* but we need to notify RCU.
|
||||||
|
*/
|
||||||
|
rcu_nmi_enter();
|
||||||
|
return IN_KERNEL; /* the value is irrelevant. */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void ist_exit(struct pt_regs *regs, enum ctx_state prev_state)
|
||||||
|
{
|
||||||
|
preempt_count_sub(HARDIRQ_OFFSET);
|
||||||
|
|
||||||
|
if (user_mode_vm(regs))
|
||||||
|
return exception_exit(prev_state);
|
||||||
|
else
|
||||||
|
rcu_nmi_exit();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* ist_begin_non_atomic() - begin a non-atomic section in an IST exception
|
||||||
|
* @regs: regs passed to the IST exception handler
|
||||||
|
*
|
||||||
|
* IST exception handlers normally cannot schedule. As a special
|
||||||
|
* exception, if the exception interrupted userspace code (i.e.
|
||||||
|
* user_mode_vm(regs) would return true) and the exception was not
|
||||||
|
* a double fault, it can be safe to schedule. ist_begin_non_atomic()
|
||||||
|
* begins a non-atomic section within an ist_enter()/ist_exit() region.
|
||||||
|
* Callers are responsible for enabling interrupts themselves inside
|
||||||
|
* the non-atomic section, and callers must call is_end_non_atomic()
|
||||||
|
* before ist_exit().
|
||||||
|
*/
|
||||||
|
void ist_begin_non_atomic(struct pt_regs *regs)
|
||||||
|
{
|
||||||
|
BUG_ON(!user_mode_vm(regs));
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Sanity check: we need to be on the normal thread stack. This
|
||||||
|
* will catch asm bugs and any attempt to use ist_preempt_enable
|
||||||
|
* from double_fault.
|
||||||
|
*/
|
||||||
|
BUG_ON(((current_stack_pointer() ^ this_cpu_read_stable(kernel_stack))
|
||||||
|
& ~(THREAD_SIZE - 1)) != 0);
|
||||||
|
|
||||||
|
preempt_count_sub(HARDIRQ_OFFSET);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* ist_end_non_atomic() - begin a non-atomic section in an IST exception
|
||||||
|
*
|
||||||
|
* Ends a non-atomic section started with ist_begin_non_atomic().
|
||||||
|
*/
|
||||||
|
void ist_end_non_atomic(void)
|
||||||
|
{
|
||||||
|
preempt_count_add(HARDIRQ_OFFSET);
|
||||||
|
}
|
||||||
|
|
||||||
static nokprobe_inline int
|
static nokprobe_inline int
|
||||||
do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
|
do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
|
||||||
struct pt_regs *regs, long error_code)
|
struct pt_regs *regs, long error_code)
|
||||||
|
@ -251,6 +322,8 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
|
||||||
* end up promoting it to a doublefault. In that case, modify
|
* end up promoting it to a doublefault. In that case, modify
|
||||||
* the stack to make it look like we just entered the #GP
|
* the stack to make it look like we just entered the #GP
|
||||||
* handler from user space, similar to bad_iret.
|
* handler from user space, similar to bad_iret.
|
||||||
|
*
|
||||||
|
* No need for ist_enter here because we don't use RCU.
|
||||||
*/
|
*/
|
||||||
if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
|
if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
|
||||||
regs->cs == __KERNEL_CS &&
|
regs->cs == __KERNEL_CS &&
|
||||||
|
@ -263,12 +336,12 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
|
||||||
normal_regs->orig_ax = 0; /* Missing (lost) #GP error code */
|
normal_regs->orig_ax = 0; /* Missing (lost) #GP error code */
|
||||||
regs->ip = (unsigned long)general_protection;
|
regs->ip = (unsigned long)general_protection;
|
||||||
regs->sp = (unsigned long)&normal_regs->orig_ax;
|
regs->sp = (unsigned long)&normal_regs->orig_ax;
|
||||||
|
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
exception_enter();
|
ist_enter(regs); /* Discard prev_state because we won't return. */
|
||||||
/* Return not checked because double check cannot be ignored */
|
|
||||||
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
|
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
|
||||||
|
|
||||||
tsk->thread.error_code = error_code;
|
tsk->thread.error_code = error_code;
|
||||||
|
@ -434,7 +507,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
|
||||||
if (poke_int3_handler(regs))
|
if (poke_int3_handler(regs))
|
||||||
return;
|
return;
|
||||||
|
|
||||||
prev_state = exception_enter();
|
prev_state = ist_enter(regs);
|
||||||
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
|
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
|
||||||
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
|
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
|
||||||
SIGTRAP) == NOTIFY_STOP)
|
SIGTRAP) == NOTIFY_STOP)
|
||||||
|
@ -460,33 +533,20 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
|
||||||
preempt_conditional_cli(regs);
|
preempt_conditional_cli(regs);
|
||||||
debug_stack_usage_dec();
|
debug_stack_usage_dec();
|
||||||
exit:
|
exit:
|
||||||
exception_exit(prev_state);
|
ist_exit(regs, prev_state);
|
||||||
}
|
}
|
||||||
NOKPROBE_SYMBOL(do_int3);
|
NOKPROBE_SYMBOL(do_int3);
|
||||||
|
|
||||||
#ifdef CONFIG_X86_64
|
#ifdef CONFIG_X86_64
|
||||||
/*
|
/*
|
||||||
* Help handler running on IST stack to switch back to user stack
|
* Help handler running on IST stack to switch off the IST stack if the
|
||||||
* for scheduling or signal handling. The actual stack switch is done in
|
* interrupted code was in user mode. The actual stack switch is done in
|
||||||
* entry.S
|
* entry_64.S
|
||||||
*/
|
*/
|
||||||
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
|
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
|
||||||
{
|
{
|
||||||
struct pt_regs *regs = eregs;
|
struct pt_regs *regs = task_pt_regs(current);
|
||||||
/* Did already sync */
|
*regs = *eregs;
|
||||||
if (eregs == (struct pt_regs *)eregs->sp)
|
|
||||||
;
|
|
||||||
/* Exception from user space */
|
|
||||||
else if (user_mode(eregs))
|
|
||||||
regs = task_pt_regs(current);
|
|
||||||
/*
|
|
||||||
* Exception from kernel and interrupts are enabled. Move to
|
|
||||||
* kernel process stack.
|
|
||||||
*/
|
|
||||||
else if (eregs->flags & X86_EFLAGS_IF)
|
|
||||||
regs = (struct pt_regs *)(eregs->sp -= sizeof(struct pt_regs));
|
|
||||||
if (eregs != regs)
|
|
||||||
*regs = *eregs;
|
|
||||||
return regs;
|
return regs;
|
||||||
}
|
}
|
||||||
NOKPROBE_SYMBOL(sync_regs);
|
NOKPROBE_SYMBOL(sync_regs);
|
||||||
|
@ -554,7 +614,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
|
||||||
unsigned long dr6;
|
unsigned long dr6;
|
||||||
int si_code;
|
int si_code;
|
||||||
|
|
||||||
prev_state = exception_enter();
|
prev_state = ist_enter(regs);
|
||||||
|
|
||||||
get_debugreg(dr6, 6);
|
get_debugreg(dr6, 6);
|
||||||
|
|
||||||
|
@ -629,7 +689,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
|
||||||
debug_stack_usage_dec();
|
debug_stack_usage_dec();
|
||||||
|
|
||||||
exit:
|
exit:
|
||||||
exception_exit(prev_state);
|
ist_exit(regs, prev_state);
|
||||||
}
|
}
|
||||||
NOKPROBE_SYMBOL(do_debug);
|
NOKPROBE_SYMBOL(do_debug);
|
||||||
|
|
||||||
|
|
|
@ -759,39 +759,71 @@ void rcu_irq_enter(void)
|
||||||
/**
|
/**
|
||||||
* rcu_nmi_enter - inform RCU of entry to NMI context
|
* rcu_nmi_enter - inform RCU of entry to NMI context
|
||||||
*
|
*
|
||||||
* If the CPU was idle with dynamic ticks active, and there is no
|
* If the CPU was idle from RCU's viewpoint, update rdtp->dynticks and
|
||||||
* irq handler running, this updates rdtp->dynticks_nmi to let the
|
* rdtp->dynticks_nmi_nesting to let the RCU grace-period handling know
|
||||||
* RCU grace-period handling know that the CPU is active.
|
* that the CPU is active. This implementation permits nested NMIs, as
|
||||||
|
* long as the nesting level does not overflow an int. (You will probably
|
||||||
|
* run out of stack space first.)
|
||||||
*/
|
*/
|
||||||
void rcu_nmi_enter(void)
|
void rcu_nmi_enter(void)
|
||||||
{
|
{
|
||||||
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
|
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
|
||||||
|
int incby = 2;
|
||||||
|
|
||||||
if (rdtp->dynticks_nmi_nesting == 0 &&
|
/* Complain about underflow. */
|
||||||
(atomic_read(&rdtp->dynticks) & 0x1))
|
WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 0);
|
||||||
return;
|
|
||||||
rdtp->dynticks_nmi_nesting++;
|
/*
|
||||||
smp_mb__before_atomic(); /* Force delay from prior write. */
|
* If idle from RCU viewpoint, atomically increment ->dynticks
|
||||||
atomic_inc(&rdtp->dynticks);
|
* to mark non-idle and increment ->dynticks_nmi_nesting by one.
|
||||||
/* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
|
* Otherwise, increment ->dynticks_nmi_nesting by two. This means
|
||||||
smp_mb__after_atomic(); /* See above. */
|
* if ->dynticks_nmi_nesting is equal to one, we are guaranteed
|
||||||
WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
|
* to be in the outermost NMI handler that interrupted an RCU-idle
|
||||||
|
* period (observation due to Andy Lutomirski).
|
||||||
|
*/
|
||||||
|
if (!(atomic_read(&rdtp->dynticks) & 0x1)) {
|
||||||
|
smp_mb__before_atomic(); /* Force delay from prior write. */
|
||||||
|
atomic_inc(&rdtp->dynticks);
|
||||||
|
/* atomic_inc() before later RCU read-side crit sects */
|
||||||
|
smp_mb__after_atomic(); /* See above. */
|
||||||
|
WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
|
||||||
|
incby = 1;
|
||||||
|
}
|
||||||
|
rdtp->dynticks_nmi_nesting += incby;
|
||||||
|
barrier();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* rcu_nmi_exit - inform RCU of exit from NMI context
|
* rcu_nmi_exit - inform RCU of exit from NMI context
|
||||||
*
|
*
|
||||||
* If the CPU was idle with dynamic ticks active, and there is no
|
* If we are returning from the outermost NMI handler that interrupted an
|
||||||
* irq handler running, this updates rdtp->dynticks_nmi to let the
|
* RCU-idle period, update rdtp->dynticks and rdtp->dynticks_nmi_nesting
|
||||||
* RCU grace-period handling know that the CPU is no longer active.
|
* to let the RCU grace-period handling know that the CPU is back to
|
||||||
|
* being RCU-idle.
|
||||||
*/
|
*/
|
||||||
void rcu_nmi_exit(void)
|
void rcu_nmi_exit(void)
|
||||||
{
|
{
|
||||||
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
|
struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
|
||||||
|
|
||||||
if (rdtp->dynticks_nmi_nesting == 0 ||
|
/*
|
||||||
--rdtp->dynticks_nmi_nesting != 0)
|
* Check for ->dynticks_nmi_nesting underflow and bad ->dynticks.
|
||||||
|
* (We are exiting an NMI handler, so RCU better be paying attention
|
||||||
|
* to us!)
|
||||||
|
*/
|
||||||
|
WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0);
|
||||||
|
WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If the nesting level is not 1, the CPU wasn't RCU-idle, so
|
||||||
|
* leave it in non-RCU-idle state.
|
||||||
|
*/
|
||||||
|
if (rdtp->dynticks_nmi_nesting != 1) {
|
||||||
|
rdtp->dynticks_nmi_nesting -= 2;
|
||||||
return;
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* This NMI interrupted an RCU-idle CPU, restore RCU-idleness. */
|
||||||
|
rdtp->dynticks_nmi_nesting = 0;
|
||||||
/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
|
/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
|
||||||
smp_mb__before_atomic(); /* See above. */
|
smp_mb__before_atomic(); /* See above. */
|
||||||
atomic_inc(&rdtp->dynticks);
|
atomic_inc(&rdtp->dynticks);
|
||||||
|
|
Загрузка…
Ссылка в новой задаче