The decision whether or not to exit from L2 to L1 on an lmsw instruction is
based on bogus values: instead of using the information encoded within the
exit qualification, it uses the data also used for the mov-to-cr
instruction, which boils down to using whatever is in %eax at that point.
Use the correct values instead.
Without this fix, an L1 may not get notified when a 32-bit Linux L2
switches its secondary CPUs to protected mode; the L1 is only notified on
the next modification of CR0. This short time window poses a problem, when
there is some other reason to exit to L1 in between. Then, L2 will be
resumed in real mode and chaos ensues.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Preemption can occur during cancel preemption timer, and there will be
inconsistent status in lapic, vmx and vmcs field.
CPU0 CPU1
preemption timer vmexit
handle_preemption_timer(vCPU0)
kvm_lapic_expired_hv_timer
vmx_cancel_hv_timer
vmx->hv_deadline_tsc = -1
vmcs_clear_bits
/* hv_timer_in_use still true */
sched_out
sched_in
kvm_arch_vcpu_load
vmx_set_hv_timer
write vmx->hv_deadline_tsc
vmcs_set_bits
/* back in kvm_lapic_expired_hv_timer */
hv_timer_in_use = false
...
vmx_vcpu_run
vmx_arm_hv_run
write preemption timer deadline
spurious preemption timer vmexit
handle_preemption_timer(vCPU0)
kvm_lapic_expired_hv_timer
WARN_ON(!apic->lapic_timer.hv_timer_in_use);
This can be reproduced sporadically during boot of L2 on a
preemptible L1, causing a splat on L1.
WARNING: CPU: 3 PID: 1952 at arch/x86/kvm/lapic.c:1529 kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
CPU: 3 PID: 1952 Comm: qemu-system-x86 Not tainted 4.12.0-rc1+ #24 RIP: 0010:kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
Call Trace:
handle_preemption_timer+0xe/0x20 [kvm_intel]
vmx_handle_exit+0xc9/0x15f0 [kvm_intel]
? lock_acquire+0xdb/0x250
? lock_acquire+0xdb/0x250
? kvm_arch_vcpu_ioctl_run+0xdf3/0x1ce0 [kvm]
kvm_arch_vcpu_ioctl_run+0xe55/0x1ce0 [kvm]
kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
? __fget+0xf3/0x210
do_vfs_ioctl+0xa4/0x700
? __fget+0x114/0x210
SyS_ioctl+0x79/0x90
do_syscall_64+0x8f/0x750
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL64_slow_path+0x25/0x25
This patch fixes it by disabling preemption while cancelling
preemption timer. This way cancel_hv_timer is atomic with
respect to kvm_arch_vcpu_load.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This ensures that adjustments to x86_platform done by the hypervisor
setup is already respected by this simple calibration.
The current user of this, introduced by 1b5aeebf3a ("x86/earlyprintk:
Add support for earlyprintk via USB3 debug port"), comes much later
into play.
Fixes: dd759d93f4 ("x86/timers: Add simple udelay calibration")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: http://lkml.kernel.org/r/5e89fe60-aab3-2c1c-aba8-32f8ad376189@siemens.com
In the current form of the code, if a->replacementlen is 0, the reference
to *insnbuf for comparison touches potentially garbage memory. While it
doesn't affect the execution flow due to the subsequent a->replacementlen
comparison, it is (rightly) detected as use of uninitialized memory by a
runtime instrumentation currently under my development, and could be
detected as such by other tools in the future, too (e.g. KMSAN).
Fix the "false-positive" by reordering the conditions to first check the
replacement instruction length before referencing specific opcode bytes.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20170524135500.27223-1-mjurczyk@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In the file arch/x86/mm/pat.c, there's a '__pat_enabled' variable. The
variable is set to 1 by default and the function pat_init() sets
__pat_enabled to 0 if the CPU doesn't support PAT.
However, on AMD K6-3 CPUs, the processor initialization code never calls
pat_init() and so __pat_enabled stays 1 and the function pat_enabled()
returns true, even though the K6-3 CPU doesn't support PAT.
The result of this bug is that a kernel warning is produced when attempting to
start the Xserver and the Xserver doesn't start (fork() returns ENOMEM).
Another symptom of this bug is that the framebuffer driver doesn't set the
K6-3 MTRR registers:
x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
------------[ cut here ]------------
WARNING: CPU: 0 PID: 3891 at arch/x86/mm/pat.c:1020 untrack_pfn+0x5c/0x9f
...
x86/PAT: Xorg:3891 map pfn expected mapping type uncached-minus for [mem 0xe4000000-0xe5ffffff], got write-combining
To fix the bug change pat_enabled() so that it returns true only if PAT
initialization was actually done.
Also, I changed boot_cpu_has(X86_FEATURE_PAT) to
this_cpu_has(X86_FEATURE_PAT) in pat_ap_init(), so that we check the PAT
feature on the processor that is being initialized.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1704181501450.26399@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At least Make 3.82 dislikes the tab in front of the $(warning) function:
arch/x86/Makefile:162: *** recipe commences before first target. Stop.
Let's be gentle.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1944fcd8-e3df-d1f7-c0e4-60aeb1917a24@siemens.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Jones and Steven Rostedt reported unwinder warnings like the
following:
WARNING: kernel stack frame pointer at ffff8800bda0ff30 in sshd:1090 has bad value 000055b32abf1fa8
In both cases, the unwinder was attempting to unwind from an ftrace
handler into entry code. The callchain was something like:
syscall entry code
C function
ftrace handler
save_stack_trace()
The problem is that the unwinder's end-of-stack logic gets confused by
the way ftrace lays out the stack frame (with fentry enabled).
I was able to recreate this warning with:
echo call_usermodehelper_exec_async:stacktrace > /sys/kernel/debug/tracing/set_ftrace_filter
(exit login session)
I considered fixing this by changing the ftrace code to rewrite the
stack to make the unwinder happy. But that seemed too intrusive after I
implemented it. Instead, just add another check to the unwinder's
end-of-stack logic to detect this special case.
Side note: We could probably get rid of these end-of-stack checks by
encoding the frame pointer for syscall entry just like we do for
interrupt entry. That would be simpler, but it would also be a lot more
intrusive since it would slightly affect the performance of every
syscall.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: live-patching@vger.kernel.org
Fixes: c32c47c68a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/671ba22fbc0156b8f7e0cfa5ab2a795e08bc37e1.1495553739.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Petr Mladek reported the following warning when loading the livepatch
sample module:
WARNING: CPU: 1 PID: 3699 at arch/x86/kernel/stacktrace.c:132 save_stack_trace_tsk_reliable+0x133/0x1a0
...
Call Trace:
__schedule+0x273/0x820
schedule+0x36/0x80
kthreadd+0x305/0x310
? kthread_create_on_cpu+0x80/0x80
? icmp_echo.part.32+0x50/0x50
ret_from_fork+0x2c/0x40
That warning means the end of the stack is no longer recognized as such
for newly forked tasks. The problem was introduced with the following
commit:
ff3f7e2475 ("x86/entry: Fix the end of the stack for newly forked tasks")
... which was completely misguided. It only partially fixed the
reported issue, and it introduced another bug in the process. None of
the other entry code saves the frame pointer before calling into C code,
so it doesn't make sense for ret_from_fork to do so either.
Contrary to what I originally thought, the original issue wasn't related
to newly forked tasks. It was actually related to ftrace. When entry
code calls into a function which then calls into an ftrace handler, the
stack frame looks different than normal.
The original issue will be fixed in the unwinder, in a subsequent patch.
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: live-patching@vger.kernel.org
Fixes: ff3f7e2475 ("x86/entry: Fix the end of the stack for newly forked tasks")
Link: http://lkml.kernel.org/r/f350760f7e82f0750c8d1dd093456eb212751caa.1495553739.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit
b2f680380d ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").
Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.
The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.
There were two independent bugs in __get_user_asm_u64():
- it still did the STAC/CLAC user space access marking, even though
that is now done by the wrapper macros, see commit 11f1a4b975
("x86: reorganize SMAP handling in user space accesses").
This didn't result in a semantic error, it just means that the
inlined optimized version was hugely less efficient than the
allegedly slower standard version, since the CLAC/STAC overhead is
quite high on modern Intel CPU's.
- the double register %eax/%edx was marked as an output, but the %eax
part of it was touched early in the asm, and could thus clobber other
inputs to the asm that gcc didn't expect it to touch.
In particular, that meant that the generated code could look like
this:
mov (%eax),%eax
mov 0x4(%eax),%edx
where the load of %edx obviously was _supposed_ to be from the 32-bit
word that followed the source of %eax, but because %eax was
overwritten by the first instruction, the source of %edx was
basically random garbage.
The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org # v4.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al noticed that unsafe_put_user() had type problems, and fixed them in
commit a7cc722fff ("fix unsafe_put_user()"), which made me look more
at those functions.
It turns out that unsafe_get_user() had a type issue too: it limited the
largest size of the type it could handle to "unsigned long". Which is
fine with the current users, but doesn't match our existing normal
get_user() semantics, which can also handle "u64" even when that does
not fit in a long.
While at it, also clean up the type cast in unsafe_put_user(). We
actually want to just make it an assignment to the expected type of the
pointer, because we actually do want warnings from types that don't
convert silently. And it makes the code more readable by not having
that one very long and complex line.
[ This patch might become stable material if we ever end up back-porting
any new users of the unsafe uaccess code, but as things stand now this
doesn't matter for any current existing uses. ]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Export the function which checks whether an MCE is a memory error to
other users so that we can reuse the logic. Drop the boot_cpu_data use,
while at it, as mce.cpuvendor already has the CPU vendor in there.
Integrate a piece from a patch from Vishal Verma
<vishal.l.verma@intel.com> to export it for modules (nfit).
The main reason we're exporting it is that the nfit handler
nfit_handle_mce() needs to detect a memory error properly before doing
its recovery actions.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull misc uaccess fixes from Al Viro:
"Fix for unsafe_put_user() (no callers currently in mainline, but
anyone starting to use it will step into that) + alpha osf_wait4()
infoleak fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
osf_wait4(): fix infoleak
fix unsafe_put_user()
__put_user_size() relies upon its first argument having the same type as what
the second one points to; the only other user makes sure of that and
unsafe_put_user() should do the same.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.
Add the missing $(CROSS_COMPILE) prefix.
[ tglx: Rewrote changelog ]
Fixes: 98f7852537 ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ARM:
- A fix for a build failure introduced in -rc1 when tracepoints are
enabled on 32-bit ARM.
- Disabling use of stack pointer protection in the hyp code which can
cause panics.
- A handful of VGIC fixes.
- A fix to the init of the redistributors on GICv3 systems that
prevented boot with kvmtool on GICv3 systems introduced in -rc1.
- A number of race conditions fixed in our MMU handling code.
- A fix for the guest being able to program the debug extensions for
the host on the 32-bit side.
PPC:
- Fixes for build failures with PR KVM configurations.
- A fix for a host crash that can occur on POWER9 with radix guests.
x86:
- Fixes for nested PML and nested EPT.
- A fix for crashes caused by reserved bits in SSE MXCSR that could
have been set by userspace.
- An optimization of halt polling that fixes high CPU overhead.
- Fixes for four reports from Dan Carpenter's static checker.
- A protection around code that shouldn't have been preemptible.
- A fix for port IO emulation.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJZHzY3AAoJEED/6hsPKofocI8H/AiOHXi6AC/3s9Ok3IbN/Wp6
+xSm1yqgxitGhpmKIJQyKMUTV0t8SblRV2nxvW7/MEyfl7vztiyWENaVFc6pO6N7
GbnLvdImZ9aypoBaxVOY8WG/CHw2XZ7oUYyBIGrWECH3k+fptBNdISFK3D76+4G2
+tAuWSpKSQFwjGxtreUSlnvQBp6Tjh/PqTyxslPs4zYCL6UPKSSVAoxy4yOKj3AX
G03tx/1U1n/hSJHub9RFqho4dhVGT/p3V6oppZmS1g/ZqGPQwK1wxlYquHOtORFR
Iq8LdkNQwTdkLlTTOG+tamYSfzn0+KhczfWjIh6ZEb79ARrUSnBU4Awpvom1C2A=
=B6Rl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- a fix for a build failure introduced in -rc1 when tracepoints are
enabled on 32-bit ARM.
- disable use of stack pointer protection in the hyp code which can
cause panics.
- a handful of VGIC fixes.
- a fix to the init of the redistributors on GICv3 systems that
prevented boot with kvmtool on GICv3 systems introduced in -rc1.
- a number of race conditions fixed in our MMU handling code.
- a fix for the guest being able to program the debug extensions for
the host on the 32-bit side.
PPC:
- fixes for build failures with PR KVM configurations.
- a fix for a host crash that can occur on POWER9 with radix guests.
x86:
- fixes for nested PML and nested EPT.
- a fix for crashes caused by reserved bits in SSE MXCSR that could
have been set by userspace.
- an optimization of halt polling that fixes high CPU overhead.
- fixes for four reports from Dan Carpenter's static checker.
- a protection around code that shouldn't have been preemptible.
- a fix for port IO emulation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
KVM: x86: prevent uninitialized variable warning in check_svme()
KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
KVM: x86: zero base3 of unusable segments
KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
KVM: x86: Fix potential preemption when get the current kvmclock timestamp
KVM: Silence underflow warning in avic_get_physical_id_entry()
KVM: arm/arm64: Hold slots_lock when unregistering kvm io bus devices
KVM: arm/arm64: Fix bug when registering redist iodevs
KVM: x86: lower default for halt_poll_ns
kvm: arm/arm64: Fix use after free of stage2 page table
kvm: arm/arm64: Force reading uncached stage2 PGD
KVM: nVMX: fix EPT permissions as reported in exit qualification
KVM: VMX: Don't enable EPT A/D feature if EPT feature is disabled
KVM: x86: Fix load damaged SSEx MXCSR register
kvm: nVMX: off by one in vmx_write_pml_buffer()
KVM: arm: rename pm_fake handler to trap_raz_wi
KVM: arm: plug potential guest hardware debug leakage
kvm: arm/arm64: Fix race in resetting stage2 PGD
KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2 registers
KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZHx/IAAoJELDendYovxMvzegIAIOyDATZsyLnbDnTunOmYqLJ
n06v50N3KwQ+pegJyz4lHdTryI10/TEUzvuT4v/V9B0sHimNRJcE7ClvRVPEaFrs
4y459kKGXRpXXAvS2r0WIY3NhwP/Num9+duVY5lInJ6caq+/JDm3S1tL2HeQ9gl1
SDuI6IMV3q12Agk6jgbvwd1XBh3wbj8Z6SOx3DAchqY/kbdy6tS4y5CR93mKpjs3
LsVyPvY2IOLWCSrPsdloM4l7lMoVmd/1tt6NfzymepIxQbIS3KWo5AwBsoM0cVfs
KGb4T3+H8uwmpyWjgibsayr31cC7LIulEqLtqZNyycpIZGR5TlZ01KEPSMKn78s=
=Boz3
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Some fixes for the new Xen 9pfs frontend and some minor cleanups"
* tag 'for-linus-4.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: make xen_flush_tlb_all() static
xen: cleanup pvh leftovers from pv-only sources
xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static
xen/9pfs: fix return value check in xen_9pfs_front_probe()
get_msr() of MSR_EFER is currently always going to succeed, but static
checker doesn't see that far.
Don't complicate stuff and just use 0 for the fallback -- it means that
the feature is not present.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Static analysis noticed that pmu->nr_arch_gp_counters can be 32
(INTEL_PMC_MAX_GENERIC) and therefore cannot be used to shift 'int'.
I didn't add BUILD_BUG_ON for it as we have a better checker.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 25462f7f52 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Static checker noticed that base3 could be used uninitialized if the
segment was not present (useable). Random stack values probably would
not pass VMCS entry checks.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 1aa366163b ("KVM: x86 emulator: consolidate segment accessors")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.
- "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
a read should be ignored) in guest can get a random number.
- "rep insb" instruction to access PIT register port 0x43 can control memcpy()
in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
which will disclose the unimportant kernel memory in host but no crash.
The similar test program below can reproduce the read out-of-bounds vulnerability:
void hexdump(void *mem, unsigned int len)
{
unsigned int i, j;
for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
{
/* print offset */
if(i % HEXDUMP_COLS == 0)
{
printf("0x%06x: ", i);
}
/* print hex data */
if(i < len)
{
printf("%02x ", 0xFF & ((char*)mem)[i]);
}
else /* end of block, just aligning for ASCII dump */
{
printf(" ");
}
/* print ASCII dump */
if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
{
for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
{
if(j >= len) /* end of block, not really printing */
{
putchar(' ');
}
else if(isprint(((char*)mem)[j])) /* printable char */
{
putchar(0xFF & ((char*)mem)[j]);
}
else /* other char */
{
putchar('.');
}
}
putchar('\n');
}
}
}
int main(void)
{
int i;
if (iopl(3))
{
err(1, "set iopl unsuccessfully\n");
return -1;
}
static char buf[0x40];
/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */
memset(buf, 0xab, sizeof(buf));
asm volatile("push %rdi;");
asm volatile("mov %0, %%rdi;"::"q"(buf));
asm volatile ("mov $0x40, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");
asm volatile ("mov $0x41, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");
asm volatile ("mov $0x42, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");
asm volatile ("mov $0x43, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");
asm volatile ("mov $0x44, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");
asm volatile ("mov $0x45, %rdx;");
asm volatile ("in %dx,%al;");
asm volatile ("stosb;");
asm volatile ("pop %rdi;");
hexdump(buf, 0x40);
printf("\n");
/* ins port 0x40 */
memset(buf, 0xab, sizeof(buf));
asm volatile("push %rdi;");
asm volatile("mov %0, %%rdi;"::"q"(buf));
asm volatile ("mov $0x20, %rcx;");
asm volatile ("mov $0x40, %rdx;");
asm volatile ("rep insb;");
asm volatile ("pop %rdi;");
hexdump(buf, 0x40);
printf("\n");
/* ins port 0x43 */
memset(buf, 0xab, sizeof(buf));
asm volatile("push %rdi;");
asm volatile("mov %0, %%rdi;"::"q"(buf));
asm volatile ("mov $0x20, %rcx;");
asm volatile ("mov $0x43, %rdx;");
asm volatile ("rep insb;");
asm volatile ("pop %rdi;");
hexdump(buf, 0x40);
printf("\n");
return 0;
}
The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
w/o clear after using which results in some random datas are left over in
the buffer. Guest reads port 0x43 will be ignored since it is write only,
however, the function kernel_pio() can't distigush this ignore from successfully
reads data from device's ioport. There is no new data fill the buffer from
port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
the buffer to the guest unconditionally. This patch fixes it by clearing the
buffer before in instruction emulation to avoid to grant guest the stale data
in the buffer.
In addition, string I/O is not supported for in kernel device. So there is no
iteration to read ioport %RCX times for string I/O. The function kernel_pio()
just reads one round, and then copy the io size * %RCX to the guest unconditionally,
actually it copies the one round ioport data w/ other random datas which are left
over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
introducing the string I/O support for in kernel device in order to grant the right
ioport datas to the guest.
Before the patch:
0x000000: fe 38 93 93 ff ff ab ab .8......
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
After the patch:
0x000000: 1e 02 f8 00 ff ff ab ab ........
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: d2 e2 d2 df d2 db d2 d7 ........
0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
0x000000: 00 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 00 00 00 00 ........
0x000018: 00 00 00 00 00 00 00 00 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........
Reported-by: Moguofang <moguofang@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Moguofang <moguofang@huawei.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/2809
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 2 PID: 2809 Comm: qemu-system-x86 Not tainted 4.11.0+ #13
Call Trace:
dump_stack+0x99/0xce
check_preemption_disabled+0xf5/0x100
__this_cpu_preempt_check+0x13/0x20
get_kvmclock_ns+0x6f/0x110 [kvm]
get_time_ref_counter+0x5d/0x80 [kvm]
kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
? kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
? kvm_arch_vcpu_ioctl_run+0xac9/0x1ce0 [kvm]
kvm_arch_vcpu_ioctl_run+0x5bf/0x1ce0 [kvm]
kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
? __fget+0xf3/0x210
do_vfs_ioctl+0xa4/0x700
? __fget+0x114/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
RIP: 0033:0x7f9d164ed357
? __this_cpu_preempt_check+0x13/0x20
This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.
Safe access to per-CPU data requires a couple of constraints, though: the
thread working with the data cannot be preempted and it cannot be migrated
while it manipulates per-CPU variables. If the thread is preempted, the
thread that replaces it could try to work with the same variables; migration
to another CPU could also cause confusion. However there is no preemption
disable when reads host per-CPU tsc rate to calculate the current kvmclock
timestamp.
This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
__this_cpu_read() and rdtsc() are not preempted.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
xen_flush_tlb_all() is used in arch/x86/xen/mmu.c only. Make it static.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
There are some leftovers testing for pvh guest mode in pv-only source
files. Remove them.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Smatch complains that we check cap the upper bound of "index" but don't
check for negatives. It's a false positive because "index" is never
negative. But it's also simple enough to make it unsigned which makes
the code easier to audit.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
increase heavily even in cases where the performance improvement was
small. In particular, bandwidth divided by CPU usage was as much as
60% lower.
To some extent this is the expected effect of the patch, and the
additional CPU utilization is only visible when running the
benchmarks. However, halving the threshold also halves the extra
CPU utilization (from +30-130% to +20-70%) and has no negative
effect on performance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
This fixes the new ept_access_test_read_only and ept_access_test_read_write
testcases from vmx.flat.
The problem is that gpte_access moves bits around to switch from EPT
bit order (XWR) to ACC_*_MASK bit order (RWX). This results in an
incorrect exit qualification. To fix this, make pt_access and
pte_access operate on raw PTE values (only with NX flipped to mean
"can execute") and call gpte_access at the end of the walk. This
lets us use pte_access to compute the exit qualification with XWR
bit order.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
We can observe eptad kvm_intel module parameter is still Y
even if ept is disabled which is weird. This patch will
not enable EPT A/D feature if EPT feature is disabled.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reported by syzkaller:
BUG: unable to handle kernel paging request at ffffffffc07f6a2e
IP: report_bug+0x94/0x120
PGD 348e12067
P4D 348e12067
PUD 348e14067
PMD 3cbd84067
PTE 80000003f7e87161
Oops: 0003 [#1] SMP
CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G OE 4.11.0+ #8
task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
RIP: 0010:report_bug+0x94/0x120
RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
do_trap+0x156/0x170
do_error_trap+0xa3/0x170
? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
? mark_held_locks+0x79/0xa0
? retint_kernel+0x10/0x10
? trace_hardirqs_off_thunk+0x1a/0x1c
do_invalid_op+0x20/0x30
invalid_op+0x1e/0x30
RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
kvm_vcpu_ioctl+0x384/0x780 [kvm]
? kvm_vcpu_ioctl+0x384/0x780 [kvm]
? sched_clock+0x13/0x20
? __do_page_fault+0x2a0/0x550
do_vfs_ioctl+0xa4/0x700
? up_read+0x1f/0x40
? __do_page_fault+0x2a0/0x550
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
a 1 to any of these bits will cause a general-protection exception(#GP) to be
generated". The syzkaller forks' testcase overrides xsave area w/ random values
and steps on the reserved bits of MXCSR register. The damaged MXCSR register
values of guest will be restored to SSEx MXCSR register before vmentry. This
patch fixes it by catching userspace override MXCSR register reserved bits w/
random values and bails out immediately.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
There are PML_ENTITY_NUM elements in the pml_address[] array so the >
should be >= or we write beyond the end of the array when we do:
pml_address[vmcs12->guest_pml_index--] = gpa;
Fixes: c5f983f6e8 ("nVMX: Implement emulated Page Modification Logging")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull UML fixes from Richard Weinberger:
"No new stuff, just fixes"
* 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Add missing NR_CPUS include
um: Fix to call read_initrd after init_bootmem
um: Include kbuild.h instead of duplicating its macros
um: Fix PTRACE_POKEUSER on x86_64
um: Set number of CPUs
um: Fix _print_addr()
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, docs: update memory.stat description with workingset* entries
mm: vmscan: scan until it finds eligible pages
mm, thp: copying user pages must schedule on collapse
dax: fix PMD data corruption when fault races with write
dax: fix data corruption when fault races with write
ext4: return to starting transaction in ext4_dax_huge_fault()
mm: fix data corruption due to stale mmap reads
dax: prevent invalidation of mapped DAX entries
Tigran has moved
mm, vmalloc: fix vmalloc users tracking properly
mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
gcov: support GCC 7.1
mm, vmstat: Remove spurious WARN() during zoneinfo print
time: delete current_fs_time()
hwpoison, memcg: forcibly uncharge LRU pages
Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:
- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.
- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.
- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.
- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.
These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
Pull perf updates/fixes from Ingo Molnar:
"Mostly tooling updates, but also two kernel fixes: a call chain
handling robustness fix and an x86 PMU driver event definition fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/callchain: Force USER_DS when invoking perf_callchain_user()
tools build: Fixup sched_getcpu feature test
perf tests kmod-path: Don't fail if compressed modules aren't supported
perf annotate: Fix AArch64 comment char
perf tools: Fix spelling mistakes
perf/x86: Fix Broadwell-EP DRAM RAPL events
perf config: Refactor a duplicated code for obtaining config file name
perf symbols: Allow user probes on versioned symbols
perf symbols: Accept symbols starting at address 0
tools lib string: Adopt prefixcmp() from perf and subcmd
perf units: Move parse_tag_value() to units.[ch]
perf ui gtk: Move gtk .so name to the only place where it is used
perf tools: Move HAS_BOOL define to where perl headers are used
perf memswap: Split the byteswap memory range wrappers from util.[ch]
perf tools: Move event prototypes from util.h to event.h
perf buildid: Move prototypes from util.h to build-id.h
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- two boot crash fixes
- unwinder fixes
- kexec related kernel direct mappings enhancements/fixes
- more Clang support quirks
- minor cleanups
- Documentation fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix a typo in Documentation
x86/build: Don't add -maccumulate-outgoing-args w/o compiler support
x86/boot/32: Fix UP boot on Quark and possibly other platforms
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
x86/kexec/64: Use gbpages for identity mappings if available
x86/mm: Add support for gbpages to kernel_ident_mapping_init()
x86/boot: Declare error() as noreturn
x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()
x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
x86/microcode/AMD: Remove redundant NULL check on mc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZFWTzAAoJELDendYovxMv24cIAJ3U2OZ64d7WTKD37AT2O6nF
6R3j+zJ6apoKX4zHvhWUOHZ6jpTASTnaisiIskVc52JcgAK0f8ZYTg5nhyWPceAD
Icf+JuXrI6uplD97qsjt7X9FbxUsRZninfsznoBkK6P8Cw8ZWlWIWIl6e3CrVwBD
geyKcbsKkVG8+bMjWvmQd94CFq5r8Ivup0sCECumx0lqw3RhxdhQvUix9eBULEoG
h/XAuPbMupdjU6phgqG4rvUjWd/R+9mIIDG1oY9Kpx4Kpn/7bHtoYZ//Qzs8bmuP
5ORujOedshdyAZqLGxQuQzo+/4E9gX3qVbaS6fPf1Ab+ra0k/iWtetUITZ0v2AQ=
=gWpG
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains two fixes for booting under Xen introduced during this
merge window and two fixes for older problems, where one is just much
more probable due to another merge window change"
* tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: adjust early dom0 p2m handling to xen hypervisor behavior
x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen
xen/x86: Do not call xen_init_time_ops() until shared_info is initialized
x86/xen: fix xsave capability setting
When booted as pv-guest the p2m list presented by the Xen is already
mapped to virtual addresses. In dom0 case the hypervisor might make use
of 2M- or 1G-pages for this mapping. Unfortunately while being properly
aligned in virtual and machine address space, those pages might not be
aligned properly in guest physical address space.
So when trying to obtain the guest physical address of such a page
pud_pfn() and pmd_pfn() must be avoided as those will mask away guest
physical address bits not being zero in this special case.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
on AMD cpus.
This bug/feature bit is kind of special as it will be used very early
when switching threads. Setting the bit and clearing it a little bit
later leaves a critical window where things can go wrong. This time
window has enlarged a little bit by using setup_clear_cpu_cap() instead
of the hypervisor's set_cpu_features callback. It seems this larger
window now makes it rather easy to hit the problem.
The proper solution is to never set the bit in case of Xen.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories,
but the de-coupling of exported headers has not been completed.
Headers listed in header-y are exported whether they exist in
uapi directories or not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported.
The asm-generic wrappers are still exceptions, but this is a big
step forward.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZE7MBAAoJED2LAQed4NsGroAP/iARejrIFmxuH96D5h2aiP1j
c8KHQ+5fuq4w2KBmfbfkNvWbazlVheT6RrYWBUh/GABGsSqQC07d8New6B8TaUkE
K0E48RsuYxouP18Ys6BOO4/zyRhEFD7Ta72PGQ/gDQY+6hAu4jYQnMdG0wipTblS
QWgnUxTqfCbTjnRpRKXpcwRff+OeTWtOv3s0V8UashJUxnFVQ7Br2uRsm/KKkU/k
jQC65KyHL4HlsFeeAiMmQ9IQPVwLsd6+d5crs0nydHaJ2XrFlNNQ7EEMyG8FxPdx
9b/VpS+XY6DO+jeqkcpFrdL9IgcmCn72Qc5/4vrHuQO2dpWW5mVaVPq9RAGP0Yq/
FB0vZRTp/tOIkD+0esirZW2gJtU3DWMY1A9rc5jjLRabdnRXVTdLfhEnksYJEfES
yPbDEuKyzo6a+zBSqNtMquJPmYVYEDS2mcmgxY5sB58qtXkUN2Yr+uUALxC8XhXW
SHHwIAV3a+UX5ZU9Ys8dp2hI4EXYXtdvsz2zvl4qPIn/Q9d1YoEJRe7/Y0p8gBXM
5pVJ1yohKoYrNZVGBe0LO/gHGVAVgMj0cKn0Xg51bbvjxY2U5djUbMY0uw1gFrrM
O9ld3C6O8zH5BsExCfwp9iPz2SW5W9N80kgnKfjCHBRUKuMTkm02DJf8Hx+pyfVQ
DCy9lYTi76IgZ1uflKq9
=Rqdo
-----END PGP SIGNATURE-----
Merge tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild UAPI updates from Masahiro Yamada:
"Improvement of headers_install by Nicolas Dichtel.
It has been long since the introduction of uapi directories, but the
de-coupling of exported headers has not been completed. Headers listed
in header-y are exported whether they exist in uapi directories or
not. His work fixes this inconsistency.
All (and only) headers under uapi directories are now exported. The
asm-generic wrappers are still exceptions, but this is a big step
forward"
* tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
arch/include: remove empty Kbuild files
uapi: export all arch specifics directories
uapi: export all headers under uapi directories
smc_diag.h: fix include from userland
btrfs_tree.h: fix include from userland
uapi: includes linux/types.h before exporting files
Makefile.headersinst: remove destination-y option
Makefile.headersinst: cleanup input files
x86: stop exporting msr-index.h to userland
nios2: put setup.h in uapi
h8300: put bitsperlong.h in uapi
Subsystem:
- Add OF device ID table for i2c drivers
New driver:
- Motorola CPCAP PMIC RTC
Drivers:
- cmos: fix IRQ selection
- ds1307: Add ST m41t0 support
- ds1374: fix watchdog configuration
- sh: Add rza series support
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlkTf2sACgkQAyWl4gNJ
NJLPvw//dGzo6oD3C96QIurfrgFx9512ZurEiJpGPIO15obTVLF0SNuswaMj7knm
ezqQ23qX9VBEmu3si7LvkQVbE60giB3XnlJ/wpFi/LhtlM7SQ4o2Z8Go3rkL8tCw
iPcj5l3ShbHgSF+TBK+jK5C/8ahR7RE32l2rtSi9xwzxOmKRySmSWg2iGmGJMNUU
7UHR4DRHVPS/h1ffM/rOWV+d3GVK9laNmeoIORhsWCa+iYwGRZr3XL3GXQzhehBO
H5uFYewMVBHREADiqMNQ/ogHZI+ghXt1OSK7vhUFkYxosqU56P0YtU6SPH6UuFsH
ryoiUmCgQQjjhptlvVv71D7Wj1txSCT6rByQU1YyVZ0yw9XpVuGTYBjFBY+D7nxb
e3sR+Poe3diVLWDwFTXStrY0TtVlCTTCjs5T2kwUdYOJ188expQGHgj6wVl7PPTs
gpeSIunekbop13KCPWV01TzmRLB8ne9ZiomsuiNnuAKhXP7KRf6AfuQd6kpyvpmH
vhGcEIe7O0i4TwUIuB/dmdhLHmlOqCpLJpGQihNc+f0jJAxHv+akXEQ06H84FkJD
kPkBYSVDp/2pEBdf7ig2mlpPEqANgoQY8GCu9SbEg976g0v8k6m+i9IlbR0m7hwE
0XF+8W45iNsaEIzoXcyHuB/lrUy1/0eNoG4KX8vyWIjITo5HQWg=
=40TA
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC subsystem update:
- Add OF device ID table for i2c drivers
New RTC driver:
- Motorola CPCAP PMIC RTC
RTC driver updates:
- cmos: fix IRQ selection
- ds1307: Add ST m41t0 support
- ds1374: fix watchdog configuration
- sh: Add rza series support"
* tag 'rtc-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits)
rtc: gemini: add return value validation
rtc: snvs: fix an incorrect check of return value
rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL
rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks
rtc: sh: mark PM functions as unused
rtc: hid-sensor-time: remove some dead code
rtc: m41t80: Add proper compatible for rv4162
rtc: ds1307: Add m41t0 to OF device ID table
rtc: ds1307: support m41t0 variant
rtc: cpcap: fix improper use of IRQ_NONE for request_threaded_irq
rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs
x86: i8259: export legacy_pic symbol
dt-bindings: rtc: document the rtc-sh bindings
rtc: sh: add support for rza series
rtc: cpcap: kfreeing devm allocated memory
rtc: wm8350: Remove unused to_wm8350_from_rtc_dev
rtc: cpcap: new rtc driver
dt-bindings: Add vendor prefix for Motorola
rtc: omap: mark PM methods as __maybe_unused
rtc: omap: remove incorrect __exit markups
...
-----BEGIN PGP SIGNATURE-----
iQIVAwUAWPiW6vSw1s6N8H32AQLOrw/+NTqGf7bjq+64YKS6NfR0XDgE+wNJltGO
ck7zJW3NHIg76RNu8s0I9xg5aVmwizz3Z5DGROZquaolnezux4tQihZ3AFyxIzLc
+Y3WHYagcML7yFfjl/WznCLRD5EW3yPln4lCvQO0nW/xICRYeRI057JaIbi2Dtek
BhcXt3c4AjXDLdYJkgtHV3p2R2mt8hcdFdWqqx6s7JaIThZNRGNzxAgtbcB9k5IW
HVG9ZEIL73VBYWHrYivzjHYF5rBnNCPt87eOwDQeTOSkhv8te+u9k+bH8vxZw1T0
XUtDrLBndKiuVo2GUfLkkF8LItx3Q9eLCJYy0joaIliyPqTEsPx9KjQ+Af0cxS9s
ZPCZ5SYf96stKmDeL5xaMfrAmeyVHJ4lc4JTOqdzbIT8blsOSfYO/03p0ALShSDv
/RQLaKGlf8Bjoy8PwKFcXb4sIDufcd/U1Av/EMFXxOfgN/u2JUkGKq6EaIM5B68L
fHPje+aR9VNELPmPjwNOWtmN4I79EH3EItQf7zv0KG+UeKhcHLx/EAcSJ3ZRKEkH
Lathg7pPOEJGArPiVO79TZzBG01ADn1aiwv65XObMzNZ+54xI/mN/Y1DNF/kL5jU
XzvNzEjFt8mwMIZGVNdAt4+pDyMfIZGZSyUkSRKFnaQZMIvQrfQIU9RLBYLX5eOx
+/p0VkIwDpg=
=lbS7
-----END PGP SIGNATURE-----
Merge tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull hw lockdown support from David Howells:
"Annotation of module parameters that configure hardware resources
including ioports, iomem addresses, irq lines and dma channels.
This allows a future patch to prohibit the use of such module
parameters to prevent that hardware from being abused to gain access
to the running kernel image as part of locking the kernel down under
UEFI secure boot conditions.
Annotations are made by changing:
module_param(n, t, p)
module_param_named(n, v, t, p)
module_param_array(n, t, m, p)
to:
module_param_hw(n, t, hwtype, p)
module_param_hw_named(n, v, t, hwtype, p)
module_param_hw_array(n, t, hwtype, m, p)
where the module parameter refers to a hardware setting
hwtype specifies the type of the resource being configured. This can
be one of:
ioport Module parameter configures an I/O port
iomem Module parameter configures an I/O mem address
ioport_or_iomem Module parameter could be either (runtime set)
irq Module parameter configures an I/O port
dma Module parameter configures a DMA channel
dma_addr Module parameter configures a DMA buffer address
other Module parameter configures some other value
Note that the hwtype is compile checked, but not currently stored (the
lockdown code probably won't require it). It is, however, there for
future use.
A bonus is that the hwtype can also be used for grepping.
The intention is for the kernel to ignore or reject attempts to set
annotated module parameters if lockdown is enabled. This applies to
options passed on the boot command line, passed to insmod/modprobe or
direct twiddling in /sys/module/ parameter files.
The module initialisation then needs to handle the parameter not being
set, by (1) giving an error, (2) probing for a value or (3) using a
reasonable default.
What I can't do is just reject a module out of hand because it may
take a hardware setting in the module parameters. Some important
modules, some ipmi stuff for instance, both probe for hardware and
allow hardware to be manually specified; if the driver is aborts with
any error, you don't get any ipmi hardware.
Further, trying to do this entirely in the module initialisation code
doesn't protect against sysfs twiddling.
[!] Note that in and of itself, this series of patches should have no
effect on the the size of the kernel or code execution - that is
left to a patch in the next series to effect. It does mark
annotated kernel parameters with a KERNEL_PARAM_FL_HWPARAM flag in
an already existing field"
* tag 'hwparam-20170420' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (38 commits)
Annotate hardware config module parameters in sound/pci/
Annotate hardware config module parameters in sound/oss/
Annotate hardware config module parameters in sound/isa/
Annotate hardware config module parameters in sound/drivers/
Annotate hardware config module parameters in fs/pstore/
Annotate hardware config module parameters in drivers/watchdog/
Annotate hardware config module parameters in drivers/video/
Annotate hardware config module parameters in drivers/tty/
Annotate hardware config module parameters in drivers/staging/vme/
Annotate hardware config module parameters in drivers/staging/speakup/
Annotate hardware config module parameters in drivers/staging/media/
Annotate hardware config module parameters in drivers/scsi/
Annotate hardware config module parameters in drivers/pcmcia/
Annotate hardware config module parameters in drivers/pci/hotplug/
Annotate hardware config module parameters in drivers/parport/
Annotate hardware config module parameters in drivers/net/wireless/
Annotate hardware config module parameters in drivers/net/wan/
Annotate hardware config module parameters in drivers/net/irda/
Annotate hardware config module parameters in drivers/net/hamradio/
Annotate hardware config module parameters in drivers/net/ethernet/
...
* ARM: bugfixes; moved shared 32-bit/64-bit files to virt/kvm/arm;
support for saving/restoring virtual ITS state to userspace
* PPC: XIVE (eXternal Interrupt Virtualization Engine) support
* x86: nVMX improvements, including emulated page modification logging
(PML) which brings nice performance improvements on some workloads
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZEeusAAoJEL/70l94x66Dq+cH/RkL9znP717k7Z0jS8/FJN9q
wKU8j0jRxuqjnvEu89redfFKxElWM9T1fwReBObjWct9+hyJ9Pbpf95Lr9ca39PR
zhiBMKl79he0gHV/z48ItuzH1mOrU/KzFfxHYLlBd4oGw0ZdUttWAsUtaWQ8UNFo
xtyu2R+CWYLeAUBrpYmkvrOjhnB+S9+f4y2OY9pXsMg4HN9/Tdn0B656yTOWdu9C
onO3QQXNY/dzGFLH3tA/kAbz25x4Y+pP2UHlMm5vkW8XWPn+lluUwtBonTKdzy64
RDWWUWcs0k37ps4H9b56oXmz8ZFZ0FQF3MimDQueGHSYOXCxU5EqmC9c7KZmZrg=
=KcCv
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"ARM:
- bugfixes
- moved shared 32-bit/64-bit files to virt/kvm/arm
- support for saving/restoring virtual ITS state to userspace
PPC:
- XIVE (eXternal Interrupt Virtualization Engine) support
x86:
- nVMX improvements, including emulated page modification logging
(PML) which brings nice performance improvements on some workloads"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits)
KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore
KVM: arm/arm64: Don't call map_resources when restoring ITS tables
KVM: arm/arm64: Register ITS iodev when setting base address
KVM: arm/arm64: Get rid of its->initialized field
KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs
KVM: arm/arm64: Slightly rework kvm_vgic_addr
KVM: arm/arm64: Make vgic_v3_check_base more broadly usable
KVM: arm/arm64: Refactor vgic_register_redist_iodevs
KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus
nVMX: Advertise PML to L1 hypervisor
nVMX: Implement emulated Page Modification Logging
kvm: x86: Add a hook for arch specific dirty logging emulation
kvm: nVMX: Validate CR3 target count on nested VM-entry
KVM: set no_llseek in stat_fops_per_vm
KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable
KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI
KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES
KVM: arm64: vgic-its: Fix pending table sync
KVM: arm64: vgic-its: ITT save and restore
KVM: arm64: vgic-its: Device table save/restore
...
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.
In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.
After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h
Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).
Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.
For the record, note that exported files for asm directories are a mix of
files listed by:
- include/uapi/asm-generic/Kbuild.asm;
- arch/<arch>/include/uapi/asm/Kbuild;
- arch/<arch>/include/asm/Kbuild.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Even if this file was not in an uapi directory, it was exported because
it was listed in the Kbuild file.
Fixes: b72e7464e4 ("x86/uapi: Do not export <asm/msr-index.h> as part of the user API headers")
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This includes:
* Some code optimizations for the Intel VT-d driver
* Code to switch off a previously enabled Intel IOMMU
* Support for 'struct iommu_device' for OMAP, Rockchip and
Mediatek IOMMUs
* Some header optimizations for IOMMU core code headers and a
few fixes that became necessary in other parts of the kernel
because of that
* ACPI/IORT updates and fixes
* Some Exynos IOMMU optimizations
* Code updates for the IOMMU dma-api code to bring it closer to
use per-cpu iova caches
* New command-line option to set default domain type allocated
by the iommu core code
* Another command line option to allow the Intel IOMMU switched
off in a tboot environment
* ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using
an IDENTITY domain in conjunction with DMA ops, Support for
SMR masking, Support for 16-bit ASIDs (was previously broken)
* Various other small fixes and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZEY4XAAoJECvwRC2XARrjth0QAKV56zjnFclv39aDo6eCq9CT
51+XT4bPY5VKQ2+Jx76TBNObHmGK+8KEMHfT9khpWJtFCDyy25SGckLry1nYqmZs
tSTsbj4sOeCyKzOLITlRN9/OzKXkjKAxYuq+sQZZFDFYf3kCM/eag0dGAU6aVLNp
tkIal3CSpGjCQ9M5JohrtQ1mwiGqCIkMIgvnBjRw+bfpLnQNG+VL6VU2G3RAkV2b
5Vbdoy+P7ZQnJSZr/bibYL2BaQs2diR4gOppT5YbsfniMq4QYSjheu1xBboGX8b7
sx8yuPi4370irSan0BDvlvdQdjBKIRiDjfGEKDhRwPhtvN6JREGakhEOC8MySQ37
mP96B72Lmd+a7DEl5udOL7tQILA0DcUCX0aOyF714khnZuFU5tVlCotb/36xeJ+T
FPc3RbEVQ90m8dYU6MNJ+ahtb/ZapxGTRfisIigB6wlnZa0Evabp9EJSce6oJMkm
whbBhDubeEU18n9XAaofMbu+P2LAzq8cxiRMlsDvT4mIy7jO86jjCmhpu1Tfn2GY
4wrEQZdWOMvhUsIhObXA0aC3BzC506uvnKPW3qy041RaxBuelWiBi29qzYbhxzkr
DLDpWbUZNYPyFJjttpavyQb2/XRduBTJdVP1pQpkJNDsW5jLiBkpSqm9xNADapRY
vLSYRX0JCIquaD+PAuxn
=3aE8
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- code optimizations for the Intel VT-d driver
- ability to switch off a previously enabled Intel IOMMU
- support for 'struct iommu_device' for OMAP, Rockchip and Mediatek
IOMMUs
- header optimizations for IOMMU core code headers and a few fixes that
became necessary in other parts of the kernel because of that
- ACPI/IORT updates and fixes
- Exynos IOMMU optimizations
- updates for the IOMMU dma-api code to bring it closer to use per-cpu
iova caches
- new command-line option to set default domain type allocated by the
iommu core code
- another command line option to allow the Intel IOMMU switched off in
a tboot environment
- ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an
IDENTITY domain in conjunction with DMA ops, Support for SMR masking,
Support for 16-bit ASIDs (was previously broken)
- various other small fixes and improvements
* tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits)
soc/qbman: Move dma-mapping.h include to qman_priv.h
soc/qbman: Fix implicit header dependency now causing build fails
iommu: Remove trace-events include from iommu.h
iommu: Remove pci.h include from trace/events/iommu.h
arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops()
ACPI/IORT: Fix CONFIG_IOMMU_API dependency
iommu/vt-d: Don't print the failure message when booting non-kdump kernel
iommu: Move report_iommu_fault() to iommu.c
iommu: Include device.h in iommu.h
x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
iommu/arm-smmu: Correct sid to mask
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
omap3isp: Remove iommu_group related code
iommu/omap: Add iommu-group support
iommu/omap: Make use of 'struct iommu_device'
iommu/omap: Store iommu_dev pointer in arch_data
iommu/omap: Move data structures to omap-iommu.h
iommu/omap: Drop legacy-style device support
...
Commit 11e63f6d92 added cache flushing for unaligned writes from an
iovec, covering the first and last cache line of a >= 8 byte write and
the first cache line of a < 8 byte write. But an unaligned write of
2-7 bytes can still cover two cache lines, so make sure we flush both
in that case.
Cc: <stable@vger.kernel.org>
Fixes: 11e63f6d92 ("x86, pmem: fix broken __copy_user_nocache ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.
This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When KVM updates accessed/dirty bits, this hook can be used
to invoke an arch specific function that implements/emulates
dirty logging such as PML.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the SDM, the CR3-target count must not be greater than
4. Future processors may support a different number of CR3-target
values. Software should read the VMX capability MSR IA32_VMX_MISC to
determine the number of values supported.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clang does not support this machine dependent option.
Older versions of GCC (pre 3.0) may not support this option, added in
2000, but it's unlikely they can still compile a working kernel.
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170509032946.20444-1-nick.desaulniers@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This partially reverts commit:
23b2a4ddeb ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up")
That commit had one definite bug and one potential bug. The
definite bug is that setup_per_cpu_areas() uses a differnet generic
implementation on UP kernels, so initial_page_table never got
resynced. This was fine for access to percpu data (it's in the
identity map on UP), but it breaks other users of
initial_page_table. The potential bug is that helpers like
efi_init() would be called before the tables were synced.
Avoid both problems by just syncing the page tables in setup_arch()
*and* setup_per_cpu_areas().
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
'__vmalloc_start_set' currently only gets set in initmem_init() when
!CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address
with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing
a kernel crash:
[mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78!
Set '__vmalloc_start_set' appropriately for that case as well.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: dc16ecf7fd ("x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid")
Link: http://lkml.kernel.org/r/1494278596-30373-1-git-send-email-labbott@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
ags00JtMLitfNPBH3eSl
=eE4D
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add framework for supporting PCIe devices in Endpoint mode (Kishon
Vijay Abraham I)
- use non-postable PCI config space mappings when possible (Lorenzo
Pieralisi)
- clean up and unify mmap of PCI BARs (David Woodhouse)
- export and unify Function Level Reset support (Christoph Hellwig)
- avoid FLR for Intel 82579 NICs (Sasha Neftin)
- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)
- short-circuit config access failures for disconnected devices (Keith
Busch)
- remove D3 sleep delay when possible (Adrian Hunter)
- freeze PME scan before suspending devices (Lukas Wunner)
- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)
- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)
- add arch-specific alignment control to improve device passthrough by
avoiding multiple BARs in a page (Yongji Xie)
- add sysfs sriov_drivers_autoprobe to control VF driver binding
(Bodong Wang)
- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)
- fix crashes when unbinding host controllers that don't support
removal (Brian Norris)
- add driver for MicroSemi Switchtec management interface (Logan
Gunthorpe)
- add driver for Faraday Technology FTPCI100 host bridge (Linus
Walleij)
- add i.MX7D support (Andrey Smirnov)
- use generic MSI support for Aardvark (Thomas Petazzoni)
- make Rockchip driver modular (Brian Norris)
- advertise 128-byte Read Completion Boundary support for Rockchip
(Shawn Lin)
- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)
- convert atomic_t to refcount_t in HV driver (Elena Reshetova)
- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)
- fix PCI bus removal in HV driver (Long Li)
- add support for ThunderX2 DMA alias topology (Jayachandran C)
- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)
- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
(Manish Jaggi)
* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
PCI: Don't allow unbinding host controllers that aren't prepared
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: imx6: Fix spelling mistake: "contol" -> "control"
...
Merge more updates from Andrew Morton:
- the rest of MM
- various misc things
- procfs updates
- lib/ updates
- checkpatch updates
- kdump/kexec updates
- add kvmalloc helpers, use them
- time helper updates for Y2038 issues. We're almost ready to remove
current_fs_time() but that awaits a btrfs merge.
- add tracepoints to DAX
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
selftests/vm: add a test for virtual address range mapping
dax: add tracepoint to dax_insert_mapping()
dax: add tracepoint to dax_writeback_one()
dax: add tracepoints to dax_writeback_mapping_range()
dax: add tracepoints to dax_load_hole()
dax: add tracepoints to dax_pfn_mkwrite()
dax: add tracepoints to dax_iomap_pte_fault()
mtd: nand: nandsim: convert to memalloc_noreclaim_*()
treewide: convert PF_MEMALLOC manipulations to new helpers
mm: introduce memalloc_noreclaim_{save,restore}
mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
mm/huge_memory.c: use zap_deposited_table() more
time: delete CURRENT_TIME_SEC and CURRENT_TIME
gfs2: replace CURRENT_TIME with current_time
apparmorfs: replace CURRENT_TIME with current_time()
lustre: replace CURRENT_TIME macro
fs: ubifs: replace CURRENT_TIME_SEC with current_time
fs: ufs: use ktime_get_real_ts64() for birthtime
...
Now that all call sites, completely decouple cacheflush.h and
set_memory.h
[sfr@canb.auug.org.au: kprobes/x86: merge fix for set_memory.h decoupling]
Link: http://lkml.kernel.org/r/20170418180903.10300fd3@canb.auug.org.au
Link: http://lkml.kernel.org/r/1488920133-27229-17-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.
Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "set_memory_* functions header refactor", v3.
The set_memory_* APIs came out of a desire to have a better way to
change memory attributes. Many of these attributes were linked to cache
functionality so the prototypes were put in cacheflush.h. These days,
the APIs have grown and have a much wider use than just cache APIs. To
support this growth, split off set_memory_* and friends into a separate
header file to avoid growing cacheflush.h for APIs that have nothing to
do with caches.
Link: http://lkml.kernel.org/r/1488920133-27229-2-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__vmalloc* allows users to provide gfp flags for the underlying
allocation. This API is quite popular
$ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
77
The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space. About half of users don't
use this flag, though. This signals that we make the API unnecessarily
too complex.
This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
are simplified and drop the flag.
Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kvmalloc", v5.
There are many open coded kmalloc with vmalloc fallback instances in the
tree. Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.
As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.
Most callers, I could find, have been converted to use the helper
instead. This is patch 6. There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.
[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
This patch (of 9):
Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.
This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL). Those need to be
fixed separately.
While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers. Existing ones would have to die
slowly.
[sfr@canb.auug.org.au: f2fs fixup]
Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
support; virtual interrupt controller performance improvements; support
for userspace virtual interrupt controller (slower, but necessary for
KVM on the weird Broadcom SoCs used by the Raspberry Pi 3)
* MIPS: basic support for hardware virtualization (ImgTec
P5600/P6600/I6400 and Cavium Octeon III)
* PPC: in-kernel acceleration for VFIO
* s390: support for guests without storage keys; adapter interruption
suppression
* x86: usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits; emulation of CPL3 CPUID faulting
* generic: first part of VCPU thread request API; kvm_stat improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH
9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG
0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/
2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG
Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk
32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU=
=IsiZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- HYP mode stub supports kexec/kdump on 32-bit
- improved PMU support
- virtual interrupt controller performance improvements
- support for userspace virtual interrupt controller (slower, but
necessary for KVM on the weird Broadcom SoCs used by the Raspberry
Pi 3)
MIPS:
- basic support for hardware virtualization (ImgTec P5600/P6600/I6400
and Cavium Octeon III)
PPC:
- in-kernel acceleration for VFIO
s390:
- support for guests without storage keys
- adapter interruption suppression
x86:
- usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits
- emulation of CPL3 CPUID faulting
generic:
- first part of VCPU thread request API
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
kvm: nVMX: Don't validate disabled secondary controls
KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
Revert "KVM: Support vCPU-based gfn->hva cache"
tools/kvm: fix top level makefile
KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
KVM: Documentation: remove VM mmap documentation
kvm: nVMX: Remove superfluous VMX instruction fault checks
KVM: x86: fix emulation of RSM and IRET instructions
KVM: mark requests that need synchronization
KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
KVM: add explicit barrier to kvm_vcpu_kick
KVM: perform a wake_up in kvm_make_all_cpus_request
KVM: mark requests that do not need a wakeup
KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
KVM: x86: always use kvm_make_request instead of set_bit
KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
s390: kvm: Cpu model support for msa6, msa7 and msa8
KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
kvm: better MWAIT emulation for guests
KVM: x86: virtualize cpuid faulting
...
Kexec sets up all identity mappings before booting into the new
kernel, and this will cause extra memory consumption for paging
structures which is quite considerable on modern machines with
huge memory sizes.
E.g. on a 32TB machine that is kdumping, it could waste around
128MB (around 4MB/TB) from the reserved memory after kexec sets
all the identity mappings using the current 2MB page.
Add to that the memory needed for the loaded kdump kernel, initramfs,
etc., and it causes a kexec syscall -NOMEM failure.
As a result, we had to enlarge reserved memory via "crashkernel=X"
to work around this problem.
This causes some trouble for distributions that use policies
to evaluate the proper "crashkernel=X" value for users.
So enable gbpages for kexec mappings.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kernel identity mappings on x86-64 kernels are created in two
ways: by the early x86 boot code, or by kernel_ident_mapping_init().
Native kernels (which is the dominant usecase) use the former,
but the kexec and the hibernation code uses kernel_ident_mapping_init().
There's a subtle difference between these two ways of how identity
mappings are created, the current kernel_ident_mapping_init() code
creates identity mappings always using 2MB page(PMD level) - while
the native kernel boot path also utilizes gbpages where available.
This difference is suboptimal both for performance and for memory
usage: kernel_ident_mapping_init() needs to allocate pages for the
page tables when creating the new identity mappings.
This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
to address these concerns.
The primary advantage would be better TLB coverage/performance,
because we'd utilize 1GB TLBs instead of 2MB ones.
It is also useful for machines with large number of memory to
save paging structure allocations(around 4MB/TB using 2MB page)
when setting identity mappings for all the memory, after using
1GB page it will consume only 8KB/TB.
( Note that this change alone does not activate gbpages in kexec,
we are doing that in a separate patch. )
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The compressed boot function error() is used to halt execution, but it
wasn't marked with "noreturn". This fixes that in preparation for
supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation
on panic, and calls error(). GCC would warn about a noreturn function
calling a non-noreturn function:
arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’:
arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return
}
^
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20170506045116.GA2879@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Region media error reporting: A libnvdimm region device is the parent
to one or more namespaces. To date, media errors have been reported via
the "badblocks" attribute attached to pmem block devices for namespaces
in "raw" or "memory" mode. Given that namespaces can be in "device-dax"
or "btt-sector" mode this new interface reports media errors
generically, i.e. independent of namespace modes or state. This
subsequently allows userspace tooling to craft "ACPI 6.1 Section
9.20.7.6 Function Index 4 - Clear Uncorrectable Error" requests and
submit them via the ioctl path for NVDIMM root bus devices.
* Introduce 'struct dax_device' and 'struct dax_operations': Prompted by
a request from Linus and feedback from Christoph this allows for dax
capable drivers to publish their own custom dax operations. This fixes
the broken assumption that all dax operations are related to a
persistent memory device, and makes it easier for other architectures
and platforms to add customized persistent memory support.
* 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
available for storage appliance applications to manually trigger memory
controllers to drain write-pending buffers that would otherwise be
flushed automatically by the platform ADR (asynchronous-DRAM-refresh)
mechanism at a power loss event. Support for "locked" DIMMs is included
to prevent namespaces from surfacing when the namespace label data area
is locked. Finally, fixes for various reported deadlocks and crashes,
also tagged for -stable.
* ACPI / nfit driver updates: General updates of the nfit driver to add
DSM command overrides, ACPI 6.1 health state flags support, DSM payload
debug available by default, and various fixes.
Acknowledgements that came after the branch was pushed:
commmit 565851c972 "device-dax: fix sysfs attribute deadlock"
Tested-by: Yi Zhang <yizhan@redhat.com>
commit 23f4984483 "libnvdimm: rework region badblocks clearing"
Tested-by: Toshi Kani <toshi.kani@hpe.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZDONJAAoJEB7SkWpmfYgC3SsP/2KrLvTUcz646ViuPOgZ2cC4
W6wAx6cvDSt+H52kLnFEsYoFt7WAj20ggPirb/Bc5jkGlvwE0lT9Xtmso9GpVkYT
J9ZJ9pP/4YaAD3II1gmTwaUjYi0FxoOdx3Eb92yuWkO/8ylz4b2Nu3cBpYwyziGQ
nIfEVwDXRLE86u6x0bWuf6TlVuvsbdiAI55CDqDMVQC6xIOLbSez7b8QIHlpiKEb
Mw+xqdQva0esoreZEOXEhWNO+qtfILx8/ceBEGTNMp4e/JjZ2FbrSNplM+9bH5k7
ywqP8lW+mBEw0fmBBkYoVG/xyesiiBb55JLnbi8Ew+7IUxw8a3iV7wftRi62lHcK
zAjsHe4L+MansgtZsCL8wluvIPaktAdtB4xr7l9VNLKRYRUG73jEWU0gcUNryHIL
BkQJ52pUS1PkClyAsWbBBHl1I/CvzVPd21VW0YELmLR4OywKy1c+eKw2bcYgjrb4
59HZSv6S6EoKaQC+2qvVNpePil7cdfg5V2ubH/ki9HoYVyoxDptEWHnvf0NNatIH
Y7mNcOPvhOksJmnKSyHbDjtRur7WoHIlC9D7UjEFkSBWsKPjxJHoidN4SnCMRtjQ
WKQU0seoaKj04b68Bs/Qm9NozVgnsPFIUDZeLMikLFX2Jt7YSPu+Jmi2s4re6WLh
TmJQ3Ly9t3o3/weHSzmn
=Ox0s
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The bulk of this has been in multiple -next releases. There were a few
late breaking fixes and small features that got added in the last
couple days, but the whole set has received a build success
notification from the kbuild robot.
Change summary:
- Region media error reporting: A libnvdimm region device is the
parent to one or more namespaces. To date, media errors have been
reported via the "badblocks" attribute attached to pmem block
devices for namespaces in "raw" or "memory" mode. Given that
namespaces can be in "device-dax" or "btt-sector" mode this new
interface reports media errors generically, i.e. independent of
namespace modes or state.
This subsequently allows userspace tooling to craft "ACPI 6.1
Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
requests and submit them via the ioctl path for NVDIMM root bus
devices.
- Introduce 'struct dax_device' and 'struct dax_operations': Prompted
by a request from Linus and feedback from Christoph this allows for
dax capable drivers to publish their own custom dax operations.
This fixes the broken assumption that all dax operations are
related to a persistent memory device, and makes it easier for
other architectures and platforms to add customized persistent
memory support.
- 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
available for storage appliance applications to manually trigger
memory controllers to drain write-pending buffers that would
otherwise be flushed automatically by the platform ADR
(asynchronous-DRAM-refresh) mechanism at a power loss event.
Support for "locked" DIMMs is included to prevent namespaces from
surfacing when the namespace label data area is locked. Finally,
fixes for various reported deadlocks and crashes, also tagged for
-stable.
- ACPI / nfit driver updates: General updates of the nfit driver to
add DSM command overrides, ACPI 6.1 health state flags support, DSM
payload debug available by default, and various fixes.
Acknowledgements that came after the branch was pushed:
- commmit 565851c972 "device-dax: fix sysfs attribute deadlock":
Tested-by: Yi Zhang <yizhan@redhat.com>
- commit 23f4984483 "libnvdimm: rework region badblocks clearing"
Tested-by: Toshi Kani <toshi.kani@hpe.com>"
* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
libnvdimm, pfn: fix 'npfns' vs section alignment
libnvdimm: handle locked label storage areas
libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
brd: fix uninitialized use of brd->dax_dev
block, dax: use correct format string in bdev_dax_supported
device-dax: fix sysfs attribute deadlock
libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
libnvdimm: rework region badblocks clearing
acpi, nfit: kill ACPI_NFIT_DEBUG
libnvdimm: fix clear length of nvdimm_forget_poison()
libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
libnvdimm, region: sysfs trigger for nvdimm_flush()
libnvdimm: fix phys_addr for nvdimm_clear_poison
x86, dax, pmem: remove indirection around memcpy_from_pmem()
block: remove block_device_operations ->direct_access()
block, dax: convert bdev_dax_supported() to dax_direct_access()
filesystem-dax: convert to dax_direct_access()
Revert "block: use DAX for partition table reads"
ext2, ext4, xfs: retrieve dax_device for iomap operations
...
Routines that are set by xen_init_time_ops() use shared_info's
pvclock_vcpu_time_info area. This area is not properly available until
shared_info is mapped in xen_setup_shared_info().
This became especially problematic due to commit dd759d93f4 ("x86/timers:
Add simple udelay calibration") where we end up reading tsc_to_system_mul
from xen_dummy_shared_info (i.e. getting zero value) and then trying
to divide by it in pvclock_tsc_khz().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Commit 690b7f10b4f9f ("x86/xen: use capabilities instead of fake cpuid
values for xsave") introduced a regression as it tried to make use of
the fixup feature before it being available.
Fall back to the old variant testing via cpuid().
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
According to the SDM, if the "activate secondary controls" primary
processor-based VM-execution control is 0, no checks are performed on
the secondary processor-based VM-execution controls.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The constraint "rm" allows the compiler to put mix_const into memory.
When the input operand is a memory location then MUL needs an operand
size suffix, since Clang can't infer the multiplication width from the
operand.
Add and use the _ASM_MUL macro which determines the operand size and
resolves to the NUL instruction with the corresponding suffix.
This fixes the following error when building with clang:
CC arch/x86/lib/kaslr.o
/tmp/kaslr-dfe1ad.s: Assembler messages:
/tmp/kaslr-dfe1ad.s:182: Error: no instruction mnemonic suffix given and no register operands; can't size instruction
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Davidson <md@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170501224741.133938-1-mka@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
will make the system hang intermittently during boot. While adding 'nokaslr'
won't.
The back trace is:
Oops: 0000 [#1] SMP
RIP: memcpy_erms()
[ .... ]
Call Trace:
pmem_rw_page()
bdev_read_page()
do_mpage_readpage()
mpage_readpages()
blkdev_readpages()
__do_page_cache_readahead()
force_page_cache_readahead()
page_cache_sync_readahead()
generic_file_read_iter()
blkdev_read_iter()
__vfs_read()
vfs_read()
SyS_read()
entry_SYSCALL_64_fastpath()
This crash happens because the for loop count calculation in sync_global_pgds()
is not correct. When a mapping area crosses PGD entries, we should
calculate the starting address of region which next PGD covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel PGD init_mm.pgd.
In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one PGD entry. With KASLR enabled,
this area could cross two PGD entries, then the next PGD entry won't
be synced to all other processes. That is why we saw empty PGD.
Fix it.
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andrey Konovalov reported the following warning while fuzzing the kernel
with syzkaller:
WARNING: kernel stack regs at ffff8800686869f8 in a.out:4933 has bad 'bp' value c3fc855a10167ec0
The unwinder dump revealed that RBP had a bad value when an interrupt
occurred in csum_partial_copy_generic().
That function saves RBP on the stack and then overwrites it, using it as
a scratch register. That's problematic because it breaks stack traces
if an interrupt occurs in the middle of the function.
Replace the usage of RBP with another callee-saved register (R15) so
stack traces are no longer affected.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: linux-sctp@vger.kernel.org
Cc: netdev <netdev@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Link: http://lkml.kernel.org/r/4b03a961efda5ec9bfe46b7b9c9ad72d1efad343.1493909486.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Here is the big set of new char/misc driver drivers and features for
4.12-rc1.
There's lots of new drivers added this time around, new firmware drivers
from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and
a bunch of other driver updates. Nothing major, except if you happen to
have the hardware for these drivers, and then you will be happy :)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWQvAgg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yknsACgzkAeyz16Z97J3UTaeejbR7nKUCAAoKY4WEHY
8O9f9pr9gj8GMBwxeZQa
=OIfB
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of new char/misc driver drivers and features for
4.12-rc1.
There's lots of new drivers added this time around, new firmware
drivers from Google, more auxdisplay drivers, extcon drivers, fpga
drivers, and a bunch of other driver updates. Nothing major, except if
you happen to have the hardware for these drivers, and then you will
be happy :)
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
firmware: google memconsole: Fix return value check in platform_memconsole_init()
firmware: Google VPD: Fix return value check in vpd_platform_init()
goldfish_pipe: fix build warning about using too much stack.
goldfish_pipe: An implementation of more parallel pipe
fpga fr br: update supported version numbers
fpga: region: release FPGA region reference in error path
fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe()
mei: drop the TODO from samples
firmware: Google VPD sysfs driver
firmware: Google VPD: import lib_vpd source files
misc: lkdtm: Add volatile to intentional NULL pointer reference
eeprom: idt_89hpesx: Add OF device ID table
misc: ds1682: Add OF device ID table
misc: tsl2550: Add OF device ID table
w1: Remove unneeded use of assert() and remove w1_log.h
w1: Use kernel common min() implementation
uio_mf624: Align memory regions to page size and set correct offsets
uio_mf624: Refactor memory info initialization
uio: Allow handling of non page-aligned memory regions
hangcheck-timer: Fix typo in comment
...
This pull requests represents a significantly larger and more complex set of
changes than those of prior merge windows. In particular, we had several changes
with dependencies on other subsystems which we felt were best managed through
merges of immutable branches, including one each from input, i2c, and leds. Two
patches for the watchdog subsystem are included after discussion with Wim and
Guenter following a collision in linux-next (this should be resolved and you
should only see these two appear in this pull request). These are called out in
the "External" section below.
Summary of changes:
- significant further cleanup of fujitsu-laptop and hp-wmi
- new model support for ideapad, asus, silead, and xiaomi
- new hotkeys for thinkpad and models using intel-vbtn
- dell keyboard backlight improvements
- build and dependency improvements
- intel * ipc fixes, cleanups, and api updates
- single isolated fixes noted below
External:
- watchdog: iTCO_wdt: Add PMC specific noreboot update api
- watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions
- Merge branch 'ib/4.10-sparse-keymap-managed'
- Merge branch 'i2c/for-INT33FE'
- Merge branch 'linux-leds/dell-laptop-changes-for-4.12'
platform/x86:
- Add Intel Cherry Trail ACPI INT33FE device driver
- remove sparse_keymap_free() calls
- Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD
asus-wmi:
- try to set als by default
- fix cpufv sysfs file permission
acer-wmi:
- setup accelerometer when ACPI device was found
ideapad-laptop:
- Add IdeaPad V310-15ISK to no_hw_rfkill
- Add IdeaPad 310-15IKB to no_hw_rfkill
intel_pmc_ipc:
- use gcr mem base for S0ix counter read
- Fix iTCO_wdt GCS memory mapping failure
- Add pmc gcr read/write/update api's
- fix gcr offset
dell-laptop:
- Add keyboard backlight timeout AC settings
- Handle return error form dell_get_intensity.
- Protect kbd_state against races
- Refactor kbd_led_triggers_store()
hp-wireless:
- reuse module_acpi_driver
- add Xiaomi's hardware id to the supported list
intel-vbtn:
- add volume up and down
INT33FE:
- add i2c dependency
hp-wmi:
- Cleanup exit paths
- Do not shadow errors in sysfs show functions
- Use DEVICE_ATTR_(RO|RW) helper macros
- Refactor dock and tablet state fetchers
- Cleanup wireless get_(hw|sw)state functions
- Refactor redundant HPWMI_READ functions
- Standardize enum usage for constants
- Cleanup local variable declarations
- Do not shadow error values
- Fix detection for dock and tablet mode
- Fix error value for hp_wmi_tablet_state
fujitsu-laptop:
- simplify error handling in acpi_fujitsu_laptop_add()
- do not log LED registration failures
- switch to managed LED class devices
- reorganize LED-related code
- refactor LED registration
- select LEDS_CLASS
- remove redundant fields from struct fujitsu_bl
- account for backlight power when determining brightness
- do not log set_lcd_level() failures in bl_update_status()
- ignore errors when setting backlight power
- make disable_brightness_adjust a boolean
- clean up use_alt_lcd_levels handling
- sync brightness in set_lcd_level()
- simplify set_lcd_level()
- merge set_lcd_level_alt() into set_lcd_level()
- switch to a managed backlight device
- only handle backlight when appropriate
- update debug message logged by call_fext_func()
- rename call_fext_func() arguments
- simplify call_fext_func()
- clean up local variables in call_fext_func()
- remove keycode fields from struct fujitsu_bl
- model-dependent sparse keymap overrides
- use a sparse keymap for hotkey event generation
- switch to a managed hotkey input device
- refactor hotkey input device setup
- use a sparse keymap for brightness key events
- switch to a managed backlight input device
- refactor backlight input device setup
- remove pf_device field from struct fujitsu_bl
- only register platform device if FUJ02E3 is present
- add and remove platform device in separate functions
- simplify platform device attribute definitions
- remove backlight-related attributes from the platform device
- cleanup error labels in fujitsu_init()
- only register backlight device if FUJ02B1 is present
- sync backlight power status in acpi_fujitsu_laptop_add()
- register backlight device in a separate function
- simplify brightness key event generation logic
- decrease indentation in acpi_fujitsu_bl_notify()
intel-hid:
- Add missing ->thaw callback
- do not set parents of input devices explicitly
- remove redundant set_bit() call
- use devm_input_allocate_device() for HID events input device
- make intel_hid_set_enable() take a boolean argument
- simplify enabling/disabling HID events
silead_dmi:
- Add touchscreen info for Surftab Wintron 7.0
- Abort early if DMI does not match
- Do not treat all devices as i2c_clients
- Add entry for Insyde 7W tablets
- Constify properties arrays
intel_scu_ipc:
- Introduce intel_scu_ipc_raw_command()
- Introduce SCU_DEVICE() macro
- Remove redundant subarch check
- Rearrange init sequence
- Platform data is mandatory
asus-nb-wmi:
- Add wapf4 quirk for the X302UA
dell-*:
- Call new led hw_changed API on kbd brightness change
- Add a generic dell-laptop notifier chain
eeepc-laptop:
- Skip unknown key messages 0x50 0x51
thinkpad_acpi:
- add mapping for new hotkeys
- guard generic hotkey case
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZCkCPAAoJEKbMaAwKp36452cH/Ahu1T6htVYo6HQ6nMp8FS9+
lOvUsjTSWenDNjXArOZFOXWA2fZM72aqabBYdMCb473lT1u9tV4sFLkmdMnMcUAk
4akOU5taXawvHUSIdpU6gAuAD8HIbo1Vl678KgLdo+PIM5RUwPj0mWYQ8nlSFgmV
QNlTlMVU9MrixHoCLixlBk9oZ2EKewS30+nMLwkY+x0sDS996C6X/OH/zo4/TC29
TUE2s9wvZ8OdCMRui9smWXqsVmI1dpWc1tF0Azi1HHNzCQeZBSoO8EzTh/WiYNzZ
5Wvcb1ch0JcVXy50eOAEHj1+Cgn25gp6aBV6F9aMK9k22BdYHJJy/B1VjaB7K6E=
=WFbm
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.12-1' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform-drivers update from Darren Hart:
"This represents a significantly larger and more complex set of changes
than those of prior merge windows.
In particular, we had several changes with dependencies on other
subsystems which we felt were best managed through merges of immutable
branches, including one each from input, i2c, and leds. Two patches
for the watchdog subsystem are included after discussion with Wim and
Guenter following a collision in linux-next (this should be resolved
and you should only see these two appear in this pull request). These
are called out in the "External" section below.
Summary of changes:
- significant further cleanup of fujitsu-laptop and hp-wmi
- new model support for ideapad, asus, silead, and xiaomi
- new hotkeys for thinkpad and models using intel-vbtn
- dell keyboard backlight improvements
- build and dependency improvements
- intel * ipc fixes, cleanups, and api updates
- single isolated fixes noted below
External:
- watchdog: iTCO_wdt: Add PMC specific noreboot update api
- watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions
- Merge branch 'ib/4.10-sparse-keymap-managed'
- Merge branch 'i2c/for-INT33FE'
- Merge branch 'linux-leds/dell-laptop-changes-for-4.12'
platform/x86:
- Add Intel Cherry Trail ACPI INT33FE device driver
- remove sparse_keymap_free() calls
- Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD
asus-wmi:
- try to set als by default
- fix cpufv sysfs file permission
acer-wmi:
- setup accelerometer when ACPI device was found
ideapad-laptop:
- Add IdeaPad V310-15ISK to no_hw_rfkill
- Add IdeaPad 310-15IKB to no_hw_rfkill
intel_pmc_ipc:
- use gcr mem base for S0ix counter read
- Fix iTCO_wdt GCS memory mapping failure
- Add pmc gcr read/write/update api's
- fix gcr offset
dell-laptop:
- Add keyboard backlight timeout AC settings
- Handle return error form dell_get_intensity.
- Protect kbd_state against races
- Refactor kbd_led_triggers_store()
hp-wireless:
- reuse module_acpi_driver
- add Xiaomi's hardware id to the supported list
intel-vbtn:
- add volume up and down
INT33FE:
- add i2c dependency
hp-wmi:
- Cleanup exit paths
- Do not shadow errors in sysfs show functions
- Use DEVICE_ATTR_(RO|RW) helper macros
- Refactor dock and tablet state fetchers
- Cleanup wireless get_(hw|sw)state functions
- Refactor redundant HPWMI_READ functions
- Standardize enum usage for constants
- Cleanup local variable declarations
- Do not shadow error values
- Fix detection for dock and tablet mode
- Fix error value for hp_wmi_tablet_state
fujitsu-laptop:
- simplify error handling in acpi_fujitsu_laptop_add()
- do not log LED registration failures
- switch to managed LED class devices
- reorganize LED-related code
- refactor LED registration
- select LEDS_CLASS
- remove redundant fields from struct fujitsu_bl
- account for backlight power when determining brightness
- do not log set_lcd_level() failures in bl_update_status()
- ignore errors when setting backlight power
- make disable_brightness_adjust a boolean
- clean up use_alt_lcd_levels handling
- sync brightness in set_lcd_level()
- simplify set_lcd_level()
- merge set_lcd_level_alt() into set_lcd_level()
- switch to a managed backlight device
- only handle backlight when appropriate
- update debug message logged by call_fext_func()
- rename call_fext_func() arguments
- simplify call_fext_func()
- clean up local variables in call_fext_func()
- remove keycode fields from struct fujitsu_bl
- model-dependent sparse keymap overrides
- use a sparse keymap for hotkey event generation
- switch to a managed hotkey input device
- refactor hotkey input device setup
- use a sparse keymap for brightness key events
- switch to a managed backlight input device
- refactor backlight input device setup
- remove pf_device field from struct fujitsu_bl
- only register platform device if FUJ02E3 is present
- add and remove platform device in separate functions
- simplify platform device attribute definitions
- remove backlight-related attributes from the platform device
- cleanup error labels in fujitsu_init()
- only register backlight device if FUJ02B1 is present
- sync backlight power status in acpi_fujitsu_laptop_add()
- register backlight device in a separate function
- simplify brightness key event generation logic
- decrease indentation in acpi_fujitsu_bl_notify()
intel-hid:
- Add missing ->thaw callback
- do not set parents of input devices explicitly
- remove redundant set_bit() call
- use devm_input_allocate_device() for HID events input device
- make intel_hid_set_enable() take a boolean argument
- simplify enabling/disabling HID events
silead_dmi:
- Add touchscreen info for Surftab Wintron 7.0
- Abort early if DMI does not match
- Do not treat all devices as i2c_clients
- Add entry for Insyde 7W tablets
- Constify properties arrays
intel_scu_ipc:
- Introduce intel_scu_ipc_raw_command()
- Introduce SCU_DEVICE() macro
- Remove redundant subarch check
- Rearrange init sequence
- Platform data is mandatory
asus-nb-wmi:
- Add wapf4 quirk for the X302UA
dell-*:
- Call new led hw_changed API on kbd brightness change
- Add a generic dell-laptop notifier chain
eeepc-laptop:
- Skip unknown key messages 0x50 0x51
thinkpad_acpi:
- add mapping for new hotkeys
- guard generic hotkey case"
* tag 'platform-drivers-x86-v4.12-1' of git://git.infradead.org/linux-platform-drivers-x86: (108 commits)
platform/x86: Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD
platform/x86: asus-wmi: try to set als by default
platform/x86: asus-wmi: fix cpufv sysfs file permission
platform/x86: acer-wmi: setup accelerometer when ACPI device was found
platform/x86: ideapad-laptop: Add IdeaPad V310-15ISK to no_hw_rfkill
platform/x86: intel_pmc_ipc: use gcr mem base for S0ix counter read
platform/x86: intel_pmc_ipc: Fix iTCO_wdt GCS memory mapping failure
watchdog: iTCO_wdt: Add PMC specific noreboot update api
watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions
platform/x86: intel_pmc_ipc: Add pmc gcr read/write/update api's
platform/x86: intel_pmc_ipc: fix gcr offset
platform/x86: dell-laptop: Add keyboard backlight timeout AC settings
platform/x86: dell-laptop: Handle return error form dell_get_intensity.
platform/x86: hp-wireless: reuse module_acpi_driver
platform/x86: intel-vbtn: add volume up and down
platform/x86: INT33FE: add i2c dependency
platform/x86: hp-wmi: Cleanup exit paths
platform/x86: hp-wmi: Do not shadow errors in sysfs show functions
platform/x86: hp-wmi: Use DEVICE_ATTR_(RO|RW) helper macros
platform/x86: hp-wmi: Refactor dock and tablet state fetchers
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJZChTBAAoJELDendYovxMvkXEIAJDpK5UKMsL1Ihgc0DL0OujQ
UGxLfWJueSA1X7i8BgL/8vfgKxSEB9SUiM+ooHOKXS6oDhyk2RP4MuCe5+lhUbbv
ZMK5KxHMlVUOD9EjYif8DhhiwRowBbWYEwr8XgY12s0Ya0a9TQLVC+noGsuzqNiH
1UyzeeWlBae4nulUMMim6urPNq5AEPVeQKNX3S8rlnDp74IKVZuoISMM62b2KRSr
+R8FVBshXR/HO53YNY0+AfmmUa8T1+dyjL50Eo/QnsG0i+3igOqNrzSKSc6T+nBt
Zl3KDUE5W3/OlxuR+CIdZZ1KKtjzoAiR3cvVlHs2z7MIio87bJcYJforAqe6Evo=
=k6in
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Xen fixes and featrues for 4.12. The main changes are:
- enable building the kernel with Xen support but without enabling
paravirtualized mode (Vitaly Kuznetsov)
- add a new 9pfs xen frontend driver (Stefano Stabellini)
- simplify Xen's cpuid handling by making use of cpu capabilities
(Juergen Gross)
- add/modify some headers for new Xen paravirtualized devices
(Oleksandr Andrushchenko)
- EFI reset_system support under Xen (Julien Grall)
- and the usual cleanups and corrections"
* tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (57 commits)
xen: Move xen_have_vector_callback definition to enlighten.c
xen: Implement EFI reset_system callback
arm/xen: Consolidate calls to shutdown hypercall in a single helper
xen: Export xen_reboot
xen/x86: Call xen_smp_intr_init_pv() on BSP
xen: Revert commits da72ff5bfc and 72a9b18629
xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams()
xen/scsifront: use offset_in_page() macro
xen/arm,arm64: rename __generic_dma_ops to xen_get_dma_ops
xen/arm,arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..."
xen/9pfs: select CONFIG_XEN_XENBUS_FRONTEND
x86/cpu: remove hypervisor specific set_cpu_features
vmware: set cpu capabilities during platform initialization
x86/xen: use capabilities instead of fake cpuid values for xsave
x86/xen: use capabilities instead of fake cpuid values for x2apic
x86/xen: use capabilities instead of fake cpuid values for mwait
x86/xen: use capabilities instead of fake cpuid values for acpi
x86/xen: use capabilities instead of fake cpuid values for acc
x86/xen: use capabilities instead of fake cpuid values for mtrr
x86/xen: use capabilities instead of fake cpuid values for aperf
...
o Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter
o The rewrite was needed to add plugins to be unique to tracing instances.
i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter
The old way was written very hacky. This removes a lot of those hacks.
o New "function-fork" tracing option. When set, pids in the set_ftrace_pid
will have their children added when the processes with their pids
listed in the set_ftrace_pid file forks.
o Exposure of "maxactive" for kretprobe in kprobe_events
o Allow for builtin init functions to be traced by the function tracer
(via the kernel command line). Module init function tracing will come
in the next release.
o Added more selftests, and have selftests also test in an instance.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJZCRchFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
zuIH/RsLUb8Hj6GmhAvn/tblUDzWyqlXX2h79VVlo/XrWayHYNHnKOmua1WwMZC6
xESXb/AffAc89VWTkKsrwaK7yfRPG6+w8zTZOcFuXSBpqSGG/oey9Fxj5Wqqpche
oJ2UY7ngxANAipkP5GxdYTafFSoWhGZGfUUtW+5tAHoFHzqO2lOjO8olbXP69sON
kVX/b461S20cVvRe5H/F0klXLSc37Tlp5YznXy4H4V4HcJSN1Fb6/uozOXALZ4se
SBpVMWmVVoGJorzj+ic7gVOeohvC8RnR400HbeMVwaI0Lj50noidDj/5Hv8F7T+D
h1B8vATNZLFAFUOSHINCBIu6Vj0=
=t8mg
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New features for this release:
- Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter
- The rewrite was needed to add plugins to be unique to tracing
instances. i.e. mkdir instance/foo; cd instances/foo; echo
do_IRQ:stacktrace > set_ftrace_filter The old way was written very
hacky. This removes a lot of those hacks.
- New "function-fork" tracing option. When set, pids in the
set_ftrace_pid will have their children added when the processes
with their pids listed in the set_ftrace_pid file forks.
- Exposure of "maxactive" for kretprobe in kprobe_events
- Allow for builtin init functions to be traced by the function
tracer (via the kernel command line). Module init function tracing
will come in the next release.
- Added more selftests, and have selftests also test in an instance"
* tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits)
ring-buffer: Return reader page back into existing ring buffer
selftests: ftrace: Allow some event trigger tests to run in an instance
selftests: ftrace: Have some basic tests run in a tracing instance too
selftests: ftrace: Have event tests also run in an tracing instance
selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances
selftests: ftrace: Allow some tests to be run in a tracing instance
tracing/ftrace: Allow for instances to trigger their own stacktrace probes
tracing/ftrace: Allow for the traceonoff probe be unique to instances
tracing/ftrace: Enable snapshot function trigger to work with instances
tracing/ftrace: Allow instances to have their own function probes
tracing/ftrace: Add a better way to pass data via the probe functions
ftrace: Dynamically create the probe ftrace_ops for the trace_array
tracing: Pass the trace_array into ftrace_probe_ops functions
tracing: Have the trace_array hold the list of registered func probes
ftrace: If the hash for a probe fails to update then free what was initialized
ftrace: Have the function probes call their own function
ftrace: Have each function probe use its own ftrace_ops
ftrace: Have unregister_ftrace_function_probe_func() return a value
ftrace: Add helper function ftrace_hash_move_and_update_ops()
ftrace: Remove data field from ftrace_func_probe structure
...
This is broken since ever but sadly nobody noticed.
Recent versions of GDB set DR_CONTROL unconditionally and
UML dies due to a heap corruption. It turns out that
the PTRACE_POKEUSER was copy&pasted from i386 and assumes
that addresses are 4 bytes long.
Fix that by using 8 as address size in the calculation.
Cc: <stable@vger.kernel.org>
Reported-by: jie cao <cj3054@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZCTzvAAoJEAx081l5xIa+9kcQAJsQiija4/7QGx6IzakOMqjx
WulJ3zYG/cU/HLwCBcuWRDF6wAj+7iWNeLCPmolHwEazcI8tQVdgMlWtbdMbDh8U
ckzD3FBXsEVfIfab+u6tyoUkm3l/VDhMXbjkUK7NTo/+dkRqe5LuFfZPCGN09jft
Y+5salkRXzDhXPSFsqmjfzhx1v7PTgf0a5HUenKWEWOv+sJQaW4/iPvcDSIcg5qR
l9WjAqro1NpFYhUodnh6DkLeledL1U5whdtp/yvrUAck8y+WP/jwGYmQ7pZ0UkQm
f0M3kV6K67ox9eqN++jsGX5o8sB1qF01Uh95kBAnyzYzsw4ZlMCx6pV7PDX+J88M
UBNMEqX10hrLkNJA9lGjPWx+/6fudcwg9anKvTRO3Uyx7MbYoJAgjzAM+yBqqtV0
8Otxa4Bw0V2pmUD+0lqJDERRvE77VCXkLb8SaI5lQo0MHpQqT2cZA+GD+B+rZHO6
Ie5LDFY87vM2GG1IECufG+xOa3v6sn2FfQ1ouu1KNGKOAMBKcQCQyQx3kGVuNW2i
HDACVXALJgXdRlVLm4jydOCZdRoguX7AWmRjtdwxgaO+lBcGfLhkXdjLQ7Ho+29p
32ArJfkZPfA53vMB6lHxAfbtrs1q2RzyVnPHj/KqeJnGZbABKTsF2HQ5BQc4Xq/J
mqXoz6Oubdvk4Pwyx7Ne
=UxFF
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux
Pull drm u pdates from Dave Airlie:
"This is the main drm pull request for v4.12. Apart from two fixes
pulls, everything should have been in drm-next for at least 2 weeks.
The biggest thing in here is AMD released the public headers for their
upcoming VEGA GPUs. These as always are quite a sizeable chunk of
header files. They've also added initial non-display support for those
GPUs, though they aren't available in production yet.
Otherwise it's pretty much normal.
New bridge drivers:
- megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++
- generic LVDS bridge support.
Core:
- Displayport link train failure reporting to userspace
- debugfs interface cleaned up
- subsystem TODO in kerneldoc now
- Extended fbdev support (flipping and vblank wait)
- drm_platform removed
- EDP CRC support in helper
- HF-VSDB SCDC support in EDID parser
- Lots of code cleanups and header extraction
- Thunderbolt external GPU awareness
- Atomic helper improvements
- Documentation improvements
panel:
- Sitronix and Samsung new panel support
amdgpu:
- Preliminary vega10 support
- Multi-level page table support
- GPU sensor support for userspace
- PRT support for sparse buffers
- SR-IOV improvements
- Non-contig VRAM CPU mapping
i915:
- Atomic modesetting enabled by default on Gen5+
- LSPCON improvements
- Atomic state handling for cdclk
- GPU reset improvements
- In-kernel unit tests
- Geminilake improvements and color manager support
- Designware i2c fixes
- vblank evasion improvements
- Hotplug safe connector iterators
- GVT scheduler QoS support
- GVT Kabylake support
nouveau:
- Acceleration support for Pascal (GP10x).
- Rearchitecture of code handling proprietary signed firmware
- Fix GTX 970 with odd MMU configuration
- GP10B support
- GP107 acceleration support
vmwgfx:
- Atomic modesetting support for vmwgfx
omapdrm:
- Support for render nodes
- Refactor omapdss code
- Fix some probe ordering issues
- Fix too dark RGB565 rendering
sunxi:
- prelim rework for multiple pipes.
mali-dp:
- Color management support
- Plane scaling
- Power management improvements
imx-drm:
- Prefetch Resolve Engine/Gasket on i.MX6QP
- Deferred plane disabling
- Separate alpha support
mediatek:
- Mediatek SoC MT2701 support
rcar-du:
- Gen3 HDMI support
msm:
- 4k support for newer chips
- OPP bindings for gpu
- prep work for per-process pagetables
vc4:
- HDMI audio support
- fixes
qxl:
- minor fixes.
dw-hdmi:
- PHY improvements
- CSC fixes
- Amlogic GX SoC support"
* tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits)
drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection
drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr()
drm/nouveau/kms: Increase max retries in scanout position queries.
drm/nouveau/bios/bitP: check that table is long enough for optional pointers
drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine
drm: mali-dp: use div_u64 for expensive 64-bit divisions
drm/i915: Confirm the request is still active before adding it to the await
drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio
drm/i915/selftests: Allocate inode/file dynamically
drm/i915: Fix system hang with EI UP masked on Haswell
drm/i915: checking for NULL instead of IS_ERR() in mock selftests
drm/i915: Perform link quality check unconditionally during long pulse
drm/i915: Fix use after free in lpe_audio_platdev_destroy()
drm/i915: Use the right mapping_gfp_mask for final shmem allocation
drm/i915: Make legacy cursor updates more unsynced
drm/i915: Apply a cond_resched() to the saturated signaler
drm/i915: Park the signaler before sleeping
drm: mali-dp: Check the mclk rate and allow up/down scaling
drm: mali-dp: Enable image enhancement when scaling
drm: mali-dp: Add plane upscaling support
...
Fixes:
- Support setting probes in versioned user space symbols, such as
pthread_create@@GLIBC_2.1, picking the default one, more work
needed to make it possible to set it on the other versions, as
the 'perf probe' syntax already uses @ for other purposes.
(Paul Clarke)
- Do not special case address zero as an error for routines that
return addresses (symbol lookup), instead use the return as the
success/error indication and pass a pointer to return the address,
fixing 'perf test vmlinux' (the one that compares address between
vmlinux and kallsyms) on s/390, where the '_text' address is equal
to zero (Arnaldo Carvalho de Melo)
Infrastructure:
- More header sanitization, moving stuff out of util.h into
more appropriate headers and objects and sometimes creating
new ones (Arnaldo Carvalho de Melo)
- Refactor a duplicated code for obtaining config file name (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZCd/DAAoJENZQFvNTUqpAHLkP/i834z2r9/CQBIMiPOq3ciCd
W3K4JsHp3IGFg2rFwit6QnRtTycaZyhupBBNnBj+4OLLT5zujemP3VHbLRf3rvrY
Xhx7dlpSYkvpDXOB4lUElrIknIp4jLO329aGW9plRk7vGYa2q97NfDcQYqwRKnd+
1Y4Z2Bg2ImTWhsrmD+YuI8MwzFcQnG5oAavbbXFP5Bnmorh56auJ4Y6doEThmVbC
T0CnYyG29i9KlN1pIm4CDpjVH/aGNZpGhKBJlYGhCWDgxQwstMY2bKwa+6VyITpv
FgtU/YKW9ebqT0v2nENjU2XAoFktd3Chn3b8nhuNqN3081mGvIdr4ugMuh7bP0k2
XGiO7ILQAfpO9b0uxGlUX9evvduvM7GMIwdRuJ/jurxxIn4cHy1i6rcU/l096Y0b
9s81bd11NyK4eE7c4Z1IX9JNV0Jw3Knb9B2XEHXfbOx4s7QPsNUQvE0zXUefwmS+
h0YZ1GcAwxIc92JC7gy2iuik1tJ18Nd8Y9/Qnfziem8AIVX205d4miEz9Zx1NUJI
pRB4CB9HnrdFZW1rgZ5ob53ToVTdFLAziKq2tEJPdCq2+e2VZfrb3KqeVeGvgRUN
xDRvTwc2rgeGynn80t/ShsSpbXPwnmbBapbp5MQdF5T5ObSQOnYVmIGQ3SN3ST5y
azaqjBjikhiPzxQJxIHM
=gqm+
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.12-20170503' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
Fixes:
- Support setting probes in versioned user space symbols, such as
pthread_create@@GLIBC_2.1, picking the default one, more work
needed to make it possible to set it on the other versions, as
the 'perf probe' syntax already uses @ for other purposes.
(Paul Clarke)
- Do not special case address zero as an error for routines that
return addresses (symbol lookup), instead use the return as the
success/error indication and pass a pointer to return the address,
fixing 'perf test vmlinux' (the one that compares address between
vmlinux and kallsyms) on s/390, where the '_text' address is equal
to zero (Arnaldo Carvalho de Melo)
Infrastructure changes:
- More header sanitization, moving stuff out of util.h into
more appropriate headers and objects and sometimes creating
new ones (Arnaldo Carvalho de Melo)
- Refactor a duplicated code for obtaining config file name (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit bbd6411513.
I've been sitting on this revert for too long and it unfortunately
missed 4.11. It's also the reason why I haven't merged ring-based
dirty tracking for 4.12.
Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and
kvm_vcpu_write_guest_offset_cached means that the MSR value can
now be used to access SMRAM, simply by making it point to an SMRAM
physical address. This is problematic because it lets the guest
OS overwrite memory that it shouldn't be able to touch.
Cc: stable@vger.kernel.org
Fixes: bbd6411513
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It appears as though the Broadwell-EP DRAM units share the special
units quirk with Haswell-EP/KNL.
Without this patch, you get really high results (a single DRAM using 20W
of power).
The powercap driver in drivers/powercap/intel_rapl.c already has this
change.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 84d582d236 ("xen: Revert commits da72ff5bfc and
72a9b186292d") defined xen_have_vector_callback in enlighten_hvm.c.
Since guest-type-neutral code refers to this variable this causes
build failures when CONFIG_XEN_PVHVM is not defined.
Moving xen_have_vector_callback definition to enlighten.c resolves
this issue.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull livepatch updates from Jiri Kosina:
- a per-task consistency model is being added for architectures that
support reliable stack dumping (extending this, currently rather
trivial set, is currently in the works).
This extends the nature of the types of patches that can be applied
by live patching infrastructure. The code stems from the design
proposal made [1] back in November 2014. It's a hybrid of SUSE's
kGraft and RH's kpatch, combining advantages of both: it uses
kGraft's per-task consistency and syscall barrier switching combined
with kpatch's stack trace switching. There are also a number of
fallback options which make it quite flexible.
Most of the heavy lifting done by Josh Poimboeuf with help from
Miroslav Benes and Petr Mladek
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
- module load time patch optimization from Zhou Chengming
- a few assorted small fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: add missing printk newlines
livepatch: Cancel transition a safe way for immediate patches
livepatch: Reduce the time of finding module symbols
livepatch: make klp_mutex proper part of API
livepatch: allow removal of a disabled patch
livepatch: add /proc/<pid>/patch_state
livepatch: change to a per-task consistency model
livepatch: store function sizes
livepatch: use kstrtobool() in enabled_store()
livepatch: move patching functions into patch.c
livepatch: remove unnecessary object loaded check
livepatch: separate enabled and patched states
livepatch/s390: add TIF_PATCH_PENDING thread flag
livepatch/s390: reorganize TIF thread flag bits
livepatch/powerpc: add TIF_PATCH_PENDING thread flag
livepatch/x86: add TIF_PATCH_PENDING thread flag
livepatch: create temporary klp_update_patch_state() stub
x86/entry: define _TIF_ALLWORK_MASK flags explicitly
stacktrace/x86: add function for detecting reliable stack traces
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.12:
API:
- Add batch registration for acomp/scomp
- Change acomp testing to non-unique compressed result
- Extend algorithm name limit to 128 bytes
- Require setkey before accept(2) in algif_aead
Algorithms:
- Add support for deflate rfc1950 (zlib)
Drivers:
- Add accelerated crct10dif for powerpc
- Add crc32 in stm32
- Add sha384/sha512 in ccp
- Add 3des/gcm(aes) for v5 devices in ccp
- Add Queue Interface (QI) backend support in caam
- Add new Exynos RNG driver
- Add ThunderX ZIP driver
- Add driver for hardware random generator on MT7623 SoC"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (101 commits)
crypto: stm32 - Fix OF module alias information
crypto: algif_aead - Require setkey before accept(2)
crypto: scomp - add support for deflate rfc1950 (zlib)
crypto: scomp - allow registration of multiple scomps
crypto: ccp - Change ISR handler method for a v5 CCP
crypto: ccp - Change ISR handler method for a v3 CCP
crypto: crypto4xx - rename ce_ring_contol to ce_ring_control
crypto: testmgr - Allow ecb(cipher_null) in FIPS mode
Revert "crypto: arm64/sha - Add constant operand modifier to ASM_EXPORT"
crypto: ccp - Disable interrupts early on unload
crypto: ccp - Use only the relevant interrupt bits
hwrng: mtk - Add driver for hardware random generator on MT7623 SoC
dt-bindings: hwrng: Add Mediatek hardware random generator bindings
crypto: crct10dif-vpmsum - Fix missing preempt_disable()
crypto: testmgr - replace compression known answer test
crypto: acomp - allow registration of multiple acomps
hwrng: n2 - Use devm_kcalloc() in n2rng_probe()
crypto: chcr - Fix error handling related to 'chcr_alloc_shash'
padata: get_next is never NULL
crypto: exynos - Add new Exynos RNG driver
...
Pull fs/compat.c cleanups from Al Viro:
"More moving of compat syscalls from fs/compat.c to fs/*.c where the
native counterparts live.
And death to compat_sys_getdents64() - the only architecture that used
to need it was ia64, and _that_ has lost biarch support quite a few
years ago"
* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/compat.c: trim unused includes
move compat_rw_copy_check_uvector() over to fs/read_write.c
fhandle: move compat syscalls from compat.c
open: move compat syscalls from compat.c
stat: move compat syscalls from compat.c
fcntl: move compat syscalls from compat.c
readdir: move compat syscalls from compat.c
statfs: move compat syscalls from compat.c
utimes: move compat syscalls from compat.c
move compat select-related syscalls to fs/select.c
Remove compat_sys_getdents64()
- drop now unneeded is_vmalloc_or_module() check; Laura Abbott
- use enum instead of literals for stack frame API; Sahara
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJZB39WAAoJEIly9N/cbcAmVJ8QALC0teqysiyml1CuxruXoXNj
wPfwOJMypdXTYtL70ZKqi6Mboqrg01HTBeSNZjoNDvpHtsePlPVLjgDZ9ehcgokb
nTQ23zJguV0nOLn32yKSJ1KupuxGMzW9RtrjOWH6w8nixff42vCHANY8+j5/Nx4R
L4uLEPhA2ay35ddMeJMaNE8MAw7YS/C4enWu15CDbAjv++bVPoKwvqUchBoIPRx5
ZNjEUlAdnsv8IfccUea0Xz8CrBshe0kN4SGQvPqvaff2Orsk2FDHoK5wk6MaNN8L
Dx2yZI5vxPbe6JYVEhvUxxGevuhmouTXf3UxBShOaggc4++/nuJ75S/nDIBosGrs
EzWkRGn2JLr0+mKTCrjhbxBocstOsEIW6XSfEE2Sx4bBdj4LkcGoR/cCmTC8vjoL
82VaUnCVWyhwRgkowi4yJzE6iG5yQ8r6NpAPZsfYkgeOLFQ9uAy6pSceFRa1w38q
vrysB+e0Dof6HRCd3UvbvGo94+ev4yc8niS70nFsVGhntRQYgPxKPRrzW+HdyWVp
zA49P0FJgZu8a5jAbHwgv/J7ff2pfeM+ZhEX5XqR2EaMjAqLFI5QPJTFheSfjz6q
2Nbpbnq8PuIR4f1dgp3xbC1a2Lj8mzq+ek+SLMGAskMK+su8Niw38JQT/WGncqWy
H134mG6dbjGH2HhGOQjD
=zkvy
-----END PGP SIGNATURE-----
Merge tag 'usercopy-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardened usercopy updates from Kees Cook:
"A couple hardened usercopy changes:
- drop now unneeded is_vmalloc_or_module() check (Laura Abbott)
- use enum instead of literals for stack frame API (Sahara)"
* tag 'usercopy-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mm/usercopy: Drop extra is_vmalloc_or_module() check
usercopy: Move enum for arch_within_stack_frames()
We needed the lock to avoid racing with creation of the irqchip on x86. As
kvm_set_irq_routing() calls srcu_synchronize_expedited(), this lock
might be held for a longer time.
Let's introduce an arch specific callback to check if we can actually
add irq routes. For x86, all we have to do is check if we have an
irqchip in the kernel. We don't need kvm->lock at that point as the
irqchip is marked as inititalized only when actually fully created.
Reported-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Fixes: 1df6ddede1 ("KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP")
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The helper xen_reboot will be called by the EFI code in a later patch.
Note that the ARM version does not yet exist and will be added in a
later patch too.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Recent code rework that split handling ov PV, HVM and PVH guests into
separate files missed calling xen_smp_intr_init_pv() on CPU0.
Add this call.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Juergen Gross <jgross@suse.com>
Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741)
established that commit 72a9b18629 ("xen: Remove event channel
notification through Xen PCI platform device") (and thus commit
da72ff5bfc ("partially revert "xen: Remove event channel
notification through Xen PCI platform device"")) are unnecessary and,
in fact, prevent HVM guests from booting on Xen releases prior to 4.0
Therefore we revert both of those commits.
The summary of that discussion is below:
Here is the brief summary of the current situation:
Before the offending commit (72a9b18629):
1) INTx does not work because of the reset_watches path.
2) The reset_watches path is only taken if you have Xen > 4.0
3) The Linux Kernel by default will use vector inject if the hypervisor
support. So even INTx does not work no body running the kernel with
Xen > 4.0 would notice. Unless he explicitly disabled this feature
either in the kernel or in Xen (and this can only be disabled by
modifying the code, not user-supported way to do it).
After the offending commit (+ partial revert):
1) INTx is no longer support for HVM (only for PV guests).
2) Any HVM guest The kernel will not boot on Xen < 4.0 which does
not have vector injection support. Since the only other mode
supported is INTx which.
So based on this summary, I think before commit (72a9b18629) we were
in much better position from a user point of view.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
e820 map is updated with information from the zeropage (i.e. pvh_bootparams)
by default_machine_specific_memory_setup(). With the way things are done
now, we end up with a duplicated e820 map.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
There is no user of x86_hyper->set_cpu_features() any more. Remove it.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
There is no need to set the same capabilities for each cpu
individually. This can be done for all cpus in platform initialization.
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Alok Kataria <akataria@vmware.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the xsave feature availability is
indicated by special casing the related cpuid leaf.
Instead of delivering fake cpuid values set or clear the cpu
capability bits for xsave instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
When running as pv domain xen_cpuid() is being used instead of
native_cpuid(). In xen_cpuid() the x2apic feature is indicated as not
being present by special casing the related cpuid leaf.
Instead of delivering fake cpuid values clear the cpu capability bit
for x2apic instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>