Some issues have been reported with the for loop in stop_this_cpu() that
issues the 'wbinvd; hlt' sequence. Reverting this sequence to halt()
has been shown to resolve the issue.
However, the wbinvd is needed when running with SME. The reason for the
wbinvd is to prevent cache flush races between encrypted and non-encrypted
entries that have the same physical address. This can occur when
kexec'ing from memory encryption active to inactive or vice-versa. The
important thing is to not have outside of kernel text memory references
(such as stack usage), so the usage of the native_*() functions is needed
since these expand as inline asm sequences. So instead of reverting the
change, rework the sequence.
Move the wbinvd instruction outside of the for loop as native_wbinvd()
and make its execution conditional on X86_FEATURE_SME. In the for loop,
change the asm 'wbinvd; hlt' sequence back to a halt sequence but use
the native_halt() call.
Fixes: bba4ed011a ("x86/mm, kexec: Allow kexec to be used with SME")
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yu Chen <yu.c.chen@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kexec@lists.infradead.org
Cc: ebiederm@redhat.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180117234141.21184.44067.stgit@tlendack-t1.amdoffice.net
Keith reported an issue with vector space exhaustion on a server machine
which is caused by the i40e driver allocating 168 MSI interrupts when the
driver is initialized, even when most of these interrupts are not used at
all.
The x86 vector allocation code tries to avoid the immediate allocation with
the reservation mode, but the card uses MSI and does not support MSI entry
masking, which prevents reservation mode and requires immediate vector
allocation.
The matrix allocator is a bit naive and prefers the first CPU in the
cpumask which describes the possible target CPUs for an allocation. That
results in allocating all 168 vectors on CPU0 which later causes vector
space exhaustion when the NVMe driver tries to allocate managed interrupts
on each CPU for the per CPU queues.
Avoid this by finding the CPU which has the lowest vector allocation count
to spread out the non managed interrupt accross the possible target CPUs.
Fixes: 2f75d9e1c9 ("genirq: Implement bitmap matrix allocator")
Reported-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Keith Busch <keith.busch@intel.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801171557330.1777@nanos
Current code configures the hardware with a new SA before the state has been
fully initialized. During this time interval, an incoming ESP packet can cause
a crash due to a NULL dereference. More specifically, xfrm_input() considers
the packet as valid, and yet, anti-replay mechanism is not initialized.
Move hardware configuration to the end of xfrm_state_construct(), and mark
the state as valid once the SA is fully initialized.
Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Aviad Yehezkel <aviadye@mellnaox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If an invalid CANFD frame is received, from a driver or from a tun
interface, a Kernel warning is generated.
This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.
Reported-by: syzbot+e3b775f40babeff6e68b@syzkaller.appspotmail.com
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If an invalid CAN frame is received, from a driver or from a tun
interface, a Kernel warning is generated.
This patch replaces the WARN_ONCE by a simple pr_warn_once, so that a
kernel, bootet with panic_on_warn, does not panic. A printk seems to be
more appropriate here.
Reported-by: syzbot+4386709c0c1284dca827@syzkaller.appspotmail.com
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Final 4.15 drm-misc pull:
Just 3 sun4i patches to fix clock computation/checks.
* tag 'drm-misc-fixes-2018-01-17' of git://anongit.freedesktop.org/drm/drm-misc:
drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate
Last minute fixes for vmwgfx.
One fix for a drm helper warning introduced in 4.15
One important fix for a longer standing memory corruption issue on older
hardware versions.
* 'vmwgfx-fixes-4.15' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: fix memory corruption with legacy/sou connectors
drm/vmwgfx: Fix a boot time warning
syzkaller generated a BPF proglet and triggered a warning with
the following:
0: (b7) r0 = 0
1: (d5) if r0 s<= 0x0 goto pc+0
R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
2: (1f) r0 -= r1
R0=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
verifier internal error: known but bad sbounds
What happens is that in the first insn, r0's min/max value
are both 0 due to the immediate assignment, later in the jsle
test the bounds are updated for the min value in the false
path, meaning, they yield smin_val = 1, smax_val = 0, and when
ctx pointer is subtracted from r0, verifier bails out with the
internal error and throwing a WARN since smin_val != smax_val
for the known constant.
For min_val > max_val scenario it means that reg_set_min_max()
and reg_set_min_max_inv() (which both refine existing bounds)
demonstrated that such branch cannot be taken at runtime.
In above scenario for the case where it will be taken, the
existing [0, 0] bounds are kept intact. Meaning, the rejection
is not due to a verifier internal error, and therefore the
WARN() is not necessary either.
We could just reject such cases in adjust_{ptr,scalar}_min_max_vals()
when either known scalars have smin_val != smax_val or
umin_val != umax_val or any scalar reg with bounds
smin_val > smax_val or umin_val > umax_val. However, there
may be a small risk of breakage of buggy programs, so handle
this more gracefully and in adjust_{ptr,scalar}_min_max_vals()
just taint the dst reg as unknown scalar when we see ops with
such kind of src reg.
Reported-by: syzbot+6d362cadd45dc0a12ba4@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Running the following sequence is currently broken:
# tc qdisc add dev foo clsact
# tc filter replace dev foo ingress prio 1 handle 1 bpf da obj bar.o
# tc filter replace dev foo ingress prio 1 handle 1 bpf da obj bar.o
RTNETLINK answers: Invalid argument
The normal expectation on kernel side is that the second command
succeeds replacing the existing program. However, what happens is
in cls_bpf_change(), we bail out with err in the second run in
cls_bpf_offload(). The EINVAL comes directly in cls_bpf_offload()
when comparing prog vs oldprog's gen_flags. In case of above
replace the new prog's gen_flags are 0, but the old ones are 8,
which means TCA_CLS_FLAGS_NOT_IN_HW is set (e.g. drivers not having
cls_bpf offload).
Fix 102740bd94 ("cls_bpf: fix offload assumptions after callback
conversion") in the following way: gen_flags from user space passed
down via netlink cannot include status flags like TCA_CLS_FLAGS_IN_HW
or TCA_CLS_FLAGS_NOT_IN_HW as opposed to oldprog that we previously
loaded. Therefore, it doesn't make any sense to include them in the
gen_flags comparison with the new prog before we even attempt to
offload. Thus, lets fix this before 4.15 goes out.
Fixes: 102740bd94 ("cls_bpf: fix offload assumptions after callback conversion")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the receive queue for 4096 bytes fragments, the page address
set in the SW data0 field of the descriptor is not the one we got
when doing the reassembly in receive. The page structure was retrieved
from the wrong descriptor into SW data0 which is then causing a
page fault when UDP checksum is accessing data above 1500.
Signed-off-by: Rex Chang <rchang@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code copies directly from userspace to ctx->crypto_send, but
doesn't always reinitialize it to 0 on failure. This causes any
subsequent attempt to use this setsockopt to fail because of the
TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually
ready.
This should result in a correctly set up socket after the 3rd call, but
currently it does not:
size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128);
struct tls12_crypto_info_aes_gcm_128 crypto_good = {
.info.version = TLS_1_2_VERSION,
.info.cipher_type = TLS_CIPHER_AES_GCM_128,
};
struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good;
crypto_bad_type.info.cipher_type = 42;
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s);
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1);
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s);
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
do_tls_setsockopt_tx returns 0 without doing anything when crypto_info
is already set. Silent failure is confusing for users.
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
During setsockopt(SOL_TCP, TLS_TX), if initialization of the software
context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't
reassign ctx->priv_ctx to NULL, so we can't even do another attempt to
set it up on the same socket, as it will fail with -EEXIST.
Fixes: 3c4d755915 ('tls: kernel TLS support')
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlpeDfMTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAe/sog7Dgc9ucfCACy1QfHf8WVpcdGzD4cpgIUXc4tp22E
n+sCE9/f2bnaz3V3/d3gNatPMiCcBs7oCAsj5yJGkDfG+TQo7oPsa0iacdkEN20J
p/OlxNeYUMAfUCzw73zC3WSsc8hiKGBxX6SzQNhLBpLOO8RFylHV98EnP1ugBEAV
qxr3bN0EVW/syvFGsXv9szdtslM2LocdayjWeAw2gizo8L5tNoLOAWDpwAkOI0bw
5J1BMOdehi3c6APDNttbjXRUbwlGNZMDBXj1fSs/7K9ngFOTfB1w1qpItI7yv5dM
wwnjH5rooKlaXkMkZkPL8wrD778NW99WUuXMspJYgNuhAECVRixqzNMb
=m1Js
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.15-20180116' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2018-01-16
this is a pull reqeust of a single patch for net/master:
This patch by Stephane Grosjean fixes a potential bug in the packet
fragmentation in the peak USB driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some iommu implementations can merge physically and/or virtually
contiguous segments inside sg_map_dma. The NVMe SGL support does not take
this into account and will warn because of falling off a loop. Pass the
number of mapped segments to nvme_pci_setup_sgls so that the SGL setup
can take the number of mapped segments into account.
Reported-by: Fangjian (Turing) <f.fangjian@huawei.com>
Fixes: a7a7cbe3 ("nvme-pci: add SGL support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The driver needs to verify there is a payload with a command before
seeing if it should use SGLs to map it.
Fixes: 955b1b5a00 ("nvme-pci: move use_sgl initialization to nvme_init_iod()")
Reported-by: Paul Menzel <pmenzel+linux-nvme@molgen.mpg.de>
Reviewed-by: Paul Menzel <pmenzel+linux-nvme@molgen.mpg.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Calling accept on a TCP socket with a TLS ulp attached results
in two sockets that share the same ulp context.
The ulp context is freed while a socket is destroyed, so
after one of the sockets is released, the second second will
trigger a use after free when it tries to access the ulp context
attached to it.
We restrict the TLS ulp to sockets in ESTABLISHED state
to prevent the scenario above.
Fixes: 3c4d755915 ("tls: kernel TLS support")
Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
r8153 on Dell TB15/16 dock corrupts rx packets.
This change is suggested by Realtek. They guess that the XHCI controller
doesn't have enough buffer, and their guesswork is correct, once the RX
aggregation gets disabled, the issue is gone.
ASMedia is currently working on a real sulotion for this issue.
Dell and ODM confirm the bcdDevice and iSerialNumber is unique for TB16.
Note that TB15 has different bcdDevice and iSerialNumber, which are not
unique values. If you still have TB15, please contact Dell to replace it
with TB16.
BugLink: https://bugs.launchpad.net/bugs/1729674
Cc: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- A rather involved set of memory hardware encryption fixes to
support the early loading of microcode files via the initrd. These
are larger than what we normally take at such a late -rc stage, but
there are two mitigating factors: 1) much of the changes are
limited to the SME code itself 2) being able to early load
microcode has increased importance in the post-Meltdown/Spectre
era.
- An IRQ vector allocator fix
- An Intel RDT driver use-after-free fix
- An APIC driver bug fix/revert to make certain older systems boot
again
- A pkeys ABI fix
- TSC calibration fixes
- A kdump fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/vector: Fix off by one in error path
x86/intel_rdt/cqm: Prevent use after free
x86/mm: Encrypt the initrd earlier for BSP microcode update
x86/mm: Prepare sme_encrypt_kernel() for PAGE aligned encryption
x86/mm: Centralize PMD flags in sme_encrypt_kernel()
x86/mm: Use a struct to reduce parameters for SME PGD mapping
x86/mm: Clean up register saving in the __enc_copy() assembly code
x86/idt: Mark IDT tables __initconst
Revert "x86/apic: Remove init_bsp_APIC()"
x86/mm/pkeys: Fix fill_sig_info_pkey
x86/tsc: Print tsc_khz, when it differs from cpu_khz
x86/tsc: Fix erroneous TSC rate on Skylake Xeon
x86/tsc: Future-proof native_calibrate_tsc()
kdump: Write the correct address of mem_section into vmcoreinfo
Pull scheduler fix from Ingo Molnar:
"A delayacct statistics correctness fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
delayacct: Account blkio completion on the correct task
Pull x86 perf fix from Ingo Molnar:
"An Intel RAPL events fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Fix Haswell and Broadwell server RAPL event
tfile->tun could be detached before we close the tun fd,
via tun_detach_all(), so it should not be used to check for
tfile->tx_array.
As Jason suggested, we probably have to clean it up
unconditionally both in __tun_deatch() and tun_detach_all(),
but this requires to check if it is initialized or not.
Currently skb_array_cleanup() doesn't have such a check,
so I check it in the caller and introduce a helper function,
it is a bit ugly but we can always improve it in net-next.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 1576d98605 ("tun: switch to use skb array for tx")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 pti bits and fixes from Thomas Gleixner:
"This last update contains:
- An objtool fix to prevent a segfault with the gold linker by
changing the invocation order. That's not just for gold, it's a
general robustness improvement.
- An improved error message for objtool which spares tearing hairs.
- Make KASAN fail loudly if there is not enough memory instead of
oopsing at some random place later
- RSB fill on context switch to prevent RSB underflow and speculation
through other units.
- Make the retpoline/RSB functionality work reliably for both Intel
and AMD
- Add retpoline to the module version magic so mismatch can be
detected
- A small (non-fix) update for cpufeatures which prevents cpu feature
clashing for the upcoming extra mitigation bits to ease
backporting"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
module: Add retpoline tag to VERMAGIC
x86/cpufeature: Move processor tracing out of scattered features
objtool: Improve error message for bad file argument
objtool: Fix seg fault with gold linker
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
x86/retpoline: Fill RSB on context switch for affected CPUs
x86/kasan: Panic if there is not enough memory to boot
Pull timer fix from Thomas Gleixner:
"A one-liner fix which prevents deferrable timers becoming stale when
the system does not switch into NOHZ mode"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Unconditionally check deferrable base
As per 90caccdd8c ("bpf: fix bpf_tail_call() x64 JIT"), the index used
for array lookup is defined to be 32-bit wide. Update a misleading
comment that suggests it is 64-bit wide.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When the source and destination register are identical, our JIT does not
generate correct code, which leads to kernel oopses.
Fix this by (a) generating more efficient code, and (b) making use of
the temporary earlier if we will overwrite the address register.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
When an eBPF program tail-calls another eBPF program, it enters it after
the prologue to avoid having complex stack manipulations. This can lead
to kernel oopses, and similar.
Resolve this by always using a fixed stack layout, a CPU register frame
pointer, and using this when reloading registers before returning.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The stack layout documentation incorrectly suggests that the BPF JIT
scratch space starts immediately below BPF_FP. This is not correct,
so let's fix the documentation to reflect reality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Move the stack documentation towards the top of the file, where it's
relevant for things like the register layout.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
As per 2dede2d8e9 ("ARM EABI: stack pointer must be 64-bit aligned
after a CPU exception") the stack should be aligned to a 64-bit boundary
on EABI systems. Ensure that the eBPF JIT appropraitely aligns the
stack.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Avoid the 'bx' instruction on CPUs that have no support for Thumb and
thus do not implement this instruction by moving the generation of this
opcode to a separate function that selects between:
bx reg
and
mov pc, reg
according to the capabilities of the CPU.
Fixes: 39c13c204b ("arm: eBPF JIT compiler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
It looks like in all cases 'struct vmw_connector_state' is used. But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed. Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.
Fixes: d7721ca711 "drm/vmwgfx: Connector atomic state"
Cc: <stable@vger.kernel.org>
Signed-off-by: Rob Clark <rclark@redhat.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
On a I2C_SMBUS_I2C_BLOCK_DATA read request, if data->block[0] is
greater than I2C_SMBUS_BLOCK_MAX + 1, the underlying I2C driver writes
data out of the msgbuf1 array boundary.
It is possible from a user application to run into that issue by
calling the I2C_SMBUS ioctl with data.block[0] greater than
I2C_SMBUS_BLOCK_MAX + 1.
This patch makes the code compliant with
Documentation/i2c/dev-interface by raising an error when the requested
size is larger than 32 bytes.
Call Trace:
[<ffffffff8139f695>] dump_stack+0x67/0x92
[<ffffffff811802a4>] panic+0xc5/0x1eb
[<ffffffff810ecb5f>] ? vprintk_default+0x1f/0x30
[<ffffffff817456d3>] ? i2cdev_ioctl_smbus+0x303/0x320
[<ffffffff8109a68b>] __stack_chk_fail+0x1b/0x20
[<ffffffff817456d3>] i2cdev_ioctl_smbus+0x303/0x320
[<ffffffff81745aed>] i2cdev_ioctl+0x4d/0x1e0
[<ffffffff811f761a>] do_vfs_ioctl+0x2ba/0x490
[<ffffffff81336e43>] ? security_file_ioctl+0x43/0x60
[<ffffffff811f7869>] SyS_ioctl+0x79/0x90
[<ffffffff81a22e97>] entry_SYSCALL_64_fastpath+0x12/0x6a
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Reference count of device node was increased in of_i2c_register_device,
but without decreasing it in i2c_unregister_device. Then the added
device node will never be released. Fix this by adding the of_node_put.
Signed-off-by: Lixin Wang <alan.1.wang@nokia-sbell.com>
Tested-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Fix to return error code -ENOMEM from the mempool_create_kmalloc_pool()
error handling case instead of 0, as done elsewhere in this function.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Loading key via kernel keyring service erases the internal
key copy immediately after we pass it in crypto layer. This is
wrong because IV is initialized later and we use wrong key
for the initialization (instead of real key there's just zeroed
block).
The bug may cause data corruption if key is loaded via kernel keyring
service first and later same crypt device is reactivated using exactly
same key in hexbyte representation, or vice versa. The bug (and fix)
affects only ciphers using following IVs: essiv, lmk and tcw.
Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Cc: stable@vger.kernel.org # 4.10+
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Some asynchronous cipher implementations may use DMA. The stack may
be mapped in the vmalloc area that doesn't support DMA. Therefore,
the cipher request and initialization vector shouldn't be on the
stack.
Fix this by allocating the request and iv with kmalloc.
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If dm-crypt uses authenticated mode with separate MAC, there are two
concatenated part of the key structure - key(s) for encryption and
authentication key.
Add a missing check for authenticated key length. If this key length is
smaller than actually provided key, dm-crypt now properly fails instead
of crashing.
Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: Salah Coronya <salahx@yahoo.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When inserting a new key/value pair into a btree we walk down the spine of
btree nodes performing the following 2 operations:
i) space for a new entry
ii) adjusting the first key entry if the new key is lower than any in the node.
If the _root_ node is full, the function btree_split_beneath() allocates 2 new
nodes, and redistibutes the root nodes entries between them. The root node is
left with 2 entries corresponding to the 2 new nodes.
btree_split_beneath() then adjusts the spine to point to one of the two new
children. This means the first key is never adjusted if the new key was lower,
ie. operation (ii) gets missed out. This can result in the new key being
'lost' for a period; until another low valued key is inserted that will uncover
it.
This is a serious bug, and quite hard to make trigger in normal use. A
reproducing test case ("thin create devices-in-reverse-order") is
available as part of the thin-provision-tools project:
https://github.com/jthornber/thin-provisioning-tools/blob/master/functional-tests/device-mapper/dm-tests.scm#L593
Fix the issue by changing btree_split_beneath() so it no longer adjusts
the spine. Instead it unlocks both the new nodes, and lets the main
loop in btree_insert_raw() relock the appropriate one and make any
neccessary adjustments.
Cc: stable@vger.kernel.org
Reported-by: Monty Pavel <monty_pavel@sina.com>
Signed-off-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For btree removal, there is a corner case that a single thread
could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5)
and leads to deadlock.
A btree removal might eventually call
rebalance_children()->rebalance3() to rebalance entries of three
neighbor child nodes when shadow_spine has already acquired two
write locks. In rebalance3(), it tries to shadow and acquire the
write locks of all three child nodes. However, shadowing a child
node requires acquiring a read lock of the original child node and
a write lock of the new block. Although the read lock will be
released after block shadowing, shadowing the third child node
in rebalance3() could still take the sixth lock.
(2 write locks for shadow_spine +
2 write locks for the first two child nodes's shadow +
1 write lock for the last child node's shadow +
1 read lock for the last child node)
Cc: stable@vger.kernel.org
Signed-off-by: Dennis Yang <dennisyang@qnap.com>
Acked-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
to fix it.
Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set)
Reported-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
the contigious hint for hugetlbfs; supporting an alternative GICv4 init
sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaXi9yAAoJEEtpOizt6ddymb4H/R6Q7uPSNY31d/wcMHg8qYS7
foDW76r7mKliRVmCJoq9oqLqC7BLpQszfZ8dFjPSfdLA4xVMsuZ3GG3S7jlghiuN
9+rZK+ZZX8g5uQNsqVITC3WrXmozBj+VEs/uH2Z1pu0g+siPTp7J2iv5+A5tvM3A
NCySqgEjefQyy7Zs2r7TuvM+E3p9MY7jZih9E2o8mn2TQipVKrcnHRN3IjNNtI4u
C17x70OQ1ZY7bwnmPnuPPqnX3H1fQ6+UgwtfDCu3KP7DAFVjqAz03X6wbf1nCLAB
zzKok/SnIFWpr56JUSOzMpHWG8sOFscdVXxW97a2Ova0ur0rHW2iPiucTb8jOjQ=
=gJL6
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-fixes-for-v4.15-3-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Fixes for v4.15, Round 3 (v2)
Three more fixes for v4.15 fixing incorrect huge page mappings on systems using
the contigious hint for hugetlbfs; supporting an alternative GICv4 init
sequence; and correctly implementing the ARM SMCC for HVC and SMC handling.
Commit 6e032b350c ("powerpc/powernv: Check device-tree for RFI flush
settings") uses u64 in asm/hvcall.h without including linux/types.h
This breaks hvcall.h users that do not include the header themselves.
Fixes: 6e032b350c ("powerpc/powernv: Check device-tree for RFI flush settings")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Expose the state of the RFI flush (enabled/disabled) via debugfs, and
allow it to be enabled/disabled at runtime.
eg: $ cat /sys/kernel/debug/powerpc/rfi_flush
1
$ echo 0 > /sys/kernel/debug/powerpc/rfi_flush
$ cat /sys/kernel/debug/powerpc/rfi_flush
0
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
The recent commit 87590ce6e3 ("sysfs/cpu: Add vulnerability folder")
added a generic folder and set of files for reporting information on
CPU vulnerabilities. One of those was for meltdown:
/sys/devices/system/cpu/vulnerabilities/meltdown
This commit wires up that file for 64-bit Book3S powerpc.
For now we default to "Vulnerable" unless the RFI flush is enabled.
That may not actually be true on all hardware, further patches will
refine the reporting based on the CPU/platform etc. But for now we
default to being pessimists.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Keith reported the following warning:
WARNING: CPU: 28 PID: 1420 at kernel/irq/matrix.c:222 irq_matrix_remove_managed+0x10f/0x120
x86_vector_free_irqs+0xa1/0x180
x86_vector_alloc_irqs+0x1e4/0x3a0
msi_domain_alloc+0x62/0x130
The reason for this is that if the vector allocation fails the error
handling code tries to free the failed vector as well, which causes the
above imbalance warning to trigger.
Adjust the error path to handle this correctly.
Fixes: b5dc8e6c21 ("x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors")
Reported-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161217300.1823@nanos
intel_rdt_iffline_cpu() -> domain_remove_cpu() frees memory first and then
proceeds accessing it.
BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195
find_first_bit+0x1f/0x80
has_busy_rmid+0x47/0x70
intel_rdt_offline_cpu+0x4b4/0x510
Freed by task 195:
kfree+0x94/0x1a0
intel_rdt_offline_cpu+0x17d/0x510
Do the teardown first and then free memory.
Fixes: 24247aeeab ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Roderick W. Smith" <rod.smith@canonical.com>
Cc: 1733662@bugs.launchpad.net
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801161957510.2366@nanos