Move the inline function nsec_to_cycles from x86.c to x86.h, as
the next patch uses it from lapic.c.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is a generic function __pvclock_read_cycles to be used to get both
flags and cycles. For function pvclock_read_flags, it's useless to get
cycles value. To make this function be more effective, get this variable
flags directly in function.
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Function __pvclock_read_cycles is short enough, so there is no need to
have another function pvclock_get_nsec_offset to calculate tsc delta.
It's better to combine it into function __pvclock_read_cycles.
Remove useless variables in function __pvclock_read_cycles.
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Protocol for the "version" fields is: hypervisor raises it (making it
uneven) before it starts updating the fields and raises it again (making
it even) when it is done. Thus the guest can make sure the time values
it got are consistent by checking the version before and after reading
them.
Add CPU barries after getting version value just like what function
vread_pvclock does, because all of callees in this function is inline.
Fixes: 502dfeff23
Cc: stable@vger.kernel.org
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Cc: stable@vger.kernel.org # v3.10+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 2a0cb4e2d4 ("iommu/amd: Add new map for storing IVHD dev entry
type HID") added a call to DUMP_printk in init_iommu_from_acpi() which
used the value of devid before this variable was initialized.
Fixes: 2a0cb4e2d4 ('iommu/amd: Add new map for storing IVHD dev entry type HID')
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The valid range of 'did' in get_iommu_domain(*iommu, did)
is 0..cap_ndoms(iommu->cap), so don't exceed that
range in free_all_cpu_cached_iovas().
The user-visible impact of the out-of-bounds access is the machine
hanging on suspend-to-ram. It is, in fact, a kernel panic, but due
to already suspended devices, that's often not visible to the user.
Fixes: 22e2f9fa63 ("iommu/vt-d: Use per-cpu IOVA caching")
Signed-off-by: Jan Niehusmann <jan@gondor.com>
Tested-By: Marius Vlad <marius.c.vlad@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
kvm provides kvm_vcpu_uninit(), which amongst other things, releases the
last reference to the struct pid of the task that was last running the vcpu.
On arm64 built with CONFIG_DEBUG_KMEMLEAK, starting a guest with kvmtool,
then killing it with SIGKILL results (after some considerable time) in:
> cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff80007d5ea080 (size 128):
> comm "lkvm", pid 2025, jiffies 4294942645 (age 1107.776s)
> hex dump (first 32 bytes):
> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffff8000001b30ec>] create_object+0xfc/0x278
> [<ffff80000071da34>] kmemleak_alloc+0x34/0x70
> [<ffff80000019fa2c>] kmem_cache_alloc+0x16c/0x1d8
> [<ffff8000000d0474>] alloc_pid+0x34/0x4d0
> [<ffff8000000b5674>] copy_process.isra.6+0x79c/0x1338
> [<ffff8000000b633c>] _do_fork+0x74/0x320
> [<ffff8000000b66b0>] SyS_clone+0x18/0x20
> [<ffff800000085cb0>] el0_svc_naked+0x24/0x28
> [<ffffffffffffffff>] 0xffffffffffffffff
On x86 kvm_vcpu_uninit() is called on the path from kvm_arch_destroy_vm(),
on arm no equivalent call is made. Add the call to kvm_arch_vcpu_free().
Signed-off-by: James Morse <james.morse@arm.com>
Fixes: 749cf76c5a ("KVM: ARM: Initial skeleton to compile KVM support")
Cc: <stable@vger.kernel.org> # 3.10+
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When CONFIG_ARM_PMU is disabled, we get the following build error:
arch/arm64/kvm/sys_regs.c: In function 'pmu_counter_idx_valid':
arch/arm64/kvm/sys_regs.c:564:27: error: 'ARMV8_PMU_CYCLE_IDX' undeclared (first use in this function)
if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX)
^
arch/arm64/kvm/sys_regs.c:564:27: note: each undeclared identifier is reported only once for each function it appears in
arch/arm64/kvm/sys_regs.c: In function 'access_pmu_evcntr':
arch/arm64/kvm/sys_regs.c:592:10: error: 'ARMV8_PMU_CYCLE_IDX' undeclared (first use in this function)
idx = ARMV8_PMU_CYCLE_IDX;
^
arch/arm64/kvm/sys_regs.c: In function 'access_pmu_evtyper':
arch/arm64/kvm/sys_regs.c:638:14: error: 'ARMV8_PMU_CYCLE_IDX' undeclared (first use in this function)
if (idx == ARMV8_PMU_CYCLE_IDX)
^
arch/arm64/kvm/hyp/switch.c:86:15: error: 'ARMV8_PMU_USERENR_MASK' undeclared (first use in this function)
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
This patch fixes the build with CONFIG_ARM_PMU disabled.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.
Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.
Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.
This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()
Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
Oops: Bad kernel stack pointer, sig: 6 [#1]
CPU: 0 PID: 2006 Comm: tm-execed Not tainted
NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
REGS: c00000003ffefd40 TRAP: 0700 Not tainted
MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000
CFAR: c0000000000098b4 SOFTE: 0
PACATMSCRATCH: b00000010000d033
GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
NIP [c000000000009980] fast_exception_return+0xb0/0xb8
LR [0000000000000000] (null)
Call Trace:
Instruction dump:
f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b
Kernel BUG at c000000000043e80 [verbose debug info unavailable]
Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
Oops: Unrecoverable exception, sig: 6 [#2]
CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D
task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D
MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000
CFAR: c000000000015a20 SOFTE: 0
PACATMSCRATCH: b00000010000d033
GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
LR [c000000000015a24] __switch_to+0x1f4/0x420
Call Trace:
Instruction dump:
7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
This fixes CVE-2016-5828.
Fixes: bc2a9408fa ("powerpc: Hook in new transactional memory code")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
By default, mdiobus_alloc() sets the PHYs to polling mode, but a
pointer size memcpy means that a couple IRQs end up being overwritten
with a value of 0. This means that PHY_POLL is disabled and results
in unpredictable behavior depending on the PHY's location on the
MDIO bus. Remove that memcpy and the now unused phy_irq member to
force the SMSC911x PHYs into polling mode 100% of the time.
Fixes: e7f4dc3536 ("mdio: Move allocation of interrupts into core")
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two straightforward fixes. One is a concurrency issue only affecting
SAS connected SATA drives, but which could hang the storage subsystem
if it triggers (because the outstanding command count on error never
goes back to zero) and the other is a NO_TAG fallout from the switch
to hostwide tags which causes the system to crash on module insertion
(we've checked carefully and only the 53c700 family of drivers is
vulnerable to this issue).
Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXbtgBAAoJEAVr7HOZEZN4/EQP/0diQXZawG4Ncy4dYmrzYDeR
4R5OqkV5hJapzJD4De/DqTRxKVIsxVLkgM4wVT7KHv0QndX81ipdKsiWyUyXJ3nb
5HBvJ/GhAUnVyVM1nMGqK81Sk68zJiLoJ32lnE099GP6eJKtRtWgfKvEHPimhDWc
S42s6t6dhyirmXmVQ0Jyx1/c1y8RuL5dTCMXjkP1LdTc5UnZ7b3HeKIxE0YJ3xRs
ZsHdEPDk7UR1UUXWIdqjx/P0elgDMExkOaAQUCeZV2aWiV8OgW+OSSzjSxDvlJLk
IBeggRjdiqEBCyuU5QqfQjISaFR0POf5q41Bomg+Zq9XtkdEovRhSwdkP2/GY/TD
wEozTZKkRh1RcYtCNOASOgvmpHXcGBWhMk6iqmsbl/LIL5UXdvB9EAzhHlAZ/kai
Szs3w7IBye6CfISPoFPstNav8DFkse2lf6fVdEEdmRLF8wXGhAXugAdEw3fbuiJX
9SMwTQqSEMZCQJm4rW2vMSncmatvKfce/SXLvrEEpWFyOM984Q0ZGUTMCCKDstNm
e8LmnRlMWLa/mYoo9DFIfkgRNd1gH6j+gIPsj+Rtkhk4Q3YC9Rr6gERvuRjGM9XU
EI611dRUA2YQtclG2DKHkkY1wdN15RyrGR50I3VLaRv0RzwY5HCV1JGEemAokgpU
BfXrmW3kPfh7fUAHhLqF
=5jjT
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.
One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
Currently the ad7266 driver treats any failure to get vref as though the
regulator were not present but this means that if probe deferral is
triggered the driver will act as though the regulator were not present.
Instead only use the internal reference if we explicitly got -ENODEV which
is what is returned for absent regulators.
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
The ad7266 driver attempts to support deciding between the use of internal
and external power supplies by checking to see if an error is returned when
requesting the regulator. This doesn't work with the current code since the
driver uses a normal regulator_get() which is for non-optional supplies
and so assumes that if a regulator is not provided by the platform then
this is a bug in the platform integration and so substitutes a dummy
regulator. Use regulator_get_optional() instead which indicates to the
framework that the regulator may be absent and provides a dummy regulator
instead.
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
All regulator_get() variants return either a pointer to a regulator or an
ERR_PTR() so testing for NULL makes no sense and may lead to bugs if we
use NULL as a valid regulator. Fix this by using IS_ERR() as expected.
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
These two spi_w8r8() calls return a value with is used by the code
following the error check. The dubious use was caused by a cleanup
patch.
Fixes: d34dbee8ac ("staging:iio:accel:kxsd9 cleanup and conversion to iio_chan_spec.")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
sca3000_read_ctrl_reg() returns a negative number on failure, check for
this instead of zero.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
We are getting somewhat random soft lockups with this signature:
[ 86.992215] [<fffffc00080935e0>] el1_irq+0xa0/0x10c
[ 86.997082] [<fffffc000841822c>] cursor_timer_handler+0x30/0x54
[ 87.002991] [<fffffc000810ec44>] call_timer_fn+0x54/0x1a8
[ 87.008378] [<fffffc000810ef88>] run_timer_softirq+0x1c4/0x2bc
[ 87.014200] [<fffffc000809077c>] __do_softirq+0x114/0x344
[ 87.019590] [<fffffc00080af45c>] irq_exit+0x74/0x98
[ 87.024458] [<fffffc00080fac20>] __handle_domain_irq+0x98/0xfc
[ 87.030278] [<fffffc000809056c>] gic_handle_irq+0x94/0x190
This is caused by the vt visual_init() function calling into
fbcon_init() with a vc_cur_blink_ms value of zero. This is a
transient condition, as it is later set to a non-zero value. But, if
the timer happens to expire while the blink rate is zero, it goes into
an endless loop, and we get soft lockup.
The fix is to initialize vc_cur_blink_ms before calling the con_init()
function.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: stable@vger.kernel.org
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Scot Doyle <lkml14@scotdoyle.com>
Tested-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.
We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"
* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.
This first part has some fixes and two performance improvements that
we've been testing for some time.
Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing
Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy
driver spotted by syzkaller.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXbmz+AAoJEGwxgFQ9KSmk4qcP/0xQ2V1WS7U+ovDANFz1YBz4
xYuN/txeyl4wfuqaU94ZMICcSgBhfGhgAmSKD2AM3sdxjd1B3pxCdMad6ku+I86u
0RVATABkzaC+54CTdsIDpt5HEK+bXfVbusvnC8PgyWd976t2i/lQbYfJ/+0dDPUe
S8CtVYR6i8jtyUmDeBWMwisXaU+6yqG3w0qGBXQ5/tS702Wx3Tt4TAqjl5SQRfC1
8gjl0JR0ydi0PDO9nVTcC4qOJRU4r8+4Y0aYcfJ12cbVuwgJI9ht+yRH3a92Oc7O
oUypF5R2M2xWljJT+vijQuhv4koE6bEL8WRx5LIP0kouTVZBFRZuoEWjwVBwRHIx
C6IvlECVjo4Ezm8uPK/lnbBXA5m1m4EULUV7eRDYjwRsKqqTKeGad2u8wOgyElLI
hcEaUijewtO9Xizwn/xy1GIT82HAfXPzsFXNhrX+fdC9u+TUPKrbZdHTL5+Uk5Tv
FvED26jPO2Tw4Uijlii77o7pGYwQc7Dl5VXo1Eu0TrCGQi147oDuFGtwoT9vqES6
L+rGNQjDQZNmenHAwe1IvkSx5XpUWokWqOXJbpEk1psSSQ3lPwVfCi+JiDp1HDVJ
gWZmmmh1obPGrfChna5GrLTrNPSRTCeUg0RhZeI2fuF9Lutm9KkGWwAKtFePp2pI
3VsapVEmjH2c8UMFQGfU
=0dYh
-----END PGP SIGNATURE-----
Merge tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"
* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM
Pull x86 kprobe fix from Thomas Gleixner:
"A single fix clearing the TF bit when a fault is single stepped"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Clear TF bit in fault on single-stepping
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- force watchdog reset while processing sysrq-w
- fix a deadlock when enabling trace events in the scheduler
- fixes to the throttled next buddy logic
- fixes for the average accounting (missing serialization and
underflow handling)
- allow kernel threads for fallback to online but not active cpus"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Allow kthreads to fall back to online && !active cpus
sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
sched/fair: Initialize throttle_count for new task-groups lazily
sched/fair: Fix cfs_rq avg tracking underflow
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
sched/debug: Fix deadlock when enabling sched events
sched/fair: Fix post_init_entity_util_avg() serialization
Commit fe742fd4f9 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.
This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.
Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
done
along with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i<4; i++)); do
find . >/dev/null &
done
wait
done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull locking fix from Thomas Gleixner:
"A single fix to address a race in the static key logic"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/static_key: Fix concurrent static_key_slow_inc()
Pull irq fix from Thomas Gleixner:
"A single fix for the fallout from the conversion of MIPS GIC to irq
domains"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Fix IRQs in gic_dev_domain
- mm/radix: Update to tlb functions ric argument from Aneesh Kumar K.V
- mm/radix: Flush page walk cache when freeing page table from Aneesh Kumar K.V
- mm/hash: Use the correct PPP mask when updating HPTE from Aneesh Kumar K.V
- mm/hash: Don't add memory coherence if cache inhibited is set from Aneesh Kumar K.V
- mm/radix: Update Radix tree size as per ISA 3.0 from Aneesh Kumar K.V
- eeh: Fix invalid cached PE primary bus from Gavin Shan
- Fix faults caused by radix patching of SLB miss handler from Michael Ellerman
- bpf/jit: Disable classic BPF JIT on ppc64le from Naveen N. Rao
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXbmITAAoJEFHr6jzI4aWANXwQAIUzjKcLpyQWEKwOKfMqBT5T
EfsWDqJA/J3mYKNcZyiB7qv1NVPPkU9DSBK0OaAKwYdg5YWKDBl6R3mW+j4di0bP
SkFACCyE2WbLTCiz5fzd8l974RUh5jKQpIrObp4/8xp40d0vsyAzz4J7d4HVRsrr
BnoTS/KmytsaDQls5kYArxhW6U+Shag586Au1hNt3SS/be8lCNEXLfa3ltCr7WLJ
k+xM0KM5kpO9/OK40A64TH7xUZKQIgPMUR5Ct43IJhMeHNnQctLmGQQjRWTrajv1
K/TfrYwCl66xzKaH5G3MKJgqJAJm1LTwGs+2aOn91x5hPrbmW+bLqr1Mm0ukjROz
oaANO5fgEQjl0JRGCNAhLHvaoqJX6v5/7GbmFRoaigX4UKJ63nK1ABiwAgKDGnyj
OchwwJywU5UIX/+9Qpig3CxQNhEV33Nnp8t+dsg8CPd9o/G0mIe0QP1eGdhD09mM
X9eMfN08hLj5ERKvlpW0rrq1b/wizOGmUXbmt02HZi7iLNsyQMwShiOvwOaAvH6/
SzEFBJdp11jNoe4GJDt5rH4HlnTnTAYwcLFMTDCCPdJXy7voI/J+MaAmG89S30dQ
ph0+4v/8K2N0VDZ7kkgi0GL1gp9ULkgtimrN5Z0R8U7qEapEW6ybvv+0Ewln742f
SCRNVMZgzcwe3CcCKzdn
=fxml
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"mm/radix (Aneesh Kumar K.V):
- Update to tlb functions ric argument
- Flush page walk cache when freeing page table
- Update Radix tree size as per ISA 3.0
mm/hash (Aneesh Kumar K.V):
- Use the correct PPP mask when updating HPTE
- Don't add memory coherence if cache inhibited is set
eeh (Gavin Shan):
- Fix invalid cached PE primary bus
bpf/jit (Naveen N. Rao):
- Disable classic BPF JIT on ppc64le
.. and fix faults caused by radix patching of SLB miss handler
(Michael Ellerman)"
* tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
powerpc: Fix faults caused by radix patching of SLB miss handler
powerpc/eeh: Fix invalid cached PE primary bus
powerpc/mm/radix: Update Radix tree size as per ISA 3.0
powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
powerpc/mm/hash: Use the correct PPP mask when updating HPTE
powerpc/mm/radix: Flush page walk cache when freeing page table
powerpc/mm/radix: Update to tlb functions ric argument
Commit b235beea9e ("Clarify naming of thread info/stack allocators")
breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
kernel/fork.c:355:8: error: assignment from incompatible pointer type
stack = alloc_thread_stack_node(tsk, node);
^
Fix it by renaming free_stack() to free_thread_stack(), and updating the
return type of alloc_thread_stack_node().
Fixes: b235beea9e ("Clarify naming of thread info/stack allocators")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc fixes from Andrew Morton:
"Two weeks worth of fixes here"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
autofs: don't get stuck in a loop if vfs_write() returns an error
mm/page_owner: avoid null pointer dereference
tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
fs/nilfs2: fix potential underflow in call to crc32_le
oom, suspend: fix oom_reaper vs. oom_killer_disable race
ocfs2: disable BUG assertions in reading blocks
mm, compaction: abort free scanner if split fails
mm: prevent KASAN false positives in kmemleak
mm/hugetlb: clear compound_mapcount when freeing gigantic pages
mm/swap.c: flush lru pvecs on compound page arrival
memcg: css_alloc should return an ERR_PTR value on error
memcg: mem_cgroup_migrate() may be called with irq disabled
hugetlb: fix nr_pmds accounting with shared page tables
Revert "mm: disable fault around on emulated access bit architecture"
Revert "mm: make faultaround produce old ptes"
mailmap: add Boris Brezillon's email
mailmap: add Antoine Tenart's email
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
mm: mempool: kasan: don't poot mempool objects in quarantine
...
- A couple minor fixes to the rdma core
- Multiple minor fixes to hfi1
- Multiple minor fixes to mlx4/mlx4
- A few minor fixes to i40iw
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXbA1uAAoJELgmozMOVy/dcuEP/j5NyPmyU8XXHDloGU9b8ybu
HdnBGYZhvr2OnhuBOGW/z3dEpbVwsSfP7JTY5Z2M2GTmA0h5R8VMp9G3agCNwKvo
y0bt+GrtOzDVNkBXw+Ttqe9fBYZvMsvLe7zBV0DyBTlLeeE27f+d7ahuvFGayer8
C+i3LLgTlfzP6iImDsUdyEvp1nUc0F5Xb8vcVE4znNjwaUF1nrLRJXhSHSliLeZh
DCOr/ElbTPKm1QL0TX/O2HSxE+5zZq9VfepbKxTTBfHZp5U31ZNZrw+VrXin374h
02QdADLzOXPnUznP5UVJZPNUVdsQ8MAlW/Y9Va5w9G4nOtE6jQq8cWBeuY5g7JAJ
aTmHD60p09xGtrd+9/K7mEoSpHDUWgGl/bqTlGMRZFvBXUSxfdbEjt7TRBH2eASl
e8tOQbfMnlqO5kKDCa5BXKCJaaAeKafzUpa5kbVQZW113dBbaTC6m5qnjEEdwriX
7l4FC4vmAxrzURWd1r7oAwn4gJVDxGmXTUpgI6PzWFx93hRAukIHKsVPZ2U8IKZx
tF8SkcifXXT7F4TqSgn59SGc7BiKfvc7/vO1XBI/8nHO2NsKIVrx1S1X/yOy3Hsd
EJg42g41yr+wjQNAhG8RShgfzWh00BHtVFz+4KUESQRnZZLSMnC/EnEfGV9IzHTf
Jaz+r5+Fh2rYY/hoIn6h
=F0nv
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"This is the second batch of queued up rdma patches for this rc cycle.
There isn't anything really major in here. It's passed 0day,
linux-next, and local testing across a wide variety of hardware.
There are still a few known issues to be tracked down, but this should
amount to the vast majority of the rdma RC fixes.
Round two of 4.7 rc fixes:
- A couple minor fixes to the rdma core
- Multiple minor fixes to hfi1
- Multiple minor fixes to mlx4/mlx4
- A few minor fixes to i40iw"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (31 commits)
IB/srpt: Reduce QP buffer size
i40iw: Enable level-1 PBL for fast memory registration
i40iw: Return correct max_fast_reg_page_list_len
i40iw: Correct status check on i40iw_get_pble
i40iw: Correct CQ arming
IB/rdmavt: Correct qp_priv_alloc() return value test
IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
IB/hfi1: Fix deadlock with txreq allocation slow path
IB/mlx4: Prevent cross page boundary allocation
IB/mlx4: Fix memory leak if QP creation failed
IB/mlx4: Verify port number in flow steering create flow
IB/mlx4: Fix error flow when sending mads under SRIOV
IB/mlx4: Fix the SQ size of an RC QP
IB/mlx5: Fix wrong naming of port_rcv_data counter
IB/mlx5: Fix post send fence logic
IB/uverbs: Initialize ib_qp_init_attr with zeros
IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID
IB/core: Fix RoCE v1 multicast join logic issue
IB/core: Fix no default GIDs when netdevice reregisters
IB/hfi1: Send a pkey change event on driver pkey update
...
Pull HID fix from Jiri Kosina:
"hiddev ioctl() validation fix from Scott Bauer"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
Stable-candidate fix for a deadlock in ACPICA introduced during the
4.5 development cycle by a commit attempting to improve the handling
of AML code that doesn't belong to any namespace objects in a given
definition block (Lv Zheng).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXbbFxAAoJEILEb/54YlRx10MP/RsrluZE7OKFAUz2x1k4m+sZ
kVGsSQe0l1UhfXDKjbi9d+mQFpBMPmrype60/VQRkkLLuhSpFg4sBCqL8Xk0FIyw
iYXWKu7JkXAotdSLWLQh5HinkjJTb/BTVqW350ANy71eXhXI//ILQGm4mk4sb33M
8496IJJSb05FSDHgX97BiPLe+kjrGazJJ7/i9rZFJ1fKG9gbkyipWubx2pYg8/Br
lpRis+sEwYBcYBJiaIS935Rn/wsug3SCJBxTVRIyHBsQZVhQK2Wso9vDIp51Rvpk
cuxzltqEI38+gkZoEZJryIfc61tzQ+tdE1LppcWPCZNuzCZrdpCDAaMH69PQyJ4L
wlFKaBnQaw8zwaq5tvGnww9g70pv/ZxbUN/EpNVy5jxCJ+Um9deBzfKKl+AppHMO
ROe6oTyH6S6s+UyTL16WRVFVl+J2MUaMVWhv1D2iSo2ogvNhVZsRTouDNBN+96az
fDxzz0IuXqqbAB43pBxuGZdJ56rc0DkUpcQkQ96g2ApHNwhsoklKH8y2AEs7Ynwx
21qw5RNmh1z1PMZYId3lJR602L58q7DX4zc1P/opLVTfZ+GK0C/lC4eOrrqg8eQ0
Zn9kZUQ4omYTgi9W1a90TB0DwdzWHXn1yp1jJ3dn2frhx0se9gplWMGvMnChZxyN
X5yOp+huWGlOlzjb80Yd
=Uo+k
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Stable-candidate fix for a deadlock in ACPICA introduced during the
4.5 development cycle by a commit attempting to improve the handling
of AML code that doesn't belong to any namespace objects in a given
definition block (Lv Zheng)"
* tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading
- Fix a latent initialization issue in the pcc-cpufreq driver
(incorrect initial value of a structure field) that has been
uncovered by a recent ACPICA commit (Mike Galbraith).
- Add a missing notification in an update_devfreq() error code
path forgotten by a recent devfreq commit (Chanwoo Choi).
- Fix devfreq device frequency initialization (Lukasz Luba).
- Fix an incorrect IS_ERR() check in the devfreq framework
discovered by the Smatch checker (Dan Carpenter).
- Drop two excessive put_device() calls from the devfreq framework
(MyungJoo Ham, Cai Zhiyong).
- Fix a possible memory leak in the devfreq framework and drop an
unnecessary kfree() invocation from it (MyungJoo Ham).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXbbELAAoJEILEb/54YlRxyMIP/3Su+TreD8KWEHzAIjddk8ba
2iaQGF8gwcglNq4GMKmmkVaCzmvE6/9sLHLMJ4OW7hXvEdkU2cbhL7l7Yvna17rU
XY9UgNaewCXLflDZ1S6XirjL438CIpZiAgumRJnSVOG3miuA4bL4wivuCAuJL9sE
s9Hib0UZ6otLQidQojRMaXAUSNnmfrzmis35PGeG42MEFdr6NtCHnR7AA1XuxKy4
82JawNAZb9l4YJd1g9r0VJpbKwY7Hzbr3ZWEZ+fDp9RfjAqal83QGXxExyojRCsb
B1xg3YQNmmRmxRWNsIfKg6nlLfGIOy4eNNflUvOY0/PE5hFmypcayEC3s7BEzbM/
WTRsUM8KBIy6h/qwgYhiAoBNJ667HH/AE+d6JnUmPaEBQm592ZkFHdYMCeyLLj0T
TfXVj5El4+XQMQAStFRSEI7cRs6BWx9Z9E8LN3A40KXsFf1QEGrAFuDy+RnCKsEv
yuGkjx5BFO9lGu5UyzznCd79t/0MbBhJZjy587EMOyK1HlXvhlsmN8sK/RTnosmY
FphG0Z9ffgunnHDlRzXRYwKamiSWJ8FsaI/Swpss9MJSJbsMES/sli0k/fp+rygp
gG6gHdpjslCEk/Xur9RV2Q8e9n1dG2vLnmJstCEcFgF1nSOsfcO85nFCVsW3IYh1
BM3BJN9fIDVdg1xpSk8q
=8eXG
-----END PGP SIGNATURE-----
Merge tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix for a latent cpufreq driver bug uncovered by a recent ACPICA
change and several fixes for the devfreq framework, including one fix
for an issue introduced recently.
Specifics:
- Fix a latent initialization issue in the pcc-cpufreq driver
(incorrect initial value of a structure field) that has been
uncovered by a recent ACPICA commit (Mike Galbraith).
- Add a missing notification in an update_devfreq() error code path
forgotten by a recent devfreq commit (Chanwoo Choi).
- Fix devfreq device frequency initialization (Lukasz Luba).
- Fix an incorrect IS_ERR() check in the devfreq framework discovered
by the Smatch checker (Dan Carpenter).
- Drop two excessive put_device() calls from the devfreq framework
(MyungJoo Ham, Cai Zhiyong).
- Fix a possible memory leak in the devfreq framework and drop an
unnecessary kfree() invocation from it (MyungJoo Ham)"
* tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
cpufreq: pcc-cpufreq: Fix doorbell.access_width
PM / devfreq: fix initialization of current frequency in last status
PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
PM / devfreq: remove double put_device
PM / devfreq: fix double call put_device
PM / devfreq: fix duplicated kfree on devfreq pointer
PM / devfreq: devm_kzalloc to have dev pointer more precisely
- Fix x86 PV dom0 crash during early boot on some hardware.
- Fix two pciback bugs affects certain devices.
- Fix potential overflow when clearing page tables in x86 PV.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXbTwWAAoJEFxbo/MsZsTRu7IH/1sAn6KFHfP2Px/Sydh/pxZH
0oOW+2aZLVqu8BRiHj6YeQVRuhzdIgSoU9wMmCFX7rz1m6gq4c60cJF/lKYmlbxp
0lyxbf+4451rh/qNVV3pm5J+w6R818Y2hoIOu2BK3ppJ4W8nXbW5kHHvtYQCXu0A
mApSgMHBbWv6kkAxEuUMa5wOipENiAIYg+pFqwo+y9V8sS8zAqqHivct3T6ucNyV
u/WB076QAnL8abcwKELXsyV5hmcfJv/CoMS9Qv6GwIv1z9d0UVS2+qoo1Qox2sAP
79AoJn2E6p7rkb/HdhdSYjja22oct1ahrfSgCSBEwLNZCMc5srKdwK6Zspe5y+0=
=qqrC
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- fix x86 PV dom0 crash during early boot on some hardware
- fix two pciback bugs affects certain devices
- fix potential overflow when clearing page tables in x86 PV
* tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: return proper values during BAR sizing
x86/xen: avoid m2p lookup when setting early page table entries
xen/pciback: Fix conf_space read/write overlap check.
x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
xen/balloon: Fix declared-but-not-defined warning
- Fix icache/dcache sync for anonymous pages under migration
- Correct the ASID limit check
- Fix parallel builds of Image and Image.gz
- Refuse to hibernate when we have CPUs that we can't offline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXasPyAAoJELescNyEwWM0nYAIAJhcPoeaSEgnVGfnh4gAup/F
Wu8JLRaibaGGnLxF7Lt00N4+oe/oIi1SIJrPAe7YwzxpLcChP/SvaOBNnIa/PUm1
QC7EuYtDXJnzj483k3Iu5+XXKX5iSdzM1F3YLmFnV1IeScCDCAmSqDCwJ5mXpAOj
xFvNvI8P7WAOCKD32kiahm/38lwDgMkIY/DQq6+7li6ZMrDk5W3b6NP+8Og2D3qE
mRb/uNLZ3hBe5bDYGyqiBAwHEAmB9u7kFydh2g4gq1IKy17QjEXv4U7hhsW0RyEP
VU0ha+fQKIcFMjx2FvMPUuzejoY0SiynF0Z4K48xkhaQQQUxkWIudmLUrFvv5YM=
=ABKe
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are a few more arm64 fixes, but things do finally appear to be
slowing down. The main fix is avoiding hibernation in a previously
unanticipated situation where we have CPUs parked in the kernel, but
it's all good stuff.
- Fix icache/dcache sync for anonymous pages under migration
- Correct the ASID limit check
- Fix parallel builds of Image and Image.gz
- Refuse to hibernate when we have CPUs that we can't offline"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hibernate: Don't hibernate on systems with stuck CPUs
arm64: smp: Add function to determine if cpus are stuck in the kernel
arm64: mm: remove page_mapping check in __sync_icache_dcache
arm64: fix boot image dependencies to not generate invalid images
arm64: update ASID limit
When I replaced kasprintf("%pf") with a direct call to
sprint_symbol_no_offset I must have broken the initcall blacklisting
feature on the arches where dereference_function_descriptor() is
non-trivial.
Fixes: c8cdd2be21 (init/main.c: simplify initcall_blacklisted())
Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__vfs_write() returns a negative value in a error case.
Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have dereferenced page_ext before checking it. Lets check it first
and then used it.
Fixes: f86e427197 ("mm: check the return value of lookup_page_ext for all call sites")
Link: http://lkml.kernel.org/r/1465249059-7883-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
trivial fix to spelling mistake
Link: http://lkml.kernel.org/r/1466672144-831-1-git-send-email-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The value `bytes' comes from the filesystem which is about to be
mounted. We cannot trust that the value is always in the range we
expect it to be.
Check its value before using it to calculate the length for the crc32_le
call. It value must be larger (or equal) sumoff + 4.
This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1. This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.
BUG: unable to handle kernel paging request at ffff88021e600000
IP: crc32_le+0x36/0x100
...
Call Trace:
nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
nilfs_load_super_block+0x142/0x300 [nilfs2]
init_nilfs+0x60/0x390 [nilfs2]
nilfs_mount+0x302/0x520 [nilfs2]
mount_fs+0x38/0x160
vfs_kern_mount+0x67/0x110
do_mount+0x269/0xe00
SyS_mount+0x9f/0x100
entry_SYSCALL_64_fastpath+0x16/0x71
Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo has reported the following potential oom_killer_disable vs.
oom_reaper race:
(1) freeze_processes() starts freezing user space threads.
(2) Somebody (maybe a kenrel thread) calls out_of_memory().
(3) The OOM killer calls mark_oom_victim() on a user space thread
P1 which is already in __refrigerator().
(4) oom_killer_disable() sets oom_killer_disabled = true.
(5) P1 leaves __refrigerator() and enters do_exit().
(6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
exit_oom_victim(P1).
(7) oom_killer_disable() returns while P1 not yet finished
(8) P1 perform IO/interfere with the freezer.
This situation is unfortunate. We cannot move oom_killer_disable after
all the freezable kernel threads are frozen because the oom victim might
depend on some of those kthreads to make a forward progress to exit so
we could deadlock. It is also far from trivial to teach the oom_reaper
to not call exit_oom_victim() because then we would lose a guarantee of
the OOM killer and oom_killer_disable forward progress because
exit_mm->mmput might block and never call exit_oom_victim.
It seems the easiest way forward is to workaround this race by calling
try_to_freeze_tasks again after oom_killer_disable. This will make sure
that all the tasks are frozen or it bails out.
Fixes: 449d777d7a ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to some high-load testing, these two BUG assertions were
encountered, this led system panic. Actually, there were some
discussions about removing these two BUG() assertions, it would not
bring any side effect.
Then, I did the the following changes,
1) use the existing macro CATCH_BH_JBD_RACES to wrap BUG() in the
ocfs2_read_blocks_sync function like before.
2) disable the macro CATCH_BH_JBD_RACES in Makefile by default.
Link: http://lkml.kernel.org/r/1466574294-26863-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly. If the
watermark is insufficient for a free page of order <= cc->order, then
terminate the scanner since all future splits will also likely fail.
This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.
compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.
The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark. All thp page faults can take >400ms in such a state without
this fix.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While working on s390 support for gigantic hugepages I ran into the
following "Bad page state" warning when freeing gigantic pages:
BUG: Bad page state in process bash pfn:580001
page:000003d116000040 count:0 mapcount:0 mapping:ffffffff00000000 index:0x0
flags: 0x7fffc0000000000()
page dumped because: non-NULL mapping
This is because page->compound_mapcount, which is part of a union with
page->mapping, is initialized with -1 in prep_compound_gigantic_page(),
and not cleared again during destroy_compound_gigantic_page(). Fix this
by clearing the compound_mapcount in destroy_compound_gigantic_page()
before clearing compound_head.
Interestingly enough, the warning will not show up on x86_64, although
this should not be architecture specific. Apparently there is an
endianness issue, combined with the fact that the union contains both a
64 bit ->mapping pointer and a 32 bit atomic_t ->compound_mapcount as
members. The resulting bogus page->mapping on x86_64 therefore contains
00000000ffffffff instead of ffffffff00000000 on s390, which will falsely
trigger the PageAnon() check in free_pages_prepare() because
page->mapping & PAGE_MAPPING_ANON is true on little-endian architectures
like x86_64 in this case (the page is not compound anymore,
->compound_head was already cleared before). As a result, page->mapping
will be cleared before doing the checks in free_pages_check().
Not sure if the bogus "PageAnon() returning true" on x86_64 for the
first tail page of a gigantic page (at this stage) has other theoretical
implications, but they would also be fixed with this patch.
Link: http://lkml.kernel.org/r/1466612719-5642-1-git-send-email-gerald.schaefer@de.ibm.com
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>