When checking specific attribute from a bit mask, need to use bitwise
AND and not logical AND, fixed that.
Fixes: 145d9c5410 ('IB/core: Display extended counter set if
available')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The while loop after err_slaves should use post-decrement; otherwise
we'll fail to do the kfrees for i==0, and will run into out-of-bounds
accesses if the setup above failed already at i==0.
[I'm not sure why one even bothers populating the ->vlan_filter array:
mlx4.h isn't #included by anything outside
drivers/net/ethernet/mellanox/mlx4/, and "git grep -C2 -w vlan_filter
drivers/net/ethernet/mellanox/mlx4/" seems to suggest that the
vlan_filter elements aren't used at all.]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This reverts commit 829b6962f7.
Revert this change as it causes a sysfs path to change and therefore
introduces and ABI regression. More precisely Android's vold is not being
able to access /sys/module/mmcblk/parameters/perdev_minors any more, since
the path becomes changed to: "/sys/module/mmc_block/..."
Fixes: 829b6962f7 ("mmc: block: don't use parameter prefix if built as
module")
Reported-by: John Stultz <john.stultz@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In __cpufreq_cooling_register() we allocate the arrays for time_in_idle
and time_in_idle_timestamp to be as big as the number of cpus in this
cpufreq device. However, in get_load() we access this array using the
cpu number as index, which can result in an out of bound access.
Index time_in_idle{,_timestamp} using the index in the cpufreq_device's
allowed_cpus mask, as we do for the load_cpu array in
cpufreq_get_requested_power()
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.
There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.
We can get to that situation like that:
* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX
Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.
The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:
Overflow: e
Overflow: 7fffffff
Overflow: 7fffffffffffffff
PAX: size overflow detected in function btrfs_real_readdir
fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
context: dir_context;
CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
Call Trace:
[<ffffffff81742f0f>] dump_stack+0x4c/0x7f
[<ffffffff811cb706>] report_size_overflow+0x36/0x40
[<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
[<ffffffff811dafc8>] iterate_dir+0xa8/0x150
[<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
[<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
Overflow: 1a
[<ffffffff811db070>] ? iterate_dir+0x150/0x150
[<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83
The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.
The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.
References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
CC: stable@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Since the dawn of time the ICST code has only supported divide
by one or hang in an eternal loop. Luckily we were always dividing
by one because the reference frequency for the systems using
the ICSTs is 24MHz and the [min,max] values for the PLL input
if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
will always terminate immediately without assigning any divisor
for the reference frequency.
But for the code to make sense, let's insert the missing i++
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Intel BXT/APL use a card detect GPIO however the host controller
will not enable bus power unless it's card detect also reflects
the presence of a card. Unfortunately those 2 things race which
can result in commands not starting, after which the controller
does nothing and there is a 10 second wait for the driver's
10-second timer to timeout.
That is fixed by having the driver look also at the present state
register to determine if the card is present. Consequently, provide
a 'get_cd' mmc host operation for BXT/APL that does that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Intel BXT/APL use a card detect GPIO however the host controller
will not enable bus power unless it's card detect also reflects
the presence of a card. Unfortunately those 2 things race which
can result in commands not starting, after which the controller
does nothing and there is a 10 second wait for the driver's
10-second timer to timeout.
That is fixed by having the driver look also at the present state
register to determine if the card is present. Consequently, provide
a 'get_cd' mmc host operation for BXT/APL that does that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Drivers may need to provide their own get_cd() mmc host op, but
currently the internals of the current op (sdhci_get_cd()) are
provided by sdhci_do_get_cd() which is also called from
sdhci_request().
To allow override of the get_cd functionality, change sdhci_request()
to call ->get_cd() instead of sdhci_do_get_cd().
Note, in the future the call to ->get_cd() will likely be removed
from sdhci_request() since most drivers don't need actually it.
However this change is being done now to facilitate a subsequent
bug fix.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In the past, fixes for specific hardware devices were implemented
in sdhci using quirks. That approach is no longer accepted because
the growing number of quirks was starting to make the code difficult
to understand and maintain.
One alternative to quirks, is to allow drivers to override the default
mmc host operations. This patch makes it easy to do that, and it is
needed for a subsequent bug fix, for which separate patches are
provided.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Depending on the configuration either the 32 or 64 bit version of
elf_check_arch() is defined. parse_crash_elf{32|64}_headers() does
some basic verification of the ELF header via
vmcore_elf{32|64}_check_arch() which happen to map to elf_check_arch().
Since the implementation 32 and 64 bit version of elf_check_arch()
differ, we use the wrong type:
In file included from include/linux/elf.h:4:0,
from fs/proc/vmcore.c:13:
fs/proc/vmcore.c: In function 'parse_crash_elf64_headers':
>> arch/mips/include/asm/elf.h:228:23: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
struct elfhdr *__h = (hdr); \
^
include/linux/crash_dump.h:41:37: note: in expansion of macro 'elf_check_arch'
#define vmcore_elf64_check_arch(x) (elf_check_arch(x) || vmcore_elf_check_arch_cross(x))
^
fs/proc/vmcore.c:1015:4: note: in expansion of macro 'vmcore_elf64_check_arch'
!vmcore_elf64_check_arch(&ehdr) ||
^
Therefore, we rather define vmcore_elf{32|64}_check_arch() as a
basic machine check and use it also in binfm_elf?32.c as well.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Suggested-by: Maciej W. Rozycki <macro@imgtec.com>
Reviewed-by: Maciej W. Rozycki <macro@imgtec.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12529/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The ARM GICv3 specification mentions the need for dsb after a read
from the ICC_IAR1_EL1 register:
4.1.1 Physical CPU Interface:
The effects of reading ICC_IAR0_EL1 and ICC_IAR1_EL1
on the state of a returned INTID are not guaranteed
to be visible until after the execution of a DSB.
Not having this could result in missed interrupts, so let's add the
required barrier.
[Marc: fixed commit message]
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
EOImode1 is only used for the root controller and hence only the root
controller uses the eoimode1 functions for handling interrupts. However,
if the root controller supports EOImode1, then the EOImodeNS bit will be
set for all GICs, enabling EOImode1. This is not what we want and this
causes interrupts on non-root GICs to only be dropped in priority but
never deactivated. Therefore, only set the EOImodeNS bit for the root
controller.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Setting the affinity of an IRQ, it only applicable for the root
interrupt controller and so only populate this operator for the root
controller.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Calling component_add() may result in the completion of a set of
devices, which will try to bring up a master. In bringing the master
up, we populate its match array with the current set of children.
If binding any of the devices fails, component_add() itself will fail,
free the struct component entry, and return to the caller. The
now-freed entry is never removed from the master's match array, and
will later be used in a futile attempt to bind to freed memory.
Bring component_add's behaviour on failure to bring up a master into
line with component_del by removing the (to-be-freed) component from
the master's match array.
The specific case which broke was:
- rockchip_drm_drv adds a component master
- dwhdmi_rockchip adds a child component in probe (master incomplete)
- rockchip_drm_vop adds two children in probe, which completes the
set
- inside component_add, we try to bring up the master, having
populated the master's match array, and fail with EPROBE_DEFER from
dwhdmi_rockchip; we delete the putative component
- rockchip_drm_vop's probe fails and returns EPROBE_DEFER
- we later re-probe rockchip_drm_vop and add the component; the
master is complete, so we attempt to bring it up again
- walking the match array, we find the previous child, whose master
pointer doesn't match (as it has been freed in the meantime)
- rockchip_drm_vop probe fails, and will never be attempted again
Fixes: ffc30b74fd
Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch fixes an incorrect return of zero from the new
unmap_zeroes_data_store() configfs store attribute handler
introduced in v4.5-rc1, to use the correct 'count' bytes
return value.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
- One fix to ipoib
- One fix to core sysfs code
- Four patches that resolve an oops found in testing of ocrdma and a couple
other ocrdma issues
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWu7lOAAoJELgmozMOVy/dDi8QAISZk+WzoTBsfFymoFwo7Yh7
2W9PAdh/Wp9jH2nTFHwIgS70aLt3oMhXYMZi7xZBtICZI3QRNR0nkNl/FAwafFPo
FND2j7ftTcDofqD+OTnAE3ZMdgjJhq+z+kjrfrF9pVxdQVOJmpOm6WKZBO7Q9yZJ
D6eXmOV6+8TYSlFxbbvB158ddMcBdZXOCoJoK1x/iLtyMvoCGPv7s2eVIeozmD4u
+C16WKaDIO4xu1a/k7rULBEz1QrnlOg0r7XeYgA3ubzJgTPmBc32ZL/qv7oD1O0O
4LJLS3/HVZdkH3l94b1mGRquLaOM6TVCYs4BR+EibkB0ndYeESU/mqLaKK91zrba
TrKGLGlXo1VszzCIra2pnalYHHfe+t3dizylXB2/vvmnWAkZcBh9enxPIMXTArJq
t72ELkj3b6Ln8uuteimYeT5pX1BuxzMPggoSZ1OW19YfUt4UcmJ/RzgaqxFI6fml
RRH/7XuavKSD9I1eItg3OFyd77YtvdrWx5VxUFOVPnCaHguDDZGI9l7wP9Vtygj4
Hh5jHB6roS32z4QZ6Z9glQt0i75iXLcznYjfhA8qMRBbph0+qPW9s6gvCkiy5PL1
eWuxenS2jnrDCkAvA5BIi50QIgzezNMHduUM3FB5W5dyrFrWzJtAII+CDcn8o2DO
xZKMOpuMPuDXffP66pee
=Wpl3
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"A few more minor fixes for rc3:
- One fix to ipoib
- One fix to core sysfs code
- Four patches that resolve an oops found in testing of ocrdma and a
couple other ocrdma issues"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA/ocrdma: Fixing ocrdma debugfs directory remove
RDMA/ocrdma: Fix pkey_index returned by driver in rq work completion
RDMA/ocrdma: populate max_sge_rd in device attributes
RDMA/ocrdma: Initialize stats resources in the driver before ib device registration.
IB/sysfs: remove unused va_list args
IB/IPoIB: Do not set skb truesize since using one linearskb
radeon and amdgpu fixes for 4.5. Highlights:
- powerplay fixes for amdgpu
- race fixes in the sub-allocator in radeon and amdgpu
- hibernate fix for amdgpu
- fix a possible circular locking in userptr handling in amdgpu
* 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux: (21 commits)
drm/amdgpu: fix issue with overlapping userptrs
drm/radeon: hold reference to fences in radeon_sa_bo_new
drm/amdgpu: remove unnecessary forward declaration
drm/amdgpu: hold reference to fences in amdgpu_sa_bo_new (v2)
drm/amdgpu: fix s4 resume
drm/amdgpu/cz: plumb pg flags through to powerplay
drm/amdgpu/tonga: plumb pg flags through to powerplay
drma/dmgpu: move cg and pg flags into shared headers
drm/amdgpu: remove unused cg defines
drm/amdgpu: add a cgs interface to fetch cg and pg flags
drm/amd/powerplay/tonga: disable vce pg
drm/amd/powerplay/tonga: disable uvd pg
drm/amd/powerplay/cz: disable vce pg
drm/amd/powerplay/cz: disable uvd pg
drm/amdgpu: be consistent with uvd cg flags
drm/amdgpu: clean up vce pg flags for cz/st
drm/amdgpu: handle vce pg flags properly
drm/amdgpu: handle uvd pg flags properly
drm/amdgpu/dpm/ci: switch over to the common pcie caps interface
drm/amdgpu/cik: don't mess with aspm if gpu is root bus
...
When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.
Analysis on what the check in adjust_branches() is currently doing:
/* adjust offset of jmps if necessary */
if (i < pos && i + insn->off + 1 > pos)
insn->off += delta;
else if (i > pos && i + insn->off + 1 < pos)
insn->off -= delta;
First condition (forward jumps):
Before: After:
insns[0] insns[0]
insns[1] <--- i/insn insns[1] <--- i/insn
insns[2] <--- pos insns[P] <--- pos
insns[3] insns[P] `------| delta
insns[4] <--- target_X insns[P] `-----|
insns[5] insns[3]
insns[4] <--- target_X
insns[5]
First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.
Second condition (backward jumps):
Before: After:
insns[0] insns[0]
insns[1] <--- target_X insns[1] <--- target_X
insns[2] <--- pos <-- target_Y insns[P] <--- pos <-- target_Y
insns[3] insns[P] `------| delta
insns[4] <--- i/insn insns[P] `-----|
insns[5] insns[3]
insns[4] <--- i/insn
insns[5]
Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.
Fixes: 9bac3d6d54 ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libata fixes from Tejun Heo:
- PORTS_IMPL workaround for very early ahci controllers is misbehaving
on new systems. Disabled on recent ahci versions.
- Old-style PIO state machine had a horrible locking problem. Don't
know how we've been getting away this far. Fixed.
- Other device specific updates.
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: Intel DNV device IDs SATA
libata: fix sff host state machine locking while polling
libata-sff: use WARN instead of BUG on illegal host state machine state
libata: disable forced PORTS_IMPL for >= AHCI 1.3
libata: blacklist a Viking flash model for MWDMA corruption
drivers: ata: wake port before DMA stop for ALPM
Pull cgroup fixes from Tejun Heo:
- The destruction path of cgroup objects are asynchronous and
multi-staged and some of them ended up destroying parents before
children leading to failures in cpu and memory controllers. Ensure
that parents are always destroyed after children.
- cpuset mm node migration was performed synchronously while holding
threadgroup and cgroup mutexes and the recent threadgroup locking
update resulted in a possible deadlock. The migration is best effort
and shouldn't have been performed under those locks to begin with.
Made asynchronous.
- Minor documentation fix.
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Documentation: cgroup: Fix 'cgroup-legacy' -> 'cgroup-v1'
cgroup: make sure a parent css isn't freed before its children
cgroup: make sure a parent css isn't offlined before its children
cpuset: make mm migration asynchronous
When the FLL is in pseudo-fractional mode there is an additional
limit on fref based on the fratio, to prevent aliasing around the
Nyquist frequency. If fref exceeds this limit the refclk divider
must be increased and the calculation tried again until a suitable
combination of fref and fratio is found or we have to fall back to
integer mode.
This patch also adds some debug log prints around this code.
Signed-off-by: Richard Fitzgerald <rf@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Otherwise we could try to evict overlapping userptr BOs in get_user_pages(),
leading to a possible circular locking dependency.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
An arbitrary amount of time can pass between spin_unlock and
radeon_fence_wait_any, so we need to ensure that nobody frees the
fences from under us.
Based on the analogous fix for amdgpu.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
An arbitrary amount of time can pass between spin_unlock and
fence_wait_any_timeout, so we need to ensure that nobody frees the
fences from under us.
A stress test (rapidly starting and killing hundreds of glxgears
instances) ran into a deadlock in fence_wait_any_timeout after
about an hour, and this race condition appears to be a plausible
cause.
v2: agd: rebase on upstream
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
No need to re-init asic if it's already been initialized.
Skip IB tests since kernel processes are frozen in thaw.
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Pull workqueue fixes from Tejun Heo:
"Workqueue fixes for v4.5-rc3.
- Remove a spurious triggering of flush dependency warning.
- Officially break local execution guarantee of unbound work items
and add a debug feature to flush out usages which depend on it.
- Work around CPU -> NODE mapping becoming invalid on CPU offline.
The branch is young but pushing out early as stable kernels are being
affected"
* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs
Revert "workqueue: make sure delayed work run in local cpu"
workqueue: skip flush dependency checks for legacy workqueues
Forwarding the return value of i2c_master_send, leads to errors
later on, since i2c_master_send returns the number of bytes
transmittet. Check for ret < 0 instead and return 0 otherwise.
Signed-off-by: Pascal Huerst <pascal.huerst@gmail.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node. However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.
This has always been broken but hasn't triggered often enough before
874bbfe600 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue. This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU. The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.
While 874bbfe600 has been reverted for a different reason making the
bug less visible again, it can still happen. Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround. The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
David Wragg says:
====================
Set a large MTU on ovs-created tunnel devices
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.
This patch series sets the MTU on openvswitch-created netdevs to be
the relevant maximum (i.e. the maximum IP packet size minus any
relevant overhead), effectively restoring the behaviour prior to 4.3.
Where relevant, the limits on MTU values that can be directly set on
the netdevs are also relaxed.
Changes in v2:
* Extend to all openvswitch tunnel types, i.e. gre and geneve as well
* Use IP_MAX_MTU
Changes in v3:
* Fix block comment style
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.
Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.
Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the MTU of geneve devices to be set to large values, in order to
exploit underlying networks with larger frame sizes.
GENEVE does not have a fixed encapsulation overhead (an openvswitch
rule can add variable length options), so there is no relevant maximum
MTU to enforce. A maximum of IP_MAX_MTU is used instead.
Encapsulated packets that are too big for the underlying network will
get dropped on the floor.
Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the MTU of vxlan devices without an underlying device to be set
to larger values (up to a maximum based on IP packet limits and vxlan
overhead).
Previously, their MTUs could not be set to higher than the
conventional ethernet value of 1500. This is a very arbitrary value
in the context of vxlan, and prevented vxlan devices from being able
to take advantage of jumbo frames etc.
The default MTU remains 1500, for compatibility.
Signed-off-by: David Wragg <david@weave.works>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver only needs to allocate for [ngpio / 32] controllers,
as each controller handles 32 gpios. But the current driver
allocates for ngpio of which the extra allocated are unused.
Fix it be registering only the required number of controllers.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>