[ Upstream commit 1a0008f9df ]
While reading sysctl_tcp_fwmark_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 84f39b08d7 ("net: support marking accepting TCP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 85d0b4dbd7 ]
While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0db2327658 ]
While reading sysctl_ip_autobind_reuse, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.
Fixes: 4b01a96742 ("tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 289d3b21fb ]
While reading sysctl_ip_nonlocal_bind, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7bf9e18d9a ]
While reading sysctl_ip_fwd_update_priority, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 432e05d328 ("net: ipv4: Control SKB reprioritization after forwarding")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60c158dc7b ]
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: f87c10a8aa ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0968d2a441 ]
While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7c1ddcee53 ]
The initially merged version of the igc driver code (via commit
146740f9ab, "igc: Add support for PF") contained the following
IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors:
u32 igc_rd32(struct igc_hw *hw, u32 reg)
{
u8 __iomem *hw_addr = READ_ONCE(hw->hw_addr);
u32 value = 0;
if (IGC_REMOVED(hw_addr))
return ~value;
value = readl(&hw_addr[reg]);
/* reads should not return all F's */
if (!(~value) && (!reg || !(~readl(hw_addr))))
hw->hw_addr = NULL;
return value;
}
And:
#define wr32(reg, val) \
do { \
u8 __iomem *hw_addr = READ_ONCE((hw)->hw_addr); \
if (!IGC_REMOVED(hw_addr)) \
writel((val), &hw_addr[(reg)]); \
} while (0)
E.g. igb has similar checks in its MMIO accessors, and has a similar
macro E1000_REMOVED, which is implemented as follows:
#define E1000_REMOVED(h) unlikely(!(h))
These checks serve to detect and take note of an 0xffffffff MMIO read
return from the device, which can be caused by a PCIe link flap or some
other kind of PCI bus error, and to avoid performing MMIO reads and
writes from that point onwards.
However, the IGC_REMOVED macro was not originally implemented:
#ifndef IGC_REMOVED
#define IGC_REMOVED(a) (0)
#endif /* IGC_REMOVED */
This led to the IGC_REMOVED logic to be removed entirely in a
subsequent commit (commit 3c215fb18e, "igc: remove IGC_REMOVED
function"), with the rationale that such checks matter only for
virtualization and that igc does not support virtualization -- but a
PCIe device can become detached even without virtualization being in
use, and without proper checks, a PCIe bus error affecting an igc
adapter will lead to various NULL pointer dereferences, as the first
access after the error will set hw->hw_addr to NULL, and subsequent
accesses will blindly dereference this now-NULL pointer.
This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and
implements IGC_REMOVED the way it is done for igb, by checking for the
unlikely() case of hw_addr being NULL. This change prevents the oopses
seen when a PCIe link flap occurs on an igc adapter.
Fixes: 146740f9ab ("igc: Add support for PF")
Signed-off-by: Lennert Buytenhek <buytenh@arista.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6cfa45361d ]
This reverts commit 1866aa0d0d.
Commit 1866aa0d0d ("e1000e: Fix possible HW unit hang after an s0ix
exit") was a workaround for CSME problem to handle messages comes via H2ME
mailbox. This problem has been fixed by patch "e1000e: Enable the GPT
clock before sending message to the CSME".
Fixes: 3e55d23171 ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b49feacbef ]
On corporate (CSME) ADL systems, the Ethernet Controller may stop working
("HW unit hang") after exiting from the s0ix state. The reason is that
CSME misses the message sent by the host. Enabling the dynamic GPT clock
solves this problem. This clock is cleared upon HW initialization.
Fixes: 3e55d23171 ("e1000e: Add handshake with the CSME to support S0ix")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6961b5e028 ]
The issue exists when multipath is enabled and the namespace is
shared, but all the other controller checks at nvme_is_unique_nsid()
are false. The reason for this issue is that nvme_is_unique_nsid()
returns false when is called from nvme_mpath_alloc_disk() due to an
uninitialized value of head->shared. The patch fixes it by setting
head->shared before nvme_mpath_alloc_disk() is called.
Fixes: 5974ea7ce0 ("nvme: allow duplicate NSIDs for private namespaces")
Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e2d77d2e11 ]
Lift the check for duplicate identifiers into nvme_init_ns_head, which
avoids pointless error unwinding in case they don't match, and also
matches where we check identifier validity for the multipath case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ae57c9950 ]
Commit '3b5f3c0d0548 ("scsi: ufs: core: Tidy up WB configuration code")'
changed the log level of the write boost enable/disable notification from
debug to info. This results in a lot of noise in the kernel log during
normal operation.
Drop it back to debug level to avoid this.
Link: https://lore.kernel.org/r/20220709000027.3929970-1-bjorn.andersson@linaro.org
Fixes: 3b5f3c0d05 ("scsi: ufs: core: Tidy up WB configuration code")
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8312cd3a7b ]
The megaraid SCSI driver sets set->nr_maps as 3 if poll_queues is > 0, and
blk-mq actually initializes each map's nr_queues as nr_hw_queues.
Consequently the driver has to clear READ queue map's nr_queues, otherwise
the queue map becomes broken if poll_queues is set as non-zero.
Link: https://lore.kernel.org/r/20220706125942.528533-1-ming.lei@redhat.com
Fixes: 9e4bec5b2a ("scsi: megaraid_sas: mq_poll support")
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: sumit.saxena@broadcom.com
Cc: chandrakanth.patil@broadcom.com
Cc: linux-block@vger.kernel.org
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Tested-by: Guangwu Zhang <guazhang@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit acea108fa0 ]
[why]
First MST sideband message returns AUX_RET_ERROR_HPD_DISCON
on certain intel platform. Aux transaction considered failure
if HPD unexpected pulled low. The actual aux transaction success
in such case, hence do not return error.
[how]
Not returning error when AUX_RET_ERROR_HPD_DISCON detected
on the first sideband message.
v2: squash in additional DMI entries
v3: squash in static fix
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3ce51649cd ]
Stutter mode is a power saving feature on GPUs, however at
least one early raven system exhibits stability issues with
it. Add a quirk to disable it for that system.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214417
Fixes: 005440066f ("drm/amdgpu: enable gfxoff again on raven series (v2)")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e794421bc ]
[Why]
Currently, we will try to get dm.dc_lock in handle_hpd_rx_irq() when
link lost happened, which is risky and could cause deadlock.
e.g. If we are under procedure to enable MST streams and then monitor
happens to toggle short hpd to notify link lost, then
handle_hpd_rx_irq() will get blocked due to stream enabling flow has
dc_lock. However, under MST, enabling streams involves communication
with remote sinks which need to use handle_hpd_rx_irq() to handle
sideband messages. Thus, we have deadlock here.
[How]
Target is to have handle_hpd_rx_irq() finished as soon as possilble.
Hence we can react to interrupt quickly. Besides, we should avoid to
grabe dm.dc_lock within handle_hpd_rx_irq() to avoid deadlock situation.
Firstly, revert patches which introduced to use dm.dc_lock in
handle_hpd_rx_irq():
* commit ("drm/amd/display: NULL pointer error during ")
* commit ("drm/amd/display: Only one display lights up while using MST")
* commit ("drm/amd/display: take dc_lock in short pulse handler only")
Instead, create work to handle irq events which needs dm.dc_lock.
Besides:
* Create struct hpd_rx_irq_offload_work_queue for each link to handle
its short hpd events
* Avoid to handle link lost/ automated test if the link is disconnected
* Defer dc_lock needed works in dc_link_handle_hpd_rx_irq(). This
function should just handle simple stuff for us (e.g. DPCD R/W).
However, deferred works should still be handled by the order that
dc_link_handle_hpd_rx_irq() used to be.
* Change function name dm_handle_hpd_rx_irq() to
dm_handle_mst_sideband_msg() to be more specific
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 410ad92d7f ]
[Why & How]
Due to some code flow constraints, we need to defer dc_lock needed works
from dc_link_handle_hpd_rx_irq(). Thus, do following changes:
* Change allow_hpd_rx_irq() from static to public
* Change handle_automated_test() from static to public
* Extract link lost handling flow out from dc_link_handle_hpd_rx_irq()
and put those into a new function dc_link_dp_handle_link_loss()
* Add one option parameter to decide whether defer works within
dc_link_handle_hpd_rx_irq()
Acked-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e27c41d5b0 ]
[WHY]
To add support for HPD interrupt handling from DMUB.
HPD interrupt could be triggered from outbox1 from DMUB
[HOW]
1) Use queue_work to handle hpd task from outbox1
2) Add handle_hpd_irq_helper to share interrupt handling code
between legacy and DMUB HPD from outbox1
3) Added DMUB HPD handling in dmub_srv_stat_get_notification().
HPD handling callback function and wake up the DMUB thread.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Jude Shih <shenshih@amd.com>
Tested-by: Daniel Wheeler <Daniel.Wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4785a66702 ]
While reading sysctl_tcp_ecn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78e36f3b0d ]
sysctl has helpers which let us specify boundary values for a min or max
int value. Since these are used for a boundary check only they don't
change, so move these variables to sysctl_vals to avoid adding duplicate
variables. This will help with our cleanup of kernel/sysctl.c.
[akpm@linux-foundation.org: update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%"]
[mcgrof@kernel.org: major rebase]
Link: https://lkml.kernel.org/r/20211123202347.818157-3-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39c65a94cd ]
For embedded systems with low total memory, having to run applications
with relatively large memory requirements, 10% max limitation for
watermark_scale_factor poses an issue of triggering direct reclaim every
time such application is started. This results in slow application
startup times and bad end-user experience.
By increasing watermark_scale_factor max limit we allow vendors more
flexibility to choose the right level of kswapd aggressiveness for their
device and workload requirements.
Link: https://lkml.kernel.org/r/20211124193604.2758863-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Fengfei Xi <xi.fengfei@h3c.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 45a15d89fb ]
No functional change.
Just to split the if statement into different conditions to use
kfree_skb_reason() to trace the reason later.
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 37ba017dcc ]
TCP ipv4 uses per-cpu/per-netns ctl sockets in order to send
RST and some ACK packets (on behalf of TIMEWAIT sockets).
This adds memory and cpu costs, which do not seem needed.
Now typical servers have 256 or more cores, this adds considerable
tax to netns users.
tcp sockets are used from BH context, are not receiving packets,
and do not store any persistent state but the 'struct net' pointer
in order to be able to use IPv4 output functions.
Note that I attempted a related change in the past, that had
to be hot-fixed in commit bdbbb8527b ("ipv4: tcp: get rid of ugly unicast_sock")
This patch could very well surface old bugs, on layers not
taking care of sk->sk_kern_sock properly.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 68e3c69803 ]
Yang Jihing reported a race between perf_event_set_output() and
perf_mmap_close():
CPU1 CPU2
perf_mmap_close(e2)
if (atomic_dec_and_test(&e2->rb->mmap_count)) // 1 - > 0
detach_rest = true
ioctl(e1, IOC_SET_OUTPUT, e2)
perf_event_set_output(e1, e2)
...
list_for_each_entry_rcu(e, &e2->rb->event_list, rb_entry)
ring_buffer_attach(e, NULL);
// e1 isn't yet added and
// therefore not detached
ring_buffer_attach(e1, e2->rb)
list_add_rcu(&e1->rb_entry,
&e2->rb->event_list)
After this; e1 is attached to an unmapped rb and a subsequent
perf_mmap() will loop forever more:
again:
mutex_lock(&e->mmap_mutex);
if (event->rb) {
...
if (!atomic_inc_not_zero(&e->rb->mmap_count)) {
...
mutex_unlock(&e->mmap_mutex);
goto again;
}
}
The loop in perf_mmap_close() holds e2->mmap_mutex, while the attach
in perf_event_set_output() holds e1->mmap_mutex. As such there is no
serialization to avoid this race.
Change perf_event_set_output() to take both e1->mmap_mutex and
e2->mmap_mutex to alleviate that problem. Additionally, have the loop
in perf_mmap() detach the rb directly, this avoids having to wait for
the concurrent perf_mmap_close() to get around to doing it to make
progress.
Fixes: 9bb5d40cd9 ("perf: Fix mmap() accounting hole")
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lkml.kernel.org/r/YsQ3jm2GR38SW7uD@worktop.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c3b821e8e4 ]
Because of the possible failure of the allocation, data->domains might
be NULL pointer and will cause the dereference of the NULL pointer
later.
Therefore, it might be better to check it and directly return -ENOMEM
without releasing data manually if fails, because the comment of the
devm_kmalloc() says "Memory allocated with this function is
automatically freed on driver detach.".
Fixes: a86854d0c5 ("treewide: devm_kzalloc() -> devm_kcalloc()")
Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Link: https://lore.kernel.org/r/20220710154922.2610876-1-williamsukatube@163.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6b3dd85b0b ]
pinctrl-rt2880.c and pinmux.h make up the Ralink pinctrl driver. Rename
pinctrl-rt2880.c to pinctrl-ralink.c. Rename pinmux.h to pinctrl-ralink.h.
Fix references to it. Rename functions that include "rt2880" to "ralink".
Remove PINCTRL_RT2880 symbol and make the existing PINCTRL_RALINK symbol
compile pinctrl-ralink.c. Change the bool to "Ralink pinctrl driver".
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20220414173916.5552-3-arinc.unal@arinc9.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 150438c86f ]
The functions that include "MT7628(an)" are for MT7628 and MT7688 SoCs.
Rename them to MT76X8 to refer to both of the SoCs.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20220414173916.5552-2-arinc.unal@arinc9.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc0315564d ]
Taking the qos_mutex to process RoCEv2 QP's on netdev events causes a
kernel splat.
Fix this by removing the handling for RoCEv2 in
irdma_cm_teardown_connections that uses the mutex. This handling is only
needed for iWARP to avoid having connections established while the link is
down or having connections remain functional after the IP address is
removed.
BUG: sleeping function called from invalid context at kernel/locking/mutex.
Call Trace:
kernel: dump_stack+0x66/0x90
kernel: ___might_sleep.cold.92+0x8d/0x9a
kernel: mutex_lock+0x1c/0x40
kernel: irdma_cm_teardown_connections+0x28e/0x4d0 [irdma]
kernel: ? check_preempt_curr+0x7a/0x90
kernel: ? select_idle_sibling+0x22/0x3c0
kernel: ? select_task_rq_fair+0x94c/0xc90
kernel: ? irdma_exec_cqp_cmd+0xc27/0x17c0 [irdma]
kernel: ? __wake_up_common+0x7a/0x190
kernel: irdma_if_notify+0x3cc/0x450 [irdma]
kernel: ? sched_clock_cpu+0xc/0xb0
kernel: irdma_inet6addr_event+0xc6/0x150 [irdma]
Fixes: 146b9756f1 ("RDMA/irdma: Add connection manager")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e8afb8792 ]
x722 does not support 1GB page size but the irdma driver incorrectly
advertises 1GB page size support for x722 device to ib_core to compute the
best page size to use on this MR. This could lead to incorrect start
offsets computed by hardware on the MR.
Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80192eff64 ]
of_find_matching_node_and_match() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 0e545f57b7 ("power: reset: driver for the Versatile syscon reboot")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f85daf0e72 ]
xfrm_policy_lookup() will call xfrm_pol_hold_rcu() to get a refcount of
pols[0]. This refcount can be dropped in xfrm_expand_policies() when
xfrm_expand_policies() return error. pols[0]'s refcount is balanced in
here. But xfrm_bundle_lookup() will also call xfrm_pols_put() with
num_pols == 1 to drop this refcount when xfrm_expand_policies() return
error.
This patch also fix an illegal address access. pols[0] will save a error
point when xfrm_policy_lookup fails. This lead to xfrm_pols_put to resolve
an illegal address in xfrm_bundle_lookup's error path.
Fix these by setting num_pols = 0 in xfrm_expand_policies()'s error path.
Fixes: 80c802f307 ("xfrm: cache bundles instead of policies for outgoing flows")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8281b7ec5c upstream.
While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cdf0b86b25 upstream.
This fixes that the platform is waked by an unexpected packet. The
size and range of FIFO is different when the device enters S3 state,
so it is necessary to correct some settings when suspending.
Regardless of jumbo frame, set RMS to 1522 and MTPS to MTPS_DEFAULT.
Besides, enable MCU_BORW_EN to update the method of calculating the
pointer of data. Then, the hardware could get the correct data.
Fixes: 195aae321c ("r8152: support new chips")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20220718082120.10957-391-nic_swsd@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ed6356b07 ]
The "bufsize" comes from the root user. If "bufsize" is negative then,
because of type promotion, neither of the validation checks at the start
of the function are able to catch it:
if (bufsize < sizeof(struct xfs_attrlist) ||
bufsize > XFS_XATTR_LIST_MAX)
return -EINVAL;
This means "bufsize" will trigger (WARN_ON_ONCE(size > INT_MAX)) in
kvmalloc_node(). Fix this by changing the type from int to size_t.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 892a666faf ]
The for_each_perag*() set of macros are hacky in that some (i.e.
those based on sb_agcount) rely on the assumption that perag
iteration terminates naturally with a NULL perag at the specified
end_agno. Others allow for the final AG to have a valid perag and
require the calling function to clean up any potential leftover
xfs_perag reference on termination of the loop.
Aside from providing a subtly inconsistent interface, the former
variant is racy with growfs because growfs can create discoverable
post-eofs perags before the final superblock update that completes
the grow operation and increases sb_agcount. This leads to the
following assert failure (reproduced by xfs/104) in the perag free
path during unmount:
XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195
This occurs because one of the many for_each_perag() loops in the
code that is expected to terminate with a NULL pag (and thus has no
post-loop xfs_perag_put() check) raced with a growfs and found a
non-NULL post-EOFS perag, but terminated naturally based on the
end_agno check without releasing the post-EOFS perag.
Rework the iteration logic to lift the agno check from the main for
loop conditional to the iteration helper function. The for loop now
purely terminates on a NULL pag and xfs_perag_next() avoids taking a
reference to any perag beyond end_agno in the first place.
Fixes: f250eedcf7 ("xfs: make for_each_perag... a first class citizen")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8ed004eb9d ]
The for_each_perag_from() iteration macro relies on sb_agcount to
process every perag currently within EOFS from a given starting
point. It's perfectly valid to have perag structures beyond
sb_agcount, however, such as if a growfs is in progress. If a perag
loop happens to race with growfs in this manner, it will actually
attempt to process the post-EOFS perag where ->pag_agno ==
sb_agcount. This is reproduced by xfs/104 and manifests as the
following assert failure in superblock write verifier context:
XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
Update the corresponding macro to only process perags that are
within the current sb_agcount.
Fixes: 58d43a7e32 ("xfs: pass perags around in fsmap data dev functions")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f1788b5e5e ]
Rename the next_agno variable to be consistent across the several
iteration macros and shorten line length.
[backport: dependency for 8ed004eb9d]
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf2307b195 ]
Fold the loop iteration logic into a helper in preparation for
further fixups. No functional change in this patch.
[backport: dependency for f1788b5e5e]
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 78e8ec83a4 ]
The btree geometry computation function has an off-by-one error in that
it does not allow maximally tall btrees (nlevels == XFS_BTREE_MAXLEVELS).
This can result in repairs failing unnecessarily on very fragmented
filesystems. Subsequent patches to remove MAXLEVELS usage in favor of
the per-btree type computations will make this a much more likely
occurrence.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e90db5e21 upstream.
In case of error, some resources must be freed, as already done above and
below the devm_kmemdup() and __mt7921e_mcu_drv_pmctrl() calls added in the
commit in Fixes:.
Fixes: 602cc0c961 ("mt76: mt7921e: fix possible probe failure after reboot")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 602cc0c961 upstream.
It doesn't guarantee the mt7921e gets started with ASPM L0 after each
machine reboot on every platform.
If mt7921e gets started with not ASPM L0, it would be possible that the
driver encounters time to time failure in mt7921_pci_probe, like a
weird chip identifier is read
[ 215.514503] mt7921e 0000:05:00.0: ASIC revision: feed0000
[ 216.604741] mt7921e: probe of 0000:05:00.0 failed with error -110
or failing to init hardware because the driver is not allowed to access the
register until the device is in ASPM L0 state. So, we call
__mt7921e_mcu_drv_pmctrl in early mt7921_pci_probe to force the device
to bring back to the L0 state for we can safely access registers in any
case.
In the patch, we move all functions from dma.c to pci.c and register mt76
bus operation earilier, that is the __mt7921e_mcu_drv_pmctrl depends on.
Fixes: bf3747ae2e ("mt76: mt7921: enable aspm by default")
Reported-by: Kai-Chuan Hsieh <kaichuan.hsieh@canonical.com>
Co-developed-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1e2eef111 upstream.
Use physical address to unify the register access and reorder the
entries in fixed_map table to accelerate the address lookup for
MT7921e. Cosmetics the patch with adding an extra space to make all
entries in the array style consistent.
Tested-by: Deren Wu <deren.wu@mediatek.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 649178c049 that is the
commit 602cc0c961 upstream.
Because there was mistake in
'649178c0493e ("mt76: mt7921e: fix possible probe failure after reboot")'
that caused WiFi reset cannot work well as the reported issue
"PROBLEM: [Stable v5.15.42+] [mt7921] Wake after suspend locks up system
when mt7921-driver is used on a Lenovo ThinkPad E15 G3" described in
http://lists.infradead.org/pipermail/linux-mediatek/2022-June/042668.html
So, we need to revert it before fixing and landing it again on the stable
tree from upstream.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 663457f421 that is the
commit 44c4237cf3 upstream.
Because there was mistake in
'649178c0493e ("mt76: mt7921e: fix possible probe failure after reboot")'
that caused WiFi reset cannot work well as the reported issue
"PROBLEM: [Stable v5.15.42+] [mt7921] Wake after suspend locks up system
when mt7921-driver is used on a Lenovo ThinkPad E15 G3" described in
http://lists.infradead.org/pipermail/linux-mediatek/2022-June/042668.html
So we need to revert the patch first to avoid the conflict of reverting
'649178c0493e ("mt76: mt7921e: fix possible probe failure after reboot")'
and will be applied back later after fixing.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts the stable commit
e65d78b12f ("batman-adv: Use netif_rx().")
The commit message says:
| Since commit
| baebdf48c3 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
|
| the function netif_rx() can be used in preemptible/thread context as
| well as in interrupt context.
This commit (baebdf48c3) has not been backported to the 5.15 stable
series and therefore, the commit which builds upon it, must not be
backported either.
Revert the backport and use netif_rx_any_context() again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f532c1e25 upstream.
Functions tty_termios_encode_baud_rate() and uart_update_timeout() should
be called with the baudrate value which was set to hardware. Linux then
report exact values via ioctl(TCGETS2) to userspace.
Change mvebu_uart_baud_rate_set() function to return baudrate value which
was set to hardware and propagate this value to above mentioned functions.
With this change userspace would see precise value in termios c_ospeed
field.
Fixes: 68a0db1d7d ("serial: mvebu-uart: add function to change baudrate")
Cc: stable <stable@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20220628100922.10717-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2bad844a6 upstream.
According to Dexuan, the hypervisor folks beleive that multi-msi
allocations are not correct. compose_msi_msg() will allocate multi-msi
one by one. However, multi-msi is a block of related MSIs, with alignment
requirements. In order for the hypervisor to allocate properly aligned
and consecutive entries in the IOMMU Interrupt Remapping Table, there
should be a single mapping request that requests all of the multi-msi
vectors in one shot.
Dexuan suggests detecting the multi-msi case and composing a single
request related to the first MSI. Then for the other MSIs in the same
block, use the cached information. This appears to be viable, so do it.
5.15 backport - add hv_msi_get_int_vector helper function. Fixed merge
conflict due to delivery_mode name change (APIC_DELIVERY_MODE_FIXED
is the value given to DELIVERY_MODE on x86). Removed unused variable
in hv_compose_msi_msg. Fixed reference to msi_desc->pci to point to
the same is_msix variable.
Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1652282599-21643-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4b77778ec upstream.
Currently if compose_msi_msg() is called multiple times, it will free any
previous IRTE allocation, and generate a new allocation. While nothing
prevents this from occurring, it is extraneous when Linux could just reuse
the existing allocation and avoid a bunch of overhead.
However, when future IRTE allocations operate on blocks of MSIs instead of
a single line, freeing the allocation will impact all of the lines. This
could cause an issue where an allocation of N MSIs occurs, then some of
the lines are retargeted, and finally the allocation is freed/reallocated.
The freeing of the allocation removes all of the configuration for the
entire block, which requires all the lines to be retargeted, which might
not happen since some lines might already be unmasked/active.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1652282582-21595-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 455880dfe2 upstream.
In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first
MSI of the N allocated. This is because only the first msi_desc is cached
and it is shared by all the MSIs of the multi-MSI block. This means that
hv_arch_irq_unmask() gets the correct address, but the wrong data (always
0).
This can break MSIs.
Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0.
hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure
the MSI address and data (0) to vector 34 of CPU0. This is correct. Then
hv_arch_irq_unmask is called on MSI1. It uses another hypercall to
configure the MSI address and data (0) to vector 33 of CPU0. This is
wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux
will observe extra instances of MSI1 and no instances of MSI0 despite the
endpoint device behaving correctly.
For the multi-MSI case, we need unique address and data info for each MSI,
but the cached msi_desc does not provide that. However, that information
can be gotten from the int_desc cached in the chip_data by
compose_msi_msg(). Fix the multi-MSI case to use that cached information
instead. Since hv_set_msi_entry_from_desc() is no longer applicable,
remove it.
5.15 backport - no changes to code, but merge conflict due to refactor.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1651068453-29588-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>