Having two ring buffers per-peer means that every peer results in two
massive ring allocations. On an 8-core x86_64 machine, this commit
reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which
is an 90% reduction. Ninety percent! With some single-machine
deployments approaching 500,000 peers, we're talking about a reduction
from 7 gigs of memory down to 700 megs of memory.
In order to get rid of these per-peer allocations, this commit switches
to using a list-based queueing approach. Currently GSO fragments are
chained together using the skb->next pointer (the skb_list_* singly
linked list approach), so we form the per-peer queue around the unused
skb->prev pointer (which sort of makes sense because the links are
pointing backwards). Use of skb_queue_* is not possible here, because
that is based on doubly linked lists and spinlocks. Multiple cores can
write into the queue at any given time, because its writes occur in the
start_xmit path or in the udp_recv path. But reads happen in a single
workqueue item per-peer, amounting to a multi-producer, single-consumer
paradigm.
The MPSC queue is implemented locklessly and never blocks. However, it
is not linearizable (though it is serializable), with a very tight and
unlikely race on writes, which, when hit (some tiny fraction of the
0.15% of partial adds on a fully loaded 16-core x86_64 system), causes
the queue reader to terminate early. However, because every packet sent
queues up the same workqueue item after it is fully added, the worker
resumes again, and stopping early isn't actually a problem, since at
that point the packet wouldn't have yet been added to the encryption
queue. These properties allow us to avoid disabling interrupts or
spinning. The design is based on Dmitry Vyukov's algorithm [1].
Performance-wise, ordinarily list-based queues aren't preferable to
ringbuffers, because of cache misses when following pointers around.
However, we *already* have to follow the adjacent pointers when working
through fragments, so there shouldn't actually be any change there. A
potential downside is that dequeueing is a bit more complicated, but the
ptr_ring structure used prior had a spinlock when dequeueing, so all and
all the difference appears to be a wash.
Actually, from profiling, the biggest performance hit, by far, of this
commit winds up being atomic_add_unless(count, 1, max) and atomic_
dec(count), which account for the majority of CPU time, according to
perf. In that sense, the previous ring buffer was superior in that it
could check if it was full by head==tail, which the list-based approach
cannot do.
But all and all, this enables us to get massive memory savings, allowing
WireGuard to scale for real world deployments, without taking much of a
performance hit.
[1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If skb->protocol doesn't match the actual skb->data header, it's
probably not a good idea to pass it off to icmp{,v6}_ndo_send, which is
expecting to reply to a valid IP packet. So this commit has that early
mismatch case jump to a later error label.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The is_dead boolean is checked for every single packet, while the
internal_id member is used basically only for pr_debug messages. So it
makes sense to hoist up is_dead into some space formerly unused by a
struct hole, while demoting internal_api to below the lowest struct
cache line.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to test ndo_start_xmit being called in parallel, explicitly add
separate tests, which should all run on different cores. This should
help tease out bugs associated with queueing up packets from different
cores in parallel. Currently, it hasn't found those types of bugs, but
given future planned work, this is a useful regression to avoid.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The endpoint->src_if4 has nothing to do with fixed-endian numbers; remove
the bogus annotation.
This was introduced in
https://git.zx2c4.com/wireguard-monolithic-historical/commit?id=14e7d0a499a676ec55176c0de2f9fcbd34074a82
in the historical WireGuard repo because the old code used to
zero-initialize multiple members as follows:
endpoint->src4.s_addr = endpoint->src_if4 = fl.saddr = 0;
Because fl.saddr is fixed-endian and an assignment returns a value with the
type of its left operand, this meant that sparse detected an assignment
between values of different endianness.
Since then, this assignment was already split up into separate statements;
just the cast survived.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The definition of IS_ERR() already applies the unlikely() notation
when checking the error status of the passed pointer. For this
reason there is no need to have the same notation outside of
IS_ERR() itself.
Clean up code by removing redundant notation.
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some messages are before calling register_netdev(), so replace
netif_err() with dev_err().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Return error code if autosuspend_en, eee_get, or eee_set don't exist.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
U1/U2 shoued be enabled for USB 3.0 or later. The USB 2.0 doesn't
support it.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add invalid and reply flags validate in the fl_validate_ct_state.
This makes the checking complete if compared to ovs'
validate_ct_state().
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/1614064315-364-1-git-send-email-wenxu@ucloud.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Fainelli says:
====================
net: dsa: Learning fixes for b53/bcm_sf2
This patch series contains a couple of fixes for the b53/bcm_sf2 drivers
with respect to configuring learning.
The first patch is wiring-up the necessary dsa_switch_ops operations in
order to support the offloading of bridge flags.
The second patch corrects the switch driver's default learning behavior
which was unfortunately wrong from day one.
This is submitted against "net" because this is technically a bug fix
since ports should not have had learning enabled by default but given
this is dependent upon Vladimir's recent br_flags series, there is no
Fixes tag provided.
I will be providing targeted stable backports that look a bit
different.
Changes in v2:
- added first patch
- updated second patch to include BR_LEARNING check in br_flags_pre as
a support bridge flag to offload
====================
Link: https://lore.kernel.org/r/20210222223010.2907234-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for being able to set the learning attribute on port, and
make sure that the standalone ports start up with learning disabled.
We can remove the code in bcm_sf2 that configured the ports learning
attribute because we want the standalone ports to have learning disabled
by default and port 7 cannot be bridged, so its learning attribute will
not change past its initial configuration.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Because bcm_sf2 implements its own dsa_switch_ops we need to export the
b53_br_flags_pre(), b53_br_flags() and b53_set_mrouter so we can wire-up
them up like they used to be with the former b53_br_egress_floods().
Fixes: a8b659e7ff ("net: dsa: act as passthrough for bridge port flags")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sky2.c driver uses netdev_warn() before the net device is initialized.
Fix it by using dev_warn() instead.
Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>
Link: https://lore.kernel.org/r/m3a6s1r1ul.fsf@t19.piap.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In ndo_stop functions, netdev_completed_queue() is called during forced
tx reclaim, after netdev_reset_queue(). This may trigger kernel panic if
there is any tx skb left.
This patch moves netdev_reset_queue() to after tx reclaim, so BQL can
complete successfully then reset.
Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Fixes: 4c59b0f554 ("bcm63xx_enet: add BQL support")
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210222013530.1356-1-liew.s.piaw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The icmp{,v6}_send functions make all sorts of use of skb->cb, casting
it with IPCB or IP6CB, assuming the skb to have come directly from the
inet layer. But when the packet comes from the ndo layer, especially
when forwarded, there's no telling what might be in skb->cb at that
point. As a result, the icmp sending code risks reading bogus memory
contents, which can result in nasty stack overflows such as this one
reported by a user:
panic+0x108/0x2ea
__stack_chk_fail+0x14/0x20
__icmp_send+0x5bd/0x5c0
icmp_ndo_send+0x148/0x160
In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read
from it. The optlen parameter there is of particular note, as it can
induce writes beyond bounds. There are quite a few ways that can happen
in __ip_options_echo. For example:
// sptr/skb are attacker-controlled skb bytes
sptr = skb_network_header(skb);
// dptr/dopt points to stack memory allocated by __icmp_send
dptr = dopt->__data;
// sopt is the corrupt skb->cb in question
if (sopt->rr) {
optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data
soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data
// this now writes potentially attacker-controlled data, over
// flowing the stack:
memcpy(dptr, sptr+sopt->rr, optlen);
}
In the icmpv6_send case, the story is similar, but not as dire, as only
IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is
worse than the iif case, but it is passed to ipv6_find_tlv, which does
a bit of bounds checking on the value.
This is easy to simulate by doing a `memset(skb->cb, 0x41,
sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by
good fortune and the rarity of icmp sending from that context that we've
avoided reports like this until now. For example, in KASAN:
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0
Write of size 38 at addr ffff888006f1f80e by task ping/89
CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5
Call Trace:
dump_stack+0x9a/0xcc
print_address_description.constprop.0+0x1a/0x160
__kasan_report.cold+0x20/0x38
kasan_report+0x32/0x40
check_memory_region+0x145/0x1a0
memcpy+0x39/0x60
__ip_options_echo+0xa0e/0x12b0
__icmp_send+0x744/0x1700
Actually, out of the 4 drivers that do this, only gtp zeroed the cb for
the v4 case, while the rest did not. So this commit actually removes the
gtp-specific zeroing, while putting the code where it belongs in the
shared infrastructure of icmp{,v6}_ndo_send.
This commit fixes the issue by passing an empty IPCB or IP6CB along to
the functions that actually do the work. For the icmp_send, this was
already trivial, thanks to __icmp_send providing the plumbing function.
For icmpv6_send, this required a tiny bit of refactoring to make it
behave like the v4 case, after which it was straight forward.
Fixes: a2b78e9b2c ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs")
Reported-by: SinYu <liuxyon@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-02-19
This series contains updates to i40e driver only.
Slawomir resolves an issue with the IPv6 extension headers being
processed incorrectly.
Keita Suzuki fixes a memory leak on probe failure.
Mateusz initializes AQ command structures to zero to comply with
spec, fixes FW flow control settings being overwritten and resolves an
issue with adding VLAN filters after enabling FW LLDP. He also adds
an additional check when adding TC filter as the current check doesn't
properly distinguish between IPv4 and IPv6.
Sylwester removes setting disabled bit when syncing filters as this
prevents VFs from completing setup.
Norbert cleans up sparse warnings.
v2:
- Fix fixes tag on patch 7
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix endianness conversions
i40e: Fix add TC filter for IPv6
i40e: Fix VFs not created
i40e: Fix addition of RX filters after enabling FW LLDP agent
i40e: Fix overwriting flow control settings during driver loading
i40e: Add zero-initialization of AQ command structures
i40e: Fix memory leak in i40e_probe
i40e: Fix flow for IPv6 next header (extension header)
====================
Link: https://lore.kernel.org/r/20210219213606.2567536-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlx4_do_mirror_rule() forgets to call mlx4_free_cmd_mailbox() to
free the memory region allocated by mlx4_alloc_cmd_mailbox() before
an exit.
Add the missed call to fix it.
Fixes: 78efed2751 ("net/mlx4_core: Support mirroring VF DMFS rules on both ports")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20210221143559.390277-1-hslester96@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When link speed is not 100 Mbps, port transmit rate and speed divider
are set to 8 and 1000000 respectively. These values are incorrect for
CBS idleslope and sendslope HW values calculation if the link speed is
not 1 Gbps.
This patch adds switch statement to set the values of port transmit rate
and speed divider for 10 Gbps, 5 Gbps, 2.5 Gbps, 1 Gbps, and 100 Mbps.
Note that CBS is not supported at 10 Mbps.
Fixes: bc41a6689b ("net: stmmac: tc: Remove the speed dependency")
Fixes: 1f705bc61a ("net: stmmac: Add support for CBS QDISC")
Signed-off-by: Song, Yoong Siang <yoong.siang.song@intel.com>
Link: https://lore.kernel.org/r/1613655653-11755-1-git-send-email-yoong.siang.song@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni says:
====================
mptcp: a bunch of fixes
This series bundle a few MPTCP fixes for the current net tree.
They have been detected via syzkaller and packetdrill
Patch 1 fixes a slow close for orphaned sockets
Patch 2 fixes another hangup at close time, when no
data was actually transmitted before close
Patch 3 fixes a memory leak with unusual sockopts
Patch 4 fixes stray wake-ups on listener sockets
====================
Link: https://lore.kernel.org/r/cover.1613755058.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MPJ subflows are not exposed as fds to user spaces. As such,
incoming MPJ subflows are removed from the accept queue by
tcp_check_req()/tcp_get_cookie_sock().
Later tcp_child_process() invokes subflow_data_ready() on the
parent socket regardless of the subflow kind, leading to poll
wakeups even if the later accept will block.
Address the issue by double-checking the queue state before
waking the user-space.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/164
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mptcp re-used inet(6)_release, so the subflow sockets are ignored.
Need to invoke ip(v6)_mc_drop_socket function to ensure mcast join
resources get free'd.
Fixes: 717e79c867 ("mptcp: Add setsockopt()/getsockopt() socket operations")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/110
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the msk is closed before sending or receiving any data,
no DATA_FIN is generated, instead an MPC ack packet is
crafted out.
In the above scenario, the MPTCP protocol creates and sends a
pure ack and such packets matches also the criteria for an
MPC ack and the protocol tries first to insert MPC options,
leading to the described error.
This change addresses the issue by avoiding the insertion of an
MPC option for DATA_FIN packets or if the sub-flow is not
established.
To avoid doing multiple times the same test, fetch the data_fin
flag in a bool variable and pass it to both the interested
helpers.
Fixes: 6d0060f600 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently we move orphaned msk sockets directly from FIN_WAIT2
state to CLOSE, with the rationale that incoming additional
data could be just dropped by the TCP stack/TW sockets.
Anyhow we miss sending MPTCP-level ack on incoming DATA_FIN,
and that may hang the peers.
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The core DSA framework uses hsr_is_master() which would not resolve to a
valid symbol if HSR is built-into the kernel and DSA is a module.
Fixes: 18596f504a ("net: dsa: add support for offloading HSR")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20210220051222.15672-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The comments to phy_select_page() say that "phy_restore_page() must
always be called after this, irrespective of success or failure of this
call." If we don't call phy_restore_page() then we are still holding
the phy_lock_mdio_bus() so it eventually leads to a dead lock.
Fixes: 32ab60e539 ("net: phy: icplus: add MDI/MDIX support for IP101A/G")
Fixes: f9bc51e6cc ("net: phy: icplus: fix paged register access")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Michael Walle <michael@walle.cc>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/YC+OpFGsDPXPnXM5@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PPv2 loopback port doesn't support RSS, so we should
skip RSS configurations for this port.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Reviewed-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/1613652123-19021-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current use of container_of is flawed and unnecessary. Obtain
the dpaa_napi_portal reference from the private percpu data instead.
Fixes: a1e031ffb4 ("dpaa_eth: add XDP_REDIRECT support")
Reported-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20210218182106.22613-1-camelia.groza@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 bytes of the MTU are reserved for Atheros DSA tag, but DSA core
has already handled that since commit dc0fe7d47f.
Remove the unnecessary reservation.
Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20210218034514.3421-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This code does not allocate enough memory for the NUL terminator so it
ends up putting it one character beyond the end of the buffer.
Fixes: 8756828a81 ("octeontx2-af: Add NPA aura and pool contexts to debugfs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add CPU-PMU support for Intel Sapphire Rapids CPUs
- Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter
sampling event feedback. Not used yet, but is intended for Golden Cove
CPU-PMU, which can provide both the instruction latency and the cache
latency information for memory profiling events.
- Remove experimental, default-disabled perfmon-v4 counter_freezing support
that could only be enabled via a boot option. The hardware is hopelessly
broken, we'd like to make sure nobody starts relying on this, as it would
only end in tears.
- Fix energy/power events on Intel SPR platforms
- Simplify the uprobes resume_execution() logic
- Misc smaller fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtf7kRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iJ2xAAvygKF8hm/UAGyT2R3iEruO49wRrmUfgt
13iBBA1DotKw2b8F5UN5MqjfwS8UgGKuAd8agvQ6XXANpnJ5mpy0nrzgjEXUx4j+
sQUqL7vxSdZ5J3kKblSZ4QoMzLVYSUkEDmw818vsa4eFWN8z58FJsv+ySegIFbXx
+I3hF1O9a8MERZBUz4T5xHlgcbSDGEX6EvYRcO+zZ0rXfARfo9StfHYv1V53j6iY
EOotFEKEn/5naczAd/sQo1SE1IgHtX2cbjOaKF7LulgEwZQWHpdKq0gww6nFK5yz
XMSE9oXAFXRkRCJbrSqC0Dvrrf8hdlxWbKYbj9L7XILoxw199AdOBDbliJm6P/UH
6+JSEu/N4R0TFYc7TX6yef7ncw12e+64USjKOlWWwww97rVWWH1/tFTdlXhS6s+d
jVI3yEECKyZlddrDdsetRdUj+QKyZQfDqbMXPXiDTv9P6AFqBvNLZYT0UPU3akk5
jXueHJQYSSgqnN+eRaIwvm4ZYWa031YHJXxiq2E89RnzL4JJArBYaddpukgxTYka
c6Tn8L7f4zP5Bghu7hHv5Vy69i1N/3YvzUoYc6ljjmapgAJzxzq/yoEKrBlKnjtA
MrstHhnwnPJl+PKjlbLpjl74rtcCiKJxjVhm+a5UbEcYoVuzJ86lmQK2WrLaoCTU
B/zFplUF8C4=
=BCcg
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
- Add CPU-PMU support for Intel Sapphire Rapids CPUs
- Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer
two-parameter sampling event feedback. Not used yet, but is intended
for Golden Cove CPU-PMU, which can provide both the instruction
latency and the cache latency information for memory profiling
events.
- Remove experimental, default-disabled perfmon-v4 counter_freezing
support that could only be enabled via a boot option. The hardware is
hopelessly broken, we'd like to make sure nobody starts relying on
this, as it would only end in tears.
- Fix energy/power events on Intel SPR platforms
- Simplify the uprobes resume_execution() logic
- Misc smaller fixes.
* tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Fix psys-energy event on Intel SPR platform
perf/x86/rapl: Only check lower 32bits for RAPL energy counters
perf/x86/rapl: Add msr mask support
perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[]
perf/x86/intel: Support CPUID 10.ECX to disable fixed counters
perf/x86/intel: Add perf core PMU support for Sapphire Rapids
perf/x86/intel: Filter unsupported Topdown metrics event
perf/x86/intel: Factor out intel_update_topdown_event()
perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT
perf/intel: Remove Perfmon-v4 counter_freezing support
x86/perf: Use static_call for x86_pmu.guest_get_msrs
perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info
perf/x86/intel/uncore: Store the logical die id instead of the physical die id.
x86/kprobes: Do not decode opcode in resume_execution()
[ NOTE: unfortunately this tree had to be freshly rebased today,
it's a same-content tree of 82891be90f3c (-next published)
merged with v5.11.
The main reason for the rebase was an authorship misattribution
problem with a new commit, which we noticed in the last minute,
and which we didn't want to be merged upstream. The offending
commit was deep in the tree, and dependent commits had to be
rebased as well. ]
- Core scheduler updates:
- Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
preempt=none/voluntary/full boot options (default: full),
to allow distros to build a PREEMPT kernel but fall back to
close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling
behavior via a boot time selection.
There's also the /debug/sched_debug switch to do this runtime.
This feature is implemented via runtime patching (a new variant of static calls).
The scope of the runtime patching can be best reviewed by looking
at the sched_dynamic_update() function in kernel/sched/core.c.
( Note that the dynamic none/voluntary mode isn't 100% identical,
for example preempt-RCU is available in all cases, plus the
preempt count is maintained in all models, which has runtime
overhead even with the code patching. )
The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority
of distributions, are supposed to be unaffected.
- Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
was found via rcutorture triggering a hang. The bug is that
rcu_idle_enter() may wake up a NOCB kthread, but this happens after
the last generic need_resched() check. Some cpuidle drivers fix it
by chance but many others don't.
In true 2020 fashion the original bug fix has grown into a 5-patch
scheduler/RCU fix series plus another 16 RCU patches to address
the underlying issue of missed preemption events. These are the
initial fixes that should fix current incarnations of the bug.
- Clean up rbtree usage in the scheduler, by providing & using the following
consistent set of rbtree APIs:
partial-order; less() based:
- rb_add(): add a new entry to the rbtree
- rb_add_cached(): like rb_add(), but for a rb_root_cached
total-order; cmp() based:
- rb_find(): find an entry in an rbtree
- rb_find_add(): find an entry, and add if not found
- rb_find_first(): find the first (leftmost) matching entry
- rb_next_match(): continue from rb_find_first()
- rb_for_each(): iterate a sub-tree using the previous two
- Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass.
This is a 4-commit series where each commit improves one aspect of the idle
sibling scan logic.
- Improve the cpufreq cooling driver by getting the effective CPU utilization
metrics from the scheduler
- Improve the fair scheduler's active load-balancing logic by reducing the number
of active LB attempts & lengthen the load-balancing interval. This improves
stress-ng mmapfork performance.
- Fix CFS's estimated utilization (util_est) calculation bug that can result in
too high utilization values
- Misc updates & fixes:
- Fix the HRTICK reprogramming & optimization feature
- Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
- Reduce dl_add_task_root_domain() overhead
- Fix uprobes refcount bug
- Process pending softirqs in flush_smp_call_function_from_idle()
- Clean up task priority related defines, remove *USER_*PRIO and
USER_PRIO()
- Simplify the sched_init_numa() deduplication sort
- Documentation updates
- Fix EAS bug in update_misfit_status(), which degraded the quality
of energy-balancing
- Smaller cleanups
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtHBsRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1itgg/+NGed12pgPjYBzesdou60Lvx7LZLGjfOt
M1F1EnmQGn/hEH2fCY6ZoqIZQTVltm7GIcBNabzYTzlaHZsdtyuDUJBZyj19vTlk
zekcj7WVt+qvfjChaNwEJhQ9nnOM/eohMgEOHMAAJd9zlnQvve7NOLQ56UDM+kn/
9taFJ5ZPvb4avP6C5p3KivvKex6Bjof/Tl0m3utpNyPpI/qK3FyGxwdgCxU0yepT
ABWQX5ZQCufFvo1bgnBPfqyzab4MqhoM3bNKBsLQfuAlssG1xRv4KQOev4dRwrt9
pXJikV5C9yez5d2lGe5p0ltH5IZS/l9x2yI/ZQj3OUDTFyV1ic6WfFAqJgDzVF8E
i/vvA4NPQiI241Bkps+ErcCw4aVOgiY6TWli74cHjLUIX0+As6aHrFWXGSxUmiHB
WR+B8KmdfzRTTlhOxMA+cvlpZcKCfxWkJJmXzr/lDZzIuKPqM3QCE2wD9sixkfVo
JNICT0IvZghWOdbMEfZba8Psh/e2LVI9RzdpEiuYJz1ZrVlt1hO0M6jBxY0hMz9n
k54z81xODw0a8P2FHMtpmB1vhAeqCmvwA6DO8z0Oxs0DFi+KM2bLf2efHsCKafI+
Bm5v9YFaOk/55R76hJVh+aYLlyFgFkKd+P/niJTPDnxOk3SqJuXvTrql1HeGHkNr
kYgQa23dsZk=
=pyaG
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler updates:
- Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
preempt=none/voluntary/full boot options (default: full), to allow
distros to build a PREEMPT kernel but fall back to close to
PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
a boot time selection.
There's also the /debug/sched_debug switch to do this runtime.
This feature is implemented via runtime patching (a new variant of
static calls).
The scope of the runtime patching can be best reviewed by looking
at the sched_dynamic_update() function in kernel/sched/core.c.
( Note that the dynamic none/voluntary mode isn't 100% identical,
for example preempt-RCU is available in all cases, plus the
preempt count is maintained in all models, which has runtime
overhead even with the code patching. )
The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
majority of distributions, are supposed to be unaffected.
- Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
was found via rcutorture triggering a hang. The bug is that
rcu_idle_enter() may wake up a NOCB kthread, but this happens after
the last generic need_resched() check. Some cpuidle drivers fix it
by chance but many others don't.
In true 2020 fashion the original bug fix has grown into a 5-patch
scheduler/RCU fix series plus another 16 RCU patches to address the
underlying issue of missed preemption events. These are the initial
fixes that should fix current incarnations of the bug.
- Clean up rbtree usage in the scheduler, by providing & using the
following consistent set of rbtree APIs:
partial-order; less() based:
- rb_add(): add a new entry to the rbtree
- rb_add_cached(): like rb_add(), but for a rb_root_cached
total-order; cmp() based:
- rb_find(): find an entry in an rbtree
- rb_find_add(): find an entry, and add if not found
- rb_find_first(): find the first (leftmost) matching entry
- rb_next_match(): continue from rb_find_first()
- rb_for_each(): iterate a sub-tree using the previous two
- Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
single pass. This is a 4-commit series where each commit improves
one aspect of the idle sibling scan logic.
- Improve the cpufreq cooling driver by getting the effective CPU
utilization metrics from the scheduler
- Improve the fair scheduler's active load-balancing logic by
reducing the number of active LB attempts & lengthen the
load-balancing interval. This improves stress-ng mmapfork
performance.
- Fix CFS's estimated utilization (util_est) calculation bug that can
result in too high utilization values
Misc updates & fixes:
- Fix the HRTICK reprogramming & optimization feature
- Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
- Reduce dl_add_task_root_domain() overhead
- Fix uprobes refcount bug
- Process pending softirqs in flush_smp_call_function_from_idle()
- Clean up task priority related defines, remove *USER_*PRIO and
USER_PRIO()
- Simplify the sched_init_numa() deduplication sort
- Documentation updates
- Fix EAS bug in update_misfit_status(), which degraded the quality
of energy-balancing
- Smaller cleanups"
* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
sched,x86: Allow !PREEMPT_DYNAMIC
entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
entry: Explicitly flush pending rcuog wakeup before last rescheduling point
rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
sched/features: Distinguish between NORMAL and DEADLINE hrtick
sched/features: Fix hrtick reprogramming
sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
uprobes: (Re)add missing get_uprobe() in __find_uprobe()
smp: Process pending softirqs in flush_smp_call_function_from_idle()
sched: Harden PREEMPT_DYNAMIC
static_call: Allow module use without exposing static_call_key
sched: Add /debug/sched_preempt
preempt/dynamic: Support dynamic preempt with preempt= boot option
preempt/dynamic: Provide irqentry_exit_cond_resched() static call
preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
preempt/dynamic: Provide cond_resched() and might_resched() static calls
preempt: Introduce CONFIG_PREEMPT_DYNAMIC
static_call: Provide DEFINE_STATIC_CALL_RET0()
...
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
- nolibc fixes for the torture tests
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAs9lgRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1j/axAAsqIvarDD6OLmgcOPCyWSvfG6LsIFgqI9
CY0JdQBtvFBTvE8Q2No5ktbLVmuYsBh0dGeFkv4HQZJyRlr7mjstVMNN4SeBDVIS
/+zZO1wlwzaXfKQopLctTK1O/UzFqIN2sqyzA3nLzGGj8DqgxXJyreJ10feK5XM+
6ttZPd1qm4hqtpA22ZEODbct5OFqZuvnK8VNqBb2YHabA1rasUXbIEJPBpsuv/W2
l9W5AGP4erdOFm3nHJxiCpvLJtgHy4njvw0HJp5f99Abj6OVeAzw5kFjvRB3n1Qd
ayKyTw8T/1mfmkjvYkGsMAqhEmqwXcryFX0dR/14/XPdXyjPhZlbkz+MfRKrn4NT
LBJPX+MlX9lVFWBNR9HMe2o/083+gorlwZt9wtyt0OBBGGgudYo4uKNdbyy6tB3Y
Gb98P2vtVSO24EsQce6M+ppHN4TgVBd6id82MQxNuFw+PQJdBiCY0JJfNQApbAry
cIKOchSSR2SkJHlAevNVaKAeiTnkAXd1jDBKtCnvCqOUyvtnhE3rQCqwS5xT2Cno
oQydpudwBKT7uO/GUyS0ESErjHuy9zhExNSYD0ydxlBCrGbzrrgPg57ntXHA1die
mtFyvc2tfT/AshWRNYiuCG+eaUG3qK7n7jN7Vc6/K5DR4GMb5tOhL9wPx2ljCRGu
Z8WDg0pJGz4=
=31yj
-----END PGP SIGNATURE-----
Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"These are the latest RCU updates for v5.12:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide
allocator return addresses to more easily locate bugs. This has a
couple of RCU-related commits, but is mostly MM. Was pulled in with
akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks, which enables
better debugging information and smarter reactions to large numbers
of callbacks.
- The first round of changes to allow CPUs to be runtime switched
from and to callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a
"torture everything" script that runs rcutorture, locktorture,
scftorture, rcuscale, and refscale. Plus does an allmodconfig
build.
- nolibc fixes for the torture tests"
* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
mm: Make mem_obj_dump() vmalloc() dumps include start and length
mm: Make mem_dump_obj() handle vmalloc() memory
mm: Make mem_dump_obj() handle NULL and zero-sized pointers
mm: Add mem_dump_obj() to print source of memory block
tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
tools/nolibc: Fix position of -lgcc in the documented example
tools/nolibc: Emit detailed error for missing alternate syscall number definitions
tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
tools/nolibc: Get timeval, timespec and timezone from linux/time.h
tools/nolibc: Implement poll() based on ppoll()
tools/nolibc: Implement fork() based on clone()
tools/nolibc: Make getpgrp() fall back to getpgid(0)
tools/nolibc: Make dup2() rely on dup3() when available
tools/nolibc: Add the definition for dup()
rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
torture: Maintain torture-specific set of CPUs-online books
torture: Clean up after torture-test CPU hotplugging
rcutorture: Make object_debug also double call_rcu() heap object
...
- Instead of new drivers remove tango, sirf, u300 and atlas drivers
- Add suspend/resume support for microchip pit64b
- The usual fixes, improvements and cleanups here and there
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmAqjecTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodVTD/9UuKlOifRBNd4ECR+yF65MzLfjqNHU
j76E3Dzuf4QCbXTjmQAsadaqQ+w9l8Ie7OVT51XlmTczEBkJiV3FOGLVJeRun6R4
7OLF3VOYyBUtMoRdHzyXaYOTbsOK9gZitucDeLCQhvKhDCkVnKKFXNJR+TTkSYM3
xDBLwuI7uuHWHyh0+W+3SI1pTiEA4yMe5ZbqqoJbjGQapr3Eao+nyjd1aa3ERb2f
PtS7UVQ69QowRqq6DQyZk0yKit8J3HZnHfCPH/T6eXsxGnui36GiUnTGCMhLMZpD
Xvl/5cjqQuKjgt2093t8nGiumOGBOfrb8uvc/qMW777DzFe/VJtXrC/7pySVLAhK
oc9Swj0iX/WPARzlpyOk3lfpDMzv6qyjMNJIXcnav2lrknITp+TMORKWOADB03UV
sswlN7YFTrNe7d7uxEdybKkNX6bUwgOzo2m69A1IdSXwKPzYkZQmku6Y7GnzYErZ
aiJiZl858VB9g24ROKLt/uQTarzYCS0sjcdnDgO1KSR7zKHZ4iUpd3zucd3mlmUo
fGTMIbCqL/gzl4Zcl6njvzMVfJeMzOeDiQ41wCyYnOsXlIKmWNi1rONdGZcyDYvN
bOiGVUMKicEZwBlSZQQ1GdR9eGf8/Ix5cTlynZepN3dkHUDc7SU+OvQZdg4Yd/RY
BsquFDHnt4gl+A==
=xZQt
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Time and timer updates:
- Instead of new drivers remove tango, sirf, u300 and atlas drivers
- Add suspend/resume support for microchip pit64b
- The usual fixes, improvements and cleanups here and there"
* tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Delete no-op time_ns_init()
alarmtimer: Update kerneldoc
clocksource/drivers/timer-microchip-pit64b: Add clocksource suspend/resume
clocksource/drivers/prima: Remove sirf prima driver
clocksource/drivers/atlas: Remove sirf atlas driver
clocksource/drivers/tango: Remove tango driver
clocksource/drivers/u300: Remove the u300 driver
dt-bindings: timer: nuvoton: Clarify that interrupt of timer 0 should be specified
clocksource/drivers/davinci: Move pr_fmt() before the includes
clocksource/drivers/efm32: Drop unused timer code
- The usual new irq chip driver (Realtek RTL83xx)
- Removal of sirfsoc and tango irq chip drivers
- Conversion of the sun6i chip support to hierarchical irq domains
- The usual fixes, improvements and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmAqjRITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYob0ND/9ghLC0XF7gSrIFBxNXczQcckd0Ev8y
gt7EJMxy9+UG/krEmLk4p5FkkKaUQy4v8PMqhDpI0pR7Fdm2oaOCgPRHsNdVYr5J
g7FawE99qPiRXaR29rpfX6qIBRP3rJ4/MS90YlcXGvKbgPtqCXStEWrIyu1/1Q8A
3BdfKL3K15/TeWeu9075yHX8V4Ovss6Birc0/XNtQWiHNk84GAZVIQPjfdHEu83j
Kv4QxlrfSDjCvB016YKL04kjrwzUXxksvGnyjFXNXFZwg0bVtLTujSdkXajUBQjv
1ZpcEM+gBPdeOU99ELYeaQ9B0Di2mNjya5TBttmKdKW/nQ01BkWUouoqyBRqjpvd
+2uXaad/D7iKUYXqXtSim8OpwBLb5THQdG4qn1vgYZOUxWu9DKdtZEO3hw8edgjN
zJEQFI8yBeDtcAjKTqm0C2fomlWPhDM0A97EHmV8LiQpz0PyY9q8Oogg51t1RggH
7w9+YjjrGkaPifZXX+YogJT/CbFanK6lOUIFg/qro1QcicCsEbWrTaHUOwdpWpHr
B4FKZIfrlnrY7yMC7aClPQSJr3iUXBUjfWhRKvYiwOLTabfUWz+HRFA++R8vytbL
RvEZnhuvi3eyRn91yiapQM/f69yC1yslTZHy4/Ww9kvtOoawQpmegnZ3dLMhwBN1
aFgQi4U9Z7G2eA==
=rM8s
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the irq subsystem:
- The usual new irq chip driver (Realtek RTL83xx)
- Removal of sirfsoc and tango irq chip drivers
- Conversion of the sun6i chip support to hierarchical irq domains
- The usual fixes, improvements and cleanups all over the place"
* tag 'irq-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/imx: IMX_INTMUX should not default to y, unconditionally
irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
irqchip/csky-mpintc: Prevent selection on unsupported platforms
irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller
dt-bindings: interrupt-controller: Add Realtek RTL838x/RTL839x support
irqchip/ls-extirq: add IRQCHIP_SKIP_SET_WAKE to the irqchip flags
genirq: Use new tasklet API for resend_tasklet
dt-bindings: qcom,pdc: Add compatible for SM8350
dt-bindings: qcom,pdc: Add compatible for SM8250
irqchip/sun6i-r: Add wakeup support
irqchip/sun6i-r: Use a stacked irqchip driver
dt-bindings: irq: sun6i-r: Add a compatible for the H3
dt-bindings: irq: sun6i-r: Split the binding from sun7i-nmi
irqchip/gic-v3: Fix typos in PMR/RPR SCR_EL3.FIQ handling explanation
irqchip: Remove sirfsoc driver
irqchip: Remove sigma tango driver
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtYbYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppeWD/4xKhzBCGZWOkdycaaPhsUTOjNNIPmCBhlz
QQj4KFSEuJNKACUg53Ak0oECJTaH5976kjKkKs7Z+hzmkEwboLBI4erkcT9MGC3M
mPx349qBq9X3sYaFrUJF3h0sjRr+wa60nWQ01oVH8HkfI4bCNCHoqo5jDvMPWsYT
ksFbUm8YWEZmi0K2yXFWXuJIN2bVBd72a8CrvtF3ksdEMYxbWWTOAcrhYJ4H5/U7
BQjWIxiIVsAoJohcXWq/Swh8cgvgb5uJVpNUU8VEFob/jI3Gc3YojIToISB6soUL
DNhDJLeyZjuXfE1Ej+ySas9bpdG4LgxzsDBl9lFl9EQkSo1c3h/lEx85aeixAZla
QfjTOVUabzdPzvZ9H1yDQISxjVLy2PotnhVMy/rSSrnDKlowtNB9iEzd6cpzFzxU
fxomz1d6+w8rZY9jaRIAcMNa6bEOuYmcP9V8rIzGeg3Mm3jqL7H/JgJu5s2YbjpN
InmTNu4cwLeTO65DzqVxF8UGbZ2tHbMm5pNeVBYxuY1adRgJFlIOP5kYlNlyiY+D
Bt41CRuK3hqpYfXh7nSK8U4BKEhMikTCS0W4aKL5EzLZ20rxjgTlaHZiOBqd9vep
1tqNjPIvL2jWfF+5shwAZbupj3WKbuVqi4S2jXljv+Wkmk4ZVLSX3fQZv2I7JTHM
I2qa59PB4A==
=8MX/
-----END PGP SIGNATURE-----
Merge tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"Highlights from this cycles are things like request recycling and
task_work optimizations, which net us anywhere from 10-20% of speedups
on workloads that mostly are inline.
This work was originally done to put io_uring under memcg, which adds
considerable overhead. But it's a really nice win as well. Also worth
highlighting is the LOOKUP_CACHED work in the VFS, and using it in
io_uring. Greatly speeds up the fast path for file opens.
Summary:
- Put io_uring under memcg protection. We accounted just the rings
themselves under rlimit memlock before, now we account everything.
- Request cache recycling, persistent across invocations (Pavel, me)
- First part of a cleanup/improvement to buffer registration (Bijan)
- SQPOLL fixes (Hao)
- File registration NULL pointer fixup (Dan)
- LOOKUP_CACHED support for io_uring
- Disable /proc/thread-self/ for io_uring, like we do for /proc/self
- Add Pavel to the io_uring MAINTAINERS entry
- Tons of code cleanups and optimizations (Pavel)
- Support for skip entries in file registration (Noah)"
* tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-block: (103 commits)
io_uring: tctx->task_lock should be IRQ safe
proc: don't allow async path resolution of /proc/thread-self components
io_uring: kill cached requests from exiting task closing the ring
io_uring: add helper to free all request caches
io_uring: allow task match to be passed to io_req_cache_free()
io-wq: clear out worker ->fs and ->files
io_uring: optimise io_init_req() flags setting
io_uring: clean io_req_find_next() fast check
io_uring: don't check PF_EXITING from syscall
io_uring: don't split out consume out of SQE get
io_uring: save ctx put/get for task_work submit
io_uring: don't duplicate io_req_task_queue()
io_uring: optimise SQPOLL mm/files grabbing
io_uring: optimise out unlikely link queue
io_uring: take compl state from submit state
io_uring: inline io_complete_rw_common()
io_uring: move res check out of io_rw_reissue()
io_uring: simplify iopoll reissuing
io_uring: clean up io_req_free_batch_finish()
io_uring: move submit side state closer in the ring
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtnZ8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoOhD/9TZJN6mgvlmO2zuNlZwko0jD+HUYNRHdfa
UiZhKs55ShlT/Wd8MMLcmMU2/+iztq4c/ZLK9eS7NHgKTu3GbgsICEZK+XLSTVJh
gCwwEnY2dnBAIwBWxLeDG04DQvcwnOhVN1OSmhFbXG3dpElXSyfEjx00niGtl0tE
5YtvmStpqkC0tHlxq8CMyyfaL1ODGmBK2uhDeQCO12SXIgKondJUaI3/H1l1dC5t
+yg4PsSLqezo0oWmqdTEE7lcEJs4GK1ZOhIBLtWe6tl/zaVD6DuzJL83pChm8vF+
qV4LCpJL0wUL7IG601AemFcUmEg34oC0FD6GYYhXxVOrlk43V6AfkycZ2rljNRop
/+Ff+CmXfWPAwSfJi/vDlveCvgyAJNMpv4/GwynUM1563v3TYy0YjT6Jlz6M3TUn
pS0MW7iUHj3t36U3JvcYnITqTPSfTMtYMOsWDx+V4+E9iGsQF1d7KZlCPMvjWAI2
c3QuWjitXd10BZ2qUvSzAg6piv1taBKJxg1PsGlu707mHZp5J6VPAkYQn4rFgjua
uCBbmRQCDF/wJ02IBmBUiMPP64UGbkhr+O3MILPqki967BdDDrLzjTs4e5zbMeu/
qB9XRr1Yd95GCyS8I42OCC906NXrJ2R2E8dtV1XoASWGusL/wFZrLd1th8Uq8ibb
Os+G4t1Qug==
=vx6U
-----END PGP SIGNATURE-----
Merge tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- Remove the skd driver. It's been EOL for a long time (Damien)
- NVMe pull requests
- fix multipath handling of ->queue_rq errors (Chao Leng)
- nvmet cleanups (Chaitanya Kulkarni)
- add a quirk for buggy Amazon controller (Filippo Sironi)
- avoid devm allocations in nvme-hwmon that don't interact well
with fabrics (Hannes Reinecke)
- sysfs cleanups (Jiapeng Chong)
- fix nr_zones for multipath (Keith Busch)
- nvme-tcp crash fix for no-data commands (Sagi Grimberg)
- nvmet-tcp fixes (Sagi Grimberg)
- add a missing __rcu annotation (Christoph)
- failed reconnect fixes (Chao Leng)
- various tracing improvements (Michal Krakowiak, Johannes
Thumshirn)
- switch the nvmet-fc assoc_list to use RCU protection (Leonid
Ravich)
- resync the status codes with the latest spec (Max Gurtovoy)
- minor nvme-tcp improvements (Sagi Grimberg)
- various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya
Kulkarni, Israel Rukshin)
- Floppy O_NDELAY fix (Denis)
- MD pull request
- raid5 chunk_sectors fix (Guoqing)
- Use lore links (Kees)
- Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao)
- loop lock scaling (Pavel)
- mtip32xx PCI fixes (Bjorn)
- bcache fixes (Kai, Dongdong)
- Misc fixes (Tian, Yang, Guoqing, Joe, Andy)
* tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits)
lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid()
lightnvm: fix unnecessary NULL check warnings
nvme-tcp: fix crash triggered with a dataless request submission
block: Replace lkml.org links with lore
nbd: Convert to DEFINE_SHOW_ATTRIBUTE
nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
nvme-hwmon: rework to avoid devm allocation
nvmet: remove else at the end of the function
nvmet: add nvmet_req_subsys() helper
nvmet: use min of device_path and disk len
nvmet: use invalid cmd opcode helper
nvmet: use invalid cmd opcode helper
nvmet: add helper to report invalid opcode
nvmet: remove extra variable in id-ns handler
nvmet: make nvmet_find_namespace() req based
nvmet: return uniform error for invalid ns
nvmet: set status to 0 in case for invalid nsid
nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues
nvme-multipath: set nr_zones for zoned namespaces
nvmet-tcp: fix potential race of tcp socket closing accept_work
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtmIwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgplzLEAC5O+3rBM8QuiJdo39Yppmuw4hDJ6hOKynP
EJQLKQQi0VfXgU+MprGvcbpFYmNbgICvUICQkEzJuk++kPCu/BJtJz0yErQeLgS+
RdXiPV6enbF7iRML5TVRTr1q/z7sJMXcIIJ8Pz/rU/JNfGYExVd0WfnEY9mp1jOt
Bl9V+qyTazdP+Ma4+uEPatSayqcdi1rxB5I+7v/sLiOvKZZWkaRZjUZ/mxAjUfvK
dBOOPjMygEo3tCLkIyyA6lpLvr1r+SUZhLuebRLEKa3To3TW6RtoG0qwpKmI2iKw
ylLeVLB60nM9RUxjflVOfBsHxz1bDg5Ve86y5nCjQd4Jo8x1c4DnecyGE5/Tu8Rg
rgbsfD6nFWzhDCvcZT0XrfQ4ZAjIL2IfT+ypQiQ6UlRd3hvIKRmzWMkjuH2svr0u
ey9Kq+lYerI4cM0F3W73gzUKdIQOuCzBCYxQuSQQomscBa7FCInyU192dAI9Aj6l
Yd06mgKu6qCx6zLv6JfpBqaBHZMwyGE4dmZgPQFuuwO+b4N+Ck3Jm5fzEzw/xIxQ
wdo/DlsAl60BXentB6FByGBJaCjVdSymRqN/xNCAbFKCjmr6TLBuXPfg1gYYO7xC
VOcVjWe8iN3wWHZab3t2mxMKH9B9B/KKzIhu6TNHSmgtQ5paZPRCBx995pDyRw26
WC22RGC2MA==
=os1E
-----END PGP SIGNATURE-----
Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"Another nice round of removing more code than what is added, mostly
due to Christoph's relentless pursuit of tech debt removal/cleanups.
This pull request contains:
- Two series of BFQ improvements (Paolo, Jan, Jia)
- Block iov_iter improvements (Pavel)
- bsg error path fix (Pan)
- blk-mq scheduler improvements (Jan)
- -EBUSY discard fix (Jan)
- bvec allocation improvements (Ming, Christoph)
- bio allocation and init improvements (Christoph)
- Store bdev pointer in bio instead of gendisk + partno (Christoph)
- Block trace point cleanups (Christoph)
- hard read-only vs read-only split (Christoph)
- Block based swap cleanups (Christoph)
- Zoned write granularity support (Damien)
- Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"
* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
mm: simplify swapdev_block
sd_zbc: clear zone resources for non-zoned case
block: introduce blk_queue_clear_zone_settings()
zonefs: use zone write granularity as block size
block: introduce zone_write_granularity limit
block: use blk_queue_set_zoned in add_partition()
nullb: use blk_queue_set_zoned() to setup zoned devices
nvme: cleanup zone information initialization
block: document zone_append_max_bytes attribute
block: use bi_max_vecs to find the bvec pool
md/raid10: remove dead code in reshape_request
block: mark the bio as cloned in bio_iov_bvec_set
block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
block: remove a layer of indentation in bio_iov_iter_get_pages
block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
block: remove the 1 and 4 vec bvec_slabs entries
block: streamline bvec_alloc
block: factor out a bvec_alloc_gfp helper
block: move struct biovec_slab to bio.c
block: reuse BIO_INLINE_VECS for integrity bvecs
...
The "oprofile" user-space tools don't use the kernel OPROFILE support any more,
and haven't in a long time. User-space has been converted to the perf
interfaces.
The dcookies stuff is only used by the oprofile code. Now that oprofile's
support is getting removed from the kernel, there is no need for dcookies as
well.
Remove kernel's old oprofile and dcookies support.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJgJMEVAAoJENK5HDyugRIcL8YP/jkmXH5CZT80ntcqrJGWKcG7
lWbach7uNeQteht7B1ZPKvojxizTkmfrN2sClX0B2hbGkc5TiWUQ2ZSnvnfWDZ8+
z2qQcEB11G/ReL2vvRk1fJlWdAOyUfrPee/44AkemnLRv+Niw/8PqnGd87yDQGsK
qy5E1XXfbjUq6Y/uMiLOX3+21I6w6o2Q6I3NNXC93s0wS3awqnft8n0XBC7iAPBj
eowRJxpdRU2Vcuj8UOzzOI7gQlwdjwYImyLPbRy/V8NawC8a+FHrPrf5/GCYlVzl
7TGFBsDQSmzvrBChUfoGz1Rq/VZ1a357p5rhRqemfUrdkjW+vyzelnD8I1W/hb2o
SmBXoPoyl3+UkFHNyJI0mI7obaV+2PzyXMV0JIQUj+IiX/mfeFv0nF4XfZD2IkRt
6xhaYj775Zrx32iBdGZIvvLg5Gh9ZkZmR5vJ7Fi/EIZFe6Z+bZnPKUROnAgS/o0z
+UkSygOhgo/1XbqrzZVk1iweWeu+EUMbY4YQv2qVnFhpvsq4ieThcUGQpWcxGjjH
WP8O0n1yq1slsnpUtxhiTsm46ENajx9zZp6Iv6Ws+NM0RUqjND8BdF1co9WGD3LS
cnZMFBs4Bg/V1HICL/D4s6L7t1ofrEXIgJH1y3iF0HeECq03mU4CgA/qly9Aebqg
UxPF3oNlVOPlds9FzsU2
=I2Ac
-----END PGP SIGNATURE-----
Merge tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux
Pull oprofile and dcookies removal from Viresh Kumar:
"Remove oprofile and dcookies support
The 'oprofile' user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.
The dcookies stuff is only used by the oprofile code. Now that
oprofile's support is getting removed from the kernel, there is no
need for dcookies as well.
Remove kernel's old oprofile and dcookies support"
* tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
fs: Remove dcookies support
drivers: Remove CONFIG_OPROFILE support
arch: xtensa: Remove CONFIG_OPROFILE support
arch: x86: Remove CONFIG_OPROFILE support
arch: sparc: Remove CONFIG_OPROFILE support
arch: sh: Remove CONFIG_OPROFILE support
arch: s390: Remove CONFIG_OPROFILE support
arch: powerpc: Remove oprofile
arch: powerpc: Stop building and using oprofile
arch: parisc: Remove CONFIG_OPROFILE support
arch: mips: Remove CONFIG_OPROFILE support
arch: microblaze: Remove CONFIG_OPROFILE support
arch: ia64: Remove rest of perfmon support
arch: ia64: Remove CONFIG_OPROFILE support
arch: hexagon: Don't select HAVE_OPROFILE
arch: arc: Remove CONFIG_OPROFILE support
arch: arm: Remove CONFIG_OPROFILE support
arch: alpha: Remove CONFIG_OPROFILE support
- Fix an ABBA deadlock when renaming files on overlayfs.
- Make sure that we can't overflow the inode extent counters when adding
to or removing extents from a file.
- Make directory sgid inheritance work the same way as all the other
filesystems.
- Don't drain the buffer cache on freeze and ro remount, which should
reduce the amount of time if read-only workloads are continuing
during the freeze.
- Fix a bug where symlink size isn't reported to the vfs in ecryptfs.
- Disentangle log cleaning from log covering. This refactoring sets us
up for future changes to the log, though for now it simply means that
we can use covering for freezes, and cleaning becomes something we
only do at unmount.
- Speed up file fsyncs by reducing iolock cycling.
- Fix delalloc blocks leaking when changing the project id fails because
of input validation errors in FSSETXATTR.
- Fix oversized quota reservation when converting unwritten extents
during a DAX write.
- Create a transaction allocation helper function to standardize the
idiom of allocating a transaction, reserving blocks, locking inodes,
and reserving quota. Replace all the open-coded logic for file
creation, file ownership changes, and file modifications to use them.
- Actually shut down the fs if the incore quota reservations get
corrupted.
- Fix background block garbage collection scans to not block and to
actually clean out CoW staging extents properly.
- Run block gc scans when we run low on project quota.
- Use the standardized transaction allocation helpers to make it so that
ENOSPC and EDQUOT errors during reservation will back out, invoke the
block gc scanner, and try again. This is preparation for introducing
background inode garbage collection in the next cycle.
- Combine speculative post-EOF block garbage collection with speculative
copy on write block garbage collection.
- Enable multithreaded quotacheck.
- Allow sysadmins to tweak the CPU affinities and maximum concurrency
levels of quotacheck and background blockgc worker pools.
- Expose the inode btree counter feature in the fs geometry ioctl.
- Cleanups of the growfs code in preparation for starting work on
filesystem shrinking.
- Fix all the bloody gcc warnings that the maintainer knows about. :P
- Fix a RST syntax error.
- Don't trigger bmbt corruption assertions after the fs shuts down.
- Restore behavior of forcing SIGBUS on a shut down filesystem when
someone triggers a mmap write fault (or really, any buffered write).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmAlX/UACgkQ+H93GTRK
tOta+RAAiGqLKxeY07HH7F98pRJ86j6lU0zmc5i5UCOGMvZd8hLKDdThzggsjqO6
rrUSc7Ppg7MQt1JdXLSdZw2N6Ksb9yy6chufj+j3Dq1JQfSL4YvBO/LlXmZmFE6d
80Qbqq6HFSRWb6JzCMr3knhC+FJovAGhFgZYZGBZ817A/FXacTg9/A5Ow8SX81WX
42s517QOmegAn7YhC3xcPZp5iavjbMd7Y9v7izpuo4FBB9AY7NYyb5wVhvffILfS
/SMLQPw3T/tccRJuVJ8TfLA9R+B9+LaGmQ5tn/AtdwN+Lv7ykinzGKYLagkdlTmE
onGkEIwrebEgq9phT47eX7ixiEt7oWQiQGZukXLVn7mL/0WPVI2pbYi/M1BNpi8i
UftOEVroav+m4h0DF3duOE7rLGuBIEdjPuuAs85QhZ6UTusBjwxp1gOJbjuN0Up9
9hBGTtYQIRhWxHkxWKAeuYzIbtMxC2S2XGxnW4cNOxbE7GxwfxBw0KP/38ZP4iYQ
LKt6JVX+iFDQ+lH8JA6DD7+j+m7W37Alu89OPmpW2nYpFyisFDY+1dEIFvPw9roZ
BtbKlZzS2O2zD67/tTVh+ZcPoEcPfp156GDCrgfgdIdiBvQtGbyOLB/WQC6wSU1L
2PLt1inFBx5wNrIEMFMHT1hsduRihNMM+eLn6LV5XIK2RmSCT+I=
=CaLz
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"There's a lot going on this time, which seems about right for this
drama-filled year.
Community developers added some code to speed up freezing when
read-only workloads are still running, refactored the logging code,
added checks to prevent file extent counter overflow, reduced iolock
cycling to speed up fsync and gc scans, and started the slow march
towards supporting filesystem shrinking.
There's a huge refactoring of the internal speculative preallocation
garbage collection code which fixes a bunch of bugs, makes the gc
scheduling per-AG and hence multithreaded, and standardizes the retry
logic when we try to reserve space or quota, can't, and want to
trigger a gc scan. We also enable multithreaded quotacheck to reduce
mount times further. This is also preparation for background file gc,
which may or may not land for 5.13.
We also fixed some deadlocks in the rename code, fixed a quota
accounting leak when FSSETXATTR fails, restored the behavior that
write faults to an mmap'd region actually cause a SIGBUS, fixed a bug
where sgid directory inheritance wasn't quite working properly, and
fixed a bug where symlinks weren't working properly in ecryptfs. We
also now advertise the inode btree counters feature that was
introduced two cycles ago.
Summary:
- Fix an ABBA deadlock when renaming files on overlayfs.
- Make sure that we can't overflow the inode extent counters when
adding to or removing extents from a file.
- Make directory sgid inheritance work the same way as all the other
filesystems.
- Don't drain the buffer cache on freeze and ro remount, which should
reduce the amount of time if read-only workloads are continuing
during the freeze.
- Fix a bug where symlink size isn't reported to the vfs in ecryptfs.
- Disentangle log cleaning from log covering. This refactoring sets
us up for future changes to the log, though for now it simply means
that we can use covering for freezes, and cleaning becomes
something we only do at unmount.
- Speed up file fsyncs by reducing iolock cycling.
- Fix delalloc blocks leaking when changing the project id fails
because of input validation errors in FSSETXATTR.
- Fix oversized quota reservation when converting unwritten extents
during a DAX write.
- Create a transaction allocation helper function to standardize the
idiom of allocating a transaction, reserving blocks, locking
inodes, and reserving quota. Replace all the open-coded logic for
file creation, file ownership changes, and file modifications to
use them.
- Actually shut down the fs if the incore quota reservations get
corrupted.
- Fix background block garbage collection scans to not block and to
actually clean out CoW staging extents properly.
- Run block gc scans when we run low on project quota.
- Use the standardized transaction allocation helpers to make it so
that ENOSPC and EDQUOT errors during reservation will back out,
invoke the block gc scanner, and try again. This is preparation for
introducing background inode garbage collection in the next cycle.
- Combine speculative post-EOF block garbage collection with
speculative copy on write block garbage collection.
- Enable multithreaded quotacheck.
- Allow sysadmins to tweak the CPU affinities and maximum concurrency
levels of quotacheck and background blockgc worker pools.
- Expose the inode btree counter feature in the fs geometry ioctl.
- Cleanups of the growfs code in preparation for starting work on
filesystem shrinking.
- Fix all the bloody gcc warnings that the maintainer knows about. :P
- Fix a RST syntax error.
- Don't trigger bmbt corruption assertions after the fs shuts down.
- Restore behavior of forcing SIGBUS on a shut down filesystem when
someone triggers a mmap write fault (or really, any buffered
write)"
* tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (85 commits)
xfs: consider shutdown in bmapbt cursor delete assert
xfs: fix boolreturn.cocci warnings
xfs: restore shutdown check in mapped write fault path
xfs: fix rst syntax error in admin guide
xfs: fix incorrect root dquot corruption error when switching group/project quota types
xfs: get rid of xfs_growfs_{data,log}_t
xfs: rename `new' to `delta' in xfs_growfs_data_private()
libxfs: expose inobtcount in xfs geometry
xfs: don't bounce the iolock between free_{eof,cow}blocks
xfs: expose the blockgc workqueue knobs publicly
xfs: parallelize block preallocation garbage collection
xfs: rename block gc start and stop functions
xfs: only walk the incore inode tree once per blockgc scan
xfs: consolidate the eofblocks and cowblocks workers
xfs: consolidate incore inode radix tree posteof/cowblocks tags
xfs: remove trivial eof/cowblocks functions
xfs: hide xfs_icache_free_cowblocks
xfs: hide xfs_icache_free_eofblocks
xfs: relocate the eofb/cowb workqueue functions
xfs: set WQ_SYSFS on all workqueues in debug mode
...
- Adjust the final parameter of iomap_dio_rw.
- Add a new flag to request that iomap directio writes return EAGAIN if
the write is not a pure overwrite within EOF; this will be used to
reduce lock contention with unaligned direct writes on XFS.
- Amend XFS' directio code to eliminate exclusive locking for unaligned
direct writes if the circumstances permit
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmAZgQAACgkQ+H93GTRK
tOtNqw/+KPff1NjQVK2k361R0+LjlEHfe2nxh7+kS10IiR5nbBz4Fu+GwEosZKq+
H9ficBbZ0wIveV+5CEt2xZLEJFC4LZUpNPVVrUf8XPLKiVexP/U3wtKzmv9Z7D5J
5walMWQycVeR+ycomynV36giqekvARL7KCQG5By2ITfSNxfnb/wvKhn1d61ZDOF6
f4xzq7F6+cEOrSZt2LcFzGSfsTl6oakYMAomPU57sqGmw7MHRqoPTErbdh2HnVJy
yQ47eiZgSKWKA+Qm+VvHHePYCYnu0nvA2rbNerjTN70hnO8rK9S0Vle6Sp5CUqAX
sXOy8zxOLYKqyM4S/QkIN2TGIyWg+CHiakVLZGF3Q4AUDDYfpD0cHvAe9N3v9euL
qt8ypT8dz2C3qiTg5E31xy033wlAP0wg3FZiLAqEjL5o3fzD+qbplTiSmYbMV2Fb
xuu7a2T6u1MHaIn1IhaL0cB49Fzn+5EMyp6BlAucAOakyuqJCyJiXokdk0Looy5e
jUshvcwWcmHMpI/YYYY6t56KV6tl2exGq5sySY5U6dr8/r5lwc0SI+TrYFG0jTR8
59DGd5CkKgdBFcuys+eaZDXgr7A4ymkVE+pE0QNDz9UwNP20tLb3dQNlhgxchUgu
NgPaFgQkoNM3HmQNyU2wX/t1aFlC/doqSkb/96UWQSxq6IrajMU=
=AR07
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.12-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"The big change in this cycle is some new code to make it possible for
XFS to try unaligned directio overwrites without taking locks. If the
block is fully written and within EOF (i.e. doesn't require any
further fs intervention) then we can let the unlocked write proceed.
If not, we fall back to synchronizing direct writes.
Summary:
- Adjust the final parameter of iomap_dio_rw.
- Add a new flag to request that iomap directio writes return EAGAIN
if the write is not a pure overwrite within EOF; this will be used
to reduce lock contention with unaligned direct writes on XFS.
- Amend XFS' directio code to eliminate exclusive locking for
unaligned direct writes if the circumstances permit"
* tag 'iomap-5.12-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reduce exclusive locking on unaligned dio
xfs: split the unaligned DIO write code out
xfs: improve the reflink_bounce_dio_write tracepoint
xfs: simplify the read/write tracepoints
xfs: remove the buffered I/O fallback assert
xfs: cleanup the read/write helper naming
xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware
xfs: factor out a xfs_ilock_iocb helper
iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag
iomap: pass a flags argument to iomap_dio_rw
iomap: rename the flags variable in __iomap_dio_rw
Add an ioctl which allows reading fs-verity metadata from a file.
This is useful when a file with fs-verity enabled needs to be served
somewhere, and the other end wants to do its own fs-verity compatible
verification of the file. See the commit messages for details.
This new ioctl has been tested using new xfstests I've written for it.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYCv/2hQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK6/7AQDRmmnV+G34yGPCWfu8tyjdYvWPyak2
IA/I+eM6S/F+4QEAkbX6rOwYVhLHN9KSOYyNhJiBchm6xq83J+R8BYh/Kw0=
=FPNK
-----END PGP SIGNATURE-----
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
"Add an ioctl which allows reading fs-verity metadata from a file.
This is useful when a file with fs-verity enabled needs to be served
somewhere, and the other end wants to do its own fs-verity compatible
verification of the file. See the commit messages for details.
This new ioctl has been tested using new xfstests I've written for it"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: support reading signature with ioctl
fs-verity: support reading descriptor with ioctl
fs-verity: support reading Merkle tree with ioctl
fs-verity: add FS_IOC_READ_VERITY_METADATA ioctl
fs-verity: don't pass whole descriptor to fsverity_verify_signature()
fs-verity: factor out fsverity_get_descriptor()