The nfsd file cache table can be pretty large and its allocation
may require as many as 80 contigious pages.
Employ the same fix that was employed for similar issue that was
reported for the reply cache hash table allocation several years ago
by commit 8f97514b42 ("nfsd: more robust allocation failure handling
in nfsd_reply_cache_init").
Fixes: 65294c1f2c ("nfsd: add a new struct file caching facility to nfsd")
Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Initial support for the RPC_AUTH_TLS authentication flavor enables
NFSD to eventually accept an RPC_AUTH_TLS probe from clients. This
patch simply prevents NFSD from rejecting these probes completely.
In the meantime, graft this support in now so that RPC_AUTH_TLS
support keeps up with generic code and API changes in the RPC
server.
Down the road, server-side transport implementations will populate
xpo_start_tls when they can support RPC-with-TLS. For example, TCP
will eventually populate it, but RDMA won't.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Hoist svo_function back into svc_serv and remove struct
svc_serv_ops, since the struct is now devoid of fields.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
struct svc_serv_ops is about to be removed.
Neil Brown says:
> I suspect svo_module can go as well - I don't think the thread is
> ever the thing that primarily keeps a module active.
A random sample of kthread_create() callers shows sunrpc is the only
one that manages module reference count in this way.
Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: svc_shutdown_net() now does nothing but call
svc_close_net(). Replace all external call sites.
svc_close_net() is renamed to be the inverse of svc_xprt_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up. Neil observed that "any code that calls svc_shutdown_net()
knows what the shutdown function should be, and so can call it
directly."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Neil says:
"These functions were separated in commit 0971374e28 ("SUNRPC:
Reduce contention in svc_xprt_enqueue()") so that the XPT_BUSY check
happened before taking any spinlocks.
We have since moved or removed the spinlocks so the extra test is
fairly pointless."
I've made this a separate patch in case the XPT_BUSY change has
unexpected consequences and needs to be reverted.
Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We have never been able to track down and address the underlying
cause of the performance issues with workqueue-based service
support. svo_enqueue_xprt is called multiple times per RPC, so
it adds instruction path length, but always ends up at the same
function: svc_xprt_do_enqueue(). We do not anticipate needing
this flexibility for dynamic nfsd thread management support.
As a micro-optimization, remove .svo_enqueue_xprt because
Spectre/Meltdown makes virtual function calls more costly.
This change essentially reverts commit b9e13cdfac ("nfsd/sunrpc:
turn enqueueing a svc_xprt into a svc_serv operation").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
To make server-side trace events more useful in container-ized
environments, capture not just the remote's IP address, but the
local IP address and network namespace as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up.
The PROC_ARGS macros were added when I thought that NFSD tracepoints
would be reporting endpoint information. However, tracepoints in the
RPC server now report transport endpoint information, so in general
there's no need for the upper layers to do that any more, and these
macros can be retired.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
A helper macro was added to make reading socket addresses easier in trace
events. It pairs %pISpc with __get_sockaddr() that reads the socket
address from the ring buffer into a human readable format.
The boot up check that makes sure that trace events do not reference
pointers to memory that can later be freed when the trace event is read,
incorrectly flagged this as a delayed reference. Update the check to
handle "__get_sockaddr" and not report an error on it.
Link: https://lore.kernel.org/all/20220125160505.068dbb52@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Enable a struct sockaddr to be stored in a trace record as a
dynamically-sized field. The common cases are AF_INET and AF_INET6
which are different sizes, and are vastly smaller than a struct
sockaddr_storage.
These are safer because, when used properly, the size of the
sockaddr destination field in each trace record is now guaranteed
to be the same as the source address that is being copied into it.
Link: https://lore.kernel.org/all/164182978641.8391.8277203495236105391.stgit@bazille.1015granger.net/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Move a rarely called function call site out of the hot path.
This is an exceptionally small improvement because the compiler
inlines most of the functions that nfsd_cache_lookup() calls.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Force the compiler to skip unneeded initialization for cases that
don't need those values. For example, NFSv4 COMPOUND operations are
RC_NOCACHE.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: The details of finding the right hash bucket are exactly
the same in both nfsd_cache_lookup() and nfsd_cache_update().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
For filesystems that supports "btime" timestamp (i.e. most modern
filesystems do) we share it via kernel nfsd. Btime support for NFS
client has already been added by Trond recently.
Suggested-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Ondrej Valousek <ondrej.valousek.xm@renesas.com>
[ cel: addressed some whitespace/checkpatch nits ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
resulted in a recursive locking problem in the VMD driver. The cure is to
cache the relevant information upfront instead of retrieving it at runtime.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmIbQhATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoX+ND/0ZY7N+NbHnOWz8aPRIelSgchSq4xEU
jjNe+FIe32Zq8zmVDeS59aAXF/gbT4LwR8eL0clzpM+Sd0Rg7xyvYE5v9ltwgv17
3IJNnmJgLmeJazI5qMRSeDZcV5Ys0AIJYueVkDOkiMiJd0alLuGkRocOsejVdFhh
27mLu33tfnXf0qFCZHFUiQtrus5zgJWh+kKz2vOuzLUxF9QPUe+CCTyA9HVNRneh
94PFK7hjjbtyI65KLqSjEQRnGP3ddRwwII4EwE1aa+x/Fx6cDA6/L0PinpIDCSkh
vXfODriwqW2Y9M4g3WrKLU69OB+LxVzV5pKcbC8Rrs9xOfNVGOBJNbzyqnR3nye6
jPOb1I5DF427LJpac8BQKcdu9kxwqTF8D77BWZpkjYdKbIFh5Otd0/DgKaLOH4EG
u4eMSNsgYkFLTc1Aa59CrYdAM03yflYI0BJ0Sdrw+fZbhRoFFmuEMm9R7f6J6E4+
2tbq8uZpZcqBP7YLbAuMmC1Km7fhMlGZNj/8XXHj2168wKmTmQm48J2bARkZmIPt
Jk2el2wKM14gGttES2nqEf/UDrl8XCllTD+cRzBqEAjOv3himpsErZmuKxni6BAd
pQozQpyJlK5swF7U3mZkalJE/btyVL6dzAzlDp0psZbDGFmFK5O+/F3kxQOpoGzo
hsbHVeTZFmmWdA==
=ukul
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2022-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for a regression caused by the recent PCI/MSI rework
which resulted in a recursive locking problem in the VMD driver.
The cure is to cache the relevant information upfront instead of
retrieving it at runtime"
* tag 'irq-urgent-2022-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI: vmd: Prevent recursive locking on interrupt allocation
- fix a swiotlb info leak (Halil Pasic)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmIbvm8LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMcBg/8DD+QsKEEv+D/+bCSWZVbW1ekFDiDyEdO9xuSymxT
pf56Dy693G0BoPZ1JFiN17LIOp3vsHQfwE4ZDo/OHmyDTcziepD40SZiHD3eqEDB
cl7RXhAcoxnUMkgK+hIb6Q7t/4dXaC3+rVXUuL//xUjXZu7GML46tRzuDoVr9MZp
MBSGdZlXIzfvpzNf+yVzpk7sZP/w4nrzyuzeE5T6fdipjN2o78PhWmZzqsHGujeI
ptMuZQwwDuXXetQNiA/t0+DkeWCz/+ERI7k/6lAma14aKHjM1Jn8HcNqW0UGUuGV
148V4wKFqhaOXnoWuodrgjo9AJkOVqh+YH4ui60MMOUvZ4MPh4+5muobybzndtNy
3bAM3JbcNUH1/vyyazvwr22My/TapbydwPuXda63Ls/2WbnV66xbgFEm6S9GXC4X
6+ynK5CzWaTF5INgpd6WjrEMPXGCi0hXM9yQmJvlI1I0muFv9mBXhW/wP7OE3Na8
eQXuP/v+QqYaDVTsGTtySNPkwBstFa9GQUTghrnxKzk3giLVZfGPX5Bs4/+gl4gl
rrGBNhUAnMOdvBSkLUf4hDgyGvsCiYtB1YX9QYxq2aRCbIT03164VnXB0lQRKnmz
Yu547NO96tBnoIT9/gQYMTHvwPU1WjWi3jVbT2zpMKgG4DBUEEIH33P9S1vbgecY
vLY=
=bR6d
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- fix a swiotlb info leak (Halil Pasic)
* tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix info leak with DMA_FROM_DEVICE
- Fix some drive strength and pull-up code in the K210 driver.
- Add the Alder Lake-M ACPI ID so it starts to work properly.
- Use a static name for the StarFive GPIO irq_chip, forestalling
an upcoming fixes series from Marc Zyngier.
- Fix an ages old bug in the Tegra 186 driver where we were
indexing at random into struct and being lucky getting the
right member.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmIaztAACgkQQRCzN7AZ
XXOp+xAAvF525eqPH/wAvoj0p7KbpQQJ9gYpqCL4FNp+t/Wj3q+7ihgy/Y4iQ6wC
vU72M8OQ+i7t/5R8l1cJUj/f/OA2+icNeD5L1+DD+4RB52wQvdbjz7XDqVqEHSFG
YnV9YJFGQ3Tr62HU2MrWUCxsY13J6YHlRWHTcMoM0/fcRSHaDYU5mlgGPQV6fb94
WViv+PZncebY9PeyNm/wIpqL/VHqLI5fcaHz+0u6ppNTF7rGRBv7La/Du0mTbmlw
rofbc2ynv+gIERyZBZ26UepBid2ZY4qaBzNy5S4srNeY8odlE+C9qCi/UcC3j3aM
1UgsiuZKvn7arR7uR6cKPQSeIEHS25zxbL+FXPa5wtg9KrNhZUG7LG1IB3M7jcK4
CiNj7zm9Zuy/qGGbMNWmmqpFk8ueL2fq7oE6K8oQa9HxzMFd48sB0Ckhyt5PCOEV
zcLEo/WeIp3BqOJ4vZQquWO0lcEZr+2SeiGUaUJYwfZI7K2Myrc66hxKVUzBB+EK
QWOQj+2W15qBkZd49ygQMJK5D8CQPBkT66AjqtZHr/7jk5H4S0oyyhJHyEWMPcSW
oEk7UxKGfULG+zPfg6tCKQSN/QEyF9V2DZ5Ve13klWYZwR6uTZIGhQ2ZVWhJH7DN
KXmTGOLGKUp1xR17t8hrAg60WPRoltofY21U/XiABDMGHgLmyYk=
=es6M
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v5-17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Fix some drive strength and pull-up code in the K210 driver.
- Add the Alder Lake-M ACPI ID so it starts to work properly.
- Use a static name for the StarFive GPIO irq_chip, forestalling an
upcoming fixes series from Marc Zyngier.
- Fix an ages old bug in the Tegra 186 driver where we were indexing at
random into struct and being lucky getting the right member.
* tag 'pinctrl-v5-17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
gpio: tegra186: Fix chip_data type confusion
pinctrl: starfive: Use a static name for the GPIO irq_chip
pinctrl: tigerlake: Revert "Add Alder Lake-M ACPI ID"
pinctrl: k210: Fix bias-pull-up
pinctrl: fix loop in k210_pinconf_get_drive()
- rtla (Real-Time Linux Analysis tool): fix typo in man page
- rtla: Update API -e to -E before it is released
- rlla: Error message fix and memory leak fix
- Partially uninline trace event soft disable to shrink text
- Fix function graph start up test
- Have triggers affect the trace instance they are in and not top level
- Have osnoise sleep in the units it says it uses
- Remove unused ftrace stub function
- Remove event probe redundant info from event in the buffer
- Fix group ownership setting in tracefs
- Ensure trace buffer is minimum size to prevent crashes
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYho7XBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qiOhAQDbCbEjIYwkGCpckuGgSQiMU4bAWUzk
jCz9PoaTxoIWJwEAsLWrAPb0pDzNwdEKjiC3fJoUJhz3NwlEjJ7hQ3BxzAI=
=iXOQ
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- rtla (Real-Time Linux Analysis tool):
- fix typo in man page
- Update API -e to -E before it is released
- Error message fix and memory leak fix
- Partially uninline trace event soft disable to shrink text
- Fix function graph start up test
- Have triggers affect the trace instance they are in and not top level
- Have osnoise sleep in the units it says it uses
- Remove unused ftrace stub function
- Remove event probe redundant info from event in the buffer
- Fix group ownership setting in tracefs
- Ensure trace buffer is minimum size to prevent crashes
* tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
rtla/osnoise: Fix error message when failing to enable trace instance
rtla/osnoise: Free params at the exit
rtla/hist: Make -E the short version of --entries
tracing: Fix selftest config check for function graph start up test
tracefs: Set the group ownership in apply_options() not parse_options()
tracing/osnoise: Make osnoise_main to sleep for microseconds
ftrace: Remove unused ftrace_startup_enable() stub
tracing: Ensure trace buffer is at least 4096 bytes large
tracing: Uninline trace_trigger_soft_disabled() partly
eprobes: Remove redundant event type information
tracing: Have traceon and traceoff trigger honor the instance
tracing: Dump stacktrace trigger to the corresponding instance
rtla: Fix systme -> system typo on man page
memblock.{reserved,memory}.regions may be allocated using kmalloc() in
memblock_double_array(). Use kfree() to release these kmalloced regions.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmIZ1qgTHHJwcHRAbGlu
dXguaWJtLmNvbQAKCRA5A4Ymyw79kXsaB/0TnrLt98t/jPvVGinsnf7r3hXnNq7F
8FXWqdUIBWRfiHVd74pX6VE4Be56BbMUUyRQWDfbjrluVFnBibA3qJhNmpIuwdSb
9GESikUdEnuq0t059yPLupKvYY0ysq4OjNLWage+8tnA/TzlN/+t27c75iZWwGn2
JbutM/j5YKnvAcqUVv/plLVIVrGz1RCaG0diYoY1vxrbpRCicmAI8LHTkK1Xtow2
7YVkRuQWY+yJLOJ/SCst5pxy6cm3R96KvnaC9fg1Pp+8wVFrZp/hDsH8nObFccXq
6zQTbXqS88VKOoNEcuqk2ITbFyghepPIBrliEmcI2h96OSdp6BtrNau7
=UpBU
-----END PGP SIGNATURE-----
Merge tag 'fixes-2022-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"Use kfree() to release kmalloced memblock regions
memblock.{reserved,memory}.regions may be allocated using kmalloc()
in memblock_double_array(). Use kfree() to release these kmalloced
regions"
* tag 'fixes-2022-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock: use kfree() to release kmalloced memblock regions
Merge misc fixes from Andrew Morton:
"12 patches.
Subsystems affected by this patch series: MAINTAINERS, mailmap, memfd,
and mm (hugetlb, kasan, hugetlbfs, pagemap, selftests, memcg, and
slab)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
selftests/memfd: clean up mapping in mfd_fail_write
mailmap: update Roman Gushchin's email
MAINTAINERS, SLAB: add Roman as reviewer, git tree
MAINTAINERS: add Shakeel as a memcg co-maintainer
MAINTAINERS: remove Vladimir from memcg maintainers
MAINTAINERS: add Roman as a memcg co-maintainer
selftest/vm: fix map_fixed_noreplace test failure
mm: fix use-after-free bug when mm->mmap is reused after being freed
hugetlbfs: fix a truncation issue in hugepages parameter
kasan: test: prevent cache merging in kmem_cache_double_destroy
mm/hugetlb: fix kernel crash with hugetlb mremap
MAINTAINERS: add sysctl-next git tree
* A fix for the K210 sdcard defconfig, to avoid using a fixed delay for
the root FS.
* A fix to make sure there's a proper call frame for
trace_hardirqs_{on,off}().
---
There are a handful of additional fixes in flight, but not for this
week.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAmIZHmQTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQT4ND/42sEQLhcQcDpdvFDX/0zBr1Y8RNy25
7I9JBYmuTK5AfwmE52I/OcdCLE9bNELH1g+LMK/3amEqhkUtDelBb5UdC4TYfvRm
SRlj75XKPxESMEW9EjU5BeAz+uDI4oMkOmDPyp+Xv/OayGrFQIPUTo75/SiOdlH7
a2khiH4/OxqkVlOff3Ko96M4RNSUeUIEVSfrH4pgJC8n+031u02TvR1IIx5TT7ti
W5YIMw6VZ32Gl5ByZaBMbs9pz+iOKDrn3UfnPrVpbs6P3389EmR4btJpqfzN9JeC
UQzcx4rqoDzTtJvqkOxiR+Ig4nNJGyeYVvxaGH67MkD/nz6rS26adIs+xPGKjDCC
TtFyLt4h2+JX+1kNiutTLrQLAaQO4N+LSkysIsoSr9wNGCdnSrAQRxoOLwIuTMBS
61kRsBvuiuRJZQlbgkP2tiTug/8dYs4vQzPNeC5VO/c3MZB5/j2ykYdKBSsElrpi
+br602CMdeqvT+M+pT9lWdxa8X9lbYVm1z3hx2FyRdbnYw3nbcQq/mGp8Ju9O5zt
JXajwPFtUPpWXzm2CcRjeh+2GKoLetgVpwHAIOmnDd6meTXp/BEu12+o7c8vf3H7
k12BPHVPH1gPklG2Oh8Z/UvevICd4AHlSJT7J19xh7fYVMaaALVQ71Nd2jFbv9N/
eu8KKxR2pKXiPw==
=1a16
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the K210 sdcard defconfig, to avoid using a
fixed delay for the root FS
- A fix to make sure there's a proper call frame for
trace_hardirqs_{on,off}().
* tag 'riscv-for-linus-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix oops caused by irqsoff latency tracer
riscv: fix nommu_k210_sdcard_defconfig
- Only call sync_filesystem when we're remounting the filesystem
readonly readonly, and actually check its return value.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmIEnhUACgkQ+H93GTRK
tOsn7g//e3R/lqpkx6jJtg6SqiC1KiI9euwD0wBdIvrCWSJZ6IdjOSvfRRG13vN0
S1spU0joiLLlVhzLIQdysgZkRub57P0mRmq3zVpHYxMOWKBvPH1OmZtdu83HOiAv
/zjy3tNFc/1ZaqHudAZv3+4780qMZtQTmL7DbgLnvFspCBf4PdBlT0d7Wbf982w8
dMWF7Pla8DhLVFbMsGdyXlnGROz+pw3jofVwY9P6f+PaY37mo+lZu65GrTlNecnc
QfTyX45VAWFO/XZtXm7pXCr8211eK2SnrOFZXZH9u3qxSD5vo1NWf9KPKVkYxc8/
7icz+Yp5t61HQg3o0z7cNAQZp7CQl0BWz6gp2YXMWHS3ZJMnd6H4zTDBdV2MSA5/
alT4kcwncRVcmHtFET7JAsnQkWNeREBqhqCRoAf8hW8uxpjkXw6sPop7+hbZtoJw
VAp1TxbEMbPGTZb76Kw4nZt1eZ3SyJOl6ByzsJMxekEFiMYVh4yxO+a3Q6KNOkMM
O62JpzdE1EeFgV7qmoZ8QzCZuD7z7KC99iv5QtyacFITCqv5y0h/RLGCsOwJ0EMc
fJGKN7uQOZrBIJYInx53S7fCYGGMm0+HUUXMUatBe4RK3dADyqapLzQb0tCGamAf
NQra6NotwfNq8SN+Sn17PJ1KifSRKfw6l7Q+6pt9LA2eVbr2jV4=
=6ODm
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Nothing exciting, just more fixes for not returning sync_filesystem
error values (and eliding it when it's not necessary).
Summary:
- Only call sync_filesystem when we're remounting the filesystem
readonly readonly, and actually check its return value"
* tag 'xfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only bother with sync_filesystem during readonly remount
Running the memfd script ./run_hugetlbfs_test.sh will often end in error
as follows:
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
memfd-hugetlb: SEAL-SHRINK
fallocate(ALLOC) failed: No space left on device
./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs
opening: ./mnt/memfd
fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE
test the mfd_fail_write routine maps the file, but does not unmap. As a
result, two hugetlb pages remain reserved for the mapping. When the
fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm moving to a @linux.dev account. Map my old addresses.
Link: https://lkml.kernel.org/r/20220221200006.416377-1-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The slab code has an overlap with kmem accounting, where Roman has done
a lot of work recently and it would be useful to make sure he's CC'd on
patches that potentially affect it. Thus add him as a reviewer for the
SLAB subsystem.
Also while at it, add the link to slab git tree.
Link: https://lkml.kernel.org/r/20220222103104.13241-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have been contributing and reviewing to the memcg codebase for last
couple of years. So, making it official.
Link: https://lkml.kernel.org/r/20220224060148.4092228-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add myself as a memcg co-maintainer. My primary focus over last few
years was the kernel memory accounting stack, but I do work on some
other parts of the memory controller as well.
Link: https://lkml.kernel.org/r/20220221233951.659048-1-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
oom reaping (__oom_reap_task_mm) relies on a 2 way synchronization with
exit_mmap. First it relies on the mmap_lock to exclude from unlock
path[1], page tables tear down (free_pgtables) and vma destruction.
This alone is not sufficient because mm->mmap is never reset.
For historical reasons[2] the lock is taken there is also MMF_OOM_SKIP
set for oom victims before.
The oom reaper only ever looks at oom victims so the whole scheme works
properly but process_mrelease can opearate on any task (with fatal
signals pending) which doesn't really imply oom victims. That means
that the MMF_OOM_SKIP part of the synchronization doesn't work and it
can see a task after the whole address space has been demolished and
traverse an already released mm->mmap list. This leads to use after
free as properly caught up by KASAN report.
Fix the issue by reseting mm->mmap so that MMF_OOM_SKIP synchronization
is not needed anymore. The MMF_OOM_SKIP is not removed from exit_mmap
yet but it acts mostly as an optimization now.
[1] 27ae357fa8 ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[2] 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[mhocko@suse.com: changelog rewrite]
Link: https://lore.kernel.org/all/00000000000072ef2c05d7f81950@google.com/
Link: https://lkml.kernel.org/r/20220215201922.1908156-1-surenb@google.com
Fixes: 64591e8605 ("mm: protect free_pgtables with mmap_lock write lock in exit_mmap")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+2ccf63a4bd07cf39cab0@syzkaller.appspotmail.com
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we specify a large number for node in hugepages parameter, it may
be parsed to another number due to truncation in this statement:
node = tmp;
For example, add following parameter in command line:
hugepagesz=1G hugepages=4294967297:5
and kernel will allocate 5 hugepages for node 1 instead of ignoring it.
I move the validation check earlier to fix this issue, and slightly
simplifies the condition here.
Link: https://lkml.kernel.org/r/20220209134018.8242-1-liuyuntao10@huawei.com
Fixes: b5389086ad ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: Liu Yuntao <liuyuntao10@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With HW_TAGS KASAN and kasan.stacktrace=off, the cache created in the
kmem_cache_double_destroy() test might get merged with an existing one.
Thus, the first kmem_cache_destroy() call won't actually destroy it but
will only decrease the refcount. This causes the test to fail.
Provide an empty constructor for the created cache to prevent the cache
from getting merged.
Link: https://lkml.kernel.org/r/b597bd434c49591d8af00ee3993a42c609dc9a59.1644346040.git.andreyknvl@google.com
Fixes: f98f966cd7 ("kasan: test: add test case for double-kmem_cache_destroy()")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes the below crash:
kernel BUG at include/linux/mm.h:2373!
cpu 0x5d: Vector: 700 (Program Check) at [c00000003c6e76e0]
pc: c000000000581a54: pmd_to_page+0x54/0x80
lr: c00000000058d184: move_hugetlb_page_tables+0x4e4/0x5b0
sp: c00000003c6e7980
msr: 9000000000029033
current = 0xc00000003bd8d980
paca = 0xc000200fff610100 irqmask: 0x03 irq_happened: 0x01
pid = 9349, comm = hugepage-mremap
kernel BUG at include/linux/mm.h:2373!
move_hugetlb_page_tables+0x4e4/0x5b0 (link register)
move_hugetlb_page_tables+0x22c/0x5b0 (unreliable)
move_page_tables+0xdbc/0x1010
move_vma+0x254/0x5f0
sys_mremap+0x7c0/0x900
system_call_exception+0x160/0x2c0
the kernel can't use huge_pte_offset before it set the pte entry because
a page table lookup check for huge PTE bit in the page table to
differentiate between a huge pte entry and a pointer to pte page. A
huge_pte_alloc won't mark the page table entry huge and hence kernel
should not use huge_pte_offset after a huge_pte_alloc.
Link: https://lkml.kernel.org/r/20220211063221.99293-1-aneesh.kumar@linux.ibm.com
Fixes: 550a7d60bd ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a git tree for sysctls as there's been quite a bit of work lately to
remove all the syctls out of kernel/sysctl.c and move to their respective
places, so coordination has been needed to avoid conflicts. This tree
will also help soak these changes on linux-next prior to getting to Linus.
Link: https://lkml.kernel.org/r/20220218182736.3694508-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a trace instance creation fails, tools are printing:
Could not enable -> osnoiser <- tracer for tracing
Print the actual (and correct) name of the tracer it fails to enable.
Link: https://lkml.kernel.org/r/53ef0582605af91eca14b19dba9fc9febb95d4f9.1645206561.git.bristot@kernel.org
Fixes: b1696371d8 ("rtla: Helper functions for rtla")
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The variable that stores the parsed command line arguments are not
being free()d at the rtla osnoise top exit path.
Free params variable before exiting.
Link: https://lkml.kernel.org/r/0be31d8259c7c53b98a39769d60cfeecd8421785.1645206561.git.bristot@kernel.org
Fixes: 1eceb2fc2c ("rtla/osnoise: Add osnoise top mode")
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently, --entries uses -e as the short version in the hist mode of
timerlat and osnoise tools. But as -e is already used to enable events
on trace sessions by other tools, thus let's keep it available for the
same usage for all rtla tools.
Make -E the short version of --entries for hist mode on all tools.
Note: rtla was merged in this merge window, so rtla was not released yet.
Link: https://lkml.kernel.org/r/5dbf0cbe7364d3a05e708926b41a097c59a02b1e.1645206561.git.bristot@kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Al Viro brought it to my attention that the dentries may not be filled
when the parse_options() is called, causing the call to set_gid() to
possibly crash. It should only be called if parse_options() succeeds
totally anyway.
He suggested the logical place to do the update is in apply_options().
Link: https://lore.kernel.org/all/20220225165219.737025658@goodmis.org/
Link: https://lkml.kernel.org/r/20220225153426.1c4cab6b@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 48b27b6b51 ("tracefs: Set all files to the same group ownership as the mount option")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- fix a race in configfs_{,un}register_subsystem (ChenXiaoSong)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmIZI84LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYP5tQ//e+8cjGVcu5cc6mciWFqkkDrdEiIsiI2OQriXBEbT
PE7A+s5Hj5h9gtTQso3Q3L7UbPBvXH/xT2UytLzxNKOwr++jbh8Gd4Dc/bwJbOxR
AH9Ybc6sOZjOh73jtrZAE7enVBFT55U4909YNaCbif3lKWEYnBo+wktltD7Mu23K
s0IF8ch2shPBW6CB0XdXpEIhXRGqAEjr8mYydh4vAZCRPz5TSEpZrxylWRAml4nM
mQIa0zVwEqPUlYW/BxbaKUn0MSqbWZBiXnM7pJChOReJ3HGioDW8oq+8nFNY9whF
wRUKawH0E9KW0liQwhqGf6RdSAABwo/1c0PwbrjHONTCLgfLM/PtqEK1ER3VLn2+
7Dn2WN0RUbHK0KrLeOt1Bn66fkrzd0JhbOjC7BAId2Pjgpvgro0Wh7mJPxANJ1Zf
8QCUTrG2POrYDANWrUmVXSPSLFK1T33GjXRj/nD0VdREa6yCFI6wzIBDC12LQB3b
kZfAtlYwQcfMvvIZOdJeWMvjOL8W/jJy6QKbP3y9EbupJ1zTy656iSIK2sEVT1zO
2VAYh4cXegODntsuTtkMSRGCAXELM8luRrjWvvIEfNBqFU6yKdh9kt+p9Xe+U9JS
KKb82zk/d+Mc4uis2yiUvGZy2ziU+AE8Z314KmMJgYQXV7P/GwNUHmBvpBIXyZPL
6OU=
=1JKy
-----END PGP SIGNATURE-----
Merge tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs
Pull configfs fix from Christoph Hellwig:
- fix a race in configfs_{,un}register_subsystem (ChenXiaoSong)
* tag 'configfs-5.17-2022-02-25' of git://git.infradead.org/users/hch/configfs:
configfs: fix a race in configfs_{,un}register_subsystem()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmIY790ACgkQxWXV+ddt
WDvKxA//ctgUNhKEPOfJlmmaKAVRgrE6FfDgfk6c2v/PrpPFH0U9+frishcsImxu
XAObMCyPY7PfLDnk6I0Lmxm+8T56+NNGjbxq7/R1Uv0DJm75f51OJbr/H7NSjVfu
g6IyPmIft7jmt7Vp9lPyYcPNDTFyG+XARdWYS3AFtAfr2MfXgjx9AALxFjaytbLi
AevXP0qEkbLHv5npEG56pouhn44J/8GZKeUGM1crNNUDQoYpgreifZ2SHpLIfxP5
lvzrA1noaZSFS3Cth7NBPhHTFS2tiMb96bHFdF56A2EIq5vAXQF7w6IAUlvBEVoR
5XgWsxGfsv5FbdFmyrRIvOh6gGHwHw8BH5/ZRTRRVuRZAPKPY0oiJ9OJk5kIBCgo
LiYksqRTOs0Zp/e5wn/8d/UGp2A6mujxwqw7gLcOZBzfhKw7QIha6BM64BfJxBni
3dakBDCWZ/X+Kje+WaM4Sev7JUIyDVoKWClHrvzoLeEzdIgruNguMnQ+3yOZBFiG
4YRTPUeafAj0OspJ0LLG01X4NJVmnQVAFoKuFOsGbUsxeCaQ9vF3/IGTlhgkwehf
KjvE9nzl9DewpvRRd7AAirj5FncuwRw6KNci1gBBixxPaveBClCIuuyfx6lXPusK
sIF3eb7xcqKYLh0iYPd2XMZInXbWXIGuJoVG/Gu1IYm1OXAFQ5A=
=q/NS
-----END PGP SIGNATURE-----
Merge tag 'for-5.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"This is a hopefully last batch of fixes for defrag that got broken in
5.16, all stable material.
The remaining reported problem is excessive IO with autodefrag due to
various conditions in the defrag code not met or missing"
* tag 'for-5.17-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reduce extent threshold for autodefrag
btrfs: autodefrag: only scan one inode once
btrfs: defrag: don't use merged extent map for their generation check
btrfs: defrag: bring back the old file extent search behavior
btrfs: defrag: remove an ambiguous condition for rejection
btrfs: defrag: don't defrag extents which are already at max capacity
btrfs: defrag: don't try to merge regular extents with preallocated extents
btrfs: defrag: allow defrag_one_cluster() to skip large extent which is not a target
btrfs: prevent copying too big compressed lzo segment
- Older "does not even boot" regression in qib from July
- Bug fixes for error unwind in rtrs
- Avoid a deadlock syzkaller found in srp
- Fix another UAF syzkaller found in cma
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmIZRjgACgkQOG33FX4g
mxpvOg//X67AJDxGjJRAK7LHww08rNU1KdkOyvc7B3SrXPr1f0edE7t1eq3nPnz4
Dh6kvLwgtopYXAmnfNksDLq46R3M0vaTj5vSyYzvdE2k7JXTrAbtVm4Qm0dwMS7s
Ngf3YrWBHe+vVcEmV6UNGyIyBZCgcK1UJiU2xHUyGZl/oa6rC8AEjfkUbU/kWDD5
JWHJqAzerSI7b84TrAEx86JBUpqvUp2mvIoTM4O7EnMFqPjzTReq2v61e0ltjQKk
cQxASlJ7oQioNkIpCYWQw6UaDsKOna3cf0qVfFz/DztyMOOnnvVsuFTdSv7aX76l
ME9NWv+gN0OF/7VZ2ZkkDNivDzdP9zXYpnDqF3kpDHLIufMHNmtVZAsI2mwhsLfn
yvN9GuoeV1+cx2Vywklar9/o1yZ7UKzSlOsQn/QaonQhqYLXHIWEgnpJ76qq80oF
K9hzb7VvH5+FZ6CcitLtJNyLr9Y/ELiBWe2Xdju44r0bwLUijIQLIIeL0hmx5t4M
n98fUC6yKfOPqhzFqu6rRRzxoPB9Ui5BjdcKrc/0UkZygXA16OFdm6owGL7GJeTC
mva7LgrTAf64pkDWB8ZIsfIrm4HIt4fSoz6ciU3ztNwmcDpVG8vg8Hpgl1STxLRs
WkcTXRqmuq+dxbGplcb2kueEhdCnc7YPa+4jiCGQkbDGVPQc3Ss=
=not3
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
- Older "does not even boot" regression in qib from July
- Bug fixes for error unwind in rtrs
- Avoid a deadlock syzkaller found in srp
- Fix another UAF syzkaller found in cma
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cma: Do not change route.addr.src_addr outside state checks
RDMA/ib_srp: Fix a deadlock
RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close
RDMA/rtrs-clt: Fix possible double free in error case
IB/qib: Fix duplicate sysfs directory name