WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Pavel Begunkov	3ff1a0d395	io_uring: enable managed frags with register buffers io_uring's registered buffers infra has a good performant way of pinning pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely register buffer backed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	492dddb4f6	io_uring: add zc notification flush requests Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df13e2363400682a73dd9e71c3b990b8d1ff0333.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	4379d5f15b	io_uring: rename IORING_OP_FILES_UPDATE IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0a907133907d9af3415a8a7aa1802c6aa97c03c6.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	63809137eb	io_uring: flush notifiers after sendzc Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	10c7d33ecd	io_uring: sendzc with fixed buffers Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com [axboe: fold in 32-bit pointer cast warning fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	092aeedb75	io_uring: allow to pass addr into sendzc Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	e29e3bd4b9	io_uring: account locked pages for non-fixed zc Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	06a5464be8	io_uring: wire send zc request type Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a80387c6a68ce9cf99b3b6ef6f71068468761fb7.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	bc24d6bd32	io_uring: add notification slot registration Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a0aa8161fe3ebb2a4cc6e5dbd0cffb96e6881cf5.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:07 -06:00
Pavel Begunkov	68ef5578ef	io_uring: add rsrc referencing for notifiers In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cd7a01d26837945b6982fa9cf15a63230f2ed4f.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	e58d498e81	io_uring: complete notifiers in tw We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/089799ab665b10b78fdc614ae6d59fa7ef0d5f91.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	eb4a299b2f	io_uring: cache struct io_notif kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dec18f7fcbab9f4bd40b96e5ae158b119945230.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	eb42cebb2c	io_uring: add zc notification infrastructure Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	e70cb60893	io_uring: export io_put_task() Make io_put_task() available to non-core parts of io_uring, we'll need it for notification infrastructure. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3686807d4c03b72e389947b0e8692d4d44334ef0.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Pavel Begunkov	e02b665127	io_uring: initialise msghdr::msg_ubuf Initialise newly added ->msg_ubuf in io_recv() and io_send(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8f9f263875a4a36e7f26cc5d55ebe315308f57d.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:41:06 -06:00
Stefan Roesch	1c849b481b	io_uring: Add tracepoint for short writes This adds the io_uring_short_write tracepoint to io_uring. A short write is issued if not all pages that are required for a write are in the page cache and the async buffered writes have to return EAGAIN. Signed-off-by: Stefan Roesch <shr@fb.com> Link: https://lore.kernel.org/r/20220616212221.2024518-13-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:32 -06:00
Jens Axboe	e053aaf4da	io_uring: fix issue with io_write() not always undoing sb_start_write() This is actually an older issue, but we never used to hit the -EAGAIN path before having done sb_start_write(). Make sure that we always call kiocb_end_write() if we need to retry the write, so that we keep the calls to sb_start_write() etc balanced. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:32 -06:00
Stefan Roesch	4e17aaab54	io_uring: Add support for async buffered writes This enables the async buffered writes for the filesystems that support async buffered writes in io-uring. Buffered writes are enabled for blocks that are already in the page cache or can be acquired with noio. Signed-off-by: Stefan Roesch <shr@fb.com> Link: https://lore.kernel.org/r/20220616212221.2024518-12-shr@fb.com [axboe: adapt to 5.20 branch] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:32 -06:00
Jens Axboe	f6b543fd03	io_uring: ensure REQ_F_ISREG is set async offload If we're offloading requests directly to io-wq because IOSQE_ASYNC was set in the sqe, we can miss hashing writes appropriately because we haven't set REQ_F_ISREG yet. This can cause a performance regression with buffered writes, as io-wq then no longer correctly serializes writes to that file. Ensure that we set the flags in io_prep_async_work(), which will cause the io-wq work item to be hashed appropriately. Fixes: `584b0180f0` ("io_uring: move read/write file prep state into actual opcode handler") Link: https://lore.kernel.org/io-uring/20220608080054.GB22428@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:18 -06:00
Jens Axboe	4f6a94d337	net: fix compat pointer in get_compat_msghdr() A previous change enabled external users to copy the data before calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or __io_compat_recvmsg_copy_hdr() to take that into account. They are both stil passing in the __user pointer rather than the copied version. Ensure we pass in the kernel struct, not the pointer to the user data. Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/ Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:18 -06:00
Michal Koutný	4890422992	io_uring: Don't require reinitable percpu_ref The commit `8bb649ee1d` ("io_uring: remove ring quiesce for io_uring_register") removed the worklow relying on reinit/resurrection of the percpu_ref, hence, initialization with that requested is a relic. This is based on code review, this causes no real bug (and theoretically can't). Technically it's a revert of commit `214828962d` ("io_uring: initialize percpu refcounters using PERCU_REF_ALLOW_REINIT") but since the flag omission is now justified, I'm not making this a revert. Fixes: `8bb649ee1d` ("io_uring: remove ring quiesce for io_uring_register") Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:18 -06:00
Dylan Yudaken	9b0fc3c054	io_uring: fix types in io_recvmsg_multishot_overflow io_recvmsg_multishot_overflow had incorrect types on non x64 system. But also it had an unnecessary INT_MAX check, which could just be done by changing the type of the accumulator to int (also simplifying the casts). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: a8b38c4ce724 ("io_uring: support multishot in recvmsg") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220715130252.610639-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:18 -06:00
Uros Bizjak	4ccc6db090	io_uring: Use atomic_long_try_cmpxchg in __io_account_mem Use atomic_long_try_cmpxchg instead of atomic_long_cmpxchg (ptr, old, new) == old in __io_account_mem. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_long_try_cmpxchg implicitly assigns old ptr value to "old" when cmpxchg fails, enabling further code simplifications. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:18 -06:00
Dylan Yudaken	9bb66906f2	io_uring: support multishot in recvmsg Similar to multishot recv, this will require provided buffers to be used. However recvmsg is much more complex than recv as it has multiple outputs. Specifically flags, name, and control messages. Support this by introducing a new struct io_uring_recvmsg_out with 4 fields. namelen, controllen and flags match the similar out fields in msghdr from standard recvmsg(2), payloadlen is the length of the payload following the header. This struct is placed at the start of the returned buffer. Based on what the user specifies in struct msghdr, the next bytes of the buffer will be name (the next msg_namelen bytes), and then control (the next msg_controllen bytes). The payload will come at the end. The return value in the CQE is the total used size of the provided buffer. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-4-dylany@fb.com [axboe: style fixups, see link] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:18 -06:00
Dylan Yudaken	72c531f8ef	net: copy from user before calling __get_compat_msghdr this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	7fa875b8e5	net: copy from user before calling __copy_msghdr this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	6d2f75a0cf	io_uring: support 0 length iov in buffer select in compat Match up work done in "io_uring: allow iov_len = 0 for recvmsg and buffer select", but for compat code path. Fixes: a68caad69ce5 ("io_uring: allow iov_len = 0 for recvmsg and buffer select") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220708181838.1495428-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	e2df2ccb75	io_uring: fix multishot ending when not polled If multishot is not actually polling then return IOU_OK rather than the result. If the result was > 0 this will confuse things further up the callstack which expect a return <= 0. Fixes: 1300ebb20286 ("io_uring: multishot recv") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220708181838.1495428-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Jens Axboe	43e0bbbd0b	io_uring: add netmsg cache For recvmsg/sendmsg, if they don't complete inline, we currently need to allocate a struct io_async_msghdr for each request. This is a somewhat large struct. Hook up sendmsg/recvmsg to use the io_alloc_cache. This reduces the alloc + free overhead considerably, yielding 4-5% of extra performance running netbench. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Jens Axboe	9731bc9855	io_uring: impose max limit on apoll cache Caches like this tend to grow to the peak size, and then never get any smaller. Impose a max limit on the size, to prevent it from growing too big. A somewhat randomly chosen 512 is the max size we'll allow the cache to get. If a batch of frees come in and would bring it over that, we simply start kfree'ing the surplus. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Jens Axboe	9b797a37c4	io_uring: add abstraction around apoll cache In preparation for adding limits, and one more user, abstract out the core bits of the allocation+free cache. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Jens Axboe	9da7471ed1	io_uring: move apoll cache to poll.c This is where it's used, move the flush handler in there. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Pavel Begunkov	e8375e43ca	io_uring: consolidate hash_locked io-wq handling Don't duplicate code disabling REQ_F_HASH_LOCKED for IO_URING_F_UNLOCKED (i.e. io-wq), move the handling into __io_arm_poll_handler(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ff0ffdfaa65b3d536131535c3dad3c63d9b7bb0.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Pavel Begunkov	b21a51e26e	io_uring: clear REQ_F_HASH_LOCKED on hash removal Instead of clearing REQ_F_HASH_LOCKED while arming a poll, unset the bit when we're removing the entry from the table in io_poll_tw_hash_eject(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/02e48bb88d6f1480c94ac2924c43ad1fbd48e92a.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Pavel Begunkov	ceff501790	io_uring: don't race double poll setting REQ_F_ASYNC_DATA Just as with io_poll_double_prepare() setting REQ_F_DOUBLE_POLL, we can race with the first poll entry when setting REQ_F_ASYNC_DATA. Move it under io_poll_double_prepare(). Fixes: a18427bb2d9b ("io_uring: optimise submission side poll_refs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/df6920f509c11115aa2bce8b34dc5fdb0eb98920.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Pavel Begunkov	7a121ced6e	io_uring: don't miss setting REQ_F_DOUBLE_POLL When adding a second poll entry we should set REQ_F_DOUBLE_POLL unconditionally. We might race with the first entry removal but that doesn't change the rule. Fixes: a18427bb2d9b ("io_uring: optimise submission side poll_refs") Reported-and-tested-by: syzbot+49950ba66096b1f0209b@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b680d83ded07424db83e8745585e7a6d72826ef.1657203020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	cf0dd9527e	io_uring: disable multishot recvmsg recvmsg has semantics that do not make it trivial to extend to multishot. Specifically it has user pointers and returns data in the original parameter. In order to make this API useful these will need to be somehow included with the provided buffers. For now remove multishot for recvmsg as it is not useful. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220704140106.200167-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	e0486f3f7c	io_uring: only trace one of complete or overflow In overflow we see a duplcate line in the trace, and in some cases 3 lines (if initial io_post_aux_cqe fails). Instead just trace once for each CQE Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-13-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	b3fdea6ecb	io_uring: multishot recv Support multishot receive for io_uring. Typical server applications will run a loop where for each recv CQE it requeues another recv/recvmsg. This can be simplified by using the existing multishot functionality combined with io_uring's provided buffers. The API is to add the IORING_RECV_MULTISHOT flag to the SQE. CQEs will then be posted (with IORING_CQE_F_MORE flag set) when data is available and is read. Once an error occurs or the socket ends, the multishot will be removed and a completion without IORING_CQE_F_MORE will be posted. The benefit to this is that the recv is much more performant. * Subsequent receives are queued up straight away without requiring the application to finish a processing loop. * If there are more data in the socket (sat the provided buffer size is smaller than the socket buffer) then the data is immediately returned, improving batching. * Poll is only armed once and reused, saving CPU cycles Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-11-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	cbd2574854	io_uring: fix multishot accept ordering Similar to multishot poll, drop multishot accept when CQE overflow occurs. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-10-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	a2da676376	io_uring: fix multishot poll on overflow On overflow, multishot poll can still complete with the IORING_CQE_F_MORE flag set. If in the meantime the user clears a CQE and a the poll was cancelled then the poll will post a CQE without the IORING_CQE_F_MORE (and likely result -ECANCELED). However when processing the application will encounter the non-overflow CQE which indicates that there will be no more events posted. Typical userspace applications would free memory associated with the poll in this case. It will then subsequently receive the earlier CQE which has overflowed, which breaks the contract given by the IORING_CQE_F_MORE flag. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-9-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	52120f0fad	io_uring: add allow_overflow to io_post_aux_cqe Some use cases of io_post_aux_cqe would not want to overflow as is, but might want to change the flags/result. For example multishot receive requires in order CQE, and so if there is an overflow it would need to stop receiving until the overflow is taken care of. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	114eccdf0e	io_uring: add IOU_STOP_MULTISHOT return code For multishot we want a way to signal the caller that multishot has ended but also this might not be an error return. For example sockets return 0 when closed, which should end a multishot recv, but still have a CQE with result 0 Introduce IOU_STOP_MULTISHOT which does this and indicates that the return code is stored inside req->cqe Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-7-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:17 -06:00
Dylan Yudaken	2ba69707d9	io_uring: clean up io_poll_check_events return values The values returned are a bit confusing, where 0 and 1 have implied meaning, so add some definitions for them. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-6-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Dylan Yudaken	d4e097dae2	io_uring: recycle buffers on error Rather than passing an error back to the user with a buffer attached, recycle the buffer immediately. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Dylan Yudaken	5702196e7d	io_uring: allow iov_len = 0 for recvmsg and buffer select When using BUFFER_SELECT there is no technical requirement that the user actually provides iov, and this removes one copy_from_user call. So allow iov_len to be 0. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-4-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Dylan Yudaken	32f3c434b1	io_uring: restore bgid in io_put_kbuf Attempt to restore bgid. This is needed when recycling unused buffers as the next time around it will want the correct bgid. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Dylan Yudaken	b8c015598c	io_uring: allow 0 length for buffer select If user gives 0 for length, we can set it from the available buffer size. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630091231.1456789-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	6e73dffbb9	io_uring: let to set a range for file slot allocation From recently io_uring provides an option to allocate a file index for operation registering fixed files. However, it's utterly unusable with mixed approaches when for a part of files the userspace knows better where to place it, as it may race and users don't have any sane way to pick a slot and hoping it will not be taken. Let the userspace to register a range of fixed file slots in which the auto-allocation happens. The use case is splittting the fixed table in two parts, where on of them is used for auto-allocation and another for slot-specified operations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66ab0394e436f38437cf7c44676e1920d09687ad.1656154403.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Jens Axboe	e6130eba8a	io_uring: add support for passing fixed file descriptors With IORING_OP_MSG_RING, one ring can send a message to another ring. Extend that support to also allow sending a fixed file descriptor to that ring, enabling one ring to pass a registered descriptor to another one. Arguments are extended to pass in: sqe->addr3 fixed file slot in source ring sqe->file_index fixed file slot in destination ring IORING_OP_MSG_RING is extended to take a command argument in sqe->addr. If set to zero (or IORING_MSG_DATA), it sends just a message like before. If set to IORING_MSG_SEND_FD, a fixed file descriptor is sent according to the above arguments. Two common use cases for this are: 1) Server needs to be shutdown or restarted, pass file descriptors to another onei 2) Backend is split, and one accepts connections, while others then get the fd passed and handle the actual connection. Both of those are classic SCM_RIGHTS use cases, and it's not possible to support them with direct descriptors today. By default, this will post a CQE to the target ring, similarly to how IORING_MSG_DATA does it. If IORING_MSG_RING_CQE_SKIP is set, no message is posted to the target ring. The issuer is expected to notify the receiver side separately. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Jens Axboe	f110ed8498	io_uring: split out fixed file installation and removal Put it with the filetable code, which is where it belongs. While doing so, have the helpers take a ctx rather than an io_kiocb. It doesn't make sense to use a request, as it's not an operation on the request itself. It applies to the ring itself. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	fbb8bb0291	io_uring: remove ctx->refs pinning on enter io_uring_enter() takes ctx->refs, which was previously preventing racing with register quiesce. However, as register now doesn't touch the refs, we can freely kill extra ctx pinning and rely on the fact that we're holding a file reference preventing the ring from being destroyed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a11c57ad33a1be53541fce90669c1b79cf4d8940.1656153286.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	3273c4407a	io_uring: don't check file ops of registered rings Registered rings are per definitions io_uring files, so we don't need to additionally verify them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/425cd64fd885b8e329a46c205ee811987691baaf.1656153286.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	ad8b261d83	io_uring: remove extra TIF_NOTIFY_SIGNAL check io_run_task_work() accounts for TIF_NOTIFY_SIGNAL, so no need to have an second check in io_run_task_work_sig(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/52ce41a592ad904511697f432141e5690fd4b968.1656153285.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	3218e5d32d	io_uring: fuse fallback_node and normal tw node Now as both normal and fallback paths use llist, just keep one node head in struct io_task_work and kill off ->fallback_node. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d04ebde409f7b162fe247b361b4486b193293e46.1656153285.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	37c7bd31b3	io_uring: improve io_fail_links() io_fail_links() is called with ->completion_lock held and for that reason we'd want to keep it as small as we can. Instead of doing __io_req_complete_post() for each linked request under the lock, fail them in a task_work handler under ->uring_lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a2f68708b970a21f4e84ddfa7b3abd67a8fffb27.1656153285.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Jens Axboe	fe991a7688	io_uring: move POLLFREE handling to separate function We really don't care about this at all in terms of performance. Outside of having it already be marked unlikely(), shove it into a separate __cold function. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Hao Xu	795bbbc8a9	io_uring: kbuf: inline io_kbuf_recycle_ring() Make io_kbuf_recycle_ring() inline since it is the fast path of provided buffer. Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220623130126.179232-1-hao.xu@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	49f1c68e04	io_uring: optimise submission side poll_refs The final poll_refs put in __io_arm_poll_handler() takes quite some cycles. When we're arming from the original task context task_work won't be run, so in this case we can assume that we won't race with task_works and so not take the initial ownership ref. One caveat is that after arming a poll we may race with it, so we have to add a bunch of io_poll_get_ownership() hidden inside of io_poll_can_finish_inline() whenever we want to complete arming inline. For the same reason we can't just set REQ_F_DOUBLE_POLL in __io_queue_proc() and so need to sync with the first poll entry by taking its wq head lock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8825315d7f5e182ac1578a031e546f79b1c97d01.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	de08356f48	io_uring: refactor poll arm error handling __io_arm_poll_handler() errors parsing is a horror, in case it failed it returns 0 and the caller is expected to look at ipt.error, which already led us to a number of problems before. When it returns a valid mask, leave it as it's not, i.e. return 1 and store the mask in ipt.result_mask. In case of a failure that can be handled inline return an error code (negative value), and return 0 if __io_arm_poll_handler() took ownership of the request and will complete it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/018cacdaef5fe95d7dc56b32e85d752cab7607f6.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	063a007996	io_uring: change arm poll return values The rules for __io_arm_poll_handler()'s result parsing are complicated, as the first step don't pass return a mask but pass back a positive return code and fill ipt->result_mask. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/529e29e9f97f2e6e383ccd44234d8b576a83a921.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	5204aa8c43	io_uring: add a helper for apoll alloc Extract a helper function for apoll allocation, makes the code easier to read. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2f93282b47dd678e805dd0d7097f66968ced495c.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	13a99017ff	io_uring: remove events caching atavisms Remove events argument from *io_poll_execute(), it's not needed and not used. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/12efd4e15c6a90cf9e5b59807cfcb57852b51dc7.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:16 -06:00
Pavel Begunkov	0638cd7be2	io_uring: clean poll ->private flagging We store a req pointer in wqe->private but also take one bit to mark double poll entries. Replace macro helpers with inline functions for better type checking and also name the double flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9a61240555c64ac0b7a9b0eb59a9efeb638a35a4.1655990418.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Jens Axboe	78a861b949	io_uring: add sync cancelation API through io_uring_register() The io_uring cancelation API is async, like any other API that we expose there. For the case of finding a request to cancel, or not finding one, it is fully sync in that when submission returns, the CQE for both the cancelation request and the targeted request have been posted to the CQ ring. However, if the targeted work is being executed by io-wq, the API can only start the act of canceling it. This makes it difficult to use in some circumstances, as the caller then has to wait for the CQEs to come in and match on the same cancelation data there. Provide a IORING_REGISTER_SYNC_CANCEL command for io_uring_register() that does sync cancelations, always. For the io-wq case, it'll wait for the cancelation to come in before returning. The only expected returns from this API is: 0 Request found and canceled fine. > 0 Requests found and canceled. Only happens if asked to cancel multiple requests, and if the work wasn't in progress. -ENOENT Request not found. -ETIME A timeout on the operation was requested, but the timeout expired before we could cancel. and we won't get -EALREADY via this API. If the timeout value passed in is -1 (tv_sec and tv_nsec), then that means that no timeout is requested. Otherwise, the timespec passed in is the amount of time the sync cancel will wait for a successful cancelation. Link: https://github.com/axboe/liburing/discussions/608 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Jens Axboe	7d8ca72501	io_uring: add IORING_ASYNC_CANCEL_FD_FIXED cancel flag In preparation for not having a request to pass in that carries this state, add a separate cancelation flag that allows the caller to ask for a fixed file for cancelation. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Jens Axboe	88f52eaad2	io_uring: have cancelation API accept io_uring_task directly We just use the io_kiocb passed in to find the io_uring_task, and we already pass in the ctx via cd->ctx anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Hao Xu	024b8fde33	io_uring: kbuf: kill __io_kbuf_recycle() __io_kbuf_recycle() is only called in io_kbuf_recycle(). Kill it and tweak the code so that the legacy pbuf and ring pbuf code become clear Signed-off-by: Hao Xu <howeyxu@tencent.com> Link: https://lore.kernel.org/r/20220622055551.642370-1-hao.xu@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Dylan Yudaken	c6dd763c24	io_uring: trace task_work_run trace task_work_run to help provide stats on how often task work is run and what batch sizes are coming through. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-9-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Dylan Yudaken	3a0c037b0e	io_uring: batch task_work Batching task work up is an important performance optimisation, as task_work_add is expensive. In order to keep the semantics replace the task_list with a fake node while processing the old list, and then do a cmpxchg at the end to see if there is more work. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-6-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Dylan Yudaken	923d159247	io_uring: introduce llist helpers Introduce helpers to atomically switch llist. Will later move this into common code Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Dylan Yudaken	f88262e60b	io_uring: lockless task list With networking use cases we see contention on the spinlock used to protect the task_list when multiple threads try and add completions at once. Instead we can use a lockless list, and assume that the first caller to add to the list is responsible for kicking off task work. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-4-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Dylan Yudaken	c34398a8c0	io_uring: remove __io_req_task_work_add this is no longer needed as there is only one caller Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Dylan Yudaken	ed5ccb3bee	io_uring: remove priority tw list optimisation This optimisation has some built in assumptions that make it easy to introduce bugs. It also does not have clear wins that make it worth keeping. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	024f15e033	io_uring: dedup io_run_task_work We have an identical copy of io_run_task_work() for io-wq called io_flush_signals(), deduplicate them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a157a4df5fa217b8bd03c73494f2fd0e24e44fbc.1655802465.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	a6b21fbb4c	io_uring: move list helpers to a separate file It's annoying to have io-wq.h as a dependency every time we want some of struct io_wq_work_list helpers, move them into a separate file. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c1d891ce12b30767d1d2a3b7db2ca3abc1ecc4a2.1655802465.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	625d38b3fd	io_uring: improve io_run_task_work() Since SQPOLL now uses TWA_SIGNAL_NO_IPI, there won't be task work items without TIF_NOTIFY_SIGNAL. Simplify io_run_task_work() by removing task->task_works check. Even though looks it doesn't cause extra cache bouncing, it's still nice to not touch it an extra time when it might be not cached. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/75d4f34b0c671075892821a409e28da6cb1d64fe.1655802465.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	4a0fef6278	io_uring: optimize io_uring_task layout task_work bits of io_uring_task are split into two cache lines causing extra cache bouncing, place them into a separate cache line. Also move the most used submission path fields closer together, so there are hot. Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	bce5d70cd6	io_uring: add a warn_once for poll_find io_poll_remove() expects poll_find() to search only for poll requests and passes a flag for this. Just be a little bit extra cautious considering lots of recent poll/cancellation changes and add a WARN_ON_ONCE checking that we don't get an apoll'ed request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec9a66f1e22f99dcd02288d4e42f3cc6bb357804.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	9da070b142	io_uring: consistent naming for inline completion Improve naming of the inline/deferred completion helper so it's consistent with it's *_post counterpart. Add some comments and extra lockdeps to ensure the locking is done right. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/797c619943dac06529e9d3fcb16e4c3cde6ad1a3.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	c059f78584	io_uring: move io_import_fixed() Move io_import_fixed() into rsrc.c where it belongs. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4d5becb21f332b4fef6a7cedd6a50e65e2371630.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	f337a84d39	io_uring: opcode independent fixed buf import Fixed buffers are generic infrastructure, make io_import_fixed() opcode agnostic. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b1e765c8a1c2c913a05a28d2399fc53e1d3cf37a.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	46929b0868	io_uring: add io_commit_cqring_flush() Since __io_commit_cqring_flush users moved to different files, introduce io_commit_cqring_flush() helper and encapsulate all flags testing details inside. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0da03887435dd9869ffe46dcd3962bf104afcca3.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:15 -06:00
Pavel Begunkov	253993210b	io_uring: introduce locking helpers for CQE posting spin_lock(&ctx->completion_lock); /* post CQEs */ io_commit_cqring(ctx); spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); We have many places repeating this sequence, and the three function unlock section is not perfect from the maintainance perspective and also makes it harder to add new locking/sync trick. Introduce two helpers. io_cq_lock(), which is simple and only grabs ->completion_lock, and io_cq_unlock_post() encapsulating the three call section. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fe0c682bf7f7b55d9be55b0d034be9c1949277dc.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	305bef9887	io_uring: hide eventfd assumptions in eventfd paths Some io_uring-eventfd users assume that there won't be spurious wakeups. That assumption has to be honoured by all io_cqring_ev_posted() callers, which is inconvenient and from time to time leads to problems but should be maintained to not break the userspace. Instead of making the callers track whether a CQE was posted or not, hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw last time and triggers the eventfd only when ->cached_cq_tail changed since then. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ffc66bae37a2513080b601e4370e147faaa72c5.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	b321823a03	io_uring: fix io_poll_remove_all clang warnings clang complains on bitwise operations with bools, add a bit more verbosity to better show that we want to call io_poll_remove_all_table() twice but with different arguments. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f11d21dcdf9233e0eeb15fa13b858a05a78eb310.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	ba3cdb6fbb	io_uring: improve task exit timeout cancellations Don't spin trying to cancel timeouts that are reachable but not cancellable, e.g. already executing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ab8a7440a60bbdf69ae514f672ad050e43dd1b03.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	affa87db90	io_uring: fix multi ctx cancellation io_uring_try_cancel_requests() loops until there is nothing left to do with the ring, however there might be several rings and they might have dependencies between them, e.g. via poll requests. Instead of cancelling rings one by one, try to cancel them all and only then loop over if we still potenially some work to do. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8d491fe02d8ac4c77ff38061cf86b9a827e8845c.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	d9dee4302a	io_uring: remove ->flush_cqes optimisation It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often ->flush_cqes flag prevents from completion being flushed. Sometimes it's high level of concurrency that enables it at least for one CQE, but sometimes it doesn't save much because nobody waiting on the CQ. Remove ->flush_cqes flag and the optimisation, it should benefit the normal use case. Note, that there is no spurious eventfd problem with that as checks for spuriousness were incorporated into io_eventfd_signal(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com [axboe: remove now dead state->flush_cqes variable] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	a830ffd287	io_uring: move io_eventfd_signal() Move io_eventfd_signal() in the sources without any changes and kill its forward declaration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9ebebb3f6f56f5a5448a621e0b6a537720c43334.1655637157.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	9046c6415b	io_uring: reshuffle io_uring/io_uring.h It's a good idea to first do forward declarations and then inline helpers, otherwise there will be keep stumbling on dependencies between them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1d7fa6672ed43f20ccc0c54ae201369ebc3ebfab.1655637157.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	d142c3ec8d	io_uring: remove extra io_commit_cqring() We don't post events in __io_commit_cqring_flush() anymore but send all requests to tw, so no need to do io_commit_cqring() there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f2481e32375e749be89c42e4804268b608722cef.1655637157.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Jens Axboe	ad163a7e25	io_uring: move a few private types to local headers Commit 3a3d47fa9cfd ("io_uring: make io_uring_types.h public") moved a bunch of io_uring types to a kernel wide header, so we could make tracing a bit saner rather than pass in a ton of arguments. However, there are a few types in there that are not really needed to be system wide. Move the cancel data and mapped buffers back to the appropriate io_uring local headers. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	48863ffd3e	io_uring: clean up tracing events We have lots of trace events accepting an io_uring request and wanting to print some of its fields like user_data, opcode, flags and so on. However, as trace points were unaware of io_uring structures, we had to pass all the fields as arguments. Teach trace/events/io_uring.h about struct io_kiocb and stop the misery of passing a horde of arguments to trace helpers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40ff72f92798114e56d400f2b003beb6cde6ef53.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	ab1c84d855	io_uring: make io_uring_types.h public Move io_uring types to linux/include, need them public so tracing can see the definitions and we can clean trace/events/io_uring.h Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a15f12e8cb7289b2de0deaddcc7518d98a132d17.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	27a9d66fec	io_uring: kill extra io_uring_types.h includes io_uring/io_uring.h already includes io_uring_types.h, no need to include it every time. Kill it in a bunch of places, it prepares us for following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	b3659a65be	io_uring: change ->cqe_cached invariant for CQE32 With IORING_SETUP_CQE32 ->cqe_cached doesn't store a real address but rather an implicit offset into cqes. Store the real cqe pointer and increment it accordingly if CQE32. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ee1838cba16bed96381a006950b36ba640d998c.1655455613.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	e8c328c391	io_uring: deduplicate io_get_cqe() calls Deduplicate calls to io_get_cqe() from __io_fill_cqe_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4fa077986cc3abab7c59ff4e7c390c783885465f.1655455613.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	ae5735c69b	io_uring: deduplicate __io_fill_cqe_req tracing Deduplicate two trace_io_uring_complete() calls in __io_fill_cqe_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/277ed85dba5189ab7d932164b314013a0f0b0fdc.1655455613.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00
Pavel Begunkov	68494a65d0	io_uring: introduce io_req_cqe_overflow() __io_fill_cqe_req() is hot and inlined, we want it to be as small as possible. Add io_req_cqe_overflow() accepting only a request and doing all overflow accounting, and replace with it two calls to 6 argument io_cqring_event_overflow(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/048b9fbcce56814d77a1a540409c98c3d383edcb.1655455613.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-24 18:39:14 -06:00

1 2 3 4 5

236 Коммитов