WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Christoph Hellwig	340e845738	block: delay freeing the gendisk blkdev_get_no_open acquires a reference to the block_device through the block device inode and then tries to acquire a device model reference to the gendisk. But at this point the disk migh already be freed (although the race is free). Fix this by only freeing the gendisk from the whole device bdevs ->free_inode callback as well. Fixes: `22ae8ce8b8` ("block: simplify bdev/disk lookup in blkdev_get") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210722075402.983367-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-27 19:35:47 -06:00
Tejun Heo	5ab189cf3a	blk-iocost: fix operation ordering in iocg_wake_fn() iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it wants the wq_entry to be always removed whether it ended up waking the task or not. finish_wait() tests whether wq_entry needs removal without grabbing the wait_queue lock and expects the waker to use list_del_init_careful() after all waking operations are complete, which iocg_wake_fn() didn't do. The operation order was wrong and the regular list_del_init() was used. The result is that if a waiter wakes up racing the waker, it can free pop the wq_entry off stack before the waker is still looking at it, which can lead to a backtrace like the following. [7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP ... [7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0 ... [7312084.858314] Call Trace: [7312084.863548] _raw_spin_lock_irqsave+0x22/0x30 [7312084.872605] try_to_wake_up+0x4c/0x4f0 [7312084.880444] iocg_wake_fn+0x71/0x80 [7312084.887763] __wake_up_common+0x71/0x140 [7312084.895951] iocg_kick_waitq+0xe8/0x2b0 [7312084.903964] ioc_rqos_throttle+0x275/0x650 [7312084.922423] __rq_qos_throttle+0x20/0x30 [7312084.930608] blk_mq_make_request+0x120/0x650 [7312084.939490] generic_make_request+0xca/0x310 [7312084.957600] submit_bio+0x173/0x200 [7312084.981806] swap_readpage+0x15c/0x240 [7312084.989646] read_swap_cache_async+0x58/0x60 [7312084.998527] swap_cluster_readahead+0x201/0x320 [7312085.023432] swapin_readahead+0x2df/0x450 [7312085.040672] do_swap_page+0x52f/0x820 [7312085.058259] handle_mm_fault+0xa16/0x1420 [7312085.066620] do_page_fault+0x2c6/0x5c0 [7312085.074459] page_fault+0x2f/0x40 Fix it by switching to list_del_init_careful() and putting it at the end. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rik van Riel <riel@surriel.com> Fixes: `7caa47151a` ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-27 19:25:37 -06:00
John Garry	b93af3055d	blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling If the blk_mq_sched_alloc_tags() -> blk_mq_alloc_rqs() call fails, then we call blk_mq_sched_free_tags() -> blk_mq_free_rqs(). It is incorrect to do so, as any rqs would have already been freed in the blk_mq_alloc_rqs() call. Fix by calling blk_mq_free_rq_map() only directly. Fixes: `6917ff0b5b` ("blk-mq-sched: refactor scheduler initialization") Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1627378373-148090-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-27 16:44:38 -06:00
Tetsuo Handa	3ce6e1f662	loop: reintroduce global lock for safe loop_validate_file() traversal Commit `6cc8e74308` ("loop: scale loop device by introducing per device lock") re-opened a race window for NULL pointer dereference at loop_validate_file() where commit `310ca162d7` ("block/loop: Use global lock for ioctl() operation.") has closed. Although we need to guarantee that other loop devices will not change during traversal, we can't take remote "struct loop_device"->lo_mutex inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore, introduce a global lock dedicated for loop_validate_file() which is conditionally taken before local "struct loop_device"->lo_mutex is taken. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: `6cc8e74308` ("loop: scale loop device by introducing per device lock") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-23 10:18:25 -06:00
Jens Axboe	7054133da3	nvme fixes for Linux 5.14: - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmD5vv0LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYPesQ/8DcdEwxWf7BhPhnPtAn9MIcX4ZdpMd99+88nlXQ1p 3Ysoqc/u79Wuh3Z42mKSg/csU888tzI3brjijIt6+NGPnPjiNVtFdYlIATke1mJp crrSGTXANFsVj3qIJWn5otUp2tXA1TMcPPcjvogPb+qYAOLkGqtUqLUpsZqW5NkZ 4fEj8+7Cfe1wPwCRQSTIDWDOR5R+FTR3/zmfxdmzudvxslRHD//8rlB/XPwamzEt 1dADaAeZobExJYhGB3HhQH9JbXmnDqQZHZWycYx8VqkigKcLtkcZ6ymTsqYXA65t 3g13yERWosly+qBJ1u1z9I60FSKZn3+KlilQLRW2ykWixgmgx/nPUARIY2g72zgA apsKzMIlUeQu0qmAhcO+dsUsPmRiqrEFoJh3lRXfOlQf6xAgPpy24089LRvkC8oG dx+xsIh5TVXIY8Ipa37zYGoutiQ0ebZVTC/i9ZamU27cvV6HeeI0vDublJe6QCB6 W7j9PsYiUxyCBR2Fb43ntdDlrqJRRSLpC1r5sNv31NefUIfKCKmC4d2A0Cmapg5K Tm4AeKiUhyI7kVTf3BlBVKuBFbEN7hB+O9pk6PF/fpEKmv9MFRXnxdj2BBMx45Pg q97NfSS3AUUO9K/HMGw5VKqn6HuSzXyxNkIpu8Xk3/d4Gt20GfSQL70EYt7+K5mg j0I= =8oM5 -----END PGP SIGNATURE----- Merge tag 'nvme-5.14-2021-07-22' of git://git.infradead.org/nvme into block-5.14 Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.14: - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng)" * tag 'nvme-5.14-2021-07-22' of git://git.infradead.org/nvme: nvme: set the PRACT bit when using Write Zeroes with T10 PI nvme: fix nvme_setup_command metadata trace event nvme: fix refcounting imbalance when all paths are down nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING	2021-07-22 14:23:55 -06:00
Christoph Hellwig	aaeb7bb061	nvme: set the PRACT bit when using Write Zeroes with T10 PI When using Write Zeroes on a namespace that has protection information enabled they behavior without the PRACT bit counter-intuitive and will generally lead to validation failures when reading the written blocks. Fix this by always setting the PRACT bit that generates matching PI data on the fly. Fixes: `6e02318eae` ("nvme: add support for the Write Zeroes command") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-07-21 17:24:10 +02:00
Keith Busch	234211b8dd	nvme: fix nvme_setup_command metadata trace event The metadata address is set after the trace event, so the trace is not capturing anything useful. Rather than logging the memory address, it's useful to know if the command carries a metadata payload, so change the trace event to log that true/false state instead. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-07-21 09:55:44 +02:00
Hannes Reinecke	5396fdac56	nvme: fix refcounting imbalance when all paths are down When the last path to a ns_head drops the current code removes the ns_head from the subsystem list, but will only delete the disk itself if the last reference to the ns_head drops. This is causing an refcounting imbalance eg when applications have a reference to the disk, as then they'll never get notified that the disk is in fact dead. This patch moves the call 'del_gendisk' into nvme_mpath_check_last_path(), ensuring that the disk can be properly removed and applications get the appropriate notifications. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-07-21 09:55:40 +02:00
Zhihao Cheng	7764656b10	nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING Followling process: nvme_probe nvme_reset_ctrl nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) queue_work(nvme_reset_wq, &ctrl->reset_work) --------------> nvme_remove nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING) worker_thread process_one_work nvme_reset_work WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING) , which will trigger WARN_ON in nvme_reset_work(): [ 127.534298] WARNING: CPU: 0 PID: 139 at drivers/nvme/host/pci.c:2594 [ 127.536161] CPU: 0 PID: 139 Comm: kworker/u8:7 Not tainted 5.13.0 [ 127.552518] Call Trace: [ 127.552840] ? kvm_sched_clock_read+0x25/0x40 [ 127.553936] ? native_send_call_func_single_ipi+0x1c/0x30 [ 127.555117] ? send_call_function_single_ipi+0x9b/0x130 [ 127.556263] ? __smp_call_single_queue+0x48/0x60 [ 127.557278] ? ttwu_queue_wakelist+0xfa/0x1c0 [ 127.558231] ? try_to_wake_up+0x265/0x9d0 [ 127.559120] ? ext4_end_io_rsv_work+0x160/0x290 [ 127.560118] process_one_work+0x28c/0x640 [ 127.561002] worker_thread+0x39a/0x700 [ 127.561833] ? rescuer_thread+0x580/0x580 [ 127.562714] kthread+0x18c/0x1e0 [ 127.563444] ? set_kthread_struct+0x70/0x70 [ 127.564347] ret_from_fork+0x1f/0x30 The preceding problem can be easily reproduced by executing following script (based on blktests suite): test() { pdev="$(_get_pci_dev_from_blkdev)" sysfs="/sys/bus/pci/devices/${pdev}" for ((i = 0; i < 10; i++)); do echo 1 > "$sysfs/remove" echo 1 > /sys/bus/pci/rescan done } Since the device ctrl could be updated as an non-RESETTING state by repeating probe/remove in userspace (which is a normal situation), we can replace stack dumping WARN_ON with a warnning message. Fixes: `82b057caef` ("nvme-pci: fix multiple ctrl removal schedulin") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>	2021-07-21 09:55:40 +02:00
Oleksandr Natalenko	ec645dc966	block: increase BLKCG_MAX_POLS After mq-deadline learned to deal with cgroups, the BLKCG_MAX_POLS value became too small for all the elevators to be registered properly. The following issue is seen: ``` calling bfq_init+0x0/0x8b @ 1 blkcg_policy_register: BLKCG_MAX_POLS too small initcall bfq_init+0x0/0x8b returned -28 after 507 usecs ``` which renders BFQ non-functional. Increase BLKCG_MAX_POLS to allow enough space for everyone. Fixes: `08a9ad8bf6` ("block/mq-deadline: Add cgroup support") Link: https://lore.kernel.org/lkml/8988303.mDXGIdCtx8@natalenko.name/ Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/20210717123328.945810-1-oleksandr@natalenko.name Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-17 13:07:24 -06:00
Christoph Hellwig	05d69d950d	xen-blkfront: sanitize the removal state machine xen-blkfront has a weird protocol where close message from the remote side can be delayed, and where hot removals are treated somewhat differently from regular removals, all leading to potential NULL pointer removals, and a del_gendisk from the block device release method, which will deadlock. Fix this by just performing normal hot removals even when the device is opened like all other Linux block drivers. Fixes: `c76f48eb5c` ("block: take bd_mutex around delete_partitions in del_gendisk") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20210715141711.1257293-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-15 09:32:34 -06:00
Jens Axboe	a347c153b1	nvme fixes for Linux 5.14 - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmDwQcQLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYO1Rw/+MCBsOWNPnlbX3IN4wnk10ySXVb1YKkNibn7riBN5 1hhi8hXHBA5h+H/HRy00FAYvi3OjRyUBVoRN7aEQEc9t0q3WWznCTv81kdfG2vX3 ZkAmKGW/J0+Qc4+g7ul2hUcjoYSpMKNtRV/fpmn6TudDW7myXTwMReu/FFzS/XBj tP/F+k9DO9/3GCSxZJEZoPyWmo1hQsNTH03D/d/CyMTEffObMkRfcFEJgkOpdLFZ dFzfE33DbM36t8bU6wAwpXcg/y7X9mdzz2LZONYirIrE9OV+gi31uy4m9WHoXPgY n7WNQymJCFXOEIT0u9yyju+OeJ0DILKKztgG2K369y6svtwPLD2z9Jsg55fPSXPr YCA1oFyfqx/+9CezuiSD3WylJBSwSWAn/en7mlqmrIeCtX49VAx2OaEgsiy04Qll 2i8MXZG2Lc3EzTgr+FemA6OC0ESZj8t5v0NoAlP+5hsE+Y1fpu9g0CLXhNOF6M60 fZ/NvqwV3NQtjdTsMUC+rAEDBasTyTqhhfHdJ9NShm1rssUhqQxzDEF4TghaPG7f NkpmNLUQu3EHr99SJzk269pCZ6dgk3AftJc06kzvwLbBBDfVXTCnhCrQnf9wEgPM 7RPiLjwMLeCk9CLbg0cDtTnEt92SidfIe2mXL/i/0DP8Au5A7/DQhDJ60bWUt5V9 J/Y= =PZvx -----END PGP SIGNATURE----- Merge tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme into block-5.14 Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.14 - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha)" * tag 'nvme-5.14-2021-07-15' of git://git.infradead.org/nvme: nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme-pci: fix multiple races in nvme_setup_io_queues nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE	2021-07-15 09:31:36 -06:00
Wang Qing	16ad3db3b2	nbd: fix order of cleaning up the queue and freeing the tagset We must release the queue before freeing the tagset. Fixes: `4af5f2e030` ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-and-tested-by: syzbot+9ca43ff47167c0ee3466@syzkaller.appspotmail.com Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706040016.1360412-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-15 09:30:15 -06:00
Guoqing Jiang	58b63e0f55	pd: fix order of cleaning up the queue and freeing the tagset We must release the queue before freeing the tagset. Fixes: `262d431f90` ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706010734.1356066-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-15 09:29:22 -06:00
Casey Chen	251ef6f71b	nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme_dev_remove_admin could free dev->admin_q and the admin_tagset while they are being accessed by nvme_dev_disable(), which can be called by nvme_reset_work via nvme_remove_dead_ctrl. Commit `cb4bfda62a` ("nvme-pci: fix hot removal during error handling") intended to avoid requests being stuck on a removed controller by killing the admin queue. But the later fix `c8e9e9b764` ("nvme-pci: unquiesce admin queue on shutdown"), together with nvme_dev_disable(dev, true) right before nvme_dev_remove_admin() could help dispatch requests and fail them early, so we don't need nvme_dev_remove_admin() any more. Fixes: `cb4bfda62a` ("nvme-pci: fix hot removal during error handling") Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-07-13 12:03:20 +02:00
Casey Chen	e4b9852a0f	nvme-pci: fix multiple races in nvme_setup_io_queues Below two paths could overlap each other if we power off a drive quickly after powering it on. There are multiple races in nvme_setup_io_queues() because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit. nvme_reset_work() nvme_remove() nvme_setup_io_queues() nvme_dev_disable() ... ... A1 clear NVMEQ_ENABLED bit for admin queue lock retry: B1 nvme_suspend_io_queues() A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue A3 pci_free_irq_vectors() nvme_pci_disable() A4 nvme_setup_irqs(); B3 pci_free_irq_vectors() ... unlock A5 queue_request_irq() for admin queue set NVMEQ_ENABLED bit ... nvme_create_io_queues() A6 result = queue_request_irq(); set NVMEQ_ENABLED bit ... fail to allocate enough IO queues: A7 nvme_suspend_io_queues() goto retry If B3 runs in between A1 and A2, it will crash if irqaction haven't been freed by A2. B2 is supposed to free admin queue IRQ but it simply can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit. Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit gets cleared. After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3 is checking irqaction. A3 also could race with B2 if B2 is freeing IRQ while A3 is checking irqaction. Fix: A2 and A3 take lock for mutual exclusion. A3 could race with B3 since they could run free_msi_irqs() in parallel. Fix: A3 takes lock for mutual exclusion. A4 could fail to allocate all needed IRQ vectors if A3 and A4 are interrupted by B3. Fix: A4 takes lock for mutual exclusion. If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL. They are just allocated by A5/A6. Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit. A7 could get chance to pci_free_irq() for certain IO queue while B3 is checking irqaction. Fix: A7 takes lock. nvme_dev->online_queues need to be protected by shutdown_lock. Since it is not atomic, both paths could modify it using its own copy. Co-developed-by: Yuanyuan Zhong <yzhong@purestorage.com> Signed-off-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-07-13 11:40:42 +02:00
Prabhakar Kushwaha	8b43ced64d	nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE dev_get_by_name() finds network device by name but it also increases the reference count. If a nvme-tcp queue is present and the network device driver is removed before nvme_tcp, we will face the following continuous log: "kernel:unregister_netdevice: waiting for <eth> to become free. Usage count = 2" And rmmod further halts. Similar case arises during reboot/shutdown with nvme-tcp queue present and both never completes. To fix this, use __dev_get_by_name() which finds network device by name without increasing any reference counter. Fixes: `3ede8f72a9` ("nvme-tcp: allow selecting the network interface for connections") Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> [hch: remove the ->ndev member entirely] Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-07-13 11:34:24 +02:00
Yu Kuai	a731763fc4	blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs We run a test that create millions of cgroups and blkgs, and then trigger blkg_destroy_all(). blkg_destroy_all() will hold spin lock for a long time in such situation. Thus release the lock when a batch of blkgs are destroyed. blkcg_activate_policy() and blkcg_deactivate_policy() might have the same problem, however, as they are basically only called from module init/exit paths, let's leave them alone for now. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210707015649.1929797-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-07 09:36:36 -06:00
Chunguang Xu	d80c228d44	block: fix the problem of io_ticks becoming smaller On the IO submission path, blk_account_io_start() may interrupt the system interruption. When the interruption returns, the value of part->stamp may have been updated by other cores, so the time value collected before the interruption may be less than part-> stamp. So when this happens, we should do nothing to make io_ticks more accurate? For kernels less than 5.0, this may cause io_ticks to become smaller, which in turn may cause abnormal ioutil values. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-07 06:43:20 -06:00
Jens Axboe	c6af8db92b	Merge branch 'nvme-5.14' of git://git.infradead.org/nvme into block-5.14 Pull single NVMe fix from Christoph. * 'nvme-5.14' of git://git.infradead.org/nvme: nvme-tcp: can't set sk_user_data without write_lock	2021-07-07 06:37:36 -06:00
Maurizio Lombardi	0755d3be2d	nvme-tcp: can't set sk_user_data without write_lock The sk_user_data pointer is supposed to be modified only while holding the write_lock "sk_callback_lock", otherwise we could race with other threads and crash the kernel. we can't take the write_lock in nvmet_tcp_state_change() because it would cause a deadlock, but the release_work queue will set the pointer to NULL later so we can simply remove the assignment. Fixes: `b5332a9f3f` ("nvmet-tcp: fix incorrect locking in state_change sk callback") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-07-05 10:14:44 +02:00
Tetsuo Handa	585af8ede7	loop: remove unused variable in loop_set_status() Commit `0384264ea8` ("block: pass a gendisk to bdev_disk_changed") changed to pass lo->lo_disk instead of lo->lo_device. Fixes: `0384264ea8` ("block: pass a gendisk to bdev_disk_changed") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20210702152714.7978-1-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-02 09:28:42 -06:00
Christoph Hellwig	63c38d858e	block: remove the bdgrab in blk_drop_partitions There is no need to hold a bdev reference when removing the partition. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210701081638.246552-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-01 10:21:24 -06:00
Christoph Hellwig	498dcc13fd	block: grab a device refcount in disk_uevent Sending uevents requires the struct device to be alive. To ensure that grab the device refcount instead of just an inode reference. Fixes: `bc359d03c7` ("block: add a disk_uevent helper") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210701081638.246552-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-01 10:21:24 -06:00
Kees Cook	2b7a8dc06d	s390/dasd: Avoid field over-reading memcpy() In preparation for FORTIFY_SOURCE performing compile-time and run-time field array bounds checking for memcpy(), memmove(), and memset(), avoid intentionally reading across neighboring array fields. Add a wrapping structure to serve as the memcpy() source, so the compiler can do appropriate bounds checking, avoiding this future warning: In function '__fortify_memcpy', inlined from 'create_uid' at drivers/s390/block/dasd_eckd.c:749:2: ./include/linux/fortify-string.h:246:4: error: call to '__read_overflow2_field' declared with attribute error: detected read beyond size of field (2nd parameter) Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20210701142221.3408680-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-01 09:27:04 -06:00
Christoph Hellwig	299f2b5fc0	dasd: unexport dasd_set_target_state dasd_set_target_state is only used inside of dasd_mod.ko, so don't export it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20210701142221.3408680-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-07-01 09:27:04 -06:00
Yufen Yu	b5cfbd35ec	block: check disk exist before trying to add partition If disk have been deleted, we should return fail for ioctl BLKPG_DEL_PARTITION. Otherwise, the directory /sys/class/block may remain invalid symlinks file. The race as following: blkdev_open del_gendisk disk->flags &= ~GENHD_FL_UP; blk_drop_partitions blkpg_ioctl bdev_add_partition add_partition device_add device_add_class_symlinks ioctl may add_partition after del_gendisk() have tried to delete partitions. Then, symlinks file will be created. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Link: https://lore.kernel.org/r/20210610023241.3646241-1-yuyufen@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 19:38:48 -06:00
Christoph Hellwig	efee99e68e	ubd: remove dead code in ubd_setup_common Remove some leftovers of the fake major number parsing that cause complains from some compilers. Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210628093937.1325608-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:35:45 -06:00
Keith Busch	ae5e6886b4	nvme: use return value from blk_execute_rq() We don't have an nvme status to report if the driver's .queue_rq() returns an error without dispatching the requested nvme command. Check the return value from blk_execute_rq() for all passthrough commands so the caller may know their command was not successful. If the command is from the target passthrough interface and fails to dispatch, synthesize the response back to the host as a internal target error. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-5-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:35:45 -06:00
Keith Busch	fb9b16e15c	block: return errors from blk_execute_rq() The synchronous blk_execute_rq() had not provided a way for its callers to know if its request was successful or not. Return the blk_status_t result of the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:35:45 -06:00
Keith Busch	be42a33b92	nvme: use blk_execute_rq() for passthrough commands The generic blk_execute_rq() knows how to handle polled completions. Use that instead of implementing an nvme specific handler. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:35:38 -06:00
Keith Busch	c01b5a814e	block: support polling through blk_execute_rq Poll for completions if the request's hctx is a polling type. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-2-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:21 -06:00
Christoph Hellwig	da6269da4c	block: remove REQ_OP_SCSI_{IN,OUT} With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:19 -06:00
Christoph Hellwig	5ec780a6ed	block: mark blk_mq_init_queue_data static All driver uses are gone now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	8e60947d2f	loop: rewrite loop_exit using idr_for_each_entry Use idr_for_each_entry to simplify removing all devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	b984808146	loop: split loop_lookup loop_lookup has two callers - one wants to do the a find by index and the other wants any unbound loop device. Open code the respective functionality in each caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	e5d66a1032	loop: don't allow deleting an unspecified loop device Passing a negative index to loop_lookup while return any unbound device. Doing that for a delete does not make much sense, so add check to explicitly reject that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	18d1f200b3	loop: move loop_ctl_mutex locking into loop_add Move acquiring and releasing loop_ctl_mutex from the callers into loop_add. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	f9d107644a	loop: split loop_control_ioctl Split loop_control_ioctl into a helper for each command. This keeps the code nicely separated for the upcoming locking changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	4157fe0b3d	loop: don't call loop_lookup before adding a loop device loop_add returns the right error if the slot wasn't available. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	d6da83d072	loop: remove the l argument to loop_add None of the callers cares about the allocated struct loop_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	bd5c39edad	loop: reduce loop_ctl_mutex coverage in loop_exit loop_ctl_mutex is only needed to iterate the IDR for removing the loop devices, so reduce the coverage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	8b52d8be86	loop: reorder loop_exit Unregister the misc and blockdevice first to prevent further access, and only then iterate to remove the devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:13 -06:00
Christoph Hellwig	1033d103a9	mmc: initialized disk->minors Fix a let hunk from the blk_mq_alloc_disk conversion. Fixes: 281ea6a5bfdc ("mmc: switch to blk_mq_alloc_disk") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210621080144.3655131-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00
Christoph Hellwig	607d968a57	mmc: switch to blk_mq_alloc_disk Use the blk_mq_alloc_disk to allocate the request_queue and gendisk together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210616053934.880951-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00
Christoph Hellwig	249cda3325	mmc: remove an extra blk_{get,put}_queue pair The gendisk already acquires a reference to the queue when add_disk is called, which dropped on put_disk. So remove the superflous extra refcounting. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210616053934.880951-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00
Prasanna Kumar Kalever	6497ef8df5	nbd: provide a way for userspace processes to identify device backends Problem: On reconfigure of device, there is no way to defend if the backend storage is matching with the initial backend storage. Say, if an initial connect request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated. A next reconfigure request within NBD_ATTR_DEAD_CONN_TIMEOUT is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -9 rbd-nbd $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: Provide a way for userspace processes to keep some metadata to identify between the device and the backend, so that when a reconfigure request is made, we can compare and avoid such dangerous operations. With this solution, as part of the initial connect request, backend path can be stored in the sysfs per device config, so that on a reconfigure request it's easy to check if the backend path matches with the initial connect backend path. Please note, ioctl interface to nbd will not have these changes, as there won't be any reconfigure. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210429102828.31248-1-prasanna.kalever@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00
Christoph Hellwig	35efb594c3	ubd: use blk_mq_alloc_disk and blk_cleanup_disk Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060759.3965724-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00
Christoph Hellwig	7eb90f7e90	ubd: remove the code to register as the legacy IDE driver With the legacy IDE driver long deprecated, and modern userspace being much more flexible about dev_t assignments there is no reason to fake a registration as the legacy IDE driver in ubd. This registeration is a little problematic as it registers the same request_queue for multiple gendisks, so just remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210614060759.3965724-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00
Christoph Hellwig	2f43dbf3a7	null_blk: remove an unused variable assignment in null_add_dev Fix up the recent blk_alloc_disk conversion. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060231.3965278-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-30 15:34:04 -06:00

1 2 3 4 5 ...

1018568 Коммитов Все ветки Поиск

1018568 Коммитов

Все ветки