WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Tom Rix	892c7a77f6	dm dust: remove h from printk format specifier See Documentation/core-api/printk-formats.rst. commit `cbacb5ab0a` ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") Standard integer promotion is already done and %hx and %hhx is useless so do not encourage the use of %hh[xudi] or %h[xudi]. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-02-03 10:10:04 -05:00
Christoph Hellwig	a42e0d70c5	md: use rdev_read_only in restart_array Make the read-only check in restart_array identical to the other two read-only checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-02-01 09:35:20 -07:00
Christoph Hellwig	d7a4783883	md: check for NULL ->meta_bdev before calling bdev_read_only ->meta_bdev is optional and not set for most arrays. Add a rdev_read_only helper that calls bdev_read_only for both devices in a safe way. Fixes: `6f0d9689b6` ("block: remove the NULL bdev check in bdev_read_only") Reported-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-02-01 09:35:20 -07:00
Linus Torvalds	2ba1c4d1a4	block-5.11-2021-01-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAUXQsQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgppO4EAClcqoneAuhT4UvRVNxblXPhPaoC69aNgXd s+34uQSCqeWrWIAokfKp8bh3kyRqe00591auA7DwtwNqGpWuIECX8o9QvROEkuxv 0o4JFGMTHOJKP1W79Oy3RpF5oee6rMMOQN7EFL272p2xd8NRCP33c4fKvJRz+DDE 0kCcZhVjca0nZ+9OJC+WAlV+dit3azCAKSp7cItJsdOgZL74ZcGECm0pA8RpStyi tQrUr2yiHLkm1lcOYfid0fG2/5a4vAGZQav+EshOWYw9UGeMquq/aqPuZZtEUjKe oEECACfJ9cWErsi1CirIk5j5RKHOHmFSG3kRAmyvFB4f3YDGYxerI7eodWjNA0d5 38wW96sWuV4l0ShPmD3jGWIDTTcDZh4nEImCObf5YJFbr2fQXofWVWseIyo0zG8Y zDa1N/M7XgkrScX8OF33NC1uv/oExhHA7jXuQN6mRBESYjcCrH2Lf6mXAA2C8u4T z1RaG7ckRXGSbV3ol1ROrHj0RTXQ3zeIHj3yMRU8TKH0z6s+ob46D2PZCLi6cLvI IuELhzKsS1EzMSVsYk9/AegynWFjVCRJoVUVxTsrxfGEF7attwmur3lOAjbZwSWb jXlRbrkgBL1Pwbjg8AODEoq0jJgVM/S/3fG2rpcYLwwYC+FQ73/K+URmEuMsqkFC GrYllTSMFg== =hb7W -----END PGP SIGNATURE----- Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "All over the place fixes for this release: - blk-cgroup iteration teardown resched fix (Baolin) - NVMe pull request from Christoph: - add another Write Zeroes quirk (Chaitanya Kulkarni) - handle a no path available corner case (Daniel Wagner) - use the proper RCU aware list_add helper (Chao Leng) - bcache regression fix (Coly) - bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12, but for now, we'll make it IRQ safe (Damien) - null_blk zoned init fix (Damien) - add_partition() error handling fix (Dinghao) - s390 dasd kobject fix (Jan) - nbd fix for freezing queue while adding connections (Josef) - tag queueing regression fix (Ming) - revert of a patch that inadvertently meant that we regressed write performance on raid (Maxim)" * tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block: null_blk: cleanup zoned mode initialization nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head nvme-multipath: Early exit if no path is available nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES block: fix bd_size_lock use blk-cgroup: Use cond_resched() when destroy blkgs Revert "block: simplify set_init_blocksize" to regain lost performance nbd: freeze the queue while we're adding connections s390/dasd: Fix inconsistent kobject removal block: Fix an error handling in add_partition blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue	2021-01-29 13:50:06 -08:00
Coly Li	0df28cad06	bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES For super block version < BCACHE_SB_VERSION_CDEV_WITH_FEATURES, it doesn't make sense to check the feature sets. This patch checks super block version in bch_has_feature_* routines, if the version doesn't have feature sets yet, returns 0 (false) to the caller. Fixes: `5342fd4255` ("bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET") Fixes: `ffa4703275` ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Cc: stable@vger.kernel.org # 5.9+ Reported-and-tested-by: Bockholdt Arne <a.bockholdt@precitec-optronik.de> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-28 07:35:07 -07:00
Christoph Hellwig	e82ed3a4fb	md/raid6: refactor raid5_read_one_chunk Refactor raid5_read_one_chunk so that all simple checks are done before allocating the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Christoph Hellwig	6a59656968	md: remove md_bio_alloc_sync md_bio_alloc_sync is never called with a NULL mddev, and ->sync_set is initialized in md_run, so it always must be initialized as well. Just open code the remaining call to bio_alloc_bioset. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Christoph Hellwig	32637385b8	md: simplify sync_page_io Use an on-stack bio and biovec for the single page synchronous I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Christoph Hellwig	a78f18da66	md: remove bio_alloc_mddev bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is initialized in md_run, so it always must be initialized as well. Just open code the remaining call to bio_alloc_bioset. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Christoph Hellwig	a587daa064	dm-clone: use blkdev_issue_flush in commit_metadata Use blkdev_issue_flush instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Christoph Hellwig	c6bf3f0e25	block: use an on-stack bio in blkdev_issue_flush There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Christoph Hellwig	f65b95fe0c	bcache: use bio_set_dev to assign ->bi_bdev Always use the bio_set_dev helper to assign ->bi_bdev to make sure other state related to the device is uptodate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-26 08:50:01 -07:00
Ming Lei	faa8e2c4fb	bcache: don't pass BIOSET_NEED_BVECS for the 'bio_set' embedded in 'cache_set' This bioset is just for allocating bio only from bio_next_split, and it needn't bvecs, so remove the flag. Cc: linux-bcache@vger.kernel.org Cc: Coly Li <colyli@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-24 21:24:09 -07:00
Christoph Hellwig	99dfc43ecb	block: use ->bi_bdev for bio based I/O accounting Rework the I/O accounting for bio based drivers to use ->bi_bdev. This means all drivers can now simply use bio_start_io_acct to start accounting, and it will take partitions into account automatically. To end I/O account either bio_end_io_acct can be used if the driver never remaps I/O to a different device, or bio_end_io_acct_remapped if the driver did remap the I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-24 18:17:20 -07:00
Christoph Hellwig	309dca309f	block: store a block_device pointer in struct bio Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-24 18:17:20 -07:00
Christoph Hellwig	1e0dcca9e1	dm: use bdev_read_only to check if a device is read-only dm-thin and dm-cache also work on partitions, so use the proper interface to check if the device is read-only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-24 18:15:57 -07:00
Linus Torvalds	a692a610d7	block-5.11-2021-01-24 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmANwMUQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpr1PEACPVkVvLlxH2YfnBNykDn+tV+goOs37oJkB 0XR+flcZAHHycLvWnix4AVca4ebTU4diwrFMM4gB01mzzoPYhicHefNuhPV4abYO qq6SCcpZkBfbjYqfB+Fmo0823UXaGUqX/oaxbYwePjkMbFjW6kQEThEHGH07CmWA s5VfOSn695hXLUBpKwsj5m88NohP4tSZMm+VE2RvycdVt2uzJuga1aDDAFPfZFRA YHQyhIEUClYl3eC3Yo5E32nBezrRCJtumRZKmQHMCBXGQ2Z9OfgalD8wz4zBb/D0 ypzr68M27coQIQ9qNBruHuOfnjvwy4jwB0Eci7bGHfKUiVwUDiLD8TVqnQzcwxR6 VNm4RbCEazsfZ33ztk7iiKHijesJ5wHlaDNlL1xBxpNajqOVv3T65kmaETarhUAQ h/EgHUFzYrzy9Y9ZpuClAE3LSk9gV3EzFuWmgSvwSY99TPNCpkG6raXTdmRzj6fV ZFz8L7AlXnbBjREfK8x4lB3W1T16zAQpiPrGYIfQsSIAeBbbyxGIo/H5wdCZwQBj 0k5+UoxWT8oYK2C6xdK7elDN6jT4BaC4p5IZBhlVXCIh+VUD5+Ol6/2pFRr6qRWt giXITDzuVMECTtj8gNhf/P209kG9eSglYfXFRr334GFirSMfN8a22BB3tabp+dDF GzCHw+zZuA== =Bblh -----END PGP SIGNATURE----- Merge tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - fix a status code in nvmet (Chaitanya Kulkarni) - avoid double completions in nvme-rdma/nvme-tcp (Chao Leng) - fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen) - fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar) - fix a double DMA unmap in nvme-pci - lightnvm error path leak fix (Pan) - MD pull request from Song: - Flush request fix (Xiao) * tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block: lightnvm: fix memory leak when submit fails nvme-pci: fix error unwind in nvme_map_data nvme-pci: refactor nvme_unmap_data md: Set prev_flush_start and flush_bio in an atomic way nvmet: set right status on error in id-ns handler nvme-pci: allow use of cmb on v1.4 controllers nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout nvme: check the PRINFO bit before deciding the host buffer length	2021-01-24 12:24:35 -08:00
Hannes Reinecke	809b1e4945	dm: avoid filesystem lookup in dm_get_dev_t() This reverts commit `644bda6f34` ("dm table: fall back to getting device using name_to_dev_t()") dm_get_dev_t() is just used to convert an arbitrary 'path' string into a dev_t. It doesn't presume that the device is present; that check will be done later, as the only caller is dm_get_device(), which does a dm_get_table_device() later on, which will properly open the device. So if the path string already _is_ in major:minor representation we can convert it directly, avoiding a recursion into the filesystem to lookup the block device. This avoids a hang in multipath_message() when the filesystem is inaccessible. Fixes: `644bda6f34` ("dm table: fall back to getting device using name_to_dev_t()") Cc: stable@vger.kernel.org Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-21 15:06:45 -05:00
Ignat Korchagin	004b8ae9e2	dm crypt: fix copy and paste bug in crypt_alloc_req_aead In commit `d68b29584c` ("dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq") code was incorrectly copy and pasted from crypt_alloc_req_skcipher()'s crypto request allocation code to crypt_alloc_req_aead(). It is OK from runtime perspective as both simple encryption request pointer and AEAD request pointer are part of a union, but may confuse code reviewers. Fixes: `d68b29584c` ("dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq") Cc: stable@vger.kernel.org # v5.9+ Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-21 15:06:44 -05:00
Mikulas Patocka	5c02406428	dm integrity: conditionally disable "recalculate" feature Otherwise a malicious user could (ab)use the "recalculate" feature that makes dm-integrity calculate the checksums in the background while the device is already usable. When the system restarts before all checksums have been calculated, the calculation continues where it was interrupted even if the recalculate feature is not requested the next time the dm device is set up. Disable recalculating if we use internal_hash or journal_hash with a key (e.g. HMAC) and we don't have the "legacy_recalculate" flag. This may break activation of a volume, created by an older kernel, that is not yet fully recalculated -- if this happens, the user should add the "legacy_recalculate" flag to constructor parameters. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Daniel Glockner <dg@emlix.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-21 15:05:21 -05:00
Mikulas Patocka	2d06dfecb1	dm integrity: fix a crash if "recalculate" used without "internal_hash" Recalculate can only be specified with internal_hash. Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-21 14:41:43 -05:00
Xiao Ni	dc5d17a3c3	md: Set prev_flush_start and flush_bio in an atomic way One customer reports a crash problem which causes by flush request. It triggers a warning before crash. /* new request after previous flush is completed */ if (ktime_after(req_start, mddev->prev_flush_start)) { WARN_ON(mddev->flush_bio); mddev->flush_bio = bio; bio = NULL; } The WARN_ON is triggered. We use spin lock to protect prev_flush_start and flush_bio in md_flush_request. But there is no lock protection in md_submit_flush_data. It can set flush_bio to NULL first because of compiler reordering write instructions. For example, flush bio1 sets flush bio to NULL first in md_submit_flush_data. An interrupt or vmware causing an extended stall happen between updating flush_bio and prev_flush_start. Because flush_bio is NULL, flush bio2 can get the lock and submit to underlayer disks. Then flush bio1 updates prev_flush_start after the interrupt or extended stall. Then flush bio3 enters in md_flush_request. The start time req_start is behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK again. INIT_WORK() will re-initialize the list pointers in the work_struct, which then can result in a corrupted work list and the work_struct queued a second time. With the work list corrupted, it can lead in invalid work items being used and cause a crash in process_one_work. We need to make sure only one flush bio can be handled at one same time. So add spin lock in md_submit_flush_data to protect prev_flush_start and flush_bio in an atomic way. Reviewed-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2021-01-20 08:18:10 -08:00
Linus Torvalds	1d94330a43	- Fix DM-raid's raid1 discard limits so discards work. - Select missing Kconfig dependencies for DM integrity and zoned targets. - 4 fixes for DM crypt target's support to optionally bypass kcryptd workqueues. - Fix DM snapshot merge supports missing data flushes before committing metadata. - Fix DM integrity data device flushing when external metadata is used. - Fix DM integrity's maximum number of supported constructor arguments that user can request when creating an integrity device. - Eliminate DM core ioctl logging noise when an ioctl is issued without required CAP_SYS_RAWIO permission. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmACJwsTHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWr93B/sFpKbCjRxZhGdOz8u2UIV9jGoSzAaZ 3qdSTruCp1HMTsz0J/+bFBTMQKKap+8u8kzhCOLAqtZgEz13P0D3NyDUGiCxj07O vPZJQ3COrClknq5byKws/aOKcnbr25odk6Zq+gFmk47pdNFBMkzNMf0IbdVJsj19 d0tnTRkIiycC8hnXOr/tHufnmERrw9nZpq3cLEHT6vaXc4FxcYXbo4UEMIwKjgcB AnsRrqpqKyayG+G/SEs01JM9eYznWRPqf3L03wTWfieAsFjDoIp/ek8iXWjEwLtZ BnhxFwodWvycFb9ITaxlFgfyiBSgU1f3tsIYjU4MHCSFI6u5b83y57Dl =FnV7 -----END PGP SIGNATURE----- Merge tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM-raid's raid1 discard limits so discards work. - Select missing Kconfig dependencies for DM integrity and zoned targets. - Four fixes for DM crypt target's support to optionally bypass kcryptd workqueues. - Fix DM snapshot merge supports missing data flushes before committing metadata. - Fix DM integrity data device flushing when external metadata is used. - Fix DM integrity's maximum number of supported constructor arguments that user can request when creating an integrity device. - Eliminate DM core ioctl logging noise when an ioctl is issued without required CAP_SYS_RAWIO permission. * tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm crypt: defer decryption to a tasklet if interrupts disabled dm integrity: fix the maximum number of arguments dm crypt: do not call bio_endio() from the dm-crypt tasklet dm integrity: fix flush with external metadata device dm: eliminate potential source of excessive kernel log noise dm snapshot: flush merged data before committing metadata dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq dm crypt: do not wait for backlogged crypto request completion in softirq dm zoned: select CONFIG_CRC32 dm integrity: select CRYPTO_SKCIPHER dm raid: fix discard limits for raid1	2021-01-15 18:01:17 -08:00
Ignat Korchagin	c87a95dc28	dm crypt: defer decryption to a tasklet if interrupts disabled On some specific hardware on early boot we occasionally get: [ 1193.920255][ T0] BUG: sleeping function called from invalid context at mm/mempool.c:381 [ 1193.936616][ T0] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/69 [ 1193.953233][ T0] no locks held by swapper/69/0. [ 1193.965871][ T0] irq event stamp: 575062 [ 1193.977724][ T0] hardirqs last enabled at (575061): [<ffffffffab73f662>] tick_nohz_idle_exit+0xe2/0x3e0 [ 1194.002762][ T0] hardirqs last disabled at (575062): [<ffffffffab74e8af>] flush_smp_call_function_from_idle+0x4f/0x80 [ 1194.029035][ T0] softirqs last enabled at (575050): [<ffffffffad600fd2>] asm_call_irq_on_stack+0x12/0x20 [ 1194.054227][ T0] softirqs last disabled at (575043): [<ffffffffad600fd2>] asm_call_irq_on_stack+0x12/0x20 [ 1194.079389][ T0] CPU: 69 PID: 0 Comm: swapper/69 Not tainted 5.10.6-cloudflare-kasan-2021.1.4-dev #1 [ 1194.104103][ T0] Hardware name: NULL R162-Z12-CD/MZ12-HD4-CD, BIOS R10 06/04/2020 [ 1194.119591][ T0] Call Trace: [ 1194.130233][ T0] dump_stack+0x9a/0xcc [ 1194.141617][ T0] ___might_sleep.cold+0x180/0x1b0 [ 1194.153825][ T0] mempool_alloc+0x16b/0x300 [ 1194.165313][ T0] ? remove_element+0x160/0x160 [ 1194.176961][ T0] ? blk_mq_end_request+0x4b/0x490 [ 1194.188778][ T0] crypt_convert+0x27f6/0x45f0 [dm_crypt] [ 1194.201024][ T0] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1194.212906][ T0] ? module_assert_mutex_or_preempt+0x3e/0x70 [ 1194.225318][ T0] ? __module_address.part.0+0x1b/0x3a0 [ 1194.237212][ T0] ? is_kernel_percpu_address+0x5b/0x190 [ 1194.249238][ T0] ? crypt_iv_tcw_ctr+0x4a0/0x4a0 [dm_crypt] [ 1194.261593][ T0] ? is_module_address+0x25/0x40 [ 1194.272905][ T0] ? static_obj+0x8a/0xc0 [ 1194.283582][ T0] ? lockdep_init_map_waits+0x26a/0x700 [ 1194.295570][ T0] ? __raw_spin_lock_init+0x39/0x110 [ 1194.307330][ T0] kcryptd_crypt_read_convert+0x31c/0x560 [dm_crypt] [ 1194.320496][ T0] ? kcryptd_queue_crypt+0x1be/0x380 [dm_crypt] [ 1194.333203][ T0] blk_update_request+0x6d7/0x1500 [ 1194.344841][ T0] ? blk_mq_trigger_softirq+0x190/0x190 [ 1194.356831][ T0] blk_mq_end_request+0x4b/0x490 [ 1194.367994][ T0] ? blk_mq_trigger_softirq+0x190/0x190 [ 1194.379693][ T0] flush_smp_call_function_queue+0x24b/0x560 [ 1194.391847][ T0] flush_smp_call_function_from_idle+0x59/0x80 [ 1194.403969][ T0] do_idle+0x287/0x450 [ 1194.413891][ T0] ? arch_cpu_idle_exit+0x40/0x40 [ 1194.424716][ T0] ? lockdep_hardirqs_on_prepare+0x286/0x3f0 [ 1194.436399][ T0] ? _raw_spin_unlock_irqrestore+0x39/0x40 [ 1194.447759][ T0] cpu_startup_entry+0x19/0x20 [ 1194.458038][ T0] secondary_startup_64_no_verify+0xb0/0xbb IO completion can be queued to a different CPU by the block subsystem as a "call single function/data". The CPU may run these routines from the idle task, but it does so with interrupts disabled. It is not a good idea to do decryption with irqs disabled even in an idle task context, so just defer it to a tasklet (as is done with requests from hard irqs). Fixes: `39d42fa96b` ("dm crypt: add flags to optionally bypass kcryptd workqueues") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-14 09:54:37 -05:00
Mikulas Patocka	17ffc193cd	dm integrity: fix the maximum number of arguments Advance the maximum number of arguments from 9 to 15 to account for all potential feature flags that may be supplied. Linux 4.19 added "meta_device" (`356d9d52e1`) and "recalculate" (`a3fcf72531`) flags. Commit `468dfca38b` added "sectors_per_bit" and "bitmap_flush_interval". Commit `84597a44a9` added "allow_discards". And the commit `d537858ac8` added "fix_padding". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-12 15:55:06 -05:00
Ignat Korchagin	8e14f61015	dm crypt: do not call bio_endio() from the dm-crypt tasklet Sometimes, when dm-crypt executes decryption in a tasklet, we may get "BUG: KASAN: use-after-free in tasklet_action_common.constprop..." with a kasan-enabled kernel. When the decryption fully completes in the tasklet, dm-crypt will call bio_endio(), which in turn will call clone_endio() from dm.c core code. That function frees the resources associated with the bio, including per bio private structures. For dm-crypt it will free the current struct dm_crypt_io, which contains our tasklet object, causing use-after-free, when the tasklet is being dequeued by the kernel. To avoid this, do not call bio_endio() from the current tasklet context, but delay its execution to the dm-crypt IO workqueue. Fixes: `39d42fa96b` ("dm crypt: add flags to optionally bypass kcryptd workqueues") Cc: <stable@vger.kernel.org> # v5.9+ Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-12 13:51:17 -05:00
Linus Torvalds	ed41fd071c	block-5.11-2021-01-10 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/7KA0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpn6WEACeUa97qyzm7G8/E5ejBL6lXSTRXNc8qa+h YCdrDltkqs6OHAuEyUCwGw3zPmb7fp4M5RLZ/Dp9EtMwld45HfoN6mpRe0+i4U96 iAkHMNUo6ytp3wXX1XKgZ0FhcSOSwkQK8CMzmLPn+pxkDYzQPFg38AUISPpoDA/L YNh4tEiHHd5oprHIzludE00m2i1oYNrBcmUe27sKxR0mak0kEJtxr4cXLrqBtN3k 9C31A0gstCINSHmQPAcRvFerDxDM0WPYQ7K6UEXfkCfbyf6i+1eG/qLUwUCdm9MD Rjot6dXzQ2LzqJbaAZndjJRDRZx2xpC2TNlNaBjYzSOC6AXSY0MKiZBCnH/i/OoZ f0Bq/k7LVeMbyu02cgIis4DPLabfG+XQUOniu4HQTrzK8+neApAlCwINc73cvQOb hBS+LfUVqP6K6g3oVGSvqG01wj2HK69SWMNKTr9GZ3GIqrcWYtA/JnqFfTE7/KwC H7rkPL8i3+NBXmjjz6hm8hx3MrnekKJpsdCBicm9OOYqJRbkGVjoUYeDFz5MElfp k71u2WDQ81aiqfWajsJkZaUFxZgUrRzuWeyBZiQQP9kJEMzUUiDSg4K+0WJhk5bO Y0EX0sdCz8k9IBKfi2+FcF5dYj3RDolALmBDrrcfchTW0h7vxMpn4rr/ueN7gViz rW/Gj9pRsA== =CClj -----END PGP SIGNATURE----- Merge tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Missing CRC32 selections (Arnd) - Fix for a merge window regression with bdev inode init (Christoph) - bcache fixes - rnbd fixes - NVMe pull request from Christoph: - fix a race in the nvme-tcp send code (Sagi Grimberg) - fix a list corruption in an nvme-rdma error path (Israel Rukshin) - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar) - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari) - fix two compiler warnings in nvme-fcloop (James Smart) - don't call sleeping functions from irq context in nvme-fc (James Smart) - remove an unused argument (Max Gurtovoy) - remove unused exports (Minwoo Im) - Use-after-free fix for partition iteration (Ming) - Missing blk-mq debugfs flag annotation (John) - Bdev freeze regression fix (Satya) - blk-iocost NULL pointer deref fix (Tejun) * tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block: (26 commits) bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket bcache: check unsupported feature sets for bcache register bcache: fix typo from SUUP to SUPP in features.h bcache: set pdev_set_uuid before scond loop iteration blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED block/rnbd-clt: avoid module unload race with close confirmation block/rnbd: Adding name to the Contributors List block/rnbd-clt: Fix sg table use after free block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close block/rnbd: Select SG_POOL for RNBD_CLIENT block: pre-initialize struct block_device in bdev_alloc_inode fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb nvme: remove the unused status argument from nvme_trace_bio_complete nvmet-rdma: Fix list_del corruption on queue establishment failure nvme: unexport functions with no external caller nvme: avoid possible double fetch in handling CQE nvme-tcp: Fix possible race of io_work and direct send nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings ...	2021-01-10 12:53:08 -08:00
Coly Li	5342fd4255	bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature set, it means the cache device is created with obsoleted layout with obso_bucket_site_hi. Now bcache does not support this feature bit, a new BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added for a better layout to support large bucket size. For the legacy compatibility purpose, if a cache device created with obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache devices attached to this cache set should be set to read-only. Then the dirty data can be written back to backing device before re-create the cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit by the latest bcache-tools. This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit when running a cache set and attach a bcache device to the cache set. If this bit is set, - When run a cache set, print an error kernel message to indicate all following attached bcache device will be read-only. - When attach a bcache device, print an error kernel message to indicate the attached bcache device will be read-only, and ask users to update to latest bcache-tools. Such change is only for cache device whose bucket size >= 32MB, this is for the zoned SSD and almost nobody uses such large bucket size at this moment. If you don't explicit set a large bucket size for a zoned SSD, such change is totally transparent to your bcache device. Fixes: `ffa4703275` ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-09 09:21:03 -07:00
Coly Li	b16671e8f4	bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET was introduced into the incompat feature set. It used bucket_size_hi (which was added at the tail of struct cache_sb_disk) to extend current 16bit bucket size to 32bit with existing bucket_size in struct cache_sb_disk. This is not a good idea, there are two obvious problems, - Bucket size is always value power of 2, if store log2(bucket size) in existing bucket_size of struct cache_sb_disk, it is unnecessary to add bucket_size_hi. - Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in struct cache_sb_disk, bucket_size_hi was added after d[] which makes csum_set calculate an unexpected super block checksum. To fix the above problems, this patch introduces a new incompat feature bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it means bucket_size in struct cache_sb_disk stores the order of power-of-2 bucket size value. When user specifies a bucket size larger than 32768 sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to incompat feature set, and bucket_size stores log2(bucket size) more than store the real bucket size value. The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore, it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only recognized by kernel driver for legacy compatible purpose. The previous bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk and not used in bcache-tools anymore. For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature, bcache-tools and kernel driver still recognize the feature string and display it as "obso_large_bucket". With this change, the unnecessary extra space extend of bcache on-disk super block can be avoided, and csum_set() may generate expected check sum as well. Fixes: `ffa4703275` ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-09 09:21:03 -07:00
Coly Li	1dfc0686c2	bcache: check unsupported feature sets for bcache register This patch adds the check for features which is incompatible for current supported feature sets. Now if the bcache device created by bcache-tools has features that current kernel doesn't support, read_super() will fail with error messoage. E.g. if an unsupported incompatible feature detected, bcache register will fail with dmesg "bcache: register_bcache() error : Unsupported incompatible feature found". Fixes: `d721a43ff6` ("bcache: increase super block version for cache device and backing device") Fixes: `ffa4703275` ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-09 09:21:03 -07:00
Coly Li	f7b4943dea	bcache: fix typo from SUUP to SUPP in features.h This patch fixes the following typos, from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP Fixes: `d721a43ff6` ("bcache: increase super block version for cache device and backing device") Fixes: `ffa4703275` ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-09 09:21:03 -07:00
Yi Li	e80927079f	bcache: set pdev_set_uuid before scond loop iteration There is no need to reassign pdev_set_uuid in the second loop iteration, so move it to the place before second loop. Signed-off-by: Yi Li <yili@winhong.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-09 09:21:03 -07:00
Mikulas Patocka	9b5948267a	dm integrity: fix flush with external metadata device With external metadata device, flush requests are not passed down to the data device. Fix this by submitting the flush request in dm_integrity_flush_buffers. In order to not degrade performance, we overlap the data device flush with the metadata device flush. Reported-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-08 15:57:29 -05:00
Mike Snitzer	0378c625af	dm: eliminate potential source of excessive kernel log noise There wasn't ever a real need to log an error in the kernel log for ioctls issued with insufficient permissions. Simply return an error and if an admin/user is sufficiently motivated they can enable DM's dynamic debugging to see an explanation for why the ioctls were disallowed. Reported-by: Nir Soffer <nsoffer@redhat.com> Fixes: `e980f62353` ("dm: don't allow ioctls to targets that don't map to whole devices") Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-08 15:56:35 -05:00
Akilesh Kailash	fcc4233837	dm snapshot: flush merged data before committing metadata If the origin device has a volatile write-back cache and the following events occur: 1: After finishing merge operation of one set of exceptions, merge_callback() is invoked. 2: Update the metadata in COW device tracking the merge completion. This update to COW device is flushed cleanly. 3: System crashes and the origin device's cache where the recent merge was completed has not been flushed. During the next cycle when we read the metadata from the COW device, we will skip reading those metadata whose merge was completed in step (1). This will lead to data loss/corruption. To address this, flush the origin device post merge IO before updating the metadata. Cc: stable@vger.kernel.org Signed-off-by: Akilesh Kailash <akailash@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-06 18:28:34 -05:00
Ignat Korchagin	d68b29584c	dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq Commit `39d42fa96b` ("dm crypt: add flags to optionally bypass kcryptd workqueues") made it possible for some code paths in dm-crypt to be executed in softirq context, when the underlying driver processes IO requests in interrupt/softirq context. In this case sometimes when allocating a new crypto request we may get a stacktrace like below: [ 210.103008][ C0] BUG: sleeping function called from invalid context at mm/mempool.c:381 [ 210.104746][ C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2602, name: fio [ 210.106599][ C0] CPU: 0 PID: 2602 Comm: fio Tainted: G W 5.10.0+ #50 [ 210.108331][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 [ 210.110212][ C0] Call Trace: [ 210.110921][ C0] <IRQ> [ 210.111527][ C0] dump_stack+0x7d/0xa3 [ 210.112411][ C0] ___might_sleep.cold+0x122/0x151 [ 210.113527][ C0] mempool_alloc+0x16b/0x2f0 [ 210.114524][ C0] ? __queue_work+0x515/0xde0 [ 210.115553][ C0] ? mempool_resize+0x700/0x700 [ 210.116586][ C0] ? crypt_endio+0x91/0x180 [ 210.117479][ C0] ? blk_update_request+0x757/0x1150 [ 210.118513][ C0] ? blk_mq_end_request+0x4b/0x480 [ 210.119572][ C0] ? blk_done_softirq+0x21d/0x340 [ 210.120628][ C0] ? __do_softirq+0x190/0x611 [ 210.121626][ C0] crypt_convert+0x29f9/0x4c00 [ 210.122668][ C0] ? _raw_spin_lock_irqsave+0x87/0xe0 [ 210.123824][ C0] ? kasan_set_track+0x1c/0x30 [ 210.124858][ C0] ? crypt_iv_tcw_ctr+0x4a0/0x4a0 [ 210.125930][ C0] ? kmem_cache_free+0x104/0x470 [ 210.126973][ C0] ? crypt_endio+0x91/0x180 [ 210.127947][ C0] kcryptd_crypt_read_convert+0x30e/0x420 [ 210.129165][ C0] blk_update_request+0x757/0x1150 [ 210.130231][ C0] blk_mq_end_request+0x4b/0x480 [ 210.131294][ C0] blk_done_softirq+0x21d/0x340 [ 210.132332][ C0] ? _raw_spin_lock+0x81/0xd0 [ 210.133289][ C0] ? blk_mq_stop_hw_queue+0x30/0x30 [ 210.134399][ C0] ? _raw_read_lock_irq+0x40/0x40 [ 210.135458][ C0] __do_softirq+0x190/0x611 [ 210.136409][ C0] ? handle_edge_irq+0x221/0xb60 [ 210.137447][ C0] asm_call_irq_on_stack+0x12/0x20 [ 210.138507][ C0] </IRQ> [ 210.139118][ C0] do_softirq_own_stack+0x37/0x40 [ 210.140191][ C0] irq_exit_rcu+0x110/0x1b0 [ 210.141151][ C0] common_interrupt+0x74/0x120 [ 210.142171][ C0] asm_common_interrupt+0x1e/0x40 Fix this by allocating crypto requests with GFP_ATOMIC mask in interrupt context. Fixes: `39d42fa96b` ("dm crypt: add flags to optionally bypass kcryptd workqueues") Cc: stable@vger.kernel.org # v5.9+ Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-04 15:02:33 -05:00
Ignat Korchagin	8abec36d12	dm crypt: do not wait for backlogged crypto request completion in softirq Commit `39d42fa96b` ("dm crypt: add flags to optionally bypass kcryptd workqueues") made it possible for some code paths in dm-crypt to be executed in softirq context, when the underlying driver processes IO requests in interrupt/softirq context. When Crypto API backlogs a crypto request, dm-crypt uses wait_for_completion to avoid sending further requests to an already overloaded crypto driver. However, if the code is executing in softirq context, we might get the following stacktrace: [ 210.235213][ C0] BUG: scheduling while atomic: fio/2602/0x00000102 [ 210.236701][ C0] Modules linked in: [ 210.237566][ C0] CPU: 0 PID: 2602 Comm: fio Tainted: G W 5.10.0+ #50 [ 210.239292][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 [ 210.241233][ C0] Call Trace: [ 210.241946][ C0] <IRQ> [ 210.242561][ C0] dump_stack+0x7d/0xa3 [ 210.243466][ C0] __schedule_bug.cold+0xb3/0xc2 [ 210.244539][ C0] __schedule+0x156f/0x20d0 [ 210.245518][ C0] ? io_schedule_timeout+0x140/0x140 [ 210.246660][ C0] schedule+0xd0/0x270 [ 210.247541][ C0] schedule_timeout+0x1fb/0x280 [ 210.248586][ C0] ? usleep_range+0x150/0x150 [ 210.249624][ C0] ? unpoison_range+0x3a/0x60 [ 210.250632][ C0] ? ____kasan_kmalloc.constprop.0+0x82/0xa0 [ 210.251949][ C0] ? unpoison_range+0x3a/0x60 [ 210.252958][ C0] ? __prepare_to_swait+0xa7/0x190 [ 210.254067][ C0] do_wait_for_common+0x2ab/0x370 [ 210.255158][ C0] ? usleep_range+0x150/0x150 [ 210.256192][ C0] ? bit_wait_io_timeout+0x160/0x160 [ 210.257358][ C0] ? blk_update_request+0x757/0x1150 [ 210.258582][ C0] ? _raw_spin_lock_irq+0x82/0xd0 [ 210.259674][ C0] ? _raw_read_unlock_irqrestore+0x30/0x30 [ 210.260917][ C0] wait_for_completion+0x4c/0x90 [ 210.261971][ C0] crypt_convert+0x19a6/0x4c00 [ 210.263033][ C0] ? _raw_spin_lock_irqsave+0x87/0xe0 [ 210.264193][ C0] ? kasan_set_track+0x1c/0x30 [ 210.265191][ C0] ? crypt_iv_tcw_ctr+0x4a0/0x4a0 [ 210.266283][ C0] ? kmem_cache_free+0x104/0x470 [ 210.267363][ C0] ? crypt_endio+0x91/0x180 [ 210.268327][ C0] kcryptd_crypt_read_convert+0x30e/0x420 [ 210.269565][ C0] blk_update_request+0x757/0x1150 [ 210.270563][ C0] blk_mq_end_request+0x4b/0x480 [ 210.271680][ C0] blk_done_softirq+0x21d/0x340 [ 210.272775][ C0] ? _raw_spin_lock+0x81/0xd0 [ 210.273847][ C0] ? blk_mq_stop_hw_queue+0x30/0x30 [ 210.275031][ C0] ? _raw_read_lock_irq+0x40/0x40 [ 210.276182][ C0] __do_softirq+0x190/0x611 [ 210.277203][ C0] ? handle_edge_irq+0x221/0xb60 [ 210.278340][ C0] asm_call_irq_on_stack+0x12/0x20 [ 210.279514][ C0] </IRQ> [ 210.280164][ C0] do_softirq_own_stack+0x37/0x40 [ 210.281281][ C0] irq_exit_rcu+0x110/0x1b0 [ 210.282286][ C0] common_interrupt+0x74/0x120 [ 210.283376][ C0] asm_common_interrupt+0x1e/0x40 [ 210.284496][ C0] RIP: 0010:_aesni_enc1+0x65/0xb0 Fix this by making crypt_convert function reentrant from the point of a single bio and make dm-crypt defer further bio processing to a workqueue, if Crypto API backlogs a request in interrupt context. Fixes: `39d42fa96b` ("dm crypt: add flags to optionally bypass kcryptd workqueues") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-04 15:02:32 -05:00
Arnd Bergmann	b690bd546b	dm zoned: select CONFIG_CRC32 Without crc32 support, this driver fails to link: arm-linux-gnueabi-ld: drivers/md/dm-zoned-metadata.o: in function `dmz_write_sb': dm-zoned-metadata.c:(.text+0xe98): undefined reference to `crc32_le' arm-linux-gnueabi-ld: drivers/md/dm-zoned-metadata.o: in function `dmz_check_sb': dm-zoned-metadata.c:(.text+0x7978): undefined reference to `crc32_le' Fixes: `3b1a94c88b` ("dm zoned: drive-managed zoned block device target") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-04 15:02:32 -05:00
Anthony Iliopoulos	f7b347acb5	dm integrity: select CRYPTO_SKCIPHER The integrity target relies on skcipher for encryption/decryption, but certain kernel configurations may not enable CRYPTO_SKCIPHER, leading to compilation errors due to unresolved symbols. Explicitly select CRYPTO_SKCIPHER for DM_INTEGRITY, since it is unconditionally dependent on it. Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-04 15:02:31 -05:00
Mike Snitzer	cc07d72bf3	dm raid: fix discard limits for raid1 Block core warned that discard_granularity was 0 for dm-raid with personality of raid1. Reason is that raid_io_hints() was incorrectly special-casing raid1 rather than raid0. Fix raid_io_hints() by removing discard limits settings for raid1. Check for raid0 instead. Fixes: `61697a6abd` ("dm: eliminate 'split_discard_bios' flag from DM target interface") Cc: stable@vger.kernel.org Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Stephan Bärwolf <stephan@matrixstorm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2021-01-04 15:02:30 -05:00
Linus Torvalds	dea8dcf2a9	Revert WQ_SYSFS change that broke reencryption (and all other functionality that requires reloading a dm-crypt DM table). -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl/qS/4THHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWiD0CACEywiVceLgV4dTnlH5h5YJjxybjo9g LyVe+4X9bN2LauJNpNmlWgRQZTovC6wIk9kyY7p69ZZqXe9Y4sXxoynRoUu/2DiG 5+MnOleTIOafUHJTJOGhDs+vDPgNnYh3xmoVNZqAQpcexPJg/E0wuhcgmO9lXFew ldcNOXV51AdANfjLKFlyQckqBG028ktdV2xt7u/B1FAKcmUbb0rfG7LDHlGf1ggj KAVlZzMry7wMhEGPS3IQtmA0mO5DSMn1Kp4iM0Wd6cVrJ+jZafv0uFHFUHiH7DYy yxj7AurM/0wxS06cHvCN82OoOl9AzDmNAKy94I/PiWAqyP3I6iiep5rE =K4dv -----END PGP SIGNATURE----- Merge tag 'for-5.11/dm-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mike Snitzer: "Revert WQ_SYSFS change that broke reencryption (and all other functionality that requires reloading a dm-crypt DM table)" * tag 'for-5.11/dm-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: Revert "dm crypt: export sysfs of kcryptd workqueue"	2020-12-28 13:32:16 -08:00
Mike Snitzer	48b0777cd9	Revert "dm crypt: export sysfs of kcryptd workqueue" This reverts commit `a2b8b2d975`. WQ_SYSFS breaks the ability to reload a DM table due to sysfs kobject collision (due to active and inactive table). Given lack of demonstrated need for exposing this workqueue via sysfs: revert exposing it. Reported-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-28 16:13:52 -05:00
Linus Torvalds	771e7e4161	block-5.11-2020-12-23 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/kHXgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpkhFEADYuiBZbYEonaV4/nqOhMZ6lXj99rEqVZui AOMm7W8nopb97pWy0sJxZHPPMnjglubkTZbX/2TH08ndppGBQAa9HsgDIITO5Ap3 vjnew6oPsrMtxMUMqR8w8nMb4z5tfqpvUYRPd+qpvYSYLEeSh0UNAEdQ/MOGbA1t nl8UPdNW/s1MbeyLJhU+NgGM0aZahED8KuJeVLOY2im6dySO+CoeoB0/mWrdc0PZ SDcGBEjhmFspDVAkW4Wo8bMw7Cr72es0esvJSJyx0mlo0jSCR7ZYParDtPwuAl9H thKo+brLibz9M+wRtW+7w37oUADTTRL+KV2xJ/J+DeAVjI7/hxgZKewh6hEORmrF DwjbS8cnEVp70rfF7+Z9FV/+2GJXm1uI/swBi69Y42CQ33FNLZmikBW6Q3pimrDj ZeQSCLRpGlo01xtJpsm8L71KGSfYd+njWaIFnESPhxJ+sTxd8kvFRsdjHlc4KNBG UnMzvn6pE3WNhzJJoIqdM2uXa4XDyKkMfO7VbpUgbyij103jpbs7ruWXq3DDU5/t Gdwa821zlq5azDYAw7PNb7FhKrsHhiKhOuxyKqJUbOUkOHBNwxJmxXcaMnR3fOSi B+AlqYhC6A9DhkL0HG0QLcdFwYawpOsDxbFUbemt/UXn74lAjRnp8+rNMSn5tjbn OCnk4Tpaww== =Q42d -----END PGP SIGNATURE----- Merge tag 'block-5.11-2020-12-23' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A few stragglers in here, but mostly just straight fixes. In particular: - Set of rnbd fixes for issues around changes for the merge window (Gioh, Jack, Md Haris Iqbal) - iocost tracepoint addition (Baolin) - Copyright/maintainers update (Christoph) - Remove old blk-mq fast path CPU warning (Daniel) - loop max_part fix (Josh) - Remote IPI threaded IRQ fix (Sebastian) - dasd stable fixes (Stefan) - bcache merge window fixup and style fixup (Yi, Zheng)" * tag 'block-5.11-2020-12-23' of git://git.kernel.dk/linux-block: md/bcache: convert comma to semicolon bcache:remove a superfluous check in register_bcache block: update some copyrights block: remove a pointless self-reference in block_dev.c MAINTAINERS: add fs/block_dev.c to the block section blk-mq: Don't complete on a remote CPU in force threaded mode s390/dasd: fix list corruption of lcu list s390/dasd: fix list corruption of pavgroup group list s390/dasd: prevent inconsistent LCU device data s390/dasd: fix hanging device offline processing blk-iocost: Add iocg idle state tracepoint nbd: Respect max_part for all partition scans block/rnbd-clt: Does not request pdu to rtrs-clt block/rnbd-clt: Dynamically allocate sglist for rnbd_iu block/rnbd: Set write-back cache and fua same to the target device block/rnbd: Fix typos block/rnbd-srv: Protect dev session sysfs removal block/rnbd-clt: Fix possible memleak block/rnbd-clt: Get rid of warning regarding size argument in strlcpy blk-mq: Remove 'running from the wrong CPU' warning	2020-12-24 12:28:35 -08:00
Zheng Yongjun	46926127d7	md/bcache: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Coly Li <colyli@sue.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-23 09:25:15 -07:00
Yi Li	117ae250cf	bcache:remove a superfluous check in register_bcache There have no reassign the bdev after check It is IS_ERR. the double check !IS_ERR(bdev) is superfluous. After commit `4e7b5671c6` ("block: remove i_bdev"), "Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code." so after lookup_bdev call, there no need to do bdput. remove a superfluous check the bdev & don't call bdput after lookup_bdev. Fixes: 4e7b5671c6a8("block: remove i_bdev") Signed-off-by: Yi Li <yili@winhong.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-23 09:25:15 -07:00
Linus Torvalds	d8355e740f	- Add DM verity support for signature verification with 2nd keyring. - Fix DM verity to skip verity work if IO completes with error while system is shutting down. - Add new DM multipath "IO affinity" path selector that maps IO destined to a given path to a specific CPU based on user provided mapping. - Rename DM multipath path selector source files to have "dm-ps" prefix. - Add REQ_NOWAIT support to some other simple DM targets that don't block in more elaborate ways waiting for IO. - Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS). - Fix error return code in DM's target_message() if empty message is received. - A handful of other small cleanups. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl/iDJYTHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWsI2CACUk9PCCtOOHH1s//VeLjF86VUIsuxf hNMfTgWT+arlHUIzl2Bp4c4Dq/T+hXTYlf+f5zGRAbBAd82eCqji5LTVvy/qaYt3 4gWSZnv/3VYHCx4MvV9UjtXQcSnS/ttDwyV0ZD6/NtYllQQobaCbyrE4p2tA1GIk ql8ESRgXKK5Sio197Tm45UEkrhG0oUrEZ3riBaXeM9yOldLp1mLLVPGJeLcJDvpA N7TDcM0owq/CMbmu5BkNv0xw7q/Vc9VQLGva8a15StxGjk1HY5/6KQWssCEkTkO7 HnIprATtWz2r0qgTcI+fOXndUmlqgQr1jhvYfJeuv45KbjoI5qubZvYr =vHNj -----END PGP SIGNATURE----- Merge tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM verity support for signature verification with 2nd keyring - Fix DM verity to skip verity work if IO completes with error while system is shutting down - Add new DM multipath "IO affinity" path selector that maps IO destined to a given path to a specific CPU based on user provided mapping - Rename DM multipath path selector source files to have "dm-ps" prefix - Add REQ_NOWAIT support to some other simple DM targets that don't block in more elaborate ways waiting for IO - Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS) - Fix error return code in DM's target_message() if empty message is received - A handful of other small cleanups * tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: simplify the return expression of load_mapping() dm ebs: avoid double unlikely() notation when using IS_ERR() dm verity: skip verity work if I/O error when system is shutting down dm crypt: export sysfs of kcryptd workqueue dm ioctl: fix error return code in target_message dm crypt: Constify static crypt_iv_operations dm: add support for REQ_NOWAIT to various targets dm: rename multipath path selector source files to have "dm-ps" prefix dm mpath: add IO affinity path selector dm verity: Add support for signature verification with 2nd keyring dm: remove unnecessary current->bio_list check when submitting split bio	2020-12-22 13:27:21 -08:00
Zheng Yongjun	b77709237e	dm cache: simplify the return expression of load_mapping() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-22 09:54:48 -05:00
Antonio Quartulli	52252adede	dm ebs: avoid double unlikely() notation when using IS_ERR() The definition of IS_ERR() already applies the unlikely() notation when checking the error status of the passed pointer. For this reason there is no need to have the same notation outside of IS_ERR() itself. Clean up code by removing redundant notation. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-21 13:59:37 -05:00
Hyeongseok Kim	252bd12563	dm verity: skip verity work if I/O error when system is shutting down If emergency system shutdown is called, like by thermal shutdown, a dm device could be alive when the block device couldn't process I/O requests anymore. In this state, the handling of I/O errors by new dm I/O requests or by those already in-flight can lead to a verity corruption state, which is a misjudgment. So, skip verity work in response to I/O error when system is shutting down. Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-21 13:57:25 -05:00
Linus Torvalds	69f637c335	for-5.11/drivers-2020-12-14 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/XgdYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjTBD/4me2TNvGOogbcL0b1leAotndJ7spI/IcFM NUMNy3pOGuRBcRjwle85xq44puAjlNkZE2LLatem5sT7ZvS+8lPNnOIoTYgfaCjt PhKx2sKlLumVm3BwymYAPcPtke4fikGG15Mwu5nX1oOehmyGrjObGAr3Lo6gexCT tQoCOczVqaTsV+iTXrLlmgEgs07J9Tm93uh2cNR8Jgroxb8ivuWeUq4YgbV4kWk+ Y8XvOyVE/yba0vQf5/hHtWuVoC6RdELnqZ6NCkcP/EicdBecwk1GMJAej1S3zPS1 0BT7GSFTpm3YUHcygD6LRmRg4I/BmWDTDtMi84+jLat6VvSG1HwIm//qHiCJh3ku SlvFZENIWAv5LP92x2vlR5Lt7uE3GK2V/5Pxt2fekyzCth6mzu+hLH4CBPQ3xgyd E1JqIQ/ilbXstp+EYoivV5x8yltZQnKEZRopws0EOqj1LsmDPj9XT1wzE9RnB0o+ PWu/DNhQFhhcmP7Z8uLgPiKIVpyGs+vjxiJLlTtGDFTCy6M5JbcgzGkEkSmnybxH 7lSanjpLt1dWj85FBMc6fNtJkv2rBPfb4+j0d1kZ45Dzcr4umirGIh7wtCHcgc83 brmXSt29hlKHseSHMMuNWK8haXcgAE7gq9tD8GZ/kzM7+vkmLLxHJa22Qhq5rp4w URPeaBaQJw== =ayp2 -----END PGP SIGNATURE----- Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Nothing major in here: - NVMe pull request from Christoph: - nvmet passthrough improvements (Chaitanya Kulkarni) - fcloop error injection support (James Smart) - read-only support for zoned namespaces without Zone Append (Javier González) - improve some error message (Minwoo Im) - reject I/O to offline fabrics namespaces (Victor Gladkov) - PCI queue allocation cleanups (Niklas Schnelle) - remove an unused allocation in nvmet (Amit Engel) - a Kconfig spelling fix (Colin Ian King) - nvme_req_qid simplication (Baolin Wang) - MD pull request from Song: - Fix race condition in md_ioctl() (Dae R. Jeong) - Initialize read_slot properly for raid10 (Kevin Vigor) - Code cleanup (Pankaj Gupta) - md-cluster resync/reshape fix (Zhao Heming) - Move null_blk into its own directory (Damien Le Moal) - null_blk zone and discard improvements (Damien Le Moal) - bcache race fix (Dongsheng Yang) - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang, Lutz Pogrell, Md Haris Iqbal) - lightnvm NULL pointer deref fix (tangzhenhao) - sr in_interrupt() removal (Sebastian Andrzej Siewior) - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included as it made it easier for them to funnel the feature through the block driver tree. - Follow up fixes (Colin Ian King)" * tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits) block: drop dead assignments in loop_init() sr: Remove in_interrupt() usage in sr_init_command(). sr: Switch the sector size back to 2048 if sr_read_sector() changed it. cdrom: Reset sector_size back it is not 2048. drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c null_blk: Move driver into its own directory null_blk: Allow controlling max_hw_sectors limit null_blk: discard zones on reset null_blk: cleanup discard handling null_blk: Improve implicit zone close null_blk: improve zone locking block: Align max_hw_sectors to logical blocksize null_blk: Fail zone append to conventional zones null_blk: Fix zone size initialization bcache: fix race between setting bdev state to none and new write request direct to backing block/rnbd: fix a null pointer dereference on dev->blk_symlink_name block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name block/rnbd: call kobject_put in the failure path Documentation/ABI/rnbd-srv: add document for force_close block/rnbd-srv: close a mapped device from server side. ...	2020-12-16 13:09:32 -08:00
Linus Torvalds	ac7ac4618c	for-5.11/block-2020-12-14 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK 2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3 geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/ GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg Q1Xqs6xwww== =zo4w -----END PGP SIGNATURE----- Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "Another series of killing more code than what is being added, again thanks to Christoph's relentless cleanups and tech debt tackling. This contains: - blk-iocost improvements (Baolin Wang) - part0 iostat fix (Jeffle Xu) - Disable iopoll for split bios (Jeffle Xu) - block tracepoint cleanups (Christoph Hellwig) - Merging of struct block_device and hd_struct (Christoph Hellwig) - Rework/cleanup of how block device sizes are updated (Christoph Hellwig) - Simplification of gendisk lookup and removal of block device aliasing (Christoph Hellwig) - Block device ioctl cleanups (Christoph Hellwig) - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig) - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig) - sbitmap improvements (Pavel Begunkov) - Hybrid polling fix (Pavel Begunkov) - bvec iteration improvements (Pavel Begunkov) - Zone revalidation fixes (Damien Le Moal) - blk-throttle limit fix (Yu Kuai) - Various little fixes" * tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits) blk-mq: fix msec comment from micro to milli seconds blk-mq: update arg in comment of blk_mq_map_queue blk-mq: add helper allocating tagset->tags Revert "block: Fix a lockdep complaint triggered by request queue flushing" nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class blk-mq: add new API of blk_mq_hctx_set_fq_lock_class block: disable iopoll for split bio block: Improve blk_revalidate_disk_zones() checks sbitmap: simplify wrap check sbitmap: replace CAS with atomic and sbitmap: remove swap_lock sbitmap: optimise sbitmap_deferred_clear() blk-mq: skip hybrid polling if iopoll doesn't spin blk-iocost: Factor out the base vrate change into a separate function blk-iocost: Factor out the active iocgs' state check into a separate function blk-iocost: Move the usage ratio calculation to the correct place blk-iocost: Remove unnecessary advance declaration blk-iocost: Fix some typos in comments blktrace: fix up a kerneldoc comment block: remove the request_queue to argument request based tracepoints ...	2020-12-16 12:57:51 -08:00
Mike Snitzer	0941e3b065	Revert "dm raid: fix discard limits for raid1 and raid10" This reverts commit `e0910c8e4f`. Reverting `6ffeb1c3f8` ("md: change mddev 'chunk_sectors' from int to unsigned") exposes dm-raid.c compiler warnings detailed that commit's header. Clearly this more conservative fix, of simply reverting `e0910c8e4f`, would've been more prudent given how late we were in the v5.10 release. Lessons have been learned. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-14 12:12:08 -05:00
Mike Snitzer	77a68698ff	Revert "md: change mddev 'chunk_sectors' from int to unsigned" This reverts commit `6ffeb1c3f8`. This change caused unexpected v5.10 raid6 mount failures, see: https://lkml.org/lkml/2020/12/14/7 Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-14 12:08:48 -05:00
Linus Torvalds	d2360a398f	block-5.10-2020-12-12 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/VMG8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsBdEACjbsECHIyQZOtTHW+vY2wkOj2ph+OkXtUU nmRlXn5itcrTEeHyenwDEc/W03WbU5m4C6BUz9ecRtXLda6K9W2d7YspSWND1JP/ Gz3YtingA4xwAl7wOtHT6gi6JHb/seDObiKWAik6f7A3xvBW3o9lV1QRy3kaNdG1 XgcYvBZBHDaP7Gdvvbg6FucSe3dNjSi+skVdvx8j+i27HF3LAEp3eoWG4vZBUePH RIpdDyCXRao+Z22V+EeY0X5cH568fxS408Z6xXewV/g2TDd4XUaXz3WM0O9FN0jU oTiQeXL+1y4EfU8QoboTWvHmpoKRqWMSZVD0GXbmx4ZwpiFHbyBub2SydU2Xm/Vl oNwHc1wktVU2HFrWU9dwrlWz4Egt4KZO6D9SzJ3VDt9HLKIciyZD16rbsVwSr/Bc BVQaFJjRYoR60xxNohIEC8C3rF3qRbkAy5HYSHFUyuOE6gm05RQIHLprY6gIjSei 7gJnP1FL3iu5bQywp7ptqaRIlkeau6kqvboc8oYFDQvVAp35JqnxjdOsAByWUZR9 lrN87bIh0n1Vqprs5arXWhw5Ixy7ZQsf0DsrFImZgU+VNTFmuB+1cKaa94EsQwo5 gQH2j39QQSlGmkXphQVWDd0eGCAt3OrmjYe721ZSSLvJjM64+7pZFZpYPc//VpR5 8Kcq/yDeOA== =PmVb -----END PGP SIGNATURE----- Merge tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "This should be it for 5.10. Mike and Song looked into the warning case, and thankfully it appears the fix was pretty trivial - we can just change the md device chunk type to unsigned int to get rid of it. They cannot currently be < 0, and nobody is checking for that either. We're reverting the discard changes as the corruption reports came in very late, and there's just no time to attempt to deal with it at this point. Reverting the changes in question is the right call for 5.10" * tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block: md: change mddev 'chunk_sectors' from int to unsigned Revert "md: add md_submit_discard_bio() for submitting discard bio" Revert "md/raid10: extend r10bio devs to raid disks" Revert "md/raid10: pull codes that wait for blocked dev into one function" Revert "md/raid10: improve raid10 discard request" Revert "md/raid10: improve discard request for far layout" Revert "dm raid: remove unnecessary discard limits for raid10"	2020-12-13 10:36:23 -08:00
Mike Snitzer	6ffeb1c3f8	md: change mddev 'chunk_sectors' from int to unsigned Commit `e2782f560c` ("Revert "dm raid: remove unnecessary discard limits for raid10"") exposed compiler warnings introduced by commit `e0910c8e4f` ("dm raid: fix discard limits for raid1 and raid10"): In file included from ./include/linux/kernel.h:14, from ./include/asm-generic/bug.h:20, from ./arch/x86/include/asm/bug.h:93, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/gfp.h:5, from ./include/linux/slab.h:15, from drivers/md/dm-raid.c:8: drivers/md/dm-raid.c: In function ‘raid_io_hints’: ./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) )1 == (typeof(y) )1))) ^~ ./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’ (__typecheck(x, y) && __no_side_effects(x, y)) ^~~~~~~~~~~ ./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’ __builtin_choose_expr(__safe_cmp(x, y), \ ^~~~~~~~~~ ./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’ #define min(x, y) __careful_cmp(x, y, <) ^~~~~~~~~~~~~ ./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’ __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); }) ^~~ drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’ limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors, ^~~~~~~~~~~~ Fix this by changing the chunk_sectors member of 'struct mddev' from int to 'unsigned int' to match the type used for the 'chunk_sectors' member of 'struct queue_limits'. Various MD code still uses 'int' but none of it appears to ever make use of signed int; and storing positive signed int in unsigned is perfectly safe. Reported-by: Song Liu <songliubraving@fb.com> Fixes: `e2782f560c` ("Revert "dm raid: remove unnecessary discard limits for raid10"") Fixes: `e0910c8e4f` ("dm raid: fix discard limits for raid1 and raid10") Cc: stable@vger,kernel.org # `e0910c8e4f` was marked for stable@ Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-12 10:07:50 -07:00
Song Liu	57a0f3a81e	Revert "md: add md_submit_discard_bio() for submitting discard bio" This reverts commit `2628089b74`. Matthew Ruffell reported data corruption in raid10 due to the changes in discard handling [1]. Revert these changes before we find a proper fix. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-12-09 20:46:01 -08:00
Song Liu	17c28c2a06	Revert "md/raid10: extend r10bio devs to raid disks" This reverts commit `8650a88901`. Matthew Ruffell reported data corruption in raid10 due to the changes in discard handling [1]. Revert these changes before we find a proper fix. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-12-09 20:46:01 -08:00
Song Liu	4e2c6567ef	Revert "md/raid10: pull codes that wait for blocked dev into one function" This reverts commit `f046f5d0d7`. Matthew Ruffell reported data corruption in raid10 due to the changes in discard handling [1]. Revert these changes before we find a proper fix. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-12-09 20:46:01 -08:00
Song Liu	d7cb6be0d0	Revert "md/raid10: improve raid10 discard request" This reverts commit `bcc90d2804`. Matthew Ruffell reported data corruption in raid10 due to the changes in discard handling [1]. Revert these changes before we find a proper fix. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-12-09 20:46:00 -08:00
Song Liu	82fe9af77c	Revert "md/raid10: improve discard request for far layout" This reverts commit `d3ee2d8415`. Matthew Ruffell reported data corruption in raid10 due to the changes in discard handling [1]. Revert these changes before we find a proper fix. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-12-09 20:46:00 -08:00
Song Liu	e2782f560c	Revert "dm raid: remove unnecessary discard limits for raid10" This reverts commit `f0e90b6c66`. Matthew Ruffell reported data corruption in raid10 due to the changes in discard handling [1]. Revert these changes before we find a proper fix. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Cc: Matthew Ruffell <matthew.ruffell@canonical.com> Cc: Xiao Ni <xni@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-12-09 20:43:48 -08:00
Dongsheng Yang	df4ad53242	bcache: fix race between setting bdev state to none and new write request direct to backing There is a race condition in detaching as below: A. detaching B. Write request (1) writing back (2) write back done, set bdev state to clean. (3) cached_dev_put() and schedule_work(&dc->detach); (4) write data [0 - 4K] directly into backing and ack to user. (5) power-failure... When we restart this bcache device, this bdev is clean but not detached, and read [0 - 4K], we will get unexpected old data from cache device. To fix this problem, set the bdev state to none when we writeback done in detaching, and then if power-failure happened as above, the data in cache will not be used in next bcache device starting, it's detached, we will read the correct data from backing derectly. Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-07 13:25:16 -07:00
Jeffle Xu	a2b8b2d975	dm crypt: export sysfs of kcryptd workqueue It should be helpful to export sysfs of "kcryptd" workqueue in some cases, such as setting specific CPU affinity of the workqueue. Besides, also tweak the name format a little. The slash inside a directory name will be translate into exclamation mark, such as /sys/devices/virtual/workqueue/'kcryptd!253:0'. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:36 -05:00
Qinglang Miao	4d7659bfbe	dm ioctl: fix error return code in target_message Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `2ca4c92f58` ("dm ioctl: prevent empty message") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:36 -05:00
Rikard Falkeborn	e8dc79d1bd	dm crypt: Constify static crypt_iv_operations The only usage of these structs is to assign their address to the iv_gen_ops field in the crypt config struct, which is a pointer to const. Make them const like the rest of the static crypt_iv_operations structs. This allows the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:35 -05:00
Jeffle Xu	410fe22007	dm: add support for REQ_NOWAIT to various targets commit `021a24460d` ("block: add QUEUE_FLAG_NOWAIT") added a new queue flag QUEUE_FLAG_NOWAIT to advertise if the bdev supports handling of REQ_NOWAIT or not. DM core supports stacking QUEUE_FLAG_NOWAIT since commit `6abc49468e` ("dm: add support for REQ_NOWAIT and enable it for linear target"), in which only dm-linear enabled it. Update others DM targets, which just do simple remapping, to enable support for REQ_NOWAIT. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:35 -05:00
Mike Snitzer	298fb37298	dm: rename multipath path selector source files to have "dm-ps" prefix Additional prefix helps clarify that these source files implement path selectors. Required updating Makefile to still build modules _without_ the "dm-ps" prefix to preserve dm-multipath's ability to autoload path selector modules. While at it, cleaned up some DM whitespace in Makefile. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:35 -05:00
Mike Christie	e4d2e82b23	dm mpath: add IO affinity path selector This patch adds a path selector that selects paths based on a CPU to path mapping the user passes in and what CPU we are executing on. The primary user for this PS is where the app is optimized to use specific CPUs so other PSs undo the apps handy work, and the storage and it's transport are not a bottlneck. For these io-affinity PS setups a path's transport/interconnect perf is not going to flucuate a lot and there is no major differences between paths, so QL/HST smarts do not help and RR always messes up what the app is trying to do. On a system with 16 cores, where you have a job per CPU: fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=128 --numjobs=16 and a dm-multipath device setup where each CPU is mapped to one path: // When in mq mode I had to set dm_mq_nr_hw_queues=$NUM_PATHS. // Bio mode also showed similar results. 0 16777216 multipath 0 0 1 1 io-affinity 0 16 1 8:16 1 8:32 2 8:64 4 8:48 8 8:80 10 8:96 20 8:112 40 8:128 80 8:144 100 8:160 200 8:176 400 8:192 800 8:208 1000 8:224 2000 8:240 4000 65:0 8000 we can see a IOPs increase of 25%. The percent increase depends on the device and interconnect. For a slower/medium speed path/device that can do around 180K IOPs a path if you ran that fio command to it directly we saw a 25% increase like above. Slower path'd devices that could do around 90K per path showed maybe around a 2 - 5% increase. If you use something like null_blk or scsi_debug which can multi-million IOPs and hack it up so each device they export shows up as a path then you see 50%+ increases. Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:35 -05:00
Mickaël Salaün	4da8f8c8a1	dm verity: Add support for signature verification with 2nd keyring Add a new configuration DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING to enable dm-verity signatures to be verified against the secondary trusted keyring. Instead of relying on the builtin trusted keyring (with hard-coded certificates), the second trusted keyring can include certificate authorities from the builtin trusted keyring and child certificates loaded at run time. Using the secondary trusted keyring enables to use dm-verity disks (e.g. loop devices) signed by keys which did not exist at kernel build time, leveraging the certificate chain of trust model. In practice, this makes it possible to update certificates without kernel update and reboot, aligning with module and kernel (kexec) signature verification which already use the secondary trusted keyring. Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:35 -05:00
Jeffle Xu	985eabdcfe	dm: remove unnecessary current->bio_list check when submitting split bio The depth-first splitting is introduced in commit `18a25da843` ("dm: ensure bio submission follows a depth-first tree walk"), which is used to fix the potential deadlock in case of the misordering handling of bios caused by bio_list. There're two paths submitting split bios, dm_wq_work() from worker thread and submit_bio() from application. Back upon that time, dm_wq_work() thread calls __split_and_process_bio() directly and thus will not trigger this issue since bio_list doesn't exist here. So this issue will only be triggered from application calling submit_bio(), and the fix has to check if current->bio_list is non-NULL to distinguish this case. However since commit `0c2915b8c6` ("dm: fix missing imposition of queue_limits from dm_wq_work() thread"), dm_wq_work() thread calls submit_bio_noacct() and thus also uses bio_list. Since then all entries into __split_and_process_bio() are under protection of bio_list, and thus the checking of current->bio_list when determinning if the depth-first principle should be used, seems kind of nonsense. After all the checking always succeeds now. Fixes: `0c2915b8c6` ("dm: fix missing imposition of queue_limits from dm_wq_work() thread") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 18:04:35 -05:00
Mike Snitzer	bde3808bc8	dm: remove invalid sparse __acquires and __releases annotations Fixes sparse warnings: drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit Fixes: `971888c469` ("dm: hold DM table for duration of ioctl rather than use blkdev_get") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 15:25:18 -05:00
Mike Snitzer	f05c4403db	dm: fix double RCU unlock in dm_dax_zero_page_range() error path Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error path to fix sparse warning: drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock Fixes: `cdf6cdcd3b` ("dm,dax: Add dax zero_page_range operation") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-04 15:19:27 -05:00
Mike Snitzer	3ee16db390	dm: fix IO splitting Commit `882ec4e609` ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") caused a couple regressions: 1) Using lcm_not_zero() when stacking chunk_sectors was a bug because chunk_sectors must reflect the most limited of all devices in the IO stack. 2) DM targets that set max_io_len but that do _not_ provide an .iterate_devices method no longer had there IO split properly. And commit `5091cdec56` ("dm: change max_io_len() to use blk_max_size_offset()") also caused a regression where DM no longer supported varied (per target) IO splitting. The implication being the potential for severely reduced performance for IO stacks that use a DM target like dm-cache to hide performance limitations of a slower device (e.g. one that requires 4K IO splitting). Coming full circle: Fix all these issues by discontinuing stacking chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(), add optional chunk_sectors override argument to blk_max_size_offset() and update DM's max_io_len() to pass ti->max_io_len to its blk_max_size_offset() call. Passing in an optional chunk_sectors override to blk_max_size_offset() allows for code reuse of block's centralized calculation for max IO size based on provided offset and split boundary. Fixes: `882ec4e609` ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") Fixes: `5091cdec56` ("dm: change max_io_len() to use blk_max_size_offset()") Cc: stable@vger.kernel.org Reported-by: John Dorminy <jdorminy@redhat.com> Reported-by: Bruce Johnston <bjohnsto@redhat.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: John Dorminy <jdorminy@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk>	2020-12-04 14:53:15 -05:00
Christoph Hellwig	a54895fa05	block: remove the request_queue to argument request based tracepoints The request_queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-04 09:42:00 -07:00
Christoph Hellwig	1c02fca620	block: remove the request_queue argument to the block_bio_remap tracepoint The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-04 09:42:00 -07:00
Christoph Hellwig	eb6f7f7cd3	block: remove the request_queue argument to the block_split tracepoint The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-04 09:42:00 -07:00
Christoph Hellwig	977115c0f6	block: stop using bdget_disk for partition 0 We can just dereference the point in struct gendisk instead. Also remove the now unused export. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	8446fe9255	block: switch partition lookup to use struct block_device Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	cb8432d650	block: allocate struct hd_struct as part of struct bdev_inode Allocate hd_struct together with struct block_device to pre-load the lifetime rule changes in preparation of merging the two structures. Note that part0 was previously embedded into struct gendisk, but is a separate allocation now, and already points to the block_device instead of the hd_struct. The lifetime of struct gendisk is still controlled by the struct device embedded in the part0 hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	a782483cc1	block: remove the nr_sects field in struct hd_struct Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync. Additionally the field in hd_struct did not use proper serialization, possibly allowing for torn writes. By only using the block_device field this problem also gets fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	4e7b5671c6	block: remove i_bdev Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:39 -07:00
Christoph Hellwig	8d65269fe8	block: add a bdev_kobj helper Add a little helper to find the kobject for a struct block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:39 -07:00
Christoph Hellwig	b0519b5423	dm: remove the block_device reference in struct mapped_device Get rid of the long-lasting struct block_device reference in struct mapped_device. The only remaining user is the freeze code, where we can trivially look up the block device at freeze time and release the reference at thaw time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:39 -07:00
Christoph Hellwig	47d951023a	dm: simplify flush_bio initialization in __send_empty_flush We don't really need the struct block_device to initialize a bio. So switch from using bio_set_dev to manually setting up bi_disk (bi_partno will always be zero and has been cleared by bio_init already). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:39 -07:00
Christoph Hellwig	040f04bd2e	fs: simplify freeze_bdev/thaw_bdev Store the frozen superblock in struct block_device to avoid the awkward interface that can return a sb only used a cookie, an ERR_PTR or NULL. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:38 -07:00
Mike Snitzer	857c4c0a8b	dm writecache: remove BUG() and fail gracefully instead Building on arch/s390/ results in this build error: cc1: some warnings being treated as errors ../drivers/md/dm-writecache.c: In function 'persistent_memory_claim': ../drivers/md/dm-writecache.c:323:1: error: no return statement in function returning non-void [-Werror=return-type] Fix this by replacing the BUG() with an -EOPNOTSUPP return. Fixes: `48debafe4f` ("dm: add writecache target") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-01 15:43:39 -05:00
Thomas Gleixner	e7b624183d	dm table: Remove BUG_ON(in_interrupt()) The BUG_ON(in_interrupt()) in dm_table_event() is a historic leftover from a rework of the dm table code which changed the calling context. Issuing a BUG for a wrong calling context is frowned upon and in_interrupt() is deprecated and only covering parts of the wrong contexts. The sanity check for the context is covered by CONFIG_DEBUG_ATOMIC_SLEEP and other debug facilities already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-01 15:43:38 -05:00
Sergei Shtepa	8947833571	dm: fix bug with RCU locking in dm_blk_report_zones The dm_get_live_table() function makes RCU read lock so dm_put_live_table() must be called even if dm_table map is not found. Fixes: `e76239a374` ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-01 15:43:37 -05:00
Nick Desaulniers	35d2835d2a	Revert "dm cache: fix arm link errors with inline" This reverts commit `43aeaa2957`. Since commit `0bddd227f3` ("Documentation: update for gcc 4.9 requirement") the minimum supported version of GCC is gcc-4.9. It's now safe to remove this code. Link: https://github.com/ClangBuiltLinux/linux/issues/427 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-12-01 15:43:36 -05:00
Zhao Heming	bca5b06580	md/cluster: fix deadlock when node is doing resync job md-cluster uses MD_CLUSTER_SEND_LOCK to make node can exclusively send msg. During sending msg, node can concurrently receive msg from another node. When node does resync job, grab token_lockres:EX may trigger a deadlock: ``` nodeA nodeB -------------------- -------------------- a. send METADATA_UPDATED held token_lockres:EX b. md_do_sync resync_info_update send RESYNCING + set MD_CLUSTER_SEND_LOCK + wait for holding token_lockres:EX c. mdadm /dev/md0 --remove /dev/sdg + held reconfig_mutex + send REMOVE + wait_event(MD_CLUSTER_SEND_LOCK) d. recv_daemon //METADATA_UPDATED from A process_metadata_update + (mddev_trylock(mddev) \|\| MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD) //this time, both return false forever ``` Explaination: a. A send METADATA_UPDATED This will block another node to send msg b. B does sync jobs, which will send RESYNCING at intervals. This will be block for holding token_lockres:EX lock. c. B do "mdadm --remove", which will send REMOVE. This will be blocked by step <b>: MD_CLUSTER_SEND_LOCK is 1. d. B recv METADATA_UPDATED msg, which send from A in step <a>. This will be blocked by step <c>: holding mddev lock, it makes wait_event can't hold mddev lock. (btw, MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD keep ZERO in this scenario.) There is a similar deadlock in commit `0ba959774e` ("md-cluster: use sync way to handle METADATA_UPDATED msg") In that commit, step c is "update sb". This patch step c is "mdadm --remove". For fixing this issue, we can refer the solution of function: metadata_update_start. Which does the same grab lock_token action. lock_comm can use the same steps to avoid deadlock. By moving MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD from lock_token to lock_comm. It enlarge a little bit window of MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, but it is safe & can break deadlock. Repro steps (I only triggered 3 times with hundreds tests): two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB. ``` ssh root@node2 "mdadm -S --scan" mdadm -S --scan for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \ count=20; done mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh \ --bitmap-chunk=1M ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh" sleep 5 mkfs.xfs /dev/md0 mdadm --manage --add /dev/md0 /dev/sdi mdadm --wait /dev/md0 mdadm --grow --raid-devices=3 /dev/md0 mdadm /dev/md0 --fail /dev/sdg mdadm /dev/md0 --remove /dev/sdg mdadm --grow --raid-devices=2 /dev/md0 ``` test script will hung when executing "mdadm --remove". ``` # dump stacks by "echo t > /proc/sysrq-trigger" md0_cluster_rec D 0 5329 2 0x80004000 Call Trace: __schedule+0x1f6/0x560 ? _cond_resched+0x2d/0x40 ? schedule+0x4a/0xb0 ? process_metadata_update.isra.0+0xdb/0x140 [md_cluster] ? wait_woken+0x80/0x80 ? process_recvd_msg+0x113/0x1d0 [md_cluster] ? recv_daemon+0x9e/0x120 [md_cluster] ? md_thread+0x94/0x160 [md_mod] ? wait_woken+0x80/0x80 ? md_congested+0x30/0x30 [md_mod] ? kthread+0x115/0x140 ? __kthread_bind_mask+0x60/0x60 ? ret_from_fork+0x1f/0x40 mdadm D 0 5423 1 0x00004004 Call Trace: __schedule+0x1f6/0x560 ? __schedule+0x1fe/0x560 ? schedule+0x4a/0xb0 ? lock_comm.isra.0+0x7b/0xb0 [md_cluster] ? wait_woken+0x80/0x80 ? remove_disk+0x4f/0x90 [md_cluster] ? hot_remove_disk+0xb1/0x1b0 [md_mod] ? md_ioctl+0x50c/0xba0 [md_mod] ? wait_woken+0x80/0x80 ? blkdev_ioctl+0xa2/0x2a0 ? block_ioctl+0x39/0x40 ? ksys_ioctl+0x82/0xc0 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x5f/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 md0_resync D 0 5425 2 0x80004000 Call Trace: __schedule+0x1f6/0x560 ? schedule+0x4a/0xb0 ? dlm_lock_sync+0xa1/0xd0 [md_cluster] ? wait_woken+0x80/0x80 ? lock_token+0x2d/0x90 [md_cluster] ? resync_info_update+0x95/0x100 [md_cluster] ? raid1_sync_request+0x7d3/0xa40 [raid1] ? md_do_sync.cold+0x737/0xc8f [md_mod] ? md_thread+0x94/0x160 [md_mod] ? md_congested+0x30/0x30 [md_mod] ? kthread+0x115/0x140 ? __kthread_bind_mask+0x60/0x60 ? ret_from_fork+0x1f/0x40 ``` At last, thanks for Xiao's solution. Cc: stable@vger.kernel.org Signed-off-by: Zhao Heming <heming.zhao@suse.com> Suggested-by: Xiao Ni <xni@redhat.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:35 -08:00
Zhao Heming	a8da01f79c	md/cluster: block reshape with remote resync job Reshape request should be blocked with ongoing resync job. In cluster env, a node can start resync job even if the resync cmd isn't executed on it, e.g., user executes "mdadm --grow" on node A, sometimes node B will start resync job. However, current update_raid_disks() only check local recovery status, which is incomplete. As a result, we see user will execute "mdadm --grow" successfully on local, while the remote node deny to do reshape job when it doing resync job. The inconsistent handling cause array enter unexpected status. If user doesn't observe this issue and continue executing mdadm cmd, the array doesn't work at last. Fix this issue by blocking reshape request. When node executes "--grow" and detects ongoing resync, it should stop and report error to user. The following script reproduces the issue with ~100% probability. (two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB) ``` # on node1, node2 is the remote node. ssh root@node2 "mdadm -S --scan" mdadm -S --scan for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \ count=20; done mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh" sleep 5 mdadm --manage --add /dev/md0 /dev/sdi mdadm --wait /dev/md0 mdadm --grow --raid-devices=3 /dev/md0 mdadm /dev/md0 --fail /dev/sdg mdadm /dev/md0 --remove /dev/sdg mdadm --grow --raid-devices=2 /dev/md0 ``` Cc: stable@vger.kernel.org Signed-off-by: Zhao Heming <heming.zhao@suse.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:35 -08:00
Pankaj Gupta	a23f2aae84	md: use current request time as base for ktime comparisons Request coalescing logic uses 'prev_flush_start' as base to compare the current request start time. 'prev_flush_start' is updated in other context. This patch changes this by using ktime comparison base to 'req_start' for better readability of code. Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:35 -08:00
Pankaj Gupta	204d1a6434	md: add comments in md_flush_request() Request coalescing logic is dependent on flush time update in other context. This patch adds comments to understand the code flow better. Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:34 -08:00
Pankaj Gupta	81ba3c2462	md: improve variable names in md_flush_request() This patch improves readability by using better variable names in flush request coalescing logic. Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:34 -08:00
Kevin Vigor	93decc5636	md/raid10: initialize r10_bio->read_slot before use. In __make_request() a new r10bio is allocated and passed to raid10_read_request(). The read_slot member of the bio is not initialized, and the raid10_read_request() uses it to index an array. This leads to occasional panics. Fix by initializing the field to invalid value and checking for valid value in raid10_read_request(). Cc: stable@vger.kernel.org Signed-off-by: Kevin Vigor <kvigor@gmail.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:28 -08:00
Dae R. Jeong	c731b84b51	md: fix a warning caused by a race between concurrent md_ioctl()s Syzkaller reports a warning as belows. WARNING: CPU: 0 PID: 9647 at drivers/md/md.c:7169 ... Call Trace: ... RIP: 0010:md_ioctl+0x4017/0x5980 drivers/md/md.c:7169 RSP: 0018:ffff888096027950 EFLAGS: 00010293 RAX: ffff88809322c380 RBX: 0000000000000932 RCX: ffffffff84e266f2 RDX: 0000000000000000 RSI: ffffffff84e299f7 RDI: 0000000000000007 RBP: ffff888096027bc0 R08: ffff88809322c380 R09: ffffed101341a482 R10: ffff888096027940 R11: ffff88809a0d240f R12: 0000000000000932 R13: ffff8880a2c14100 R14: ffff88809a0d2268 R15: ffff88809a0d2408 __blkdev_driver_ioctl block/ioctl.c:304 [inline] blkdev_ioctl+0xece/0x1c10 block/ioctl.c:606 block_ioctl+0xee/0x130 fs/block_dev.c:1930 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is caused by a race between two concurrenct md_ioctl()s closing the array. CPU1 (md_ioctl()) CPU2 (md_ioctl()) ------ ------ set_bit(MD_CLOSING, &mddev->flags); did_set_md_closing = true; WARN_ON_ONCE(test_bit(MD_CLOSING, &mddev->flags)); if(did_set_md_closing) clear_bit(MD_CLOSING, &mddev->flags); Fix the warning by returning immediately if the MD_CLOSING bit is set in &mddev->flags which indicates that the array is being closed. Fixes: `065e519e71` ("md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop") Reported-by: syzbot+1e46a0864c1a6e9bd3d8@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Dae R. Jeong <dae.r.jeong@kaist.ac.kr> Signed-off-by: Song Liu <songliubraving@fb.com>	2020-11-30 10:12:18 -08:00
Mikulas Patocka	67aa3ec3db	dm writecache: fix the maximum number of arguments Advance the maximum number of arguments to 16. This fixes issue where certain operations, combined with table configured args, exceed 10 arguments. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: `48debafe4f` ("dm: add writecache target") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-11-17 10:45:06 -05:00
Mikulas Patocka	e5d41cbca1	dm writecache: advance the number of arguments when reporting max_age When reporting the "max_age" value the number of arguments must advance by two. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: `3923d4854e` ("dm writecache: implement gradual cleanup") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-11-17 10:45:06 -05:00
Mikulas Patocka	a7a10bce8a	dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY Don't use crypto drivers that have the flag CRYPTO_ALG_ALLOCATES_MEMORY set. These drivers allocate memory and thus they are not suitable for block I/O processing. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-11-17 10:45:06 -05:00
Christoph Hellwig	94d91e7f8c	md: remove a spurious call to revalidate_disk_size in update_size None of the ->resize methods updates the disk size, so calling revalidate_disk_size here won't do anything. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-11-16 08:34:15 -07:00

1 2 3 4 5 ...

6633 Коммитов