WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Li Zetao	c8659bbb15	ublk: Fix signedness bug returning warning There are two warnings reported by smatch: drivers/block/ublk_drv.c:445 ublk_setup_iod_zoned() warn: signedness bug returning '(-95)' drivers/block/ublk_drv.c:963 ublk_setup_iod() warn: signedness bug returning '(-5)' The type of "blk_status_t" is either be a u32 or u8, but this two functions return a negative value when not supported or failed. Use the error code of the blk module to fix these warnings. Fixes: `29802d7ca3` ("ublk: enable zoned storage support") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202308100201.TCRhgdvN-lkp@intel.com/ Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230810084836.3535322-1-lizetao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-10 07:19:23 -06:00
Jinyoung Choi	0ece1d649b	bio-integrity: create multi-page bvecs in bio_integrity_add_page() In general, the bvec data structure consists of one for physically continuous pages. But, in the bvec configuration for bip, physically continuous integrity pages are composed of each bvec. Allow bio_integrity_add_page() to create multi-page bvecs, just like the bio payloads. This simplifies adding larger payloads, and fixes support for non-tiny workloads with nvme, which stopped using scatterlist for metadata a while ago. Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Fixes: `783b94bd92` ("nvme-pci: do not build a scatterlist to map metadata") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Tested-by: "Martin K. Petersen" <martin.petersen@oracle.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230803025202epcms2p82f57cbfe32195da38c776377b55aed59@epcms2p8 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-09 16:05:35 -06:00
Jinyoung Choi	d1f04c2e23	bio-integrity: cleanup adding integrity pages to bip's bvec. bio_integrity_add_page() returns the add length if successful, else 0, just as bio_add_page. Simply check return value checking in bio_integrity_prep to not deal with a > 0 but < len case that can't happen. Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Tested-by: "Martin K. Petersen" <martin.petersen@oracle.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230803025058epcms2p5a4d0db5da2ad967668932d463661c633@epcms2p5 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-09 16:05:35 -06:00
Jinyoung Choi	80814b8e35	bio-integrity: update the payload size in bio_integrity_add_page() Previously, the bip's bi_size has been set before an integrity pages were added. If a problem occurs in the process of adding pages for bip, the bi_size mismatch problem must be dealt with. When the page is successfully added to bvec, the bi_size is updated. The parts affected by the change were also contained in this commit. Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Tested-by: "Martin K. Petersen" <martin.petersen@oracle.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230803024956epcms2p38186a17392706650c582d38ef3dbcd32@epcms2p3 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-09 16:05:35 -06:00
Jinyoung Choi	7c8998f75d	block: make bvec_try_merge_hw_page() non-static This will be used for multi-page configuration for integrity payload. Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Tested-by: "Martin K. Petersen" <martin.petersen@oracle.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230803024827epcms2p838d9e9131492c86a159fff25d195658f@epcms2p8 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-09 16:05:35 -06:00
Zhiguo Niu	d47f9717e5	block/mq-deadline: use correct way to throttling write requests The original formula was inaccurate: dd->async_depth = max(1UL, 3 * q->nr_requests / 4); For write requests, when we assign a tags from sched_tags, data->shallow_depth will be passed to sbitmap_find_bit, see the following code: nr = sbitmap_find_bit_in_word(&sb->map[index], min_t (unsigned int, __map_depth(sb, index), depth), alloc_hint, wrap); The smaller of data->shallow_depth and __map_depth(sb, index) will be used as the maximum range when allocating bits. For a mmc device (one hw queue, deadline I/O scheduler): q->nr_requests = sched_tags = 128, so according to the previous calculation method, dd->async_depth = data->shallow_depth = 96, and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5, sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or a write I/O, tags can be allocated to the maximum range each time, which has not throttling effect. In addition, refer to the methods of bfg/kyber I/O scheduler, limit ratiois are calculated base on sched_tags.bitmap_tags.sb.shift. This patch can throttle write requests really. Fixes: `07757588e5` ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/1691061162-22898-1-git-send-email-zhiguo.niu@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:46:41 -06:00
Andreas Hindborg	29802d7ca3	ublk: enable zoned storage support Add zoned storage support to ublk: report_zones and operations: - REQ_OP_ZONE_OPEN - REQ_OP_ZONE_CLOSE - REQ_OP_ZONE_FINISH - REQ_OP_ZONE_RESET - REQ_OP_ZONE_APPEND The zone append feature uses the `addr` field of `struct ublksrv_io_cmd` to communicate ALBA back to the kernel. Therefore ublk must be used with the user copy feature (UBLK_F_USER_COPY) for zoned storage support to be available. Without this feature, ublk will not allow zoned storage support. Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230804114610.179530-4-nmi@metaspace.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:45:53 -06:00
Andreas Hindborg	1a6e88b959	ublk: move check for empty address field on command submission In preparation for zoned storage support, move the check for empty `addr` field into the command handler case statement. Note that the check makes no sense for `UBLK_IO_NEED_GET_DATA` because the `addr` field must always be set for this command. Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230804114610.179530-3-nmi@metaspace.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:45:53 -06:00
Andreas Hindborg	9d4ed6d462	ublk: add helper to check if device supports user copy This will be used by ublk zoned storage support. Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230804114610.179530-2-nmi@metaspace.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:45:53 -06:00
Chengming Zhou	68392b0020	iocost_monitor: improve it by adding iocg wait_ms The iocg can have three throttled metrics: wait, debt, delay. This patch add missing wait_ms to IocgStat to show the latest wait_ms of iocg. As we are here, group iocg usage percents "inflt%" and "usage%" together, and group iocg throttled metrics "wait", "debt" and "delay" together. Effect after changes: nvme0n1 RUN per=50.0ms cur_per=177105.713:v1053528.587 busy= +0 vrate=135.00%:270.00% params=ssd_dfl(CQ) active weight hweight% inflt% usage% wait debt delay InterfererGroup0 * 100/ 100 54.28/ 9.09 0.34 24.07 0.00 0.00 0.00 interfered * 84/ 1000 45.72/ 90.91 0.48 41.09 0.00 0.00 0.00 Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230804065039.8885-3-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:43:03 -06:00
Chengming Zhou	8e93c1acd1	iocost_monitor: print vrate inuse along with base_vrate The real vrate iocost inuse is not base_vrate, but the atomic vtime_rate. We need iocost_monitor tool to display this real vrate that iocost use, to check if the boosted compensated vrate is normal. Effect after change: nvme0n1 RUN per=50.0ms cur_per=172116.580:v1040587.433 busy= +0 \ vrate=135.00%:270.00% params=ssd_dfl(CQ) ^ \| this is real vrate inuse Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20230804065039.8885-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:43:03 -06:00
Chengming Zhou	2eae9c4912	iocost_monitor: fix kernel queue kobj changes When I use iocost_monitor on nvme0n1, this error shows up: "Could not find ioc for nvme0n1" There is no kobj in struct queue in recent kernel, it seems that the commit `2bd85221a6` ("block: untangle request_queue refcounting from sysfs") move the queue kobj to struct gendisk. Fix it by using mq_kobj which is at the same level with queue kobj. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20230804065039.8885-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-08 15:43:03 -06:00
Li Zetao	a24c8b5111	fs/Kconfig: Fix compile error for romfs There are some compile errors reported by kernel test robot: arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_read': storage.c:(.text+0x64): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x9c): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strnlen': storage.c:(.text+0x128): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x16c): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strcmp': storage.c:(.text+0x22c): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: storage.c:(.text+0x27c): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x2a8): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: storage.c:(.text+0x2bc): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x2d4): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x2f4): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x304): undefined reference to `__brelse' The reason for the problem is that the commit "925c86a19bac" ("fs: add CONFIG_BUFFER_HEAD") has added a new config "CONFIG_BUFFER_HEAD" that controls building the buffer_head code, and romfs needs to use the buffer_head API, but no corresponding config has beed added. Select the config "CONFIG_BUFFER_HEAD" in romfs Kconfig to resolve the problem. Fixes: `925c86a19b` ("fs: add CONFIG_BUFFER_HEAD") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308031810.pQzGmR1v-lkp@intel.com/ Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> [axboe: fold in Christoph's incremental] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-04 11:04:05 -06:00
Christoph Hellwig	925c86a19b	fs: add CONFIG_BUFFER_HEAD Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-02 09:13:09 -06:00
Christoph Hellwig	487c607df7	block: use iomap for writes to block devices Use iomap in buffer_head compat mode to write to block devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-02 09:13:09 -06:00
Christoph Hellwig	a05f7bd957	block: stop setting ->direct_IO Direct I/O on block devices now nevers goes through aops->direct_IO. Stop setting it and set the FMODE_CAN_ODIRECT in ->open instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-02 09:13:09 -06:00
Christoph Hellwig	727cfe9767	block: open code __generic_file_write_iter for blkdev writes Open code __generic_file_write_iter to remove the indirect call into ->direct_IO and to prepare using the iomap based write code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-02 09:13:09 -06:00
Christoph Hellwig	2ba39cc46b	fs: rename and move block_page_mkwrite_return block_page_mkwrite_return is neither block nor mkwrite specific, and should not be under CONFIG_BLOCK. Move it to mm.h and rename it to vmf_fs_error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-02 09:13:09 -06:00
Christoph Hellwig	4a8b719f95	fs: remove emergency_thaw_bdev Fold emergency_thaw_bdev into it's only caller, to prepare for buffer.c to be built only when buffer_head support is enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-08-02 09:13:09 -06:00
Jens Axboe	d276bb2910	Merge tag 'md-next-20230729' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.6/block Pull MD updates from Song: "1. Deprecate bitmap file support, by Christoph Hellwig; 2. Fix deadlock with md sync thread, by Yu Kuai; 3. Refactor md io accounting, by Yu Kuai; 4. Various non-urgent fixes by Li Nan, Yu Kuai, and Jack Wang." * tag 'md-next-20230729' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (36 commits) md/md-bitmap: hold 'reconfig_mutex' in backlog_store() md/md-bitmap: remove unnecessary local variable in backlog_store() md/raid10: use dereference_rdev_and_rrdev() to get devices md/raid10: factor out dereference_rdev_and_rrdev() md/raid10: check replacement and rdev to prevent submit the same io twice md/raid1: Avoid lock contention from wake_up() md: restore 'noio_flag' for the last mddev_resume() md: don't quiesce in mddev_suspend() md: remove redundant check in fix_read_error() md/raid10: optimize fix_read_error md/raid1: prioritize adding disk to 'removed' mirror md/md-faulty: enable io accounting md/md-linear: enable io accounting md/md-multipath: enable io accounting md/raid10: switch to use md_account_bio() for io accounting md/raid1: switch to use md_account_bio() for io accounting raid5: fix missing io accounting in raid5_align_endio() md: also clone new io if io accounting is disabled md: move initialization and destruction of 'io_acct_set' to md.c md: deprecate bitmap file support ...	2023-07-29 07:17:50 -06:00
Yu Kuai	44abfa6a95	md/md-bitmap: hold 'reconfig_mutex' in backlog_store() Several reasons why 'reconfig_mutex' should be held: 1) rdev_for_each() is not safe to be called without the lock, because rdev can be removed concurrently. 2) mddev_destroy_serial_pool() and mddev_create_serial_pool() should not be called concurrently. 3) mddev_suspend() from mddev_destroy/create_serial_pool() should be protected by the lock. Fixes: `10c92fca63` ("md-bitmap: create and destroy wb_info_pool with the change of backlog") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230706083727.608914-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Yu Kuai	b4d129640f	md/md-bitmap: remove unnecessary local variable in backlog_store() Local variable is definied first in the beginning of backlog_store(), there is no need to define it again. Fixes: `8c13ab115b` ("md/bitmap: don't set max_write_behind if there is no write mostly device") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230706083727.608914-2-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Li Nan	673643490b	md/raid10: use dereference_rdev_and_rrdev() to get devices Commit `2ae6aaf769` ("md/raid10: fix io loss while replacement replace rdev") reads replacement first to prevent io loss. However, there are same issue in wait_blocked_dev() and raid10_handle_discard(), too. Fix it by using dereference_rdev_and_rrdev() to get devices. Fixes: `d30588b273` ("md/raid10: improve raid10 discard request") Fixes: `f2e7e269a7` ("md/raid10: pull the code that wait for blocked dev into one function") Signed-off-by: Li Nan <linan122@huawei.com> Link: https://lore.kernel.org/r/20230701080529.2684932-4-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Li Nan	b99f8fd2d9	md/raid10: factor out dereference_rdev_and_rrdev() Factor out a helper to get 'rdev' and 'replacement' from config->mirrors. Just to make code cleaner and prepare to fix the bug of io loss while 'replacement' replace 'rdev'. There is no functional change. Signed-off-by: Li Nan <linan122@huawei.com> Link: https://lore.kernel.org/r/20230701080529.2684932-3-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Li Nan	7e85c41b9e	md/raid10: check replacement and rdev to prevent submit the same io twice After commit `4ca40c2ce0` ("md/raid10: Allow replacement device to be replace old drive."), 'rdev' and 'replacement' could appear to be identical. There are already checks for that in wait_blocked_dev() and raid10_write_request(). Add check for raid10_handle_discard() now. Signed-off-by: Li Nan <linan122@huawei.com> Link: https://lore.kernel.org/r/20230701080529.2684932-2-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Jack Wang	21bd9a68fe	md/raid1: Avoid lock contention from wake_up() wake_up is called unconditionally in a few paths such as make_request(), which cause lock contention under high concurrency workload like below raid1_end_write_request wake_up __wake_up_common_lock spin_lock_irqsave Improve performance by only call wake_up() if waitqueue is not empty Fio test script: [global] name=random reads and writes ioengine=libaio direct=1 readwrite=randrw rwmixread=70 iodepth=64 buffered=0 filename=/dev/md0 size=1G runtime=30 time_based randrepeat=0 norandommap refill_buffers ramp_time=10 bs=4k numjobs=400 group_reporting=1 [job1] Test result with 2 ramdisk in raid1 on a Intel Broadwell 56 cores server. Before this patch With this patch READ BW=4621MB/s BW=7337MB/s WRITE BW=1980MB/s BW=3144MB/s The patch is inspired by Yu Kuai's change for raid10: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230705113227.148494-1-jinpu.wang@ionos.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Yu Kuai	e24ed04389	md: restore 'noio_flag' for the last mddev_resume() memalloc_noio_save() is called for the first mddev_suspend(), and repeated mddev_suspend() only increase 'suspended'. However, memalloc_noio_restore() is also called for the first mddev_resume(), which means that memory reclaim will be enabled before the last mddev_resume() is called, while the array is still suspended. Fix this problem by restore 'noio_flag' for the last mddev_resume(). Fixes: `78f57ef9d5` ("md: use memalloc scope APIs in mddev_suspend()/mddev_resume()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230628012931.88911-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Yu Kuai	b39f35ebe8	md: don't quiesce in mddev_suspend() Some levels doesn't implement "pers->quiesce", for example raid0_quiesce() is empty, and now that all levels will drop 'active_io' until io is done, wait for 'active_io' to be 0 is enough to make sure all normal io is done, and percpu_ref_kill() for 'active_io' will make sure no new normal io can be dispatched. There is no need to call "pers->quiesce" anymore from mddev_suspend(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230628012931.88911-2-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Li Nan	02c67a3b72	md: remove redundant check in fix_read_error() In fix_read_error(), 'success' will be checked immediately after assigning it, if it is set to 1 then the loop will break. Checking it again in condition of loop is redundant. Clean it up. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230623173236.2513554-3-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Li Nan	605eeda6e7	md/raid10: optimize fix_read_error We dereference r10_bio->read_slot too many times in fix_read_error(). Optimize it by using a variable to store read_slot. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230623173236.2513554-2-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Li Nan	ffb1e7a03f	md/raid1: prioritize adding disk to 'removed' mirror New disk should be added to "removed" position first instead of to be a replacement. Commit `6090368abc` ("md/raid10: prioritize adding disk to 'removed' mirror") has fixed this issue for raid10. Fix it for raid1 now. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230627014332.3810102-1-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>	2023-07-27 00:13:30 -07:00
Yu Kuai	dd9a686014	md/md-faulty: enable io accounting use md_account_bio() to enable io accounting, also make sure mddev_suspend() will wait for all io to be done. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-9-yukuai1@huaweicloud.com	2023-07-27 00:13:30 -07:00
Yu Kuai	09f43cb530	md/md-linear: enable io accounting use md_account_bio() to enable io accounting, also make sure mddev_suspend() will wait for all io to be done. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-8-yukuai1@huaweicloud.com	2023-07-27 00:13:30 -07:00
Yu Kuai	bdf2b52136	md/md-multipath: enable io accounting use md_account_bio() to enable io accounting, also make sure mddev_suspend() will wait for all io to be done. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-7-yukuai1@huaweicloud.com	2023-07-27 00:13:29 -07:00
Yu Kuai	8204552383	md/raid10: switch to use md_account_bio() for io accounting Make sure that 'active_io' will represent inflight io instead of io that is dispatching, and io accounting from all levels will be consistent. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-6-yukuai1@huaweicloud.com	2023-07-27 00:13:29 -07:00
Yu Kuai	bb2a9acefa	md/raid1: switch to use md_account_bio() for io accounting Two problems can be fixed this way: 1) 'active_io' will represent inflight io instead of io that is dispatching. 2) If io accounting is enabled or disabled while io is still inflight, bio_start_io_acct() and bio_end_io_acct() is not balanced and io inflight counter will be leaked. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-5-yukuai1@huaweicloud.com	2023-07-27 00:13:29 -07:00
Yu Kuai	05048cbcca	raid5: fix missing io accounting in raid5_align_endio() Io will only be accounted as done from raid5_align_endio() if the io succeeded, and io inflight counter will be leaked if such io failed. Fix this problem by switching to use md_account_bio() for io accounting. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-4-yukuai1@huaweicloud.com	2023-07-27 00:13:29 -07:00
Yu Kuai	c687297b88	md: also clone new io if io accounting is disabled Currently, 'active_io' is grabbed before make_reqeust() is called, and it's dropped immediately make_reqeust() returns. Hence 'active_io' actually means io is dispatching, not io is inflight. For raid0 and raid456 that io accounting is enabled, 'active_io' will also be grabbed when bio is cloned for io accounting, and this 'active_io' is dropped until io is done. Always clone new bio so that 'active_io' will mean that io is inflight, raid1 and raid10 will switch to use this method in later patches. Now that bio will be cloned even if io accounting is disabled, also rename related structure from '_acct_' to '_clone_'. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-3-yukuai1@huaweicloud.com	2023-07-27 00:13:29 -07:00
Yu Kuai	c567c86b90	md: move initialization and destruction of 'io_acct_set' to md.c 'io_acct_set' is only used for raid0 and raid456, prepare to use it for raid1 and raid10, so that io accounting from different levels can be consistent. By the way, follow up patches will also use this io clone mechanism to make sure 'active_io' represents in flight io, not io that is dispatching, so that mddev_suspend will wait for io to be done as designed. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621165110.1498313-2-yukuai1@huaweicloud.com	2023-07-27 00:13:29 -07:00
Christoph Hellwig	0ae1c9d384	md: deprecate bitmap file support The support for bitmaps on files is a very bad idea abusing various kernel APIs, and fundamentally requires the file to not be on the actual array without a way to check that this is actually the case. Add a deprecation warning to see if we might be able to eventually drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-12-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	a34d4ef82c	md: make bitmap file support optional The support for write intent bitmaps in files on an external files in md is a hot mess that abuses ->bmap to map file offsets into physical device objects, and also abuses buffer_heads in a creative way. Make this code optional so that MD can be built into future kernels without buffer_head support, and so that we can eventually deprecate it. Note this does not affect the internal bitmap support, which has none of the problems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-11-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	d7038f9518	md-bitmap: don't use ->index for pages backing the bitmap file The md driver allocates pages for storing the bitmap file data, which are not page cache pages, and then stores the page granularity file offset in page->index, which is a field that isn't really valid except for page cache pages. Use a separate index for the superblock, and use the scheme used at read size to recalculate the index for the bitmap pages instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-10-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	f5f2d5ac9f	md-bitmap: account for mddev->bitmap_info.offset in read_sb_page Diretly apply mddev->bitmap_info.offset to the sector number to read instead of doing that in both callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-9-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	0c3ea5cc8f	md-bitmap: cleanup read_sb_page Convert read_sb_page to the normal kernel coding style, calculate the target sector only once, and add a local iosize variable to make the call to sync_page_io more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-8-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	844dc6691a	md-bitmap: refactor md_bitmap_init_from_disk Split the confusing loop in md_bitmap_init_from_disk that iterates over all chunks but also needs to read and map the pages into three separate loops: one that iterates over the pages to read them, a second optional one to iterate over the pages to mark them invalid if the bitmaps are out of date, and a final one that actually iterates over the chunks. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306160552.smw0qbmb-lkp@intel.com/ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-7-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	d681054c2f	md-bitmap: rename read_page to read_file_page Make the difference to read_sb_page clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-6-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	5339178e53	md-bitmap: split file writes into a separate helper Split the file write code out of write_page into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-5-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	92348518f2	md-bitmap: use %pD to print the file name in md_bitmap_file_kick Don't bother allocating an extra buffer in the I/O failure handler and instead use the printk built-in format to print the last 4 path name components. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-4-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	546ac0b2e2	md-bitmap: initialize variables at declaration time in md_bitmap_file_unmap Just a small tidyup to prepare for bigger changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-3-hch@lst.de	2023-07-27 00:13:29 -07:00
Christoph Hellwig	59cefee75b	md-bitmap: set BITMAP_WRITE_ERROR in write_sb_page Set BITMAP_WRITE_ERROR directly in write_sb_page instead of propagating the error to the caller and setting it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230615064840.629492-2-hch@lst.de	2023-07-27 00:13:29 -07:00

1 2 3 4 5 ...

1200155 Коммитов Все ветки Поиск

1200155 Коммитов

Все ветки