WSL2-Linux-Kernel/drivers/scsi/ufs/ufshpb.h

321 строка
8.6 KiB
C
Исходник Обычный вид История

scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Universal Flash Storage Host Performance Booster
*
* Copyright (C) 2017-2021 Samsung Electronics Co., Ltd.
*
* Authors:
* Yongmyung Lee <ymhungry.lee@samsung.com>
* Jinyoung Choi <j-young.choi@samsung.com>
*/
#ifndef _UFSHPB_H_
#define _UFSHPB_H_
/* hpb response UPIU macro */
#define HPB_RSP_NONE 0x0
#define HPB_RSP_REQ_REGION_UPDATE 0x1
#define HPB_RSP_DEV_RESET 0x2
#define MAX_ACTIVE_NUM 2
#define MAX_INACTIVE_NUM 2
#define DEV_DATA_SEG_LEN 0x14
#define DEV_SENSE_SEG_LEN 0x12
#define DEV_DES_TYPE 0x80
#define DEV_ADDITIONAL_LEN 0x10
/* hpb map & entries macro */
#define HPB_RGN_SIZE_UNIT 512
#define HPB_ENTRY_BLOCK_SIZE 4096
#define HPB_ENTRY_SIZE 0x8
#define PINNED_NOT_SET U32_MAX
/* hpb support chunk size */
#define HPB_LEGACY_CHUNK_HIGH 1
#define HPB_MULTI_CHUNK_HIGH 255
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
/* hpb vender defined opcode */
#define UFSHPB_READ 0xF8
#define UFSHPB_READ_BUFFER 0xF9
#define UFSHPB_READ_BUFFER_ID 0x01
#define UFSHPB_WRITE_BUFFER 0xFA
#define UFSHPB_WRITE_BUFFER_INACT_SINGLE_ID 0x01
#define UFSHPB_WRITE_BUFFER_PREFETCH_ID 0x02
#define UFSHPB_WRITE_BUFFER_INACT_ALL_ID 0x03
#define HPB_WRITE_BUFFER_CMD_LENGTH 10
#define MAX_HPB_READ_ID 0x7F
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
#define HPB_READ_BUFFER_CMD_LENGTH 10
#define LU_ENABLED_HPB_FUNC 0x02
#define HPB_RESET_REQ_RETRIES 10
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
#define HPB_MAP_REQ_RETRIES 5
#define HPB_REQUEUE_TIME_MS 0
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
#define HPB_SUPPORT_VERSION 0x200
#define HPB_SUPPORT_LEGACY_VERSION 0x100
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
enum UFSHPB_MODE {
HPB_HOST_CONTROL,
HPB_DEVICE_CONTROL,
};
enum UFSHPB_STATE {
HPB_INIT = 0,
HPB_PRESENT = 1,
HPB_SUSPEND,
HPB_FAILED,
HPB_RESET,
};
enum HPB_RGN_STATE {
HPB_RGN_INACTIVE,
HPB_RGN_ACTIVE,
/* pinned regions are always active */
HPB_RGN_PINNED,
};
enum HPB_SRGN_STATE {
HPB_SRGN_UNUSED,
HPB_SRGN_INVALID,
HPB_SRGN_VALID,
HPB_SRGN_ISSUED,
};
/**
* struct ufshpb_lu_info - UFSHPB logical unit related info
* @num_blocks: the number of logical block
* @pinned_start: the start region number of pinned region
* @num_pinned: the number of pinned regions
* @max_active_rgns: maximum number of active regions
*/
struct ufshpb_lu_info {
int num_blocks;
int pinned_start;
int num_pinned;
int max_active_rgns;
};
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
struct ufshpb_map_ctx {
struct page **m_page;
unsigned long *ppn_dirty;
};
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
struct ufshpb_subregion {
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
struct ufshpb_map_ctx *mctx;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
enum HPB_SRGN_STATE srgn_state;
int rgn_idx;
int srgn_idx;
bool is_last;
/* subregion reads - for host mode */
unsigned int reads;
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
/* below information is used by rsp_list */
struct list_head list_act_srgn;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
};
struct ufshpb_region {
struct ufshpb_lu *hpb;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
struct ufshpb_subregion *srgn_tbl;
enum HPB_RGN_STATE rgn_state;
int rgn_idx;
int srgn_cnt;
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
/* below information is used by rsp_list */
struct list_head list_inact_rgn;
/* below information is used by lru */
struct list_head list_lru_rgn;
unsigned long rgn_flags;
#define RGN_FLAG_DIRTY 0
#define RGN_FLAG_UPDATE 1
/* region reads - for host mode */
spinlock_t rgn_lock;
unsigned int reads;
/* region "cold" timer - for host mode */
ktime_t read_timeout;
unsigned int read_timeout_expiries;
struct list_head list_expired_rgn;
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
};
#define for_each_sub_region(rgn, i, srgn) \
for ((i) = 0; \
((i) < (rgn)->srgn_cnt) && ((srgn) = &(rgn)->srgn_tbl[i]); \
(i)++)
/**
* struct ufshpb_req - HPB related request structure (write/read buffer)
* @req: block layer request structure
* @bio: bio for this request
* @hpb: ufshpb_lu structure that related to
* @list_req: ufshpb_req mempool list
* @sense: store its sense data
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
* @mctx: L2P map information
* @rgn_idx: target region index
* @srgn_idx: target sub-region index
* @lun: target logical unit number
* @m_page: L2P map information data for pre-request
* @len: length of host-side cached L2P map in m_page
* @lpn: start LPN of L2P map in m_page
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
*/
struct ufshpb_req {
struct request *req;
struct bio *bio;
struct ufshpb_lu *hpb;
struct list_head list_req;
union {
struct {
struct ufshpb_map_ctx *mctx;
unsigned int rgn_idx;
unsigned int srgn_idx;
unsigned int lun;
} rb;
struct {
struct page *m_page;
unsigned int len;
unsigned long lpn;
} wb;
};
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
};
struct victim_select_info {
struct list_head lh_lru_rgn; /* LRU list of regions */
int max_lru_active_cnt; /* supported hpb #region - pinned #region */
atomic_t active_cnt;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
};
/**
* ufshpb_params - ufs hpb parameters
* @requeue_timeout_ms - requeue threshold of wb command (0x2)
* @activation_thld - min reads [IOs] to activate/update a region
* @normalization_factor - shift right the region's reads
* @eviction_thld_enter - min reads [IOs] for the entering region in eviction
* @eviction_thld_exit - max reads [IOs] for the exiting region in eviction
* @read_timeout_ms - timeout [ms] from the last read IO to the region
* @read_timeout_expiries - amount of allowable timeout expireis
* @timeout_polling_interval_ms - frequency in which timeouts are checked
* @inflight_map_req - number of inflight map requests
*/
struct ufshpb_params {
unsigned int requeue_timeout_ms;
unsigned int activation_thld;
unsigned int normalization_factor;
unsigned int eviction_thld_enter;
unsigned int eviction_thld_exit;
unsigned int read_timeout_ms;
unsigned int read_timeout_expiries;
unsigned int timeout_polling_interval_ms;
unsigned int inflight_map_req;
};
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
struct ufshpb_stats {
u64 hit_cnt;
u64 miss_cnt;
u64 rb_noti_cnt;
u64 rb_active_cnt;
u64 rb_inactive_cnt;
u64 map_req_cnt;
u64 pre_req_cnt;
u64 umap_req_cnt;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
};
struct ufshpb_lu {
int lun;
struct scsi_device *sdev_ufs_lu;
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
spinlock_t rgn_state_lock; /* for protect rgn/srgn state */
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
struct ufshpb_region *rgn_tbl;
atomic_t hpb_state;
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
spinlock_t rsp_list_lock;
struct list_head lh_act_srgn; /* hold rsp_list_lock */
struct list_head lh_inact_rgn; /* hold rsp_list_lock */
/* pre request information */
struct ufshpb_req *pre_req;
int num_inflight_pre_req;
int throttle_pre_req;
int num_inflight_map_req; /* hold param_lock */
spinlock_t param_lock;
struct list_head lh_pre_req_free;
int pre_req_max_tr_len;
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
/* cached L2P map management worker */
struct work_struct map_work;
/* for selecting victim */
struct victim_select_info lru_info;
struct work_struct ufshpb_normalization_work;
struct delayed_work ufshpb_read_to_work;
unsigned long work_data_bits;
#define TIMEOUT_WORK_RUNNING 0
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
/* pinned region information */
u32 lu_pinned_start;
u32 lu_pinned_end;
/* HPB related configuration */
u32 rgns_per_lu;
u32 srgns_per_lu;
u32 last_srgn_entries;
int srgns_per_rgn;
u32 srgn_mem_size;
u32 entries_per_rgn_mask;
u32 entries_per_rgn_shift;
u32 entries_per_srgn;
u32 entries_per_srgn_mask;
u32 entries_per_srgn_shift;
u32 pages_per_srgn;
bool is_hcm;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
struct ufshpb_stats stats;
struct ufshpb_params params;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
struct kmem_cache *map_req_cache;
struct kmem_cache *m_page_cache;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
struct list_head list_hpb_lu;
};
struct ufs_hba;
struct ufshcd_lrb;
#ifndef CONFIG_SCSI_UFS_HPB
static int ufshpb_prep(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) { return 0; }
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
static void ufshpb_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) {}
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
static void ufshpb_resume(struct ufs_hba *hba) {}
static void ufshpb_suspend(struct ufs_hba *hba) {}
static void ufshpb_reset(struct ufs_hba *hba) {}
static void ufshpb_reset_host(struct ufs_hba *hba) {}
static void ufshpb_init(struct ufs_hba *hba) {}
static void ufshpb_init_hpb_lu(struct ufs_hba *hba, struct scsi_device *sdev) {}
static void ufshpb_destroy_lu(struct ufs_hba *hba, struct scsi_device *sdev) {}
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
static void ufshpb_remove(struct ufs_hba *hba) {}
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
static bool ufshpb_is_allowed(struct ufs_hba *hba) { return false; }
static void ufshpb_get_geo_info(struct ufs_hba *hba, u8 *geo_buf) {}
static void ufshpb_get_dev_info(struct ufs_hba *hba, u8 *desc_buf) {}
static bool ufshpb_is_legacy(struct ufs_hba *hba) { return false; }
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
#else
int ufshpb_prep(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
void ufshpb_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
void ufshpb_resume(struct ufs_hba *hba);
void ufshpb_suspend(struct ufs_hba *hba);
void ufshpb_reset(struct ufs_hba *hba);
void ufshpb_reset_host(struct ufs_hba *hba);
void ufshpb_init(struct ufs_hba *hba);
void ufshpb_init_hpb_lu(struct ufs_hba *hba, struct scsi_device *sdev);
void ufshpb_destroy_lu(struct ufs_hba *hba, struct scsi_device *sdev);
scsi: ufs: ufshpb: L2P map management for HPB read Implement L2P map management in HPB. The HPB divides logical addresses into several regions. A region consists of several sub-regions. The sub-region is a basic unit where L2P mapping is managed. The driver loads L2P mapping data of each sub-region. The loaded sub-region is called active-state. The HPB driver unloads L2P mapping data as region unit. The unloaded region is called inactive-state. Sub-region/region candidates to be loaded and unloaded are delivered from the UFS device. The UFS device delivers the recommended active sub-region and inactivate region to the driver using sense data. The HPB module performs L2P mapping management on the host through the delivered information. A pinned region is a preset region on the UFS device that is always in activate-state. The data structures for map data requests and L2P mappings use the mempool API, minimizing allocation overhead while avoiding static allocation. The mininum size of the memory pool used in the HPB is implemented as a module parameter so that it can be configurable by the user. To guarantee a minimum memory pool size of 4MB: ufshpb_host_map_kbytes=4096. The map_work manages active/inactive via 2 "to-do" lists: - hpb->lh_inact_rgn: regions to be inactivated - hpb->lh_act_srgn: subregions to be activated These lists are maintained on I/O completion. [mkp: switch to REQ_OP_DRV_*] Link: https://lore.kernel.org/r/20210712085859epcms2p36e420f19564f6cd0c4a45d54949619eb@epcms2p3 Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:59 +03:00
void ufshpb_remove(struct ufs_hba *hba);
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
bool ufshpb_is_allowed(struct ufs_hba *hba);
void ufshpb_get_geo_info(struct ufs_hba *hba, u8 *geo_buf);
void ufshpb_get_dev_info(struct ufs_hba *hba, u8 *desc_buf);
bool ufshpb_is_legacy(struct ufs_hba *hba);
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
extern struct attribute_group ufs_sysfs_hpb_stat_group;
extern struct attribute_group ufs_sysfs_hpb_param_group;
scsi: ufs: ufshpb: Introduce Host Performance Buffer feature Implement Host Performance Buffer (HPB) initialization and add function calls to UFS core driver. NAND flash-based storage devices, including UFS, have mechanisms to translate logical addresses of I/O requests to the corresponding physical addresses of the flash storage. In UFS, logical-to-physical-address (L2P) map data, which is required to identify the physical address for the requested I/Os, can only be partially stored in SRAM from NAND flash. Due to this partial loading, accessing the flash address area, where the L2P information for that address is not loaded in the SRAM, can result in serious performance degradation. The basic concept of HPB is to cache L2P mapping entries in host system memory so that both physical block address (PBA) and logical block address (LBA) can be delivered in HPB read command. The HPB read command allows to read data faster than a regular read command in UFS since it provides the physical address (HPB Entry) of the desired logical block in addition to its logical address. The UFS device can access the physical block in NAND directly without searching and uploading L2P mapping table. This improves read performance because the NAND read operation for uploading L2P mapping table is removed. In HPB initialization, the host checks if the UFS device supports HPB feature and retrieves related device capabilities. Then, HPB parameters are configured in the device. Total start-up time of popular applications was measured and the difference observed between HPB being enabled and disabled. Popular applications are 12 game apps and 24 non-game apps. Each test cycle consists of running 36 applications in sequence. We repeated the cycle for observing performance improvement by L2P mapping cache hit in HPB. The following is the test environment: - kernel version: 4.4.0 - RAM: 8GB - UFS 2.1 (64GB) Results: +-------+----------+----------+-------+ | cycle | baseline | with HPB | diff | +-------+----------+----------+-------+ | 1 | 272.4 | 264.9 | -7.5 | | 2 | 250.4 | 248.2 | -2.2 | | 3 | 226.2 | 215.6 | -10.6 | | 4 | 230.6 | 214.8 | -15.8 | | 5 | 232.0 | 218.1 | -13.9 | | 6 | 231.9 | 212.6 | -19.3 | +-------+----------+----------+-------+ We also measured HPB performance using iozone: $ iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F \ mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 \ mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 \ mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Results: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Link: https://lore.kernel.org/r/20210712085830epcms2p8c1288b7f7a81b044158a18232617b572@epcms2p8 Reported-by: kernel test robot <lkp@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Tested-by: Can Guo <cang@codeaurora.org> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Can Guo <cang@codeaurora.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Acked-by: Avri Altman <Avri.Altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-12 11:58:30 +03:00
#endif
#endif /* End of Header */