btrfs: add async discard implementation overview
Give a brief overview for how async discard is implemented. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
Родитель
9ddf648f9c
Коммит
dbc2a8c927
|
@ -12,6 +12,45 @@
|
|||
#include "discard.h"
|
||||
#include "free-space-cache.h"
|
||||
|
||||
/*
|
||||
* This contains the logic to handle async discard.
|
||||
*
|
||||
* Async discard manages trimming of free space outside of transaction commit.
|
||||
* Discarding is done by managing the block_groups on a LRU list based on free
|
||||
* space recency. Two passes are used to first prioritize discarding extents
|
||||
* and then allow for trimming in the bitmap the best opportunity to coalesce.
|
||||
* The block_groups are maintained on multiple lists to allow for multiple
|
||||
* passes with different discard filter requirements. A delayed work item is
|
||||
* used to manage discarding with timeout determined by a max of the delay
|
||||
* incurred by the iops rate limit, the byte rate limit, and the max delay of
|
||||
* BTRFS_DISCARD_MAX_DELAY.
|
||||
*
|
||||
* Note, this only keeps track of block_groups that are explicitly for data.
|
||||
* Mixed block_groups are not supported.
|
||||
*
|
||||
* The first list is special to manage discarding of fully free block groups.
|
||||
* This is necessary because we issue a final trim for a full free block group
|
||||
* after forgetting it. When a block group becomes unused, instead of directly
|
||||
* being added to the unused_bgs list, we add it to this first list. Then
|
||||
* from there, if it becomes fully discarded, we place it onto the unused_bgs
|
||||
* list.
|
||||
*
|
||||
* The in-memory free space cache serves as the backing state for discard.
|
||||
* Consequently this means there is no persistence. We opt to load all the
|
||||
* block groups in as not discarded, so the mount case degenerates to the
|
||||
* crashing case.
|
||||
*
|
||||
* As the free space cache uses bitmaps, there exists a tradeoff between
|
||||
* ease/efficiency for find_free_extent() and the accuracy of discard state.
|
||||
* Here we opt to let untrimmed regions merge with everything while only letting
|
||||
* trimmed regions merge with other trimmed regions. This can cause
|
||||
* overtrimming, but the coalescing benefit seems to be worth it. Additionally,
|
||||
* bitmap state is tracked as a whole. If we're able to fully trim a bitmap,
|
||||
* the trimmed flag is set on the bitmap. Otherwise, if an allocation comes in,
|
||||
* this resets the state and we will retry trimming the whole bitmap. This is a
|
||||
* tradeoff between discard state accuracy and the cost of accounting.
|
||||
*/
|
||||
|
||||
/* This is an initial delay to give some chance for block reuse */
|
||||
#define BTRFS_DISCARD_DELAY (120ULL * NSEC_PER_SEC)
|
||||
#define BTRFS_DISCARD_UNUSED_DELAY (10ULL * NSEC_PER_SEC)
|
||||
|
|
Загрузка…
Ссылка в новой задаче