blobxfer/docs/30-vectored-io.md

6.2 KiB

blobxfer Vectored I/O

blobxfer supports Vectored I/O (scatter/gather) which can help alleviate problems associated with single blob or single fileshare throughput limits. Additionally, blobxfer has the ability to replicate a single source to multiple destinations to allow for increased resiliency or throughput for consumption later.

Distribution Modes

blobxfer supports two distribution modes: replica and stripe. The following sections describe each.

Replica

replica mode replicates an entire file (or set of files) across all specified destinations. This allows for multiple backups, resiliency, and potentially increased download throughput later if the clients understand how to download from multiple sources.

The logic is fairly simple in how this is accomplished. Each source file has portions of the file read from disk, buffered in memory and then replicated across all specified destinations.

                       Whole File             +---------------------+
                       Replication            |                     |
             +------------------------------> |  Destination 0:     |
             |                                |  Storage Account A  |
             |                                |                     |
             |                                +---------------------+
             |
             |
+------------+---------------+  Whole File    +---------------------+
|                            |  Replication   |                     |
|  10 GiB VHD on Local Disk  +--------------> |  Destination 1:     |
|                            |                |  Storage Account B  |
+------------+---------------+                |                     |
             |                                +---------------------+
             |
             |
             |                                +---------------------+
             |         Whole File             |                     |
             |         Replication            |  Destination 2:     |
             +------------------------------> |  Storage Account C  |
                                              |                     |
                                              +---------------------+

In order to take advantage of replica Vectored IO, you must use a YAML configuration file to define multiple destinations.

Stripe

stripe mode will splice a file into multiple chunks and scatter these chunks across destinations specified. These destinations can be single or multiple containers within the same storage account or even containers distributed across multiple storage accounts if single storage account bandwidth limits are insufficient.

blobxfer will slice the source file into multiple chunks where the stripe_chunk_size_bytes is the stripe width of each chunk. This parameter will allow you to effectively control how many blobs/files are created on Azure. blobxfer will then round-robin through all of the destinations specified to scatter the slices. Information required to reconstruct the original file is stored on the blob or file metadata. It is important to keep this metadata in-tact or reconstruction will fail.

                                                     +---------------------+
                                                     |                     | <-----------------------------------+
                                                     |  Destination 1:     |                                     |
                                                     |  Storage Account B  | <---------------------+             |
                                                     |                     |                       |             |
                                                     +---------------------+ <-------+             |             |
                                                                                     |             |             |
                                                         ^             ^             |             |             |
                                                         |             |             |             |             |
                                 1 GiB Stripe            |             |             |             |             |
+-----------------------------+  Width        +------+---+--+------+---+--+------+---+--+------+---+--+------+---+--+
|                             |               |      |      |      |      |      |      |      |      |      |      |
|  10 GiB File on Local Disk  | +-----------> |  D0  |  D1  |  D0  |  D1  |  D0  |  D1  |  D0  |  D1  |  D0  |  D1  |
|                             |               |      |      |      |      |      |      |      |      |      |      |
+-----------------------------+  10 Vectored  +---+--+------+---+--+------+---+--+------+---+--+------+---+--+------+
                                 Slices           |             |             |             |             |
                                                  |             |             |             |             |
                                                  |             v             |             |             |
                                                  |                           |             |             |
                                                  +> +---------------------+ <+             |             |
                                                     |                     |                |             |
                                                     |  Destination 0:     | <--------------+             |
                                                     |  Storage Account A  |                              |
                                                     |                     | <----------------------------+
                                                     +---------------------+

In order to take advantage of stripe Vectored IO across multiple destinations, you must use a YAML configuration file. Additionally, when downloading a striped blob, you must specify all storage account locations of the striped blob in the azure_storage section of your YAML configuration file.