2017-06-01 23:02:31 +03:00
|
|
|
# blobxfer YAML Configuration
|
2017-06-02 06:38:15 +03:00
|
|
|
`blobxfer` accepts YAML configuration files to drive the transfer. YAML
|
|
|
|
configuration files are specified with the `--config` option to any
|
|
|
|
`blobxfer` command.
|
2017-06-01 23:02:31 +03:00
|
|
|
|
2017-10-27 18:05:13 +03:00
|
|
|
For an in-depth explanation of each option or the associated default value,
|
2017-11-06 22:54:49 +03:00
|
|
|
please see the [CLI Usage](10-cli-usage.md) documentation for the
|
2017-10-27 18:05:13 +03:00
|
|
|
corresponding CLI option.
|
|
|
|
|
2017-06-02 06:38:15 +03:00
|
|
|
## Schema
|
2017-11-06 22:54:49 +03:00
|
|
|
The `blobxfer` YAML schema consists of distinct "sections". The following
|
|
|
|
sub-sections will describe each. You can combine all sections into the
|
2017-06-02 06:38:15 +03:00
|
|
|
same YAML file if desired as `blobxfer` will only read the required sections
|
|
|
|
to execute the specified command.
|
|
|
|
|
2017-11-06 22:54:49 +03:00
|
|
|
You can view a complete sample YAML configuration [here](sample_config.yaml).
|
|
|
|
Note that the sample configuration file is just a sample and may not contain
|
|
|
|
all possible options.
|
|
|
|
|
2017-06-02 06:38:15 +03:00
|
|
|
#### Configuration Sections
|
2017-09-12 05:14:13 +03:00
|
|
|
1. [`version`](#version)
|
|
|
|
2. [`azure_storage`](#azure-storage)
|
|
|
|
3. [`options`](#options)
|
|
|
|
4. [`download`](#download)
|
|
|
|
5. [`upload`](#upload)
|
|
|
|
6. [`synccopy`](#synccopy)
|
|
|
|
|
|
|
|
### <a name="version"></a>`version`
|
|
|
|
The `version` property specifies the version of the `blobxfer` YAML
|
|
|
|
configuration schema to use. This property is required.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
version: 1
|
|
|
|
```
|
|
|
|
|
|
|
|
* `version` specifies the `blobxfer` YAML configuration schema to use.
|
|
|
|
Currently the only valid value is `1`.
|
2017-06-02 06:38:15 +03:00
|
|
|
|
|
|
|
### <a name="azure-storage"></a>`azure_storage`
|
|
|
|
The `azure_storage` section specifies Azure Storage credentials that will
|
|
|
|
be referenced for any transfer while processing the YAML file. This section
|
|
|
|
is required.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
azure_storage:
|
2017-10-25 02:52:56 +03:00
|
|
|
endpoint: core.windows.net
|
|
|
|
accounts:
|
|
|
|
mystorageaccount0: ABCDEF...
|
|
|
|
mystorageaccount1: ?se...
|
2017-06-02 06:38:15 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
* `endpoint` specifies for which endpoint to connect to with Azure Storage.
|
|
|
|
Generally this can be omitted if using Public Azure regions.
|
|
|
|
* `accounts` is a dictionary of storage account names and either a
|
2017-06-05 17:38:16 +03:00
|
|
|
storage account key or a shared access signature token. Note that if you
|
|
|
|
are downloading a striped blob (Vectored IO), then all storage accounts for
|
|
|
|
which the blob is striped to must be populated in this list.
|
2017-06-02 06:38:15 +03:00
|
|
|
|
|
|
|
### <a name="options"></a>`options`
|
|
|
|
The `options` section specifies general options that may be applied across
|
|
|
|
all other sections in the YAML configuration.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
options:
|
2017-10-25 02:52:56 +03:00
|
|
|
log_file: /path/to/blobxfer.log
|
|
|
|
enable_azure_storage_logger: false
|
|
|
|
resume_file: /path/to/resumefile.db
|
|
|
|
progress_bar: true
|
2018-04-16 18:55:20 +03:00
|
|
|
quiet: false
|
2018-06-15 18:36:46 +03:00
|
|
|
dry_run: false
|
2017-10-25 02:52:56 +03:00
|
|
|
verbose: true
|
|
|
|
timeout:
|
|
|
|
connect: null
|
|
|
|
read: null
|
2017-10-26 19:09:20 +03:00
|
|
|
max_retries: null
|
2017-10-25 02:52:56 +03:00
|
|
|
concurrency:
|
|
|
|
md5_processes: 2
|
|
|
|
crypto_processes: 2
|
|
|
|
disk_threads: 16
|
|
|
|
transfer_threads: 32
|
|
|
|
proxy:
|
|
|
|
host: myproxyhost:6000
|
|
|
|
username: proxyuser
|
|
|
|
password: abcd...
|
2017-06-02 06:38:15 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
* `log_file` is the location of the log file to write to
|
2017-10-16 19:57:06 +03:00
|
|
|
* `enable_azure_storage_logger` controls the Azure Storage logger output
|
2017-06-02 06:38:15 +03:00
|
|
|
* `resume_file` is the location of the resume database to create
|
|
|
|
* `progress_bar` controls display of a progress bar output to the console
|
2018-04-16 18:55:20 +03:00
|
|
|
* `quiet` controls quiet mode
|
2018-06-15 18:36:46 +03:00
|
|
|
* `dry_run` will perform a dry run
|
2017-06-02 06:38:15 +03:00
|
|
|
* `verbose` controls if verbose logging is enabled
|
2017-08-28 20:23:15 +03:00
|
|
|
* `timeout` is a dictionary of timeout values in seconds
|
2017-10-26 19:09:20 +03:00
|
|
|
* `connect` is the connect timeout to apply to a request
|
|
|
|
* `read` is the read timeout to apply to a request
|
|
|
|
* `max_retries` is the maximum number of retries for a request
|
2017-06-02 06:38:15 +03:00
|
|
|
* `concurrency` is a dictionary of concurrency limits
|
2017-10-04 22:44:30 +03:00
|
|
|
* `md5_processes` is the number of MD5 offload processes to create for
|
|
|
|
MD5 comparison checking
|
|
|
|
* `crypto_processes` is the number of decryption offload processes to
|
|
|
|
create
|
|
|
|
* `disk_threads` is the number of threads for disk I/O
|
|
|
|
* `transfer_threads` is the number of threads for network transfers
|
2017-10-25 02:52:56 +03:00
|
|
|
* `proxy` defines an HTTP proxy to use, if required to connect to the
|
|
|
|
Azure Storage endpoint
|
|
|
|
* `host` is the IP:Port of the HTTP Proxy
|
|
|
|
* `username` is the username login for the proxy, if required
|
|
|
|
* `password` is the password for the username for the proxy, if required
|
2017-06-02 06:38:15 +03:00
|
|
|
|
|
|
|
### <a name="download"></a>`download`
|
|
|
|
The `download` section specifies download sources and destination. Note
|
|
|
|
that `download` refers to a list of objects, thus you may specify as many
|
|
|
|
of these sub-configuration blocks on the `download` property as you need.
|
|
|
|
When the `download` command with the YAML config is specified, the list
|
|
|
|
is iterated and all specified sources are downloaded.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
download:
|
|
|
|
- source:
|
2017-10-04 22:44:30 +03:00
|
|
|
- mystorageaccount0: mycontainer
|
|
|
|
- mystorageaccount1: someothercontainer/vpath
|
2017-06-02 06:38:15 +03:00
|
|
|
destination: /path/to/store/downloads
|
|
|
|
include:
|
2017-10-04 22:44:30 +03:00
|
|
|
- "*.txt"
|
|
|
|
- "*.bxslice-*"
|
2017-06-02 06:38:15 +03:00
|
|
|
exclude:
|
2017-10-04 22:44:30 +03:00
|
|
|
- "*.bak"
|
2017-06-02 06:38:15 +03:00
|
|
|
options:
|
|
|
|
check_file_md5: true
|
|
|
|
chunk_size_bytes: 16777216
|
|
|
|
delete_extraneous_destination: false
|
2019-07-12 19:58:45 +03:00
|
|
|
delete_only: false
|
2019-01-14 22:15:06 +03:00
|
|
|
max_single_object_concurrency: 8
|
2017-06-02 06:38:15 +03:00
|
|
|
mode: auto
|
|
|
|
overwrite: true
|
|
|
|
recursive: true
|
|
|
|
rename: false
|
2018-08-08 02:48:42 +03:00
|
|
|
restore_file_properties:
|
|
|
|
attributes: true
|
|
|
|
lmt: true
|
2017-06-02 06:38:15 +03:00
|
|
|
rsa_private_key: myprivatekey.pem
|
|
|
|
rsa_private_key_passphrase: myoptionalpassword
|
2018-04-16 20:57:10 +03:00
|
|
|
strip_components: 1
|
2017-06-02 06:38:15 +03:00
|
|
|
skip_on:
|
|
|
|
filesize_match: false
|
|
|
|
lmt_ge: false
|
|
|
|
md5_match: true
|
|
|
|
- source:
|
2017-10-04 22:44:30 +03:00
|
|
|
# next if needed...
|
2017-06-02 06:38:15 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
* `source` is a list of storage account to remote path mappings
|
|
|
|
* `destination` is the local resource path
|
|
|
|
* `include` is a list of include patterns
|
|
|
|
* `exclude` is a list of exclude patterns
|
|
|
|
* `options` are download-specific options
|
2017-10-04 22:44:30 +03:00
|
|
|
* `check_file_md5` will integrity check downloaded files using the stored
|
|
|
|
MD5
|
|
|
|
* `chunk_size_bytes` is the maximum amount of data to download per request
|
|
|
|
* `delete_extraneous_destination` will cleanup any files locally that are
|
|
|
|
not found on the remote. Note that this interacts with include and
|
|
|
|
exclude filters.
|
2019-07-12 19:58:45 +03:00
|
|
|
* `delete_only` will only perform the local cleanup. If this is specified
|
|
|
|
as `true`, then `delete_extraneous_destination` must be specified as
|
|
|
|
`true` as well.
|
2019-01-14 22:15:06 +03:00
|
|
|
* `max_single_object_concurrency` is the maximum number of concurrent
|
|
|
|
transfers per object
|
2017-10-04 22:44:30 +03:00
|
|
|
* `mode` is the operating mode
|
|
|
|
* `overwrite` specifies clobber behavior
|
|
|
|
* `recursive` specifies if remote paths should be recursively searched for
|
|
|
|
entities to download
|
|
|
|
* `rename` will rename a single entity source path to the `destination`
|
2018-08-08 02:48:42 +03:00
|
|
|
* `restore_file_properties` restores the following file properties if
|
|
|
|
enabled
|
|
|
|
* `attributes` will restore POSIX file mode and ownership if stored
|
|
|
|
on the entity metadata
|
|
|
|
* `lmt` will restore the last modified time of the file
|
2017-10-04 22:44:30 +03:00
|
|
|
* `rsa_private_key` is the RSA private key PEM file to use to decrypt
|
|
|
|
encrypted blobs or files
|
|
|
|
* `rsa_private_key_passphrase` is the RSA private key passphrase, if
|
|
|
|
required
|
2018-04-16 20:57:10 +03:00
|
|
|
* `strip_components` is the number of leading path components to strip
|
|
|
|
from the remote path
|
2017-10-04 22:44:30 +03:00
|
|
|
* `skip_on` are skip on options to use
|
|
|
|
* `filesize_match` skip if file size match
|
|
|
|
* `lmt_ge` skip if local file has a last modified time greater than or
|
|
|
|
equal to the remote file
|
|
|
|
* `md5_match` skip if MD5 match
|
2017-06-02 06:38:15 +03:00
|
|
|
|
|
|
|
### <a name="upload"></a>`upload`
|
|
|
|
The `upload` section specifies upload sources and destinations. Note
|
|
|
|
that `upload` refers to a list of objects, thus you may specify as many
|
|
|
|
of these sub-configuration blocks on the `upload` property as you need.
|
|
|
|
When the `upload` command with the YAML config is specified, the list
|
|
|
|
is iterated and all specified sources are uploaded.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
upload:
|
|
|
|
- source:
|
2017-10-04 22:44:30 +03:00
|
|
|
- /path/to/hugefile1
|
|
|
|
- /path/to/hugefile2
|
2017-06-02 06:38:15 +03:00
|
|
|
destination:
|
2017-10-04 22:44:30 +03:00
|
|
|
- mystorageaccount0: mycontainer/vdir
|
|
|
|
- mystorageaccount1: someothercontainer/vdir2
|
2017-06-02 06:38:15 +03:00
|
|
|
include:
|
2017-10-04 22:44:30 +03:00
|
|
|
- "*.bin"
|
2017-06-02 06:38:15 +03:00
|
|
|
exclude:
|
2017-10-04 22:44:30 +03:00
|
|
|
- "*.tmp"
|
2017-06-02 06:38:15 +03:00
|
|
|
options:
|
|
|
|
mode: auto
|
2018-01-18 18:51:32 +03:00
|
|
|
access_tier: null
|
2017-06-02 06:38:15 +03:00
|
|
|
chunk_size_bytes: 0
|
|
|
|
delete_extraneous_destination: true
|
2019-07-12 19:58:45 +03:00
|
|
|
delete_only: false
|
2017-06-02 06:38:15 +03:00
|
|
|
one_shot_bytes: 33554432
|
|
|
|
overwrite: true
|
|
|
|
recursive: true
|
|
|
|
rename: false
|
|
|
|
rsa_public_key: mypublickey.pem
|
|
|
|
skip_on:
|
|
|
|
filesize_match: false
|
|
|
|
lmt_ge: false
|
|
|
|
md5_match: true
|
2017-09-01 05:25:14 +03:00
|
|
|
stdin_as_page_blob_size: 0
|
2017-06-02 06:38:15 +03:00
|
|
|
store_file_properties:
|
|
|
|
attributes: true
|
2019-02-26 21:43:55 +03:00
|
|
|
cache_control: 'max-age=3600'
|
2019-04-15 19:46:22 +03:00
|
|
|
content_type: 'text/javascript; charset=utf-8'
|
2017-06-02 06:38:15 +03:00
|
|
|
md5: true
|
|
|
|
strip_components: 1
|
|
|
|
vectored_io:
|
|
|
|
stripe_chunk_size_bytes: 1000000
|
|
|
|
distribution_mode: stripe
|
|
|
|
- source:
|
2017-10-04 22:44:30 +03:00
|
|
|
# next if needed...
|
2017-06-02 06:38:15 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
* `source` is a list of local resource paths
|
|
|
|
* `destination` is a list of storage account to remote path mappings
|
|
|
|
* `include` is a list of include patterns
|
|
|
|
* `exclude` is a list of exclude patterns
|
|
|
|
* `options` are upload-specific options
|
2017-10-04 22:44:30 +03:00
|
|
|
* `mode` is the operating mode
|
2018-01-18 18:51:32 +03:00
|
|
|
* `access_tier` is the access tier to set for the object. If not set,
|
|
|
|
the default access tier for the storage account is inferred.
|
2017-10-04 22:44:30 +03:00
|
|
|
* `chunk_size_bytes` is the maximum amount of data to upload per request.
|
|
|
|
This corresponds to the block size for block and append blobs, page size
|
|
|
|
for page blobs, and the file chunk for files. Only block blobs can have
|
|
|
|
a block size of up to 100MiB, all others have a maximum of 4MiB.
|
|
|
|
* `delete_extraneous_destination` will cleanup any files remotely that are
|
|
|
|
not found on locally. Note that this interacts with include and
|
|
|
|
exclude filters.
|
2019-07-12 19:58:45 +03:00
|
|
|
* `delete_only` will only perform the remote cleanup. If this is specified
|
|
|
|
as `true`, then `delete_extraneous_destination` must be specified as
|
|
|
|
`true` as well.
|
2017-10-04 22:44:30 +03:00
|
|
|
* `one_shot_bytes` is the size limit to upload block blobs in a single
|
|
|
|
request.
|
|
|
|
* `overwrite` specifies clobber behavior
|
|
|
|
* `recursive` specifies if local paths should be recursively searched for
|
|
|
|
files to upload
|
|
|
|
* `rename` will rename a single entity destination path to a single
|
|
|
|
`source`
|
|
|
|
* `rsa_public_key` is the RSA public key PEM file to use to encrypt files
|
|
|
|
* `skip_on` are skip on options to use
|
|
|
|
* `filesize_match` skip if file size match
|
|
|
|
* `lmt_ge` skip if remote file has a last modified time greater than
|
|
|
|
or equal to the local file
|
|
|
|
* `md5_match` skip if MD5 match
|
|
|
|
* `stdin_as_page_blob_size` is the page blob size to preallocate if the
|
|
|
|
amount of data to be streamed from stdin is known beforehand and the
|
|
|
|
`mode` is `page`
|
|
|
|
* `store_file_properties` stores the following file properties if enabled
|
|
|
|
* `attributes` will store POSIX file mode and ownership
|
2019-04-15 19:46:22 +03:00
|
|
|
* `cache_control` sets the CacheControl property
|
|
|
|
* `content_type` sets the ContentType property
|
2017-10-04 22:44:30 +03:00
|
|
|
* `md5` will store the MD5 of the file
|
|
|
|
* `strip_components` is the number of leading path components to strip
|
2018-04-16 20:57:10 +03:00
|
|
|
from the local path
|
2017-10-04 22:44:30 +03:00
|
|
|
* `vectored_io` are the Vectored IO options to apply to the upload
|
|
|
|
* `stripe_chunk_size_bytes` is the stripe width for each chunk if
|
|
|
|
`stripe` `distribution_mode` is selected
|
|
|
|
* `distribution_mode` is the Vectored IO mode to use which can be
|
|
|
|
one of:
|
|
|
|
* `disabled` will disable Vectored IO
|
|
|
|
* `replica` which will replicate source files to target
|
|
|
|
destinations on upload. Note that more than one destination
|
|
|
|
should be specified.
|
|
|
|
* `stripe` which will stripe source files to target destinations
|
|
|
|
on upload. If more than one destination is specified, striping
|
|
|
|
occurs in round-robin order amongst the destinations listed.
|
2017-06-02 06:38:15 +03:00
|
|
|
|
|
|
|
### <a name="synccopy"></a>`synccopy`
|
2017-06-09 23:41:40 +03:00
|
|
|
The `synccopy` section specifies synchronous copy sources and destinations.
|
|
|
|
Note that `synccopy` refers to a list of objects, thus you may specify as many
|
|
|
|
of these sub-configuration blocks on the `synccopy` property as you need.
|
|
|
|
When the `synccopy` command with the YAML config is specified, the list
|
|
|
|
is iterated and all specified sources are synchronously copied.
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
synccopy:
|
|
|
|
- source:
|
|
|
|
- mystorageaccount0: mycontainer
|
|
|
|
destination:
|
|
|
|
- mystorageaccount0: othercontainer
|
|
|
|
- mystorageaccount1: mycontainer
|
|
|
|
include:
|
|
|
|
- "*.bin"
|
|
|
|
exclude:
|
|
|
|
- "*.tmp"
|
|
|
|
options:
|
|
|
|
mode: auto
|
2017-07-10 18:46:26 +03:00
|
|
|
dest_mode: auto
|
2018-01-18 18:51:32 +03:00
|
|
|
access_tier: null
|
2017-06-09 23:41:40 +03:00
|
|
|
delete_extraneous_destination: true
|
2019-07-12 19:58:45 +03:00
|
|
|
delete_only: false
|
2017-06-09 23:41:40 +03:00
|
|
|
overwrite: true
|
|
|
|
recursive: true
|
2017-08-31 06:09:04 +03:00
|
|
|
rename: false
|
2019-07-12 18:19:47 +03:00
|
|
|
server_side_copy: true
|
2017-06-09 23:41:40 +03:00
|
|
|
skip_on:
|
|
|
|
filesize_match: false
|
|
|
|
lmt_ge: false
|
|
|
|
md5_match: true
|
|
|
|
```
|
|
|
|
|
|
|
|
* `source` is a list of storage account to remote path mappings. All sources
|
2019-05-17 20:08:41 +03:00
|
|
|
are copied to each destination specified. To use an arbitrary URL, specify
|
|
|
|
the map as `*: https://some.url/path`.
|
2017-06-09 23:41:40 +03:00
|
|
|
* `destination` is a list of storage account to remote path mappings
|
|
|
|
* `include` is a list of include patterns
|
|
|
|
* `exclude` is a list of exclude patterns
|
|
|
|
* `options` are synccopy-specific options
|
2017-10-04 22:44:30 +03:00
|
|
|
* `mode` is the source mode
|
|
|
|
* `dest_mode` is the destination mode
|
2018-01-18 18:51:32 +03:00
|
|
|
* `access_tier` is the access tier to set for the object. If not set,
|
|
|
|
the default access tier for the storage account is inferred.
|
2017-10-04 22:44:30 +03:00
|
|
|
* `delete_extraneous_destination` will cleanup any files in remote
|
|
|
|
destinations that are not found in the remote sources. Note that this
|
|
|
|
interacts with include and exclude filters.
|
2019-07-12 19:58:45 +03:00
|
|
|
* `delete_only` will only perform the remote cleanup. If this is specified
|
|
|
|
as `true`, then `delete_extraneous_destination` must be specified as
|
|
|
|
`true` as well.
|
2017-10-04 22:44:30 +03:00
|
|
|
* `overwrite` specifies clobber behavior
|
|
|
|
* `recursive` specifies if source remote paths should be recursively
|
|
|
|
searched for files to copy
|
|
|
|
* `rename` will rename a single remote source entity to the remote
|
|
|
|
destination path
|
2019-07-12 18:19:47 +03:00
|
|
|
* `server_side_copy` will perform the copy on Azure Storage servers.
|
|
|
|
This option is enabled by default and destinations must be block blob.
|
|
|
|
If destinations are not block blob, this option must be set to `false`.
|
2017-10-04 22:44:30 +03:00
|
|
|
* `skip_on` are skip on options to use
|
|
|
|
* `filesize_match` skip if file size match
|
|
|
|
* `lmt_ge` skip if source file has a last modified time greater
|
|
|
|
than or equal to the destination file
|
|
|
|
* `md5_match` skip if MD5 match
|