blobxfer/docs/20-yaml-configuration.md

357 строки
14 KiB
Markdown
Исходник Постоянная ссылка Обычный вид История

# blobxfer YAML Configuration
`blobxfer` accepts YAML configuration files to drive the transfer. YAML
configuration files are specified with the `--config` option to any
`blobxfer` command.
2017-10-27 18:05:13 +03:00
For an in-depth explanation of each option or the associated default value,
2017-11-06 22:54:49 +03:00
please see the [CLI Usage](10-cli-usage.md) documentation for the
2017-10-27 18:05:13 +03:00
corresponding CLI option.
## Schema
2017-11-06 22:54:49 +03:00
The `blobxfer` YAML schema consists of distinct "sections". The following
sub-sections will describe each. You can combine all sections into the
same YAML file if desired as `blobxfer` will only read the required sections
to execute the specified command.
2017-11-06 22:54:49 +03:00
You can view a complete sample YAML configuration [here](sample_config.yaml).
Note that the sample configuration file is just a sample and may not contain
all possible options.
#### Configuration Sections
1. [`version`](#version)
2. [`azure_storage`](#azure-storage)
3. [`options`](#options)
4. [`download`](#download)
5. [`upload`](#upload)
6. [`synccopy`](#synccopy)
### <a name="version"></a>`version`
The `version` property specifies the version of the `blobxfer` YAML
configuration schema to use. This property is required.
```yaml
version: 1
```
* `version` specifies the `blobxfer` YAML configuration schema to use.
Currently the only valid value is `1`.
### <a name="azure-storage"></a>`azure_storage`
The `azure_storage` section specifies Azure Storage credentials that will
be referenced for any transfer while processing the YAML file. This section
is required.
```yaml
azure_storage:
2017-10-25 02:52:56 +03:00
endpoint: core.windows.net
accounts:
mystorageaccount0: ABCDEF...
mystorageaccount1: ?se...
```
* `endpoint` specifies for which endpoint to connect to with Azure Storage.
Generally this can be omitted if using Public Azure regions.
* `accounts` is a dictionary of storage account names and either a
storage account key or a shared access signature token. Note that if you
are downloading a striped blob (Vectored IO), then all storage accounts for
which the blob is striped to must be populated in this list.
### <a name="options"></a>`options`
The `options` section specifies general options that may be applied across
all other sections in the YAML configuration.
```yaml
options:
2017-10-25 02:52:56 +03:00
log_file: /path/to/blobxfer.log
enable_azure_storage_logger: false
resume_file: /path/to/resumefile.db
progress_bar: true
2018-04-16 18:55:20 +03:00
quiet: false
dry_run: false
2017-10-25 02:52:56 +03:00
verbose: true
timeout:
connect: null
read: null
2017-10-26 19:09:20 +03:00
max_retries: null
2017-10-25 02:52:56 +03:00
concurrency:
md5_processes: 2
crypto_processes: 2
disk_threads: 16
transfer_threads: 32
proxy:
host: myproxyhost:6000
username: proxyuser
password: abcd...
```
* `log_file` is the location of the log file to write to
2017-10-16 19:57:06 +03:00
* `enable_azure_storage_logger` controls the Azure Storage logger output
* `resume_file` is the location of the resume database to create
* `progress_bar` controls display of a progress bar output to the console
2018-04-16 18:55:20 +03:00
* `quiet` controls quiet mode
* `dry_run` will perform a dry run
* `verbose` controls if verbose logging is enabled
* `timeout` is a dictionary of timeout values in seconds
2017-10-26 19:09:20 +03:00
* `connect` is the connect timeout to apply to a request
* `read` is the read timeout to apply to a request
* `max_retries` is the maximum number of retries for a request
* `concurrency` is a dictionary of concurrency limits
* `md5_processes` is the number of MD5 offload processes to create for
MD5 comparison checking
* `crypto_processes` is the number of decryption offload processes to
create
* `disk_threads` is the number of threads for disk I/O
* `transfer_threads` is the number of threads for network transfers
2017-10-25 02:52:56 +03:00
* `proxy` defines an HTTP proxy to use, if required to connect to the
Azure Storage endpoint
* `host` is the IP:Port of the HTTP Proxy
* `username` is the username login for the proxy, if required
* `password` is the password for the username for the proxy, if required
### <a name="download"></a>`download`
The `download` section specifies download sources and destination. Note
that `download` refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the `download` property as you need.
When the `download` command with the YAML config is specified, the list
is iterated and all specified sources are downloaded.
```yaml
download:
- source:
- mystorageaccount0: mycontainer
- mystorageaccount1: someothercontainer/vpath
destination: /path/to/store/downloads
include:
- "*.txt"
- "*.bxslice-*"
exclude:
- "*.bak"
options:
check_file_md5: true
chunk_size_bytes: 16777216
delete_extraneous_destination: false
delete_only: false
max_single_object_concurrency: 8
mode: auto
overwrite: true
recursive: true
rename: false
restore_file_properties:
attributes: true
lmt: true
rsa_private_key: myprivatekey.pem
rsa_private_key_passphrase: myoptionalpassword
strip_components: 1
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
- source:
# next if needed...
```
* `source` is a list of storage account to remote path mappings
* `destination` is the local resource path
* `include` is a list of include patterns
* `exclude` is a list of exclude patterns
* `options` are download-specific options
* `check_file_md5` will integrity check downloaded files using the stored
MD5
* `chunk_size_bytes` is the maximum amount of data to download per request
* `delete_extraneous_destination` will cleanup any files locally that are
not found on the remote. Note that this interacts with include and
exclude filters.
* `delete_only` will only perform the local cleanup. If this is specified
as `true`, then `delete_extraneous_destination` must be specified as
`true` as well.
* `max_single_object_concurrency` is the maximum number of concurrent
transfers per object
* `mode` is the operating mode
* `overwrite` specifies clobber behavior
* `recursive` specifies if remote paths should be recursively searched for
entities to download
* `rename` will rename a single entity source path to the `destination`
* `restore_file_properties` restores the following file properties if
enabled
* `attributes` will restore POSIX file mode and ownership if stored
on the entity metadata
* `lmt` will restore the last modified time of the file
* `rsa_private_key` is the RSA private key PEM file to use to decrypt
encrypted blobs or files
* `rsa_private_key_passphrase` is the RSA private key passphrase, if
required
* `strip_components` is the number of leading path components to strip
from the remote path
* `skip_on` are skip on options to use
* `filesize_match` skip if file size match
* `lmt_ge` skip if local file has a last modified time greater than or
equal to the remote file
* `md5_match` skip if MD5 match
### <a name="upload"></a>`upload`
The `upload` section specifies upload sources and destinations. Note
that `upload` refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the `upload` property as you need.
When the `upload` command with the YAML config is specified, the list
is iterated and all specified sources are uploaded.
```yaml
upload:
- source:
- /path/to/hugefile1
- /path/to/hugefile2
destination:
- mystorageaccount0: mycontainer/vdir
- mystorageaccount1: someothercontainer/vdir2
include:
- "*.bin"
exclude:
- "*.tmp"
options:
mode: auto
access_tier: null
chunk_size_bytes: 0
delete_extraneous_destination: true
delete_only: false
one_shot_bytes: 33554432
overwrite: true
recursive: true
rename: false
rsa_public_key: mypublickey.pem
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
2017-09-01 05:25:14 +03:00
stdin_as_page_blob_size: 0
store_file_properties:
attributes: true
cache_control: 'max-age=3600'
content_type: 'text/javascript; charset=utf-8'
md5: true
strip_components: 1
vectored_io:
stripe_chunk_size_bytes: 1000000
distribution_mode: stripe
- source:
# next if needed...
```
* `source` is a list of local resource paths
* `destination` is a list of storage account to remote path mappings
* `include` is a list of include patterns
* `exclude` is a list of exclude patterns
* `options` are upload-specific options
* `mode` is the operating mode
* `access_tier` is the access tier to set for the object. If not set,
the default access tier for the storage account is inferred.
* `chunk_size_bytes` is the maximum amount of data to upload per request.
This corresponds to the block size for block and append blobs, page size
for page blobs, and the file chunk for files. Only block blobs can have
a block size of up to 100MiB, all others have a maximum of 4MiB.
* `delete_extraneous_destination` will cleanup any files remotely that are
not found on locally. Note that this interacts with include and
exclude filters.
* `delete_only` will only perform the remote cleanup. If this is specified
as `true`, then `delete_extraneous_destination` must be specified as
`true` as well.
* `one_shot_bytes` is the size limit to upload block blobs in a single
request.
* `overwrite` specifies clobber behavior
* `recursive` specifies if local paths should be recursively searched for
files to upload
* `rename` will rename a single entity destination path to a single
`source`
* `rsa_public_key` is the RSA public key PEM file to use to encrypt files
* `skip_on` are skip on options to use
* `filesize_match` skip if file size match
* `lmt_ge` skip if remote file has a last modified time greater than
or equal to the local file
* `md5_match` skip if MD5 match
* `stdin_as_page_blob_size` is the page blob size to preallocate if the
amount of data to be streamed from stdin is known beforehand and the
`mode` is `page`
* `store_file_properties` stores the following file properties if enabled
* `attributes` will store POSIX file mode and ownership
* `cache_control` sets the CacheControl property
* `content_type` sets the ContentType property
* `md5` will store the MD5 of the file
* `strip_components` is the number of leading path components to strip
from the local path
* `vectored_io` are the Vectored IO options to apply to the upload
* `stripe_chunk_size_bytes` is the stripe width for each chunk if
`stripe` `distribution_mode` is selected
* `distribution_mode` is the Vectored IO mode to use which can be
one of:
* `disabled` will disable Vectored IO
* `replica` which will replicate source files to target
destinations on upload. Note that more than one destination
should be specified.
* `stripe` which will stripe source files to target destinations
on upload. If more than one destination is specified, striping
occurs in round-robin order amongst the destinations listed.
### <a name="synccopy"></a>`synccopy`
The `synccopy` section specifies synchronous copy sources and destinations.
Note that `synccopy` refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the `synccopy` property as you need.
When the `synccopy` command with the YAML config is specified, the list
is iterated and all specified sources are synchronously copied.
```yaml
synccopy:
- source:
- mystorageaccount0: mycontainer
destination:
- mystorageaccount0: othercontainer
- mystorageaccount1: mycontainer
include:
- "*.bin"
exclude:
- "*.tmp"
options:
mode: auto
dest_mode: auto
access_tier: null
delete_extraneous_destination: true
delete_only: false
overwrite: true
recursive: true
rename: false
server_side_copy: true
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
```
* `source` is a list of storage account to remote path mappings. All sources
2019-05-17 20:08:41 +03:00
are copied to each destination specified. To use an arbitrary URL, specify
the map as `*: https://some.url/path`.
* `destination` is a list of storage account to remote path mappings
* `include` is a list of include patterns
* `exclude` is a list of exclude patterns
* `options` are synccopy-specific options
* `mode` is the source mode
* `dest_mode` is the destination mode
* `access_tier` is the access tier to set for the object. If not set,
the default access tier for the storage account is inferred.
* `delete_extraneous_destination` will cleanup any files in remote
destinations that are not found in the remote sources. Note that this
interacts with include and exclude filters.
* `delete_only` will only perform the remote cleanup. If this is specified
as `true`, then `delete_extraneous_destination` must be specified as
`true` as well.
* `overwrite` specifies clobber behavior
* `recursive` specifies if source remote paths should be recursively
searched for files to copy
* `rename` will rename a single remote source entity to the remote
destination path
* `server_side_copy` will perform the copy on Azure Storage servers.
This option is enabled by default and destinations must be block blob.
If destinations are not block blob, this option must be set to `false`.
* `skip_on` are skip on options to use
* `filesize_match` skip if file size match
* `lmt_ge` skip if source file has a last modified time greater
than or equal to the destination file
* `md5_match` skip if MD5 match