14 KiB
blobxfer YAML Configuration
blobxfer
accepts YAML configuration files to drive the transfer. YAML
configuration files are specified with the --config
option to any
blobxfer
command.
For an in-depth explanation of each option or the associated default value, please see the CLI Usage documentation for the corresponding CLI option.
Schema
The blobxfer
YAML schema consists of distinct "sections". The following
sub-sections will describe each. You can combine all sections into the
same YAML file if desired as blobxfer
will only read the required sections
to execute the specified command.
You can view a complete sample YAML configuration here. Note that the sample configuration file is just a sample and may not contain all possible options.
Configuration Sections
version
The version
property specifies the version of the blobxfer
YAML
configuration schema to use. This property is required.
version: 1
version
specifies theblobxfer
YAML configuration schema to use. Currently the only valid value is1
.
azure_storage
The azure_storage
section specifies Azure Storage credentials that will
be referenced for any transfer while processing the YAML file. This section
is required.
azure_storage:
endpoint: core.windows.net
accounts:
mystorageaccount0: ABCDEF...
mystorageaccount1: ?se...
endpoint
specifies for which endpoint to connect to with Azure Storage. Generally this can be omitted if using Public Azure regions.accounts
is a dictionary of storage account names and either a storage account key or a shared access signature token. Note that if you are downloading a striped blob (Vectored IO), then all storage accounts for which the blob is striped to must be populated in this list.
options
The options
section specifies general options that may be applied across
all other sections in the YAML configuration.
options:
log_file: /path/to/blobxfer.log
enable_azure_storage_logger: false
resume_file: /path/to/resumefile.db
progress_bar: true
quiet: false
dry_run: false
verbose: true
timeout:
connect: null
read: null
max_retries: null
concurrency:
md5_processes: 2
crypto_processes: 2
disk_threads: 16
transfer_threads: 32
proxy:
host: myproxyhost:6000
username: proxyuser
password: abcd...
log_file
is the location of the log file to write toenable_azure_storage_logger
controls the Azure Storage logger outputresume_file
is the location of the resume database to createprogress_bar
controls display of a progress bar output to the consolequiet
controls quiet modedry_run
will perform a dry runverbose
controls if verbose logging is enabledtimeout
is a dictionary of timeout values in secondsconnect
is the connect timeout to apply to a requestread
is the read timeout to apply to a requestmax_retries
is the maximum number of retries for a request
concurrency
is a dictionary of concurrency limitsmd5_processes
is the number of MD5 offload processes to create for MD5 comparison checkingcrypto_processes
is the number of decryption offload processes to createdisk_threads
is the number of threads for disk I/Otransfer_threads
is the number of threads for network transfers
proxy
defines an HTTP proxy to use, if required to connect to the Azure Storage endpointhost
is the IP:Port of the HTTP Proxyusername
is the username login for the proxy, if requiredpassword
is the password for the username for the proxy, if required
download
The download
section specifies download sources and destination. Note
that download
refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the download
property as you need.
When the download
command with the YAML config is specified, the list
is iterated and all specified sources are downloaded.
download:
- source:
- mystorageaccount0: mycontainer
- mystorageaccount1: someothercontainer/vpath
destination: /path/to/store/downloads
include:
- "*.txt"
- "*.bxslice-*"
exclude:
- "*.bak"
options:
check_file_md5: true
chunk_size_bytes: 16777216
delete_extraneous_destination: false
delete_only: false
max_single_object_concurrency: 8
mode: auto
overwrite: true
recursive: true
rename: false
restore_file_properties:
attributes: true
lmt: true
rsa_private_key: myprivatekey.pem
rsa_private_key_passphrase: myoptionalpassword
strip_components: 1
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
- source:
# next if needed...
source
is a list of storage account to remote path mappingsdestination
is the local resource pathinclude
is a list of include patternsexclude
is a list of exclude patternsoptions
are download-specific optionscheck_file_md5
will integrity check downloaded files using the stored MD5chunk_size_bytes
is the maximum amount of data to download per requestdelete_extraneous_destination
will cleanup any files locally that are not found on the remote. Note that this interacts with include and exclude filters.delete_only
will only perform the local cleanup. If this is specified astrue
, thendelete_extraneous_destination
must be specified astrue
as well.max_single_object_concurrency
is the maximum number of concurrent transfers per objectmode
is the operating modeoverwrite
specifies clobber behaviorrecursive
specifies if remote paths should be recursively searched for entities to downloadrename
will rename a single entity source path to thedestination
restore_file_properties
restores the following file properties if enabledattributes
will restore POSIX file mode and ownership if stored on the entity metadatalmt
will restore the last modified time of the file
rsa_private_key
is the RSA private key PEM file to use to decrypt encrypted blobs or filesrsa_private_key_passphrase
is the RSA private key passphrase, if requiredstrip_components
is the number of leading path components to strip from the remote pathskip_on
are skip on options to usefilesize_match
skip if file size matchlmt_ge
skip if local file has a last modified time greater than or equal to the remote filemd5_match
skip if MD5 match
upload
The upload
section specifies upload sources and destinations. Note
that upload
refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the upload
property as you need.
When the upload
command with the YAML config is specified, the list
is iterated and all specified sources are uploaded.
upload:
- source:
- /path/to/hugefile1
- /path/to/hugefile2
destination:
- mystorageaccount0: mycontainer/vdir
- mystorageaccount1: someothercontainer/vdir2
include:
- "*.bin"
exclude:
- "*.tmp"
options:
mode: auto
access_tier: null
chunk_size_bytes: 0
delete_extraneous_destination: true
delete_only: false
one_shot_bytes: 33554432
overwrite: true
recursive: true
rename: false
rsa_public_key: mypublickey.pem
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
stdin_as_page_blob_size: 0
store_file_properties:
attributes: true
cache_control: 'max-age=3600'
content_type: 'text/javascript; charset=utf-8'
md5: true
strip_components: 1
vectored_io:
stripe_chunk_size_bytes: 1000000
distribution_mode: stripe
- source:
# next if needed...
source
is a list of local resource pathsdestination
is a list of storage account to remote path mappingsinclude
is a list of include patternsexclude
is a list of exclude patternsoptions
are upload-specific optionsmode
is the operating modeaccess_tier
is the access tier to set for the object. If not set, the default access tier for the storage account is inferred.chunk_size_bytes
is the maximum amount of data to upload per request. This corresponds to the block size for block and append blobs, page size for page blobs, and the file chunk for files. Only block blobs can have a block size of up to 100MiB, all others have a maximum of 4MiB.delete_extraneous_destination
will cleanup any files remotely that are not found on locally. Note that this interacts with include and exclude filters.delete_only
will only perform the remote cleanup. If this is specified astrue
, thendelete_extraneous_destination
must be specified astrue
as well.one_shot_bytes
is the size limit to upload block blobs in a single request.overwrite
specifies clobber behaviorrecursive
specifies if local paths should be recursively searched for files to uploadrename
will rename a single entity destination path to a singlesource
rsa_public_key
is the RSA public key PEM file to use to encrypt filesskip_on
are skip on options to usefilesize_match
skip if file size matchlmt_ge
skip if remote file has a last modified time greater than or equal to the local filemd5_match
skip if MD5 match
stdin_as_page_blob_size
is the page blob size to preallocate if the amount of data to be streamed from stdin is known beforehand and themode
ispage
store_file_properties
stores the following file properties if enabledattributes
will store POSIX file mode and ownershipcache_control
sets the CacheControl propertycontent_type
sets the ContentType propertymd5
will store the MD5 of the file
strip_components
is the number of leading path components to strip from the local pathvectored_io
are the Vectored IO options to apply to the uploadstripe_chunk_size_bytes
is the stripe width for each chunk ifstripe
distribution_mode
is selecteddistribution_mode
is the Vectored IO mode to use which can be one of:disabled
will disable Vectored IOreplica
which will replicate source files to target destinations on upload. Note that more than one destination should be specified.stripe
which will stripe source files to target destinations on upload. If more than one destination is specified, striping occurs in round-robin order amongst the destinations listed.
synccopy
The synccopy
section specifies synchronous copy sources and destinations.
Note that synccopy
refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the synccopy
property as you need.
When the synccopy
command with the YAML config is specified, the list
is iterated and all specified sources are synchronously copied.
synccopy:
- source:
- mystorageaccount0: mycontainer
destination:
- mystorageaccount0: othercontainer
- mystorageaccount1: mycontainer
include:
- "*.bin"
exclude:
- "*.tmp"
options:
mode: auto
dest_mode: auto
access_tier: null
delete_extraneous_destination: true
delete_only: false
overwrite: true
recursive: true
rename: false
server_side_copy: true
skip_on:
filesize_match: false
lmt_ge: false
md5_match: true
source
is a list of storage account to remote path mappings. All sources are copied to each destination specified. To use an arbitrary URL, specify the map as*: https://some.url/path
.destination
is a list of storage account to remote path mappingsinclude
is a list of include patternsexclude
is a list of exclude patternsoptions
are synccopy-specific optionsmode
is the source modedest_mode
is the destination modeaccess_tier
is the access tier to set for the object. If not set, the default access tier for the storage account is inferred.delete_extraneous_destination
will cleanup any files in remote destinations that are not found in the remote sources. Note that this interacts with include and exclude filters.delete_only
will only perform the remote cleanup. If this is specified astrue
, thendelete_extraneous_destination
must be specified astrue
as well.overwrite
specifies clobber behaviorrecursive
specifies if source remote paths should be recursively searched for files to copyrename
will rename a single remote source entity to the remote destination pathserver_side_copy
will perform the copy on Azure Storage servers. This option is enabled by default and destinations must be block blob. If destinations are not block blob, this option must be set tofalse
.skip_on
are skip on options to usefilesize_match
skip if file size matchlmt_ge
skip if source file has a last modified time greater than or equal to the destination filemd5_match
skip if MD5 match