зеркало из https://github.com/Azure/AzureStor.git
update for next release
This commit is contained in:
Родитель
cda0178881
Коммит
f42675c450
|
@ -1,6 +1,6 @@
|
|||
Package: AzureStor
|
||||
Title: Storage Management in 'Azure'
|
||||
Version: 2.1.1.9000
|
||||
Version: 3.0.0
|
||||
Authors@R: c(
|
||||
person("Hong", "Ooi", , "hongooi@microsoft.com", role = c("aut", "cre")),
|
||||
person("Microsoft", role="cph")
|
||||
|
@ -19,12 +19,10 @@ Imports:
|
|||
mime,
|
||||
openssl,
|
||||
xml2,
|
||||
AzureRMR (>= 2.2.1)
|
||||
AzureRMR (>= 2.3.0)
|
||||
Suggests:
|
||||
knitr,
|
||||
jsonlite,
|
||||
testthat
|
||||
Roxygen: list(markdown=TRUE)
|
||||
RoxygenNote: 6.1.1
|
||||
Remotes:
|
||||
Azure/AzureRMR
|
||||
|
|
4
NEWS.md
4
NEWS.md
|
@ -1,4 +1,4 @@
|
|||
# AzureStor 2.1.1.9000
|
||||
# AzureStor 3.0.0
|
||||
|
||||
## Significant user-visible changes
|
||||
|
||||
|
@ -10,7 +10,7 @@
|
|||
- Significant changes to file storage methods for greater consistency with the other storage types:
|
||||
- The default directory for `list_azure_files` is now the root, mirroring the behaviour for blobs and ADLSgen2.
|
||||
- The output of `list_azure_files` now includes the full path as part of the file/directory name.
|
||||
- Add `recursive` argument to file storage methods for recursing through subdirectories. Like above, for file storage this can be slow, so try to use a non-recursive solution where possible.
|
||||
- Add `recursive` argument to `list_azure_files`, `create_azure_dir` and `delete_azure_dir` for recursing through subdirectories. Like with file transfers, for Azure file storage this can be slow, so try to use a non-recursive solution where possible.
|
||||
- Make output format for `list_adls_files`, `list_blobs` and `list_azure_files` more consistent. The first 2 columns for a data frame output are now always `name` and `size`; the size of a directory is NA. The 3rd column for non-blobs is `isdir` which is TRUE/FALSE depending on whether the object is a directory or file. Any additional columns remain storage type-specific.
|
||||
- New `get_storage_metadata` and `set_storage_metadata` methods for managing user-specified properties (metadata) for objects.
|
||||
- Revamped methods for getting standard properties, which are now all methods for `get_storage_properties` rather than having specific functions for blobs, files and directories.
|
||||
|
|
|
@ -32,7 +32,7 @@ multicopy_url_to_storage.blob_container <- function(container, src, dest, ...)
|
|||
|
||||
#' @param async For `copy_url_to_blob` and `multicopy_url_to_blob`, whether the copy operation should be asynchronous (proceed in the background).
|
||||
#' @details
|
||||
#' `copy_url_to_blob` transfers the contents of the file at the specified HTTP\[S\] URL directly to blob storage, without requiring a temporary local copy to be made. `multicopy_url_to_blob1 does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
|
||||
#' `copy_url_to_blob` transfers the contents of the file at the specified HTTP\[S\] URL directly to blob storage, without requiring a temporary local copy to be made. `multicopy_url_to_blob` does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
|
||||
#' @rdname blob
|
||||
#' @export
|
||||
copy_url_to_blob <- function(container, src, dest, lease=NULL, async=FALSE)
|
||||
|
@ -69,12 +69,12 @@ multicopy_url_to_blob <- function(container, src, dest, lease=NULL, async=FALSE,
|
|||
stop("'dest' must contain one name per file in 'src'", call.=FALSE)
|
||||
|
||||
if(n_src == 1)
|
||||
return(copy_url_to_blob(container, src, dest, ...))
|
||||
return(copy_url_to_blob(container, src, dest, lease=lease, async=async))
|
||||
|
||||
init_pool(max_concurrent_transfers)
|
||||
|
||||
pool_export("container", envir=environment())
|
||||
pool_map(function(s, d, ...) AzureStor::copy_url_to_blob(container, s, d, ...),
|
||||
pool_map(function(s, d, lease, async) AzureStor::copy_url_to_blob(container, s, d, lease=lease, async=async),
|
||||
src, dest, MoreArgs=list(lease=lease, async=async))
|
||||
invisible(NULL)
|
||||
}
|
||||
|
|
26
README.md
26
README.md
|
@ -59,14 +59,15 @@ These functions for working with objects within a storage container:
|
|||
- `delete_storage_file`: delete a file or blob
|
||||
- `storage_upload`/`storage_download`: transfer a file to or from a storage container
|
||||
- `storage_multiupload`/`storage_multidownload`: transfer multiple files in parallel to or from a storage container
|
||||
|
||||
- `get_storage_properties`: Get properties for a storage object
|
||||
- `get_storage_metadata`/`set_storage_metadata`: Get and set user-defined metadata for a storage object
|
||||
|
||||
```r
|
||||
# example of working with files and directories (ADLSgen2)
|
||||
cont <- storage_container(ad_end_tok, "myfilesystem")
|
||||
list_storage_files(cont)
|
||||
create_storage_dir(cont, "newdir")
|
||||
storage_download(cont, "/readme.txt", "~/readme.txt")
|
||||
storage_download(cont, "/readme.txt")
|
||||
storage_multiupload(cont, "N:/data/*.*", "newdir") # uploading everything in a directory
|
||||
```
|
||||
|
||||
|
@ -76,7 +77,7 @@ AzureStor includes a number of extra features to make transferring files efficie
|
|||
|
||||
### Parallel connections
|
||||
|
||||
As noted above, you can transfer multiple files in parallel using the `multiupload_*`/`multidownload_*` functions. These functions utilise a background process pool supplied by AzureRMR to do the transfers in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.
|
||||
As noted above, you can transfer multiple files in parallel using the `storage_multiupload/download` functions. These functions utilise a background process pool supplied by AzureRMR to do the transfers in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.
|
||||
|
||||
```r
|
||||
# uploading/downloading multiple files at once: use a wildcard to specify files to transfer
|
||||
|
@ -86,22 +87,7 @@ storage_multidownload(cont, src="/monthly/jan*.*", dest="~/data/january")
|
|||
# or supply a vector of file specs as the source and destination
|
||||
src <- c("file1.csv", "file2.csv", "file3.csv")
|
||||
dest <- file.path("data/", src)
|
||||
storage_multiupload(cont, src, dest)
|
||||
```
|
||||
|
||||
You can also use the process pool to parallelise tasks for which there is no built-in function. For example, the following code will delete multiple files in parallel:
|
||||
|
||||
```r
|
||||
files_to_delete <- list_storage_files(cont, "datadir", info="name")
|
||||
|
||||
# initialise the background pool with 10 nodes
|
||||
AzureRMR::init_pool(10)
|
||||
|
||||
# export the container object to the nodes
|
||||
AzureRMR::pool_export("cont")
|
||||
|
||||
# delete the files
|
||||
AzureRMR::pool_sapply(files_to_delete, function(f) AzureStor::delete_storage_file(cont, f))
|
||||
storage_multiupload(cont, src=src, dest=dest)
|
||||
```
|
||||
|
||||
### Transfer to and from connections
|
||||
|
@ -120,7 +106,7 @@ storage_upload(cont, src=con, dest="iris.rds")
|
|||
|
||||
# downloading files into memory: as a raw vector with dest=NULL, and via a connection
|
||||
rawvec <- storage_download(cont, src="iris.json", dest=NULL)
|
||||
rawToChar(rawvec)
|
||||
rawToChar(rawConnectionValue(rawvec))
|
||||
|
||||
con <- rawConnection(raw(0), "r+")
|
||||
storage_download(cont, src="iris.rds", dest=con)
|
||||
|
|
|
@ -83,7 +83,7 @@ Upload, download, or delete a blob; list blobs in a container.
|
|||
|
||||
\code{upload_blob} and \code{download_blob} can display a progress bar to track the file transfer. You can control whether to display this with \code{options(azure_storage_progress_bar=TRUE|FALSE)}; the default is TRUE.
|
||||
|
||||
\code{copy_url_to_blob} transfers the contents of the file at the specified HTTP[S] URL directly to blob storage, without requiring a temporary local copy to be made. `multicopy_url_to_blob1 does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
|
||||
\code{copy_url_to_blob} transfers the contents of the file at the specified HTTP[S] URL directly to blob storage, without requiring a temporary local copy to be made. \code{multicopy_url_to_blob} does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
|
||||
}
|
||||
\examples{
|
||||
\dontrun{
|
||||
|
|
|
@ -144,7 +144,7 @@ test_that("Blob multicopy from URL works",
|
|||
contname <- paste0(sample(letters, 10, TRUE), collapse="")
|
||||
cont <- create_blob_container(bl, contname)
|
||||
|
||||
fnames <- c("DESCRIPTION", "LICENSE", "NAMESPACE")
|
||||
fnames <- c("LICENSE", "LICENSE.md", "CONTRIBUTING.md")
|
||||
src_urls <- paste0("https://raw.githubusercontent.com/Azure/AzureStor/master/", fnames)
|
||||
origs <- paste0("../../", fnames)
|
||||
dests <- c(tempfile(), tempfile(), tempfile())
|
||||
|
|
|
@ -129,7 +129,7 @@ test_that("Blob copy from URL works",
|
|||
# use readLines to workaround GH auto-translating CRLF -> LF
|
||||
expect_identical(readLines(orig_file), readLines(new_file))
|
||||
|
||||
fnames <- c("DESCRIPTION", "LICENSE", "NAMESPACE")
|
||||
fnames <- c("LICENSE", "LICENSE.md", "CONTRIBUTING.md")
|
||||
src_urls <- paste0("https://raw.githubusercontent.com/Azure/AzureStor/master/", fnames)
|
||||
origs <- paste0("../../", fnames)
|
||||
dests <- c(tempfile(), tempfile(), tempfile())
|
||||
|
|
|
@ -62,13 +62,12 @@ These functions for working with objects within a storage container:
|
|||
- `storage_upload`/`storage_download`: transfer a file to or from a storage container
|
||||
- `storage_multiupload`/`storage_multidownload`: transfer multiple files in parallel to or from a storage container
|
||||
|
||||
|
||||
```r
|
||||
# example of working with files and directories (ADLSgen2)
|
||||
cont <- storage_container(ad_end_tok, "myfilesystem")
|
||||
list_storage_files(cont)
|
||||
create_storage_dir(cont, "newdir")
|
||||
storage_download(cont, "/readme.txt", "~/readme.txt")
|
||||
storage_download(cont, "/readme.txt")
|
||||
storage_multiupload(cont, "N:/data/*.*", "newdir") # uploading everything in a directory
|
||||
```
|
||||
|
||||
|
@ -78,7 +77,7 @@ AzureStor includes a number of extra features to make transferring files efficie
|
|||
|
||||
### Parallel connections
|
||||
|
||||
As noted above, you can transfer multiple files in parallel using the `multiupload_*`/`multidownload_*` functions. These functions utilise a background process pool supplied by AzureRMR to do the transfers in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.
|
||||
The `storage_multiupload/download` functions transfer multiple files in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.
|
||||
|
||||
```r
|
||||
# uploading/downloading multiple files at once: use a wildcard to specify files to transfer
|
||||
|
@ -91,21 +90,6 @@ dest <- file.path("data/", src)
|
|||
storage_multiupload(cont, src, dest)
|
||||
```
|
||||
|
||||
You can also use the process pool to parallelise tasks for which there is no built-in function. For example, the following code will delete multiple files in parallel:
|
||||
|
||||
```r
|
||||
files_to_delete <- list_storage_files(cont, "datadir", info="name")
|
||||
|
||||
# initialise the background pool with 10 nodes
|
||||
AzureRMR::init_pool(10)
|
||||
|
||||
# export the container object to the nodes
|
||||
AzureRMR::pool_export("cont")
|
||||
|
||||
# delete the files
|
||||
AzureRMR::pool_sapply(files_to_delete, function(f) AzureStor::delete_storage_file(cont, f))
|
||||
```
|
||||
|
||||
### Transfer to and from connections
|
||||
|
||||
You can upload a (single) in-memory R object via a _connection_, and similarly, you can download a file to a connection, or return it as a raw vector. This lets you transfer an object without having to create a temporary file as an intermediate step.
|
||||
|
@ -122,7 +106,7 @@ storage_upload(cont, src=con, dest="iris.rds")
|
|||
|
||||
# downloading files into memory: as a raw vector with dest=NULL, and via a connection
|
||||
rawvec <- storage_download(cont, src="iris.json", dest=NULL)
|
||||
rawToChar(rawvec)
|
||||
rawToChar(rawConnectionValue(rawvec))
|
||||
|
||||
con <- rawConnection(raw(0), "r+")
|
||||
storage_download(cont, src="iris.rds", dest=con)
|
||||
|
@ -165,6 +149,52 @@ For more information, see the [AzCopy repo on GitHub](https://github.com/Azure/a
|
|||
**Note that AzureStor uses AzCopy version 10. It is incompatible with versions 8.1 and earlier.**
|
||||
|
||||
|
||||
## Other features
|
||||
|
||||
### Parallel connections
|
||||
|
||||
The `storage_multiupload/download` functions mentioned above use a background process pool supplied by AzureRMR. You can also use this pool to parallelise tasks for which there is no built-in function. For example, the following code will delete multiple files in parallel:
|
||||
|
||||
```r
|
||||
files_to_delete <- list_storage_files(container, "datadir", info="name")
|
||||
|
||||
# initialise the background pool with 10 nodes
|
||||
AzureRMR::init_pool(10)
|
||||
|
||||
# export the container object to the nodes
|
||||
AzureRMR::pool_export("cont")
|
||||
|
||||
# delete the files
|
||||
AzureRMR::pool_sapply(files_to_delete, function(f) AzureStor::delete_storage_file(cont, f))
|
||||
```
|
||||
|
||||
### Metadata
|
||||
|
||||
To get and set user-defined properties (metadata) for storage objects, use the `get_storage_metadata` and `set_storage_metadata` functions.
|
||||
|
||||
```r
|
||||
fs <- storage_container("https://mystorage.dfs.core.windows.net/myshare", key="access_key")
|
||||
storage_upload(share, "iris.csv", "newdir/iris.csv")
|
||||
|
||||
set_storage_metadata(fs, "newdir/iris.csv", name1="value1")
|
||||
# will be list(name1="value1")
|
||||
get_storage_metadata(fs, "newdir/iris.csv")
|
||||
|
||||
set_storage_metadata(fs, "newdir/iris.csv", name2="value2")
|
||||
# will be list(name1="value1", name2="value2")
|
||||
get_storage_metadata(fs, "newdir/iris.csv")
|
||||
|
||||
set_storage_metadata(fs, "newdir/iris.csv", name3="value3", keep_existing=FALSE)
|
||||
# will be list(name3="value3")
|
||||
get_storage_metadata(fs, "newdir/iris.csv")
|
||||
|
||||
# deleting all metadata
|
||||
set_storage_metadata(fs, "newdir/iris.csv", keep_existing=FALSE)
|
||||
|
||||
# if no filename supplied, get/set metadata for the container
|
||||
get_storage_metadata(fs)
|
||||
```
|
||||
|
||||
## Admin interface
|
||||
|
||||
Finally, AzureStor's admin-side interface allows you to easily create and delete resource accounts, as well as obtain access keys and generate a SAS. Here is a sample workflow:
|
||||
|
|
Загрузка…
Ссылка в новой задаче