update for next release

2019-11-02 03:46:22 +11:00 · 2019-11-02 03:46:22 +11:00 · f42675c450
--- a/6
+++ b/6
@ -1,6 +1,6 @@
 Package: AzureStor
 Title: Storage Management in 'Azure'
-Version: 2.1.1.9000
+Version: 3.0.0
 Authors@R: c(
    person("Hong", "Ooi", , "hongooi@microsoft.com", role = c("aut", "cre")),
    person("Microsoft", role="cph")
@ -19,12 +19,10 @@ Imports:
    mime,
    openssl,
    xml2,
-    AzureRMR (>= 2.2.1)
+    AzureRMR (>= 2.3.0)
 Suggests:
    knitr,
    jsonlite,
    testthat
 Roxygen: list(markdown=TRUE)
 RoxygenNote: 6.1.1
-Remotes:
-    Azure/AzureRMR
--- a/NEWS.md
+++ b/NEWS.md
@ -1,4 +1,4 @@
-# AzureStor 2.1.1.9000
+# AzureStor 3.0.0

 ## Significant user-visible changes

@ -10,7 +10,7 @@
 - Significant changes to file storage methods for greater consistency with the other storage types:
  - The default directory for `list_azure_files` is now the root, mirroring the behaviour for blobs and ADLSgen2.
  - The output of `list_azure_files` now includes the full path as part of the file/directory name.
-  - Add `recursive` argument to file storage methods for recursing through subdirectories. Like above, for file storage this can be slow, so try to use a non-recursive solution where possible.
+  - Add `recursive` argument to `list_azure_files`, `create_azure_dir` and `delete_azure_dir` for recursing through subdirectories. Like with file transfers, for Azure file storage this can be slow, so try to use a non-recursive solution where possible.
 - Make output format for `list_adls_files`, `list_blobs` and `list_azure_files` more consistent. The first 2 columns for a data frame output are now always `name` and `size`; the size of a directory is NA. The 3rd column for non-blobs is `isdir` which is TRUE/FALSE depending on whether the object is a directory or file. Any additional columns remain storage type-specific.
 - New `get_storage_metadata` and `set_storage_metadata` methods for managing user-specified properties (metadata) for objects.
 - Revamped methods for getting standard properties, which are now all methods for `get_storage_properties` rather than having specific functions for blobs, files and directories.
--- a/R/blob_copyurl.R
+++ b/R/blob_copyurl.R
@ -32,7 +32,7 @@ multicopy_url_to_storage.blob_container <- function(container, src, dest, ...)

 #' @param async For `copy_url_to_blob` and `multicopy_url_to_blob`, whether the copy operation should be asynchronous (proceed in the background).
 #' @details
-#' `copy_url_to_blob` transfers the contents of the file at the specified HTTP\[S\] URL directly to blob storage, without requiring a temporary local copy to be made. `multicopy_url_to_blob1 does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
+#' `copy_url_to_blob` transfers the contents of the file at the specified HTTP\[S\] URL directly to blob storage, without requiring a temporary local copy to be made. `multicopy_url_to_blob` does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
 #' @rdname blob
 #' @export
 copy_url_to_blob <- function(container, src, dest, lease=NULL, async=FALSE)
@ -69,12 +69,12 @@ multicopy_url_to_blob <- function(container, src, dest, lease=NULL, async=FALSE,
        stop("'dest' must contain one name per file in 'src'", call.=FALSE)

    if(n_src == 1)
-        return(copy_url_to_blob(container, src, dest, ...))
+        return(copy_url_to_blob(container, src, dest, lease=lease, async=async))

    init_pool(max_concurrent_transfers)

    pool_export("container", envir=environment())
-    pool_map(function(s, d, ...) AzureStor::copy_url_to_blob(container, s, d, ...),
+    pool_map(function(s, d, lease, async) AzureStor::copy_url_to_blob(container, s, d, lease=lease, async=async),
             src, dest, MoreArgs=list(lease=lease, async=async))
    invisible(NULL)
 }
--- a/README.md
+++ b/README.md
@ -59,14 +59,15 @@ These functions for working with objects within a storage container:
 - `delete_storage_file`: delete a file or blob
 - `storage_upload`/`storage_download`: transfer a file to or from a storage container
 - `storage_multiupload`/`storage_multidownload`: transfer multiple files in parallel to or from a storage container
-
+- `get_storage_properties`: Get properties for a storage object
+- `get_storage_metadata`/`set_storage_metadata`: Get and set user-defined metadata for a storage object

 ```r
 # example of working with files and directories (ADLSgen2)
 cont <- storage_container(ad_end_tok, "myfilesystem")
 list_storage_files(cont)
 create_storage_dir(cont, "newdir")
-storage_download(cont, "/readme.txt", "~/readme.txt")
+storage_download(cont, "/readme.txt")
 storage_multiupload(cont, "N:/data/*.*", "newdir")  # uploading everything in a directory
 ```

@ -76,7 +77,7 @@ AzureStor includes a number of extra features to make transferring files efficie

 ### Parallel connections

-As noted above, you can transfer multiple files in parallel using the `multiupload_*`/`multidownload_*` functions. These functions utilise a background process pool supplied by AzureRMR to do the transfers in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.
+As noted above, you can transfer multiple files in parallel using the `storage_multiupload/download` functions. These functions utilise a background process pool supplied by AzureRMR to do the transfers in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.

 ```r
 # uploading/downloading multiple files at once: use a wildcard to specify files to transfer
@ -86,22 +87,7 @@ storage_multidownload(cont, src="/monthly/jan*.*", dest="~/data/january")
 # or supply a vector of file specs as the source and destination
 src <- c("file1.csv", "file2.csv", "file3.csv")
 dest <- file.path("data/", src)
-storage_multiupload(cont, src, dest)
-```
-
-You can also use the process pool to parallelise tasks for which there is no built-in function. For example, the following code will delete multiple files in parallel:
-
-```r
-files_to_delete <- list_storage_files(cont, "datadir", info="name")
-
-# initialise the background pool with 10 nodes
-AzureRMR::init_pool(10)
-
-# export the container object to the nodes
-AzureRMR::pool_export("cont")
-
-# delete the files
-AzureRMR::pool_sapply(files_to_delete, function(f) AzureStor::delete_storage_file(cont, f))
+storage_multiupload(cont, src=src, dest=dest)
 ```

 ### Transfer to and from connections
@ -120,7 +106,7 @@ storage_upload(cont, src=con, dest="iris.rds")

 # downloading files into memory: as a raw vector with dest=NULL, and via a connection
 rawvec <- storage_download(cont, src="iris.json", dest=NULL)
-rawToChar(rawvec)
+rawToChar(rawConnectionValue(rawvec))

 con <- rawConnection(raw(0), "r+")
 storage_download(cont, src="iris.rds", dest=con)
--- a/man/blob.Rd
+++ b/man/blob.Rd
@ -83,7 +83,7 @@ Upload, download, or delete a blob; list blobs in a container.

 \code{upload_blob} and \code{download_blob} can display a progress bar to track the file transfer. You can control whether to display this with \code{options(azure_storage_progress_bar=TRUE|FALSE)}; the default is TRUE.

-\code{copy_url_to_blob} transfers the contents of the file at the specified HTTP[S] URL directly to blob storage, without requiring a temporary local copy to be made. `multicopy_url_to_blob1 does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
+\code{copy_url_to_blob} transfers the contents of the file at the specified HTTP[S] URL directly to blob storage, without requiring a temporary local copy to be made. \code{multicopy_url_to_blob} does the same, for multiple URLs at once. These functions have a current file size limit of 256MB.
 }
 \examples{
 \dontrun{
--- a/tests/testthat/test02a_blobext.R
+++ b/tests/testthat/test02a_blobext.R
@ -144,7 +144,7 @@ test_that("Blob multicopy from URL works",
    contname <- paste0(sample(letters, 10, TRUE), collapse="")
    cont <- create_blob_container(bl, contname)

-    fnames <- c("DESCRIPTION", "LICENSE", "NAMESPACE")
+    fnames <- c("LICENSE", "LICENSE.md", "CONTRIBUTING.md")
    src_urls <- paste0("https://raw.githubusercontent.com/Azure/AzureStor/master/", fnames)
    origs <- paste0("../../", fnames)
    dests <- c(tempfile(), tempfile(), tempfile())
--- a/tests/testthat/test05_generics.R
+++ b/tests/testthat/test05_generics.R
@ -129,7 +129,7 @@ test_that("Blob copy from URL works",
    # use readLines to workaround GH auto-translating CRLF -> LF
    expect_identical(readLines(orig_file), readLines(new_file))

-    fnames <- c("DESCRIPTION", "LICENSE", "NAMESPACE")
+    fnames <- c("LICENSE", "LICENSE.md", "CONTRIBUTING.md")
    src_urls <- paste0("https://raw.githubusercontent.com/Azure/AzureStor/master/", fnames)
    origs <- paste0("../../", fnames)
    dests <- c(tempfile(), tempfile(), tempfile())
--- a/vignettes/intro.rmd
+++ b/vignettes/intro.rmd
@ -62,13 +62,12 @@ These functions for working with objects within a storage container:
 - `storage_upload`/`storage_download`: transfer a file to or from a storage container
 - `storage_multiupload`/`storage_multidownload`: transfer multiple files in parallel to or from a storage container

-
 ```r
 # example of working with files and directories (ADLSgen2)
 cont <- storage_container(ad_end_tok, "myfilesystem")
 list_storage_files(cont)
 create_storage_dir(cont, "newdir")
-storage_download(cont, "/readme.txt", "~/readme.txt")
+storage_download(cont, "/readme.txt")
 storage_multiupload(cont, "N:/data/*.*", "newdir")  # uploading everything in a directory
 ```

@ -78,7 +77,7 @@ AzureStor includes a number of extra features to make transferring files efficie

 ### Parallel connections

-As noted above, you can transfer multiple files in parallel using the `multiupload_*`/`multidownload_*` functions. These functions utilise a background process pool supplied by AzureRMR to do the transfers in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.
+The `storage_multiupload/download` functions transfer multiple files in parallel, which usually results in major speedups when transferring multiple small files. The pool is created the first time a parallel file transfer is performed, and persists for the duration of the R session; this means you don't have to wait for the pool to be (re-)created each time.

 ```r
 # uploading/downloading multiple files at once: use a wildcard to specify files to transfer
@ -91,21 +90,6 @@ dest <- file.path("data/", src)
 storage_multiupload(cont, src, dest)
 ```

-You can also use the process pool to parallelise tasks for which there is no built-in function. For example, the following code will delete multiple files in parallel:
-
-```r
-files_to_delete <- list_storage_files(cont, "datadir", info="name")
-
-# initialise the background pool with 10 nodes
-AzureRMR::init_pool(10)
-
-# export the container object to the nodes
-AzureRMR::pool_export("cont")
-
-# delete the files
-AzureRMR::pool_sapply(files_to_delete, function(f) AzureStor::delete_storage_file(cont, f))
-```
-
 ### Transfer to and from connections

 You can upload a (single) in-memory R object via a _connection_, and similarly, you can download a file to a connection, or return it as a raw vector. This lets you transfer an object without having to create a temporary file as an intermediate step.
@ -122,7 +106,7 @@ storage_upload(cont, src=con, dest="iris.rds")

 # downloading files into memory: as a raw vector with dest=NULL, and via a connection
 rawvec <- storage_download(cont, src="iris.json", dest=NULL)
-rawToChar(rawvec)
+rawToChar(rawConnectionValue(rawvec))

 con <- rawConnection(raw(0), "r+")
 storage_download(cont, src="iris.rds", dest=con)
@ -165,6 +149,52 @@ For more information, see the [AzCopy repo on GitHub](https://github.com/Azure/a
 **Note that AzureStor uses AzCopy version 10. It is incompatible with versions 8.1 and earlier.**


+## Other features
+
+### Parallel connections
+
+The `storage_multiupload/download` functions mentioned above use a background process pool supplied by AzureRMR. You can also use this pool to parallelise tasks for which there is no built-in function. For example, the following code will delete multiple files in parallel:
+
+```r
+files_to_delete <- list_storage_files(container, "datadir", info="name")
+
+# initialise the background pool with 10 nodes
+AzureRMR::init_pool(10)
+
+# export the container object to the nodes
+AzureRMR::pool_export("cont")
+
+# delete the files
+AzureRMR::pool_sapply(files_to_delete, function(f) AzureStor::delete_storage_file(cont, f))
+```
+
+### Metadata
+
+To get and set user-defined properties (metadata) for storage objects, use the `get_storage_metadata` and `set_storage_metadata` functions.
+
+```r
+fs <- storage_container("https://mystorage.dfs.core.windows.net/myshare", key="access_key")
+storage_upload(share, "iris.csv", "newdir/iris.csv")
+
+set_storage_metadata(fs, "newdir/iris.csv", name1="value1")
+# will be list(name1="value1")
+get_storage_metadata(fs, "newdir/iris.csv")
+
+set_storage_metadata(fs, "newdir/iris.csv", name2="value2")
+# will be list(name1="value1", name2="value2")
+get_storage_metadata(fs, "newdir/iris.csv")
+
+set_storage_metadata(fs, "newdir/iris.csv", name3="value3", keep_existing=FALSE)
+# will be list(name3="value3")
+get_storage_metadata(fs, "newdir/iris.csv")
+
+# deleting all metadata
+set_storage_metadata(fs, "newdir/iris.csv", keep_existing=FALSE)
+
+# if no filename supplied, get/set metadata for the container
+get_storage_metadata(fs)
+```
+
 ## Admin interface

 Finally, AzureStor's admin-side interface allows you to easily create and delete resource accounts, as well as obtain access keys and generate a SAS. Here is a sample workflow: