doAzureParallel/samples/resource_files
Brian Hoang 96bfc22662
Fix: Upgrading to R Batch SDK to 2018-12-01.8.0 (#354)
* Added resource files

* Added resource files

* Removed comments

* Fixed resource files documentation

* Added check on job state

* Fixed jobState
2019-06-18 21:04:30 -07:00
..
README.md Feature/container (#153) 2017-11-03 10:06:40 -07:00
resource_files_cluster.json Feature/container (#153) 2017-11-03 10:06:40 -07:00
resource_files_example.R Fix: Upgrading to R Batch SDK to 2018-12-01.8.0 (#354) 2019-06-18 21:04:30 -07:00

README.md

Resource Files

The following two samples show how to use resource files to move data onto and off of the nodes in doAzureParallel. Good data movement techniques, especially for large data, are critical to get your code running quickly and in a scalable fashion.

Resource Files example

The resource files example is a good starting point on how to manage your files in the cloud and use them in your doAzureParallel cluster. The doAzureParallel package exposes Azure Storage methods to allow you to create, upload and download files from cloud storage.

This samples shows how to work with the well known large data set for the NYC Yellow Taxi Cab data set. It partitions the data set into monthly sets and then iterates over each month individually to create a map of all the pick up locations in NYC. The final result is then again uploaded to cloud storage as an image, and can be downloaded using any standard tools or viewed in a browser.

NOTE: This sample may cause the cluster to take a bit of time to set up because it needs to download a large amount of data on each node.