Merge pull request #24 from Hong-Revo/master

fix R CMD build
This commit is contained in:
Hong Ooi 2017-10-13 18:47:32 +11:00 коммит произвёл GitHub
Родитель 39b865f0b0 3553b5581d
Коммит 66ee93d013
12 изменённых файлов: 79 добавлений и 83 удалений

Просмотреть файл

@ -10,3 +10,4 @@ Makefile
^test/.*$
^.*\.tar\.gz$
README\.md
misc

1
.gitignore поставляемый
Просмотреть файл

@ -27,6 +27,7 @@ inst/doc
.Rproj.user
*_credentials.R
*~
misc
# VS files

Просмотреть файл

@ -10,13 +10,13 @@ The R package of `AzureDSVM` aims at offering functions that can be conveniently
To install the package from github:
> devtools::install_github("Azure/AzureDSVM")
devtools::install_github("Azure/AzureDSVM")
Help pages are also provided for all functions within the
package. With RStudio for example type AzureDSVM into search when the
package is loaded to see a list of functions/help pages or else
> library(help=AzureDSVM)
library(help=AzureDSVM)
Note: The package will work with any open source R Session or with
Microsoft R extensions.

Просмотреть файл

@ -30,13 +30,7 @@ documentation:
3. `AzureDSVM` assumes R (>= 3.3.1), with dependencies on packages of
`AzureSMR `(>= 0.2.2), `stringr` (>= 1.1.0), `stringi` (>= 1.1.2),
`magrittr` (>= 1.5), `dplyr` (>= 0.5.0). `AzureSMR` can be installed
from its [GitHub repository](https://github.com/Microsoft/AzureSMR):
```{r install azuresmr, eval=FALSE}
if(!require("devtools")) install.packages("devtools")
devtools::install_github("Microsoft/AzureSMR")
```
The other packages are available on CRAN or MRAN.
`magrittr` (>= 1.5), `dplyr` (>= 0.5.0).
4. Before using the functions in `AzureDSVM`, one needs to obtain
authentication credentials for managing Azure resources. Details are
@ -50,14 +44,14 @@ vignette](https://github.com/Microsoft/AzureSMR/blob/master/vignettes/Authentica
`AzureDSVM` is available from its [GitHub
repository](https://github.com/Azure/AzureDSVM). To install,
```{r install azuredsvm, eval=FALSE}
```{r, install azuredsvm, eval=FALSE}
if(!require("devtools")) install.packages("devtools")
devtools::install_github("Azure/AzureDSVM")
```
Help pages can be loaded with
```{r help azuresvm, eval=FALSE}
```{r, help azuresvm, eval=FALSE}
library(help=AzureDSVM)
```

Просмотреть файл

@ -68,7 +68,7 @@ account creation on a Windows DSVM.
We can simply source the credentials file in R.
```{r, setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -90,7 +90,7 @@ devtools::install_github("Azure/AzureDSVM")
We can then load the required pacakges from the libraries.
```{r packages}
```{r, packages, eval=FALSE}
# Load the required packages.
library(AzureSMR) # Support for managing Azure resources.
@ -99,7 +99,7 @@ library(magrittr)
library(dplyr)
```
```{r tuning}
```{r, tuning, eval=FALSE}
# Parameters for this script: the name for the new resource group and
# its location across the Azure cloud. The resource name is used to
# name the resource group that we will create transiently for the
@ -143,7 +143,7 @@ HOST
cat("\n")
```
```{r, connect}
```{r, connect, eval=FALSE}
# Connect to the Azure subscription and use this as the context for
# our activities.
@ -165,7 +165,7 @@ cat("Resource group", RG, "at", LOC,
Create the resource group within which all resources we create will be
grouped.
```{r, create resource group}
```{r, create resource group, eval=FALSE}
# Create a new resource group into which we create the VMs and related
# resources. Resource group name is RG. Note that to create a new
# resource group one needs to add access control of Active Directory
@ -197,7 +197,7 @@ default VM size for deployment is chosen for by enhanced computation
performance. See the documentation for deployDSVM() for the actual
default.
```{r}
```{r, eval=FALSE}
# List the available VM sizes. May differ with location of the data centre.
getVMSizes(context, LOC) %>%
@ -219,7 +219,7 @@ formals(deployDSVM)$os
The following code deploys a Linux DSVM which will take a few minutes.
```{r, deploy}
```{r, deploy, eval=FALSE}
# Create the required Linux DSVM - generally 4 minutes.
ldsvm <- deployDSVM(context,
@ -238,7 +238,7 @@ azureListVM(context, RG)
Prove that the deployed DSVM exists.
```{r, prove exists}
```{r, prove exists, eval=FALSE}
# Send a simple system() command across to the new server to test its
# existence. Expect a single line with an indication of how long the
@ -273,7 +273,7 @@ server through the secure shell and manually run that command. We
then reboot the server so that, for example, kernel updates, take
effect.
```{r, useful tools}
```{r, useful tools, eval=FALSE}
system(paste(ssh, "sudo locale-gen 'en_AU.UTF-8'"))
system(paste(ssh, "sudo apt-get -y install wajig"))
system(paste(ssh, "wajig install -y lsb htop"))
@ -298,7 +298,7 @@ To enable such interaction, a [one-box configuration](https://docs.microsoft.com
key-based authentication methdod can be achieved via `mrsOneBoxConfiguration`
function.
```{r}
```{r, eval=FALSE}
mrsOneBoxConfiguration(context,
resource.group=RG,
location=LOC,
@ -346,7 +346,7 @@ operateDSVM(context, RG, HOST, operation="Stop")
Once we have finished with the server we can delete it and all of its
related resources.
```{r optionally_delete_resource_group}
```{r, optionally_delete_resource_group, eval=FALSE}
# Delete the resource group now that we have proved existence. There
# is probably no need to wait. Only delete if it did not pre-exist
# this script. Deletion takes 10 minutes or more.

Просмотреть файл

@ -23,7 +23,7 @@ can be found [here](https://docs.microsoft.com/en-us/azure/virtual-machines/wind
# Setup
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -32,7 +32,7 @@ USER <- Sys.info()[['user']]
source(paste0(USER, "_credentials.R"))
```
```{r packages}
```{r, packages, eval=FALSE}
# Load the required packages.
library(AzureSMR) # Support for managing Azure resources.
@ -41,7 +41,7 @@ library(magrittr)
library(dplyr)
```
```{r tuning}
```{r, tuning, eval=FALSE}
# Parameters for this script: the name for the new resource group and
# its location across the Azure cloud. The resource name is used to
# name the resource group that we will create transiently for the
@ -82,7 +82,7 @@ cat("\n")
Deployment is the same as that in the previous sub-section. Here in the demo, a
Linux DSVM with public key type authenticaiton is deployed.
```{r connect}
```{r, connect, eval=FALSE}
context <- createAzureContext(tenantID=TID, clientID=CID, authKey=KEY)
rg_pre_exists <- existsRG(context, RG, LOC)
@ -91,13 +91,13 @@ cat("Resource group", RG, "at", LOC,
ifelse(!existsRG(context, RG, LOC), "does not exist.\n", "exists.\n"), "\n")
```
```{r create resource group}
```{r, create resource group, eval=FALSE}
if (! rg_pre_exists) {
azureCreateResourceGroup(context, RG, LOC) %>% cat("\n\n")
}
```
```{r}
```{r, eval=FALSE}
getVMSizes(context, "southeastasia") %>%
set_names(c("Size", "Cores", "DiskGB", "RAM GB", "Disks"))
@ -106,7 +106,7 @@ formals(deployDSVM)$size
formals(deployDSVM)$os
```
```{r deploy}
```{r, deploy, eval=FALSE}
ldsvm <- deployDSVM(context,
resource.group = RG,
location = LOC,
@ -130,7 +130,7 @@ as inputs.
In our example, [a script](https://github.com/Azure/DataScienceVM/blob/master/Extensions/General/create-user-and-updatecntk.sh) available on Azure DSVM github repository is used. The script is to create user and update CNTK on the DSVM. The
command to execute script is "sudo sh <script.R>".
```{r}
```{r, eval=FALSE}
URL <- "https://raw.githubusercontent.com/Azure/DataScienceVM/master/Extensions/General/create-user-and-updatecntk.sh"
CMD <- "sudo sh create-user-and-updatecntk.sh"
@ -157,7 +157,7 @@ Once compute resources are no longer needed, stop or delete it by
operateDSVM(context, RG, HOST, operation="Stop")
```
```{r optionally_delete_resource_group}
```{r, optionally_delete_resource_group, eval=FALSE}
if (! rg_pre_exists)
azureDeleteResourceGroup(context, RG)
```

Просмотреть файл

@ -45,7 +45,7 @@ ensure that the interaction with Azure completes.
Refer to
[Deploy](https://github.com/Azure/AzureDSVM/blob/master/vignettes/10Deploy.Rmd) for an explanation of the set up of the virtual machines.
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -145,7 +145,7 @@ the machines.
We can now deploy a cluster of homogeneous DSVMs. Each DSVM will be
named based on the *hostname* provided and sequentially numbered.
```{r deploy a cluster of DSVMs}
```{r, deploy a cluster of DSVMs, eval=FALSE}
# Deploy a cluster of DSVMs.
deployDSVMCluster(context,
@ -195,7 +195,7 @@ We can also create a collection of Linux DSVMs each with a different
user and with public-key based authentication method. Name, username,
and size can also be configured.
```{r deploy a set of DSVMs, eval=FALSE}
```{r, deploy a set of DSVMs, eval=FALSE}
DSVM_NAMES <- paste0(BASE, c(1, 2, 3)) %T>% print()
DSVM_USERS <- paste0("user", c("a", "b", "c")) %T>% print()
@ -236,7 +236,7 @@ for (vm in DSVM_NAMES)
# Delete the Resource Group
```{r optionally delete resource group}
```{r, optionally delete resource group, eval=FALSE}
# Delete the resource group now that we have proved existence. There
# is probably no need to wait. Only delete if it did not pre-exist
# this script. Deletion seems to take 10 minutes or more.

Просмотреть файл

@ -26,7 +26,7 @@ just for the duration.
# Setup
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -160,7 +160,7 @@ here](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/remote-executi
to configure the deployed DSVMs for remote interaction with Microsoft
R Server.**
```{r set R interface}
```{r, set R interface, eval=FALSE}
# Create a script for remote execution.
@ -225,7 +225,7 @@ performance2
Yet another example with parallel execution by using `rxExec` function
from the Microsoft RevoScaleR package.
```{r}
```{r, eval=FALSE}
# Parallelizing k-means clustering on the iris dataset.
@ -269,7 +269,7 @@ performance3
Clean up.
```{r}
```{r, eval=FALSE}
file.remove(tmpf1, tmpf2)

Просмотреть файл

@ -18,7 +18,7 @@ In this tutorial, we assume that there is at least one DSVM deployed in a resour
Similar to the previous sections, credentials for authentication are required.
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -27,7 +27,7 @@ USER <- Sys.info()[['user']]
source(paste0(USER, "_credentials.R"))
```
```{r packages}
```{r, packages, eval=FALSE}
# Load the required packages.
library(AzureSMR) # Support for managing Azure resources.
@ -62,13 +62,13 @@ GRA <- "Daily"
Get data consumption of the DSVM.
```{r}
```{r, eval=FALSE}
# authentication with Azure account.
context <- createAzureContext(tenantID=TID, clientID=CID, authKey=KEY)
```
```{r}
```{r, eval=FALSE}
# get data consumption of instance.
data_consum <- dataConsumptionDSVM(context,
@ -104,7 +104,7 @@ OFFER <- "MS-AZR-0015P"
Again assuming the information is pre-stored into the "credential" script.
```{r}
```{r, eval=FALSE}
consum <- costDSVM(context,
hostname=VM,
time.start=START,

Просмотреть файл

@ -20,7 +20,7 @@ The demo in this tutorial shows how to perform kmeans clustering on remote DSVMs
Similar to the previous sections, credentials for authentication are required to fire up the DSVMs.
```{r}
```{r, eval=FALSE}
library(AzureDSVM)
library(AzureSMR)
library(dplyr)
@ -32,7 +32,7 @@ library(rattle)
library(ggplot2)
```
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -41,7 +41,7 @@ USER <- Sys.info()[['user']]
source(paste0(USER, "_credentials.R"))
```
```{r}
```{r, eval=FALSE}
COUNT <- 4 # Number of VMs to deploy.
SIZE <- "Standard_DS2_v2"
@ -76,7 +76,7 @@ cat("\n")
Deploy a cluster of DSVMs if there is no existing, otherwise start the
machines.
```{r}
```{r, eval=FALSE}
# --------------------------------------------------------------------------
# Azure data science resource management
@ -194,7 +194,7 @@ The following is the configuration of computing cluster which is needed for spec
* `master` DNS of the DSVM where the worker script will be uploaded to for execution.
* `slaves` DNS of DSVMs where execution of worker script will be distributed to.
```{r}
```{r, eval=FALSE}
# specify machine names, master, and slaves.
machines <- unlist(cluster$name)
@ -205,7 +205,7 @@ slaves <- dns_list[-1]
The following script run the analytics of the worker script in a "local parallel" computing context, and obtain results from remote master node to local R session.
```{r}
```{r, eval=FALSE}
# parallel the analytics with local parallel computing context.
time_1 <- Sys.time()
@ -235,7 +235,7 @@ results_local <-
For comparison purpose, the same analysis is run in the "cluster parallel" context again.
```{r}
```{r, eval=FALSE}
# parallel the analytics across cluster.
time_3 <- Sys.time()
@ -265,14 +265,14 @@ results_cluster <- results
Save time points for later reference
```{r}
```{r, eval=FALSE}
runtime <- list(time_1, time_2, time_3, time_4)
save(runtime, file = "./elapsed.RData")
```
Do some visualization with data.
```{r}
```{r, eval=FALSE}
DATA_URL <- "https://zhledata.blob.core.windows.net/mldata/creditcard.xdf"
credit_data <- rxImport(inData=DATA_URL,
@ -295,7 +295,7 @@ ggplot(data=credit_data, aes(x=V1, y=V2, color=cluster_local)) +
Once finishing the analysis, switch off DSVMs.
```{r}
```{r, eval=FALSE}
# stop machines after the analysis.
for (vm in machines) {
@ -352,7 +352,7 @@ if (length(vm$name) == 1) {
Stop or delete computing resources if they are no longer needed to avoid unnecessary cost.
```{r}
```{r, eval=FALSE}
if (! rg_pre_exists)
azureDeleteResourceGroup(context, RG)
```

Просмотреть файл

@ -34,13 +34,13 @@ In this demonstration, Hot Spots analysis is used for supervised binary classifi
Similar to the previous sections, credentials for authentication are required to fire up the DSVMs.
```{r}
```{r, eval=FALSE}
library(AzureDSVM)
library(AzureSMR)
library(magrittr)
```
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -51,7 +51,7 @@ source(file.path(paste0(USER, "_credentials.R")))
Specifications of computing resources.
```{r}
```{r, eval=FALSE}
runif(4, 1, 26) %>%
round() %>%
letters[.] %>%
@ -87,7 +87,7 @@ cat("\n")
Deploy a cluster of DSVMs if there is no existing, otherwise start the
machines.
```{r}
```{r, eval=FALSE}
# Connect to the Azure subscription and use this as the context for
# all of our activities.
@ -115,7 +115,7 @@ if (! rg_pre_exists)
Create one Linux DSVM for running the Hot Spots analytics.
```{r}
```{r, eval=FALSE}
vm <- AzureSMR::azureListVM(context, RG)
if (!is.null(vm))
@ -147,7 +147,7 @@ if (!is.null(vm))
*Setting up a DSVM is not a trivial task as it is a complete solution for the data scientist and includes thew whole stack of open source data science technology including: R, Python, Hadoop, Spark, MRS, (SQL Server, RTVS, ...)*
```{r}
```{r, eval=FALSE}
azureListVM(context, RG)
```
@ -170,7 +170,7 @@ The top-level worker script for Hot Spot analysis is available as [workerHotSpot
Specify the master node that will run the analytic script. In this case, it is the DSVM created just now.
```{r}
```{r, eval=FALSE}
# specify machine names, master, and slaves.
vm <- AzureSMR::azureListVM(context, RG) %T>% print()
@ -185,7 +185,7 @@ The whole end-to-end Hot Spots analysis is run on the remote machine in a parall
Functions used for the analysis are defined in separated scripts, and uploaded onto remote DSVM with `AzureDSVM::fileTransfer`.
```{r}
```{r, eval=FALSE}
worker_scripts <- c("workerHotspotsFuncs.R",
"workerHotspotsSetup.R",
"workerHotspotsTrain.R",
@ -203,7 +203,7 @@ Remote execution of worker script (the analysis takes approximately 10 - 15 minu
*Whilst this is running we continue talking about the algorithm for Hot Spots.*
```{r}
```{r, eval=FALSE}
# parallel the analytics with local parallel computing context.
time_1 <- Sys.time()
@ -221,7 +221,7 @@ time_2 <- Sys.time()
Get the results from remote DSVM.
```{r}
```{r, eval=FALSE}
# get results from remote
AzureDSVM::fileTransfer(from=paste0(master, ":~"),
@ -237,7 +237,7 @@ results_local <-
Save time points for later reference.
```{r}
```{r, eval=FALSE}
elapsed <- list(instances=machines,
time_start=time_1,
time_end=time_2)
@ -286,7 +286,7 @@ if (length(vm$name) == 1) {
Once finishing the analysis, switch off DSVMs.
```{r}
```{r, eval=FALSE}
# stop machines after the analysis.
for (vm in machines) {
@ -296,7 +296,7 @@ for (vm in machines) {
Or delete the resource group to avoid unnecessary cost.
```{r}
if (! rg_pre_exists)
```{r, eval=FALSE}
if (! rg_pre_exi, eval=FALSEsts)
azureDeleteResourceGroup(context, RG)
```

Просмотреть файл

@ -30,7 +30,7 @@ To automate the whole process of model creation, it is beneficial to have multip
Let's repeat the same thing in the previous tutorials, to deploy DSVMs for the case studies. For comparison, two scenarios, single DSVM and cluster of DSVMs, are deployed under two resource groups.
```{r}
```{r, eval=FALSE}
library(AzureDSVM)
library(AzureSMR)
library(dplyr)
@ -41,7 +41,7 @@ library(readr)
library(rattle)
```
```{r setup}
```{r, setup, eval=FALSE}
# Load the required subscription resources: TID, CID, and KEY.
# Also includes the ssh PUBKEY for the user.
@ -50,7 +50,7 @@ USER <- Sys.info()[['user']]
source(file.path("..", paste0(USER, "_credentials.R")))
```
```{r}
```{r, eval=FALSE}
BASE <-
runif(4, 1, 26) %>%
round() %>%
@ -80,7 +80,7 @@ cat("\n")
Check existence of resource group and create one if there is no.
```{r connect}
```{r, connect, eval=FALSE}
context <- createAzureContext(tenantID=TID, clientID=CID, authKey=KEY)
@ -92,7 +92,7 @@ rg_pre_exists <- existsRG(context, RG, LOC)
Create the resource group within which all resources we create will be
grouped.
```{r create resource group}
```{r, create resource group, eval=FALSE}
if (! rg_pre_exists) azureCreateResourceGroup(context, RG, LOC)
existsRG(context, RG, LOC)
@ -103,7 +103,7 @@ existsRG(context, RG, LOC)
Create the actual Ubuntu DSVM with public-key based authentication
method. Name, username, and size can also be configured.
```{r deploy}
```{r, deploy, eval=FALSE}
# Create the required Ubuntu DSVM - generally 4 minutes.
@ -124,7 +124,7 @@ azureListVM(context, RG)
## Deploy a cluster of Ubuntu Data Science Virtual Machines.
```{r}
```{r, eval=FALSE}
# Create a set of Ubuntu DSVMs and they will be formed as a cluster.
@ -142,7 +142,7 @@ deployDSVMCluster(context,
To start with, candidature model types as well as their parameters are pre-configured as follows. In this case, three different algorithms available in Microsoft RevoScaleR package are used.
```{r, algorithm selection and parameter tuning}
```{r, algorithm selection and parameter tuning, eval=FALSE}
# make a model config to temporarily preserve model parameters. Parameters are kept fixed.
@ -182,7 +182,7 @@ Codes of solving such a machine learning problem can be obtained from
Following shows snippets of the machine learning process.
```{r}
```{r, eval=FALSE}
# functions used for model building and evaluating.
mlProcess <- function(formula, data, modelName, modelPara) {
@ -251,11 +251,11 @@ The worker script can be executed on a remote Ubuntu DSVM or DSVM cluster with A
The worker script for binary classification is located in "vignettes/test" directory, with name "worker_classficiation.R".
```{r}
```{r, eval=FALSE}
VM_URL <- paste(HOST, LOC, "cloudapp.azure.com", sep=".")
```
```{r execution}
```{r, execution, eval=FALSE}
# remote execution on a single DSVM.
@ -294,7 +294,7 @@ time3 <- Sys.time()
Save time variables into a data file for later references.
```{r}
```{r, eval=FALSE}
save(list(time_1, time_2, time_3), "./elapsed.RData")
```
@ -302,7 +302,7 @@ save(list(time_1, time_2, time_3), "./elapsed.RData")
After execution of the analytic job is done, expense on running the executions on Azure resources cannot be obtained.
```{r}
```{r, eval=FALSE}
# calculate expense on computations.
load("./elapsed.RData")
@ -338,7 +338,7 @@ if (length(vm$name) == 1) {
Stop or delete computing resources if they are no longer needed to avoid unnecessary cost.
```{r}
```{r, eval=FALSE}
if (! rg_pre_exists)
azureDeleteResourceGroup(context, RG)
```