Make vignettes discoverable (#320)

* Make Troubleshooting vignette visible and extend title * Make "Deploying Models" vignette visible * Rename and move "Train and deploy first model" to vignettes folder * Expose "Experiments with R" vignette in vignettes folder * Move "Hyperparameter Tuning with Keras" vignette * Move "Deploy to ASK vignette" * Move "Train with Tensorflow" vignette * Info about vignettes in READMEs * update path in Train with Tensorflow vignette * correct metadata * typo fix
2020-05-06 14:42:22 -07:00 · 2020-05-06 14:42:22 -07:00 · 23fa576f69
--- a/README.md
+++ b/README.md
@ -44,7 +44,7 @@ install.packages("azuremlsdk")

 # Or the development version from GitHub
 install.packages("remotes")
-remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')
+remotes::install_github('https://github.com/Azure/azureml-sdk-for-r', build_vignettes = TRUE)

 # Then, use `install_azureml()` to install the compiled code from the AzureML Python SDK.
 azuremlsdk::install_azureml()
--- a/vignettes/README.md
+++ b/vignettes/README.md
@ -16,6 +16,6 @@ The following vignettes are included:
 For additional examples on using the R SDK, see the [samples](../samples) folder.

 ### Azure ML guides
-In addition to the end-to-end vignettes, the [guides](guides/) directory contains more detailed documentation for the following:
+In addition to the end-to-end vignettes, we also provide more detailed documentation for the following:
 * [Deploying models](deploying-models.Rmd): Where and how to deploy models on Azure ML.
 * [Troubleshooting](troubleshooting.Rmd): Known issues and troubleshooting for using R in Azure ML.
--- a/vignettes/deploy-to-aks/deploy-to-aks.Rmd
+++ b/vignettes/deploy-to-aks/deploy-to-aks.Rmd
@ -45,13 +45,13 @@ ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your re
 ```

 ## Register the model
-In this tutorial we will deploy a model that was trained in one of the [samples](https://github.com/Azure/azureml-sdk-for-r/blob/master/samples/training/train-on-amlcompute/train-on-amlcompute.R). The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (`model.rds`) for the tutorial; it is located in the same directory as this vignette.
+In this tutorial we will deploy a model that was trained in one of the [samples](https://github.com/Azure/azureml-sdk-for-r/blob/master/samples/training/train-on-amlcompute/train-on-amlcompute.R). The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (`model.rds`) for the tutorial; it is located in the `deploy-to-aks` subfolder of this vignette.

 First, register the model to your workspace with [`register_model()`](https://azure.github.io/azureml-sdk-for-r/reference/register_model.html). A registered model can be any collection of files, but in this case the R model file is sufficient. Azure ML will use the registered model for deployment.

 ```{r register_model, eval=FALSE}
 model <- register_model(ws, 
-                        model_path = "model.rds", 
+                        model_path = "deploy-to-aks/model.rds", 
                        model_name = "iris_model",
                        description = "Predict an Iris flower type")
 ```
@ -92,7 +92,7 @@ Now you have everything you need to create an inference config for encapsulating

 ``` {r create_inference_config, eval=FALSE}
 inference_config <- inference_config(
-  entry_script = "score.R",
+  entry_script = "deploy-to-aks/score.R",
  source_directory = ".",
  environment = r_env)
 ```
--- a/vignettes/guides/deploying-models.Rmd
+++ b/vignettes/guides/deploying-models.Rmd
--- a/vignettes/experiments-with-R/experiments-with-R.Rmd
+++ b/vignettes/experiments-with-R/experiments-with-R.Rmd
@ -1,8 +1,11 @@
 ---
 title: "A Deeper Dive into Experiments with R"
-author: "David Smith"
-date: "1/26/2020"
-output: html_document
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{A Deeper Dive into Experiments with R}
+  %\VignetteEngine{knitr::rmarkdown}
+  \use_package{UTF-8}
 ---

 ```{r setup, include=FALSE}
@ -28,7 +31,7 @@ ws <- load_workspace_from_config()
 ds <- get_default_datastore(ws)
 target_path <- "accidentdata"

-#download_from_datastore(ds, target_path=".",prefix="accidentdata")
+download_from_datastore(ds, target_path=".", prefix="accidentdata")

 ## Find the compute target
 cluster_name <- "rcluster"
@ -37,12 +40,14 @@ if(is.null(compute_target)) stop("Training cluster not found")
 ```

 ## Try out several models
-We have provided three different experiment files: [`accident-glm.R`](accident-glm.R), [`accident-knn.R`](accident-knn.R), [`accident-glmnet.R`](accident-glmnet.R). Each uses the `caret` package to fit a predictive model to the accidents data (GLM, KNN and GLMNET respectively). Here are the parts of `accident-glm.R` that load the data and fit the model. 
+We have provided three different experiment files in the `experiments-with-R` folder:
+[`accident-glm.R`](accident-glm.R), [`accident-knn.R`](accident-knn.R), [`accident-glmnet.R`](accident-glmnet.R). 
+Each uses the `caret` package to fit a predictive model to the accidents data (GLM, KNN and GLMNET respectively). Here are the parts of `accident-glm.R` that load the data and fit the model. 

 The script loads the data `accidents` from the shared datastore, and then creates a training partition `accident_trn`. This data is then passed to the caret `train` function to fit a generalized linear model (in this case, a logistic regression).

 ```{r accident-glm, eval=FALSE}
-### FROM FILE: accident-glm.R - do not run this code
+### FROM FILE: accident-glm.R - do not run this code chunk

 ## Caret GLM model on training set with 5-fold cross validation
 accident_glm_mod <- train(
@ -67,7 +72,7 @@ Now, train the GLM model by submitting the script `accident-glm.R` to the experi

 ```{r run-experiment-1, eval=FALSE}
 est <- estimator(source_directory=".",
-                 entry_script = "accident-glm.R",
+                 entry_script = "experiments-with-R/accident-glm.R",
                 script_params = list("--data_folder" = ds$path(target_path)),
                 compute_target = compute_target)
 run <- submit_experiment(exp, est)
@ -83,13 +88,13 @@ Now let's submit scripts fitting K-nearest-neighbors and GLMnet models to the da
 accuracy statistic is tracked with this line of code:

 ```{r tracking_code, eval=FALSE}
-### DO NOT RUN: tracking code from accident-XXX.R scripts
+### DO NOT RUN THIS CODE CHUNK: tracking code from accident-XXX.R scripts
 log_metric_to_run("Accuracy",
                  calc_acc(actual = accident_tst$dead,
                           predicted = predict(accident_glmnet_mod, newdata = accident_tst))
 )
-log_metric_to_run("Method","GLMNET")
-log_metric_to_run("TrainPCT",train.pct)
+log_metric_to_run("Method", "GLMNET")
+log_metric_to_run("TrainPCT", train.pct)
 ```

 We also track the algorithm used with the "Method" metric. (It's not really a metric, but it's useful to track in the Experiment view.) We also track the percentage of data used for the training set (the remainder is used for the test set, where accuracy is calculated). By default it is set to 75%, and we'll see how to change that in the next session.
@ -98,13 +103,13 @@ For now, submit expermiments for the KNN and GLMnet models:

 ```{r run-experiment-2, eval=FALSE}
 est <- estimator(source_directory=".",
-                 entry_script = "accident-knn.R",
+                 entry_script = "experiments-with-R/accident-knn.R",
                 script_params = list("--data_folder" = ds$path(target_path)),
                 compute_target = compute_target)
 run <- submit_experiment(exp, est)

 est <- estimator(source_directory=".",
-                 entry_script = "accident-glmnet.R",
+                 entry_script = "experiments-with-R/accident-glmnet.R",
                 script_params = list("--data_folder" = ds$path(target_path)),
                 compute_target = compute_target)
 run <- submit_experiment(exp, est)
@ -117,7 +122,7 @@ Also take a look at your experiment, by clicking on "Experiments" and then "acci
 Speaking of the training percentage, the experiment scripts have been designed to accept a command-line argument to specify the proportion used for the training set. The code, which makes use of the `optparse` package, looks like this:

 ```{r options-code, eval=FALSE}
-## DO NOT RUN: options code from experiment script
+## DO NOT RUN THIS CODE CHUNK: options code from experiment script
 options <- list(
  make_option(c("-d", "--data_folder")),
  make_option(c("-p", "--percent_train"))
@ -136,7 +141,7 @@ train_pct_exp <- 0.80

 ## GLM model
 est <- estimator(source_directory = ".",
-                 entry_script = "accident-glm.R",
+                 entry_script = "experiments-with-R/accident-glm.R",
                 script_params = list("--data_folder" = ds$path(target_path),
                                      "--percent_train" = train_pct_exp),
                 compute_target = compute_target
@ -146,7 +151,7 @@ run.glm <- submit_experiment(exp, est)
 ## KNN model
 exp <- experiment(ws, "accident")
 est <- estimator(source_directory = ".", 
-                 entry_script = "accident-knn.R",
+                 entry_script = "experiments-with-R/accident-knn.R",
                 script_params = list("--data_folder" = ds$path(target_path),
                                      "--percent_train" = train_pct_exp),
                 compute_target = compute_target
@ -156,14 +161,14 @@ run.knn <- submit_experiment(exp, est)
 ## GLMNET model
 exp <- experiment(ws, "accident")
 est <- estimator(source_directory = ".", 
-                 entry_script = "accident-glmnet.R",
+                 entry_script = "experiments-with-R/accident-glmnet.R",
                 script_params = list("--data_folder" = ds$path(target_path),
                                      "--percent_train" = train_pct_exp),
                 compute_target = compute_target
 )
 run.glmnet <- submit_experiment(exp, est)
 ```
-We can check the accuracy for our runs at `ml.azure.com`, or by querying the service directly:
+We can check the accuracy for our runs after they complete at `ml.azure.com`, or by querying the service directly:

 ```{r check-metrics, eval=FALSE}
 get_run_metrics(run.glm)$Accuracy
@ -187,7 +192,7 @@ model <- register_model(ws,
 r_env <- r_environment(name = "basic_env")

 inference_config <- inference_config(
-  entry_script = "accident_predict_caret.R",
+  entry_script = "experiments-with-R/accident_predict_caret.R",
  source_directory = ".",
  environment = r_env)
 ```
@ -211,7 +216,7 @@ wait_for_deployment(aci_service, show_output = TRUE)
 If you wanted to deploy to Kubernetes, you would use something like this instead:

 ```{r provis-aks, eval=FALSE}
-## DO NOT RUN: sample code for Kubernetes deployment
+## DO NOT RUN THIS CODE CHUNK: sample code for Kubernetes deployment
 aks_target <- create_aks_compute(ws, 
                                 cluster_name = 'caretkluster',
                                 vm_size='Standard_D2_v2',
@ -226,10 +231,10 @@ We have provided the server and UI for a shiny application
 in the `accident-app` folder. The app uses the global variable `accident-endpoint` to find the endpoint of the prediction service to call, so set it here:

 ```{r get_endpoint, eval=FALSE}
-accident.endpoint <- get_webservice(ws,   "accident-predict-caret")$scoring_uri
+accident.endpoint <- get_webservice(ws,   "accident-pred-caret")$scoring_uri
 ```
-You can run the app by opening `app.R` in the `accident-app` folder in RStudio and clicking "Run App", or by running the code below. Try out different values of the input variables to see how they affect the predicted probability of a fatal accident.
+You can run the app by opening `app.R` in the `experiments-with-R/accident-app` folder in RStudio and clicking "Run App", or by running the code below. Try out different values of the input variables to see how they affect the predicted probability of a fatal accident.

 ```{r shiny-app, eval=FALSE}
-shiny::runApp("accident-app")
+shiny::runApp("experiments-with-R/accident-app")
 ```
--- a/vignettes/hyperparameter-tune-with-keras/hyperparameter-tune-with-keras.Rmd
+++ b/vignettes/hyperparameter-tune-with-keras/hyperparameter-tune-with-keras.Rmd
@ -77,7 +77,7 @@ if (is.null(compute_target))
 ```

 ## Prepare the training script
-A training script called `cifar10_cnn.R` has been provided for you in the same directory as this tutorial.
+A training script called `cifar10_cnn.R` has been provided for you in the `hyperparameter-tune-with-keras` folder.

 In order to leverage HyperDrive, the training script for your model must log the relevant metrics during model training. When you configure the hyperparameter tuning run, you specify the primary metric to use for evaluating run performance. You must log this metric so it is available to the hyperparameter tuning process.

@ -122,7 +122,7 @@ env <- r_environment("tensorflow-env",
                     cran_packages = list(cran_package("keras")),
                     use_gpu = TRUE)
 est <- estimator(source_directory = ".",
-                 entry_script = "cifar10_cnn.R",
+                 entry_script = "hyperparameter-tune-with-keras/cifar10_cnn.R",
                 compute_target = compute_target,
                 environment = env)
 ```
--- a/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd
+++ b/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd
@ -33,13 +33,14 @@ The setup for your development work in this tutorial includes the following acti
 * Create an experiment to track your runs
 * Create a remote compute target to use for training

-If you are using RStudio from a Notebook VM, open this tutorial as a project in RStudio with File > Open Project and select
-your cloned `train-and-deploy-to-aci` folder. 
+To run this notebook in an Azure ML Compute Instance, visit the [Azure Machine Learning studio](https://ml.azure.com) and browse to 
+Notebooks > Samples > Azure ML gallery > Samples > R > <version> > vignettes. Click the "..." icon next to vignettes and chose "clone". Launch RStudio Server from the link
+in the "Compute" tab. In RStudio, select "File > New Project > Existing Directory" and browse to the cloned "Vignettes" folder.

 ### Install required packages
 This tutorial assumes you already have the Azure ML SDK installed. 
 (If you are running this vignette from an RStudio instance in an Azure
-ML Compute Instance or Notebook VM, the package is already installed for you.)
+ML Compute Instance, the package is already installed for you.)
 Go ahead and load the **azuremlsdk** package.

 ```{r eval=FALSE}
@ -76,6 +77,7 @@ if (is.null(compute_target)) {
  compute_target <- create_aml_compute(workspace = ws,
                                       cluster_name = cluster_name,
                                       vm_size = vm_size,
+                                       min_nodes = 1,
                                       max_nodes = 2)
  
  wait_for_provisioning_completion(compute_target, show_output = TRUE)
@ -89,7 +91,7 @@ This tutorial uses data from the US [National Highway Traffic Safety Administrat
 This dataset includes data from over 25,000 car crashes in the US, with variables you can use to predict the likelihood of a fatality. First, import the data into R and transform it into a new dataframe `accidents` for analysis, and export it to an `Rdata` file.

 ```{r load_data, eval=FALSE}
-nassCDS <- read.csv("nassCDS.csv", 
+nassCDS <- read.csv("train-and-deploy-first-model/nassCDS.csv", 
                     colClasses=c("factor","numeric","factor",
                                  "factor","factor","numeric",
                                  "factor","numeric","numeric",
@ -127,7 +129,7 @@ For this tutorial, fit a logistic regression model on your uploaded data using y
 * Submit the job

 ### Prepare the training script
-A training script called `accidents.R` has been provided for you in the same directory as this tutorial. Notice the following details **inside the training script** that have been done to leverage the Azure ML service for training:
+A training script called `accidents.R` has been provided for you in the `train-and-deploy-first-model` folder. Notice the following details **inside the training script** that have been done to leverage the Azure ML service for training:

 * The training script takes an argument `-d` to find the directory that contains the training data. When you define and submit your job later, you point to the datastore for this argument. Azure ML will mount the storage folder to the remote cluster for the training job.
 * The training script logs the final accuracy as a metric to the run record in Azure ML using `log_metric_to_run()`. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record. The metrics can then be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com). See the [reference](https://azure.github.io/azureml-sdk-for-r/reference/index.html#section-training-experimentation) for the full set of logging methods `log_*()`.
@ -147,7 +149,7 @@ To create the estimator, define:

 ```{r create_estimator, eval=FALSE}
 est <- estimator(source_directory = ".",
-                 entry_script = "accidents.R",
+                 entry_script = "train-and-deploy-first-model/accidents.R",
                 script_params = list("--data_folder" = ds$path(target_path)),
                 compute_target = compute_target
                 )
@ -260,7 +262,7 @@ Now you have everything you need to create an **inference config** for encapsula

 ``` {r create_inference_config, eval=FALSE}
 inference_config <- inference_config(
-  entry_script = "accident_predict.R",
+  entry_script = "train-and-deploy-first-model/accident_predict.R",
  source_directory = ".",
  environment = r_env)
 ```
--- a/vignettes/train-and-deploy-first-model/accident_predict.R
+++ b/vignettes/train-and-deploy-first-model/accident_predict.R
--- a/vignettes/train-and-deploy-first-model/accidents.R
+++ b/vignettes/train-and-deploy-first-model/accidents.R
--- a/vignettes/train-and-deploy-first-model/nassCDS.csv
+++ b/vignettes/train-and-deploy-first-model/nassCDS.csv
--- a/vignettes/train-with-tensorflow/train-with-tensorflow.Rmd
+++ b/vignettes/train-with-tensorflow/train-with-tensorflow.Rmd
@ -69,7 +69,7 @@ if (is.null(compute_target))

 ## Prepare the training script

-A training script called `tf_mnist.R` has been provided for you in the same directory as this tutorial. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record, and can be be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com/).
+A training script called `tf_mnist.R` has been provided for you in the `train-with-tensorflow` subfolder of this vignette. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record, and can be be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com/).

 In order to collect and upload run metrics, you need to do the following **inside the training script**:

@ -107,7 +107,7 @@ env <- r_environment("tensorflow-env",
                                                       version = "1.14.0")),
                     use_gpu = TRUE)
 est <- estimator(source_directory = ".",
-                 entry_script = "tf_mnist.R",
+                 entry_script = "train-with-tensorflow/tf_mnist.R",
                 compute_target = compute_target,
                 environment = env)
 ```
--- a/vignettes/guides/troubleshooting.Rmd
+++ b/vignettes/guides/troubleshooting.Rmd
@ -1,5 +1,5 @@
 ---
-title: "Troubleshooting"
+title: "Known issues and troubleshooting"
 date: "`r Sys.Date()`"
 output: rmarkdown::html_vignette
 vignette: >