Eliminate dependency on DAAG package (#219)

2020-01-18 02:26:25 +08:00 · 2020-01-18 02:26:25 +08:00 · fabcca8a85
--- a/vignettes/train-and-deploy-to-aci/nassCDS.csv
+++ b/vignettes/train-and-deploy-to-aci/nassCDS.csv
--- a/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd
+++ b/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd
@ -33,6 +33,9 @@ The setup for your development work in this tutorial includes the following acti
 * Create an experiment to track your runs
 * Create a remote compute target to use for training

+If you are using RStudio from a Notebook VM, open this tutorial as a project in RStudio with File > Open Project and select
+your cloned `train-and-deploy-to-aci` folder. 
+
 ### Install required packages
 This tutorial assumes you already have the Azure ML SDK installed. Go ahead and import the **azuremlsdk** package.

@ -40,12 +43,6 @@ This tutorial assumes you already have the Azure ML SDK installed. Go ahead and
 library(azuremlsdk)
 ```

-The tutorial uses data from the [**DAAG** package](https://cran.r-project.org/package=DAAG). Install the package if you don't have it.
-
-```{r eval=FALSE}
-install.packages("DAAG")
-```
-
 The training and scoring scripts (`accidents.R` and `accident_predict.R`) have some additional dependencies. If you plan on running those scripts locally, make sure you have those required packages as well.

 ### Load your workspace
@ -83,15 +80,23 @@ if (is.null(compute_target)) {
 ```

 ## Prepare data for training
-This tutorial uses data from the **DAAG** package. This dataset includes data from over 25,000 car crashes in the US, with variables you can use to predict the likelihood of a fatality. First, import the data into R and transform it into a new dataframe `accidents` for analysis, and export it to an `Rdata` file.
+This tutorial uses data from the US [National Highway Traffic Safety Administration](https://cdan.nhtsa.gov/tsftables/tsfar.htm)  
+(with thanks to [Mary C. Meyer and Tremika Finney](https://www.stat.colostate.edu/~meyer/airbags.htm)).
+This dataset includes data from over 25,000 car crashes in the US, with variables you can use to predict the likelihood of a fatality. First, import the data into R and transform it into a new dataframe `accidents` for analysis, and export it to an `Rdata` file.

 ```{r load_data, eval=FALSE}
-library(DAAG)
-data(nassCDS)
+nassCDS <- read.csv("nassCDS.csv", 
+                     colClasses=c("factor","numeric","factor",
+                                  "factor","factor","numeric",
+                                  "factor","numeric","numeric",
+                                  "numeric","character","character",
+                                  "numeric","numeric","character"))

 accidents <- na.omit(nassCDS[,c("dead","dvcat","seatbelt","frontal","sex","ageOFocc","yearVeh","airbag","occRole")])
 accidents$frontal <- factor(accidents$frontal, labels=c("notfrontal","frontal"))
 accidents$occRole <- factor(accidents$occRole)
+accidents$dvcat <- ordered(accidents$dvcat, 
+                          levels=c("1-9km/h","10-24","25-39","40-54","55+"))

 saveRDS(accidents, file="accidents.Rd")
 ```