Operationalization

2017-09-11 11:58:13 +08:00 · 2017-09-11 11:58:13 +08:00 · 067c0f0356
--- a/EmployeeAttritionPrediction/Code/EmployeeAttritionPrediction.Rmd
+++ b/EmployeeAttritionPrediction/Code/EmployeeAttritionPrediction.Rmd
@ -260,13 +260,13 @@ After balancing the training set, a model can be created for prediction. For com
 Three algorithms, support vector machine with radial basis function kernel, random forest, and extreme gradient boosting (xgboost), are used for model building.
 ```{r, echo=TRUE, message=FALSE, warning=FALSE}
 # initialize training control. 
-tc <- trainControl(method="boot", 
-                   number=3, 
-                   repeats=3, 
-                   search="grid",
-                   classProbs=TRUE,
-                   savePredictions="final",
-                   summaryFunction=twoClassSummary)
+tc <- trainControl(method="repeatedcv", 
+                   number=3,
+                   repeats=1,
+                   search="random",
+                   summaryFunction=twoClassSummary,
+                   classProbs=TRUE, 
+                   savePredictions=TRUE)

 # SVM model.

@ -274,16 +274,16 @@ time_svm <- system.time(
  model_svm <- train(Attrition ~ .,
                     df_train,
                     method="svmRadial",
-                     trainControl=tc)
+                     trControl=tc)
 )

 # random forest model

 time_rf <- system.time(
  model_rf <- train(Attrition ~ .,
-                     df_train,
-                     method="rf",
-                     trainControl=tc)
+                    data=df_train,
+                    method="rf",
+                    trControl=tc)
 )

 # xgboost model.
@ -292,7 +292,7 @@ time_xgb <- system.time(
  model_xgb <- train(Attrition ~ .,
                     df_train,
                     method="xgbLinear",
-                     trainControl=tc)
+                     trControl=tc)
 )
 ```
 2. Ensemble of models.
@ -642,7 +642,7 @@ SVM with RBF kernel is used as an illustration.
 model_svm <- train(Attrition ~ .,
                   df_txt_train,
                   method="svmRadial",
-                   trainControl=tc)
+                   trControl=tc)
 ```
 ```{r}
 # model evaluation
--- a/EmployeeAttritionPrediction/Code/EmployeeAttritionPrediction.nb.html
+++ b/EmployeeAttritionPrediction/Code/EmployeeAttritionPrediction.nb.html
--- a/EmployeeAttritionPrediction/Code/EmployeeAttritionPredictionOperationalization.Rmd
+++ b/EmployeeAttritionPrediction/Code/EmployeeAttritionPredictionOperationalization.Rmd
@ -0,0 +1,591 @@
+---
+title: "Operationalization of Employee Attrition Prediction on Azure Cloud"
+author: "Le Zhang, Data Scientist, Microsoft"
+date: "August 19, 2017"
+output: html_document
+---
+
+## Introduction
+
+It is preferrable to create AI application hosted on cloud for obvious benefits 
+of elasticity, agility, and flexibility of training model and deploying services.
+
+The tutorial in this markdown will demonstrate how to operationalize the 
+[Employee Attrition Prediction](https://github.com/Microsoft/acceleratoRs/tree/master/EmployeeAttritionPrediction)
+on Azure cloud and then deploy the model as well as analytical functions onto 
+web-based services.
+
+## Data exploration and model training - Azure Data Science Virtual Machine
+
+### Introduction
+
+[Azure Data Science Virtual Machine (DSVM)](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-provision-vm)
+is a curated virtual machine image that is configured with a comprehensive set of
+commonly used data analytical tools and software. DSVM is a desirable workplace
+for data scientists to quickly experiment and prototype a data analytical idea. 
+
+R packages [AzureSMR](https://github.com/Microsoft/AzureSMR) and [AzureDSVM](https://github.com/Azure/AzureDSVM)
+are to simplify the use and operation of DSVM. One can use functions of the 
+packages to easily create, stop, and destroy DSVMs in Azure resource group. To
+get started, simply do initial set ups with an Azure subscription, as instructed
+[here](http://htmlpreview.github.io/?https://github.com/Microsoft/AzureSMR/blob/master/inst/doc/Authentication.html).
+
+### Set up a DSVM for employee attrition prediction
+
+#### Pre-requisites
+
+For this tutorial, a Ubuntu Linux DSVM is spinned up for the experiment. Since
+the analysis is performed on a relatively small data set, a medium-size VM is 
+sufficient. In this case, a Standard D2 v2 VM is used. It roughly costs 0.158 USD
+per hour (more details about pricing can be found [here](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/).
+
+The DSVM can be deployed by using either Azure portal, Azure Command-Line 
+Interface, or AzureDSVM R package from within an R session. 
+
+The following are the codes for deploying a Linux DSVM with Standard D2 v2 size.
+
+```{r, eval=FALSE}
+# Load the R packages for resource management.
+
+library(AzureSMR)
+library(AzureDSVM)
+```
+
+To start with `AzureSMR` and `AzureDSVM` packages for operating Azure resources,
+it is required to create and set up an Azure Azure Directory App which is 
+authorized for consuming Azure REST APIs. Details can be found in the AzureSMR package [vignette](https://github.com/Microsoft/AzureSMR/blob/master/vignettes/Authentication.Rmd)
+
+After the proper set up, credentials such as client ID, tenant ID, and secret key
+can be obtained.
+Credentials for authentication are suggested to be put in a config.json file which is 
+located at "~/.azuresmr" directory. `read.AzureSMR.config` function then reads
+the config json file into an R object. The credentials are used to set an 
+Azure Active Context which is then used for authentication.
+
+```{r, eval=FALSE}
+settingsfile <- getOption("AzureSMR.config")
+config <- read.AzureSMR.config()
+
+asc <- createAzureContext()
+
+setAzureContext(asc, 
+                tenantID=config$tenantID, 
+                clientID=config$clientID, 
+                authKey=config$authKey)
+```
+
+Authentication.
+
+```{r, eval=FALSE}
+azureAuthenticate(asc)
+```
+
+#### Deployment of DSVM
+
+Specifications for deploying the DSVM are given as inputs of the deployment
+function from `AzureDSVM`.
+
+In this case, a resource group in Southeast Asia is created, and a Ubuntu DSVM
+with Standard D2 v2 size is created. 
+
+```{r, eval=FALSE}
+dsvm_location <- "southeastasia"
+dsvm_rg       <- paste0("rg", paste(sample(letters, 3), collapse=""))
+
+dsvm_size     <- "Standard_D2_v2"
+dsvm_os       <- "Ubuntu"
+dsvm_name     <- paste0("dsvm", 
+                        paste(sample(letters, 3), collapse=""))
+dsvm_authen   <- "Password"
+dsvm_password <- "Not$ecure123"
+dsvm_username <- "dsvmuser"
+
+```
+
+After that, the resourece group can be created.
+
+```{r, eval=FALSE}
+# create resource group.
+
+azureCreateResourceGroup(asc, 
+                         location=dsvm_location, 
+                         resourceGroup=dsvm_rg)
+
+```
+
+In the resource group, the DSVM with above specifications is created.
+
+```{r, eval=FALSE}
+# deploy a DSVM.
+
+deployDSVM(asc, 
+           resource.group=dsvm_rg,
+           location=dsvm_location,
+           hostname=dsvm_name,
+           username=dsvm_username,
+           size=dsvm_size,
+           os=dsvm_os,
+           authen=dsvm_authen,
+           password=dsvm_password,
+           mode="Sync")
+```
+
+#### Adding extension to DSVM
+
+Some R packages (e.g., `caretEnsemble`) used in the accelerator are not
+pre-installed in a freshly deployed Linux DSVM. These packages can be installed
+post deployment with [Azure VM Extensions](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/extensions-features) which is also available in `AzureDSVM`.
+
+Basically the Azure Extensions function runs a script located remote on the
+target VM. In this case, the script, named `script.sh`, is a Linux shell script
+in which R packages that are needed but missing in the DSVM are installed.
+
+The following R codes add extension to the deployed DSVM.
+
+```{r, eval=FALSE}
+# add extension to the deployed DSVM. 
+# NOTE extension is installed as root.
+
+dsvm_command <- "sudo sh script.sh"
+dsvm_fileurl <- "https://raw.githubusercontent.com/Microsoft/acceleratoRs/master/EmployeeAttritionPrediction/Code/script.sh"
+
+addExtensionDSVM(asc,
+                 location=dsvm_location,
+                 resource.group=dsvm_rg,
+                 hostname=dsvm_name,
+                 os=dsvm_os, 
+                 fileurl=dsvm_fileurl, 
+                 command=dsvm_command)
+```
+
+Once experiment with the accelerator is finished, deallocate the DSVM by
+stopping it so that there will be no charge on the machine, 
+
+```{r, eval=FALSE}
+# Stop the DSVM if it is not needed.
+
+operateDSVM(asc, 
+            resource.group=dsvm_rg, 
+            hostname=dsvm_name, 
+            operation="Stop")
+```
+
+or destroy the whole resource group if the instances are not needed.
+
+```{r, eval=FALSE}
+# Resource group can be removed if the resources are no longer needed.
+
+azureDeleteResourceGroup(asc, resourceGroup=dsvm_rg)
+```
+
+#### Remote access to DSVM
+
+The DSVM can be accessed via several approaches:
+
+* Remote desktop. [X2Go](https://wiki.x2go.org/doku.php) server is
+pre-configured on a DSVM so one can used X2Go client to log onto that machine
+and use it as a remote desktop.
+* RStudio Server. RStudio Server is installed, configured, but not started
+on a Linux DSVM. Starting RStudio Server is embedded in the DSVM extension, so 
+after running the extension code above, one can access the VM via RStudio Server
+ ("http://<dsvm_name>.<dsvm_location>.cloudapp.azure.com:8787"). The user name
+ and password used in creating the DSVM can be used for log-in.
+* Jupyter notebook. Similar to RStudio Server, R user can also work on the DSVM
+within a Jupyter notebook environment. The remote JupyterHub can be accessed via
+the address "https://<dsvm_name>.<dsvm_location>.cloudapp.azure.com:8000". To
+enable an R environment, select R kernel when creating a new notebook.
+
+The accelerator in both `.md` and `.ipynb` formats are provided for convenient
+run in RStudio and Jupyter notebook environments, respectively.
+
+## Service deployment 
+
+The section shows how to consume data analytics in the accelerator on web-based
+shiny applications. 
+
+### Deployment of R application
+
+It is usually desirable to deploy R analytics as applications. This allows non-R
+-user data scientist to consume the pre-trained model or analytical results. For
+instance, the model created in the employee attrition accelerator can be 
+consumed by end users for either statistical analysis on raw data or real-time
+attrition prediction.
+
+#### Ways of deployment
+
+There are various ways of deploying R analyics. 
+
+* Deployment as API. Deployment of API will benefit downstream developers to 
+consume the data analytics in other applications. It is flexible and efficient.
+R packages such as `AzureML` and `mrsdeploy` allow deployment of R codes onto 
+Azure Machine Learning Studio web service and web service hosted on a machine 
+where Microsoft R Server is installed and configured, respectively. Other
+packages such as `plumbr` also allows publishing R codes on a local host as
+a web service.
+* Deployment as GUI application. [R Shiny](https://shiny.rstudio.com/) is the most popular framework
+for publishing R codes as GUI based application. The application can also be
+publically accessible if it is hosted on Shiny server (not free). Shiny 
+framework provides rich set of functions to define UI and server logic for 
+static, responsive, and graphical interactions with application.
+* Deployment as Container. [Docker](https://www.docker.com/) container becomes increasingly popular along
+with the proliferation of microservice architecture. The benefits of running
+container as a service is that different services can be easily modularized and
+maintainence. For a data analytical or artificial intelligence solution, 
+models of different purposes can be trained and deployed into different
+containers whereever needed. 
+
+The following sub-section will talk about how to create shiny applications
+for the accelerlator and then containerize them.
+
+#### Shiny + Docker container
+
+R Shiny can be run on either a local host or a server where Shiny Server is
+installed. 
+
+There is also a [Shiny Server Docker image](https://hub.docker.com/r/rocker/shiny/) available, which makes it easy
+for containerizing Shiny applications. The Dockerfile for the Shiny Server is 
+built based on the `r-base` image and is shown as follows.
+
+```
+FROM r-base:latest
+
+MAINTAINER Winston Chang "winston@rstudio.com"
+
+# Install dependencies and Download and install shiny server
+RUN apt-get update && apt-get install -y -t unstable \
+    sudo \
+    gdebi-core \
+    pandoc \
+    pandoc-citeproc \
+    libcurl4-gnutls-dev \
+    libcairo2-dev/unstable \
+    libxt-dev && \
+    wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
+    VERSION=$(cat version.txt)  && \
+    wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
+    gdebi -n ss-latest.deb && \
+    rm -f version.txt ss-latest.deb && \
+    R -e "install.packages(c('shiny', 'rmarkdown'), repos='https://cran.rstudio.com/')" && \
+    cp -R /usr/local/lib/R/site-library/shiny/examples/* /srv/shiny-server/ && \
+    rm -rf /var/lib/apt/lists/*
+
+EXPOSE 3838
+
+COPY shiny-server.sh /usr/bin/shiny-server.sh
+
+CMD ["/usr/bin/shiny-server.sh"]
+```
+
+A Docker image can be built by using the Dockerfile with
+
+```
+docker build -t <image_name> <path_to_the_dockerfile>
+```
+and run with
+
+```
+docker run --rm -p 3838:3838 <image_name>
+```
+
+The Shiny application can be then accessed in a web browser via address "http://localhost:3838" (if it is run on a local host machine) or "http://<ip_address_of_shiny_server:3838".
+
+### Container orchestration
+
+When there are more than one application or service needed in the whole 
+pipeline, orchestration of multiple containers becomes useful.
+
+There are multiple ways of orchestrating containers, and the three most
+representative approaches are [Kubernetes](https://kubernetes.io/), [Docker Swarm](https://docs.docker.com/engine/swarm/), and [DC/OS](https://dcos.io/).
+
+Comparison between these orchestration methods is beyond the scope of this 
+tutorial. In the following sections, it will be shown how to deploy multiple
+Shiny applications on a Kubernetes cluster.
+
+#### Azure Container Service
+
+[Azure Container Service](https://azure.microsoft.com/en-us/services/container-service/) is a cloud-based service on Azure, which simplifies the configuration 
+for orchestrating containers with various orchestration methods such as 
+Kubernetes, Docker Swarm, and DC/OS. Azure Container Service offers optimized
+configuration of these orchestration tools and technologies for Azure. In 
+deployment of the orchestration cluster, it is allowed to set VM size, number
+of hosts, etc., for scalability, load capacity, cost efficiency, etc.
+
+#### Deployment of multiple Shiny applications with Azure Container Service
+
+The following illustrates how to deploy two Shiny applications derived from 
+the employee attrition prediction accelerator with Azure Container Service.
+
+While there may be more sophisticated architecture in real-world application,
+the demonstration here merely exhibits a how-to on setting up the environment.
+
+The two Shiny applications are for (simple) data exploration and model creation,
+respectively. The two applications are built on top of two individual images.
+Both obtain data from a Azure Storage blob, where data is persistently
+preserved. This enables the real-world scenario where R-user data scientists and
+data analysts are working within the same infrascture but tasks for each can be 
+de-coupled loosely.
+
+The whole architecture is depicted as follows.
+
+##### Step 1 - Create Docker images
+
+Both of the images are created based on the rocker/shiny image.
+
+* Data exploration image
+
+```
+FROM r-base:latest
+
+MAINTAINER Le Zhang "zhle@microsoft.com"
+
+RUN apt-get update && apt-get install -y -t unstable \
+    sudo \
+    gdebi-core \
+    pandoc \
+    pandoc-citeproc \
+    libcurl4-gnutls-dev \
+    libcairo2-dev/unstable \
+    libxt-dev \
+    libssl-dev 
+
+# Download and install shiny server
+
+RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
+    VERSION=$(cat version.txt)  && \
+    wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
+    gdebi -n ss-latest.deb && \
+    rm -f version.txt ss-latest.deb
+
+RUN R -e "install.packages(c('shiny', 'ggplot2', 'dplyr', 'magrittr', 'markdown'), repos='http://cran.rstudio.com/')" 
+
+COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
+COPY /myapp /srv/shiny-server/
+
+EXPOSE 3838
+
+COPY shiny-server.sh /usr/bin/shiny-server.sh
+
+RUN chmod +x /usr/bin/shiny-server.sh
+
+CMD ["/usr/bin/shiny-server.sh"
+```
+* Model creation image
+
+```
+FROM r-base:latest
+
+MAINTAINER Le Zhang "zhle@microsoft.com"
+
+RUN apt-get update && apt-get install -y -t unstable \
+    sudo \
+    gdebi-core \
+    pandoc \
+    pandoc-citeproc \
+    libcurl4-gnutls-dev \
+    libcairo2-dev/unstable \
+    libxt-dev \
+    libssl-dev 
+
+# Download and install shiny server
+
+RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
+    VERSION=$(cat version.txt)  && \
+    wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
+    gdebi -n ss-latest.deb && \
+    rm -f version.txt ss-latest.deb
+
+RUN R -e "install.packages(c('shiny', 'ggplot2', 'dplyr', 'magrittr', 'caret', 'caretEnsemble', 'kernlab', 'randomForest', 'xgboost', 'DT'), repos='http://cran.rstudio.com/')" 
+
+COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
+COPY /myapp /srv/shiny-server/
+
+EXPOSE 3838
+
+COPY shiny-server.sh /usr/bin/shiny-server.sh
+
+RUN chmod +x /usr/bin/shiny-server.sh
+
+# Download pre-trained model
+
+RUN wget --no-verbose https://zhledata.blob.core.windows.net/employee/model.RData -O "/srv/shiny-server/model.RData"
+
+CMD ["/usr/bin/shiny-server.sh"
+```
+All of the layers are the same as those in the original rocker/shiny image, except for installation of additional R packages and their required 
+run time libraries (e.g., caretEnsemble, xgboost, etc.).
+
+Docker images can be built similar to the rocker/shiny image. After the images
+are built, they can be pushed onto a public repository such as on [Dockerhub](https://hub.docker.com/) or a private repository on [Azure Container Registry](https://azure.microsoft.com/en-us/services/container-registry/).
+
+The following shows how to do that with Dockerhub.
+
+1. Build the image.
+```
+docker build -t <name_of_image> <path_to_dockerfile>
+```
+2. Tag the image. 
+```
+docker tag <name_of_image> <dockerhub_account_name>/<name_of_repo>
+```
+3. Login with Dockerhub.
+```
+docker login
+```
+4. Push image onto Dockerhub repository.
+```
+docker push <dockerhub_account_name>/<name_of_repo>
+```
+
+In this case, both of the two images are pushed on Dockerhub.
+
+##### Step 2 - Create Azure Container Service
+
+Creation of Azure Container Service can be achieved with either Azure portal or
+Azure Command-Line Interface (CLI). 
+
+The following shows how to create a Kubernetes type orchestrator in a specified
+resource group with Azure CLI (installation of Azure CLI can be found [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest).
+
+1. Login with Azure subscription.
+```
+az login
+```
+2. Create a resource group where the Azure Container Service cluster resides.
+```
+az group create --name=<resource_group> --location=<location>
+```
+3. Create an Azure Container Service with Kubernetes orchestrator. The 
+cluster is made of one master node and three agent nodes. Name of 
+cluster, DNS prefix, and authentication private key can also be specified as
+requested.  
+```
+az acs create --orchestrator-type=kubernetes --resource-group <resource_group> --name=<cluster_name> --dns-prefix=<dns_prefix> --ssh-key-value ~/.ssh/id_rsa.pub --admin-username=<user_name> --master-count=1 --agent-count=2 --agent-vm-size=<vm_size>
+```
+##### Step 3 - Deploy Shiny applications on the Azure Container Service
+
+The status of Azure Container Service deployment can be checked in Azure portal.
+Once it is successfully done, there will be the resources listed in the resource
+group.
+
+In this tutorial, there are two Shiny applications hosted on the cluster. For 
+simplicity reason, these two applications do not have dependency on each other,
+so they are deployed independently and exposed as invidual service.
+
+The deployment is done with [Kubernetes command line tool](https://kubernetes.io/docs/tasks/tools/install-kubectl/), which can be installed on the local machine.
+
+kubectl should be configured properly in order to communicate with the remote
+Kubernetes cluster. This can be done via copy the `config` file located at
+`~/.kube` on master node of the Kubernetes cluster to `~/.kube/` of the local
+machine.
+
+Each of the two applications can be deployed individually as follows.
+```
+kubectl run <name_of_deployment> --image <dockerhub_account_name>/<name_of_repo>
+--port=3838 --replicas=3
+```
+The deployment can be exposed as web-based service by the following command:
+```
+kubectl expose deployments <name_of_deployment> --port=3838 --type=LoadBalancer
+```
+Status of the deployment and service exposure can be monitored by 
+```
+kubectl get deployments
+```
+and 
+```
+kubectl get services
+```
+respectively.
+
+The deployment and exposure of service can be put together into a yaml file for
+convenience of operation.
+```
+apiVersion: apps/v1beta1
+kind: Deployment
+metadata:
+  name: <name_of_model_app>
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: <name_of_model_app>
+    spec:
+      containers:
+      - name: <name_of_model_app>
+        image: <dockerhub_account_name>/<name_of_model_app> 
+        ports:
+        - containerPort: 3838
+        resources:
+          requests:
+            cpu: 250m
+          limits:
+            cpu: 500m
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: <name_of_model_app>
+spec:
+  type: LoadBalancer
+  ports:
+  - port: 3838
+  selector:
+    app: <name_of_model_app>
+---
+apiVersion: apps/v1beta1
+kind: Deployment
+metadata:
+  name: <name_of_data_app>
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: <name_of_data_app>
+    spec:
+      containers:
+      - name: <name_of_data_app>
+        image: <dockerhub_account_name>/<name_of_data_app> 
+        ports:
+        - containerPort: 3030
+        resources:
+          requests:
+            cpu: 250m
+          limits:
+            cpu: 500m
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: <name_of_data_app>
+spec:
+  type: LoadBalancer
+  ports:
+  - port: 3030
+  selector:
+    app: <name_of_data_app>
+```
+The deployment and service can then be created simply by 
+```
+kubectl create -f <path_to_the_yaml_file>
+```
+
+##### Step 4 - Test the deployed Shiny applications
+
+Once the deployment is finished, public IP address and port number of the 
+exposed service can be checked with `kubectl get service --watch`. In the 
+deployment process, external IP addresses of the exposed services will show
+"<pending>". It usually takes a while to finish depending on the size of the 
+image and capability of cluster.
+
+The deployed Shiny application service can be accessed from web browser via the
+public IP address with corresponding port number.
+
+The following snapshots show the deployed Shiny apps. 
+
+The readers can find Dockerfile as well as Shiny R codes in the directories. 
+Images built based on them are pre-published on Dockerhub - `yueguoguo/hrdata`
+and `yueguoguo/hrmodel`, corresponding to the data exploration application and
+model creation application, respectively. These images are ready for testing
+on a deployed Kubernetes typed Azure Container Service cluster.
--- a/EmployeeAttritionPrediction/Code/EmployeeAttritionPredictionOperationalization.html
+++ b/EmployeeAttritionPrediction/Code/EmployeeAttritionPredictionOperationalization.html
--- a/EmployeeAttritionPrediction/Code/hr-all-in-one.yml
+++ b/EmployeeAttritionPrediction/Code/hr-all-in-one.yml
@ -0,0 +1,65 @@
+apiVersion: apps/v1beta1
+kind: Deployment
+metadata:
+  name: hrmodel
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: hrmodel
+    spec:
+      containers:
+      - name: hrmodel
+        image: yueguoguo/hrmodel 
+        ports:
+        - containerPort: 3838
+        resources:
+          requests:
+            cpu: 250m
+          limits:
+            cpu: 500m
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: hrmodel
+spec:
+  type: LoadBalancer
+  ports:
+  - port: 3838
+  selector:
+    app: hrmodel
+---
+apiVersion: apps/v1beta1
+kind: Deployment
+metadata:
+  name: hrdata
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: hrdata
+    spec:
+      containers:
+      - name: hrdata
+        image: yueguoguo/hrdata 
+        ports:
+        - containerPort: 3030
+        resources:
+          requests:
+            cpu: 250m
+          limits:
+            cpu: 500m
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: hrdata
+spec:
+  type: LoadBalancer
+  ports:
+  - port: 3030
+  selector:
+    app: hrdata
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/Dockerfile
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/Dockerfile
@ -0,0 +1,34 @@
+FROM r-base:latest
+
+MAINTAINER Le Zhang "zhle@microsoft.com"
+
+RUN apt-get update && apt-get install -y -t unstable \
+    sudo \
+    gdebi-core \
+    pandoc \
+    pandoc-citeproc \
+    libcurl4-gnutls-dev \
+    libcairo2-dev/unstable \
+    libxt-dev \
+    libssl-dev 
+
+# Download and install shiny server
+
+RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
+    VERSION=$(cat version.txt)  && \
+    wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
+    gdebi -n ss-latest.deb && \
+    rm -f version.txt ss-latest.deb
+
+RUN R -e "install.packages(c('shiny', 'ggplot2', 'dplyr', 'magrittr', 'markdown', 'DT', 'scales'), repos='http://cran.rstudio.com/')" 
+
+COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
+COPY /myapp /srv/shiny-server/
+
+EXPOSE 3030
+
+COPY shiny-server.sh /usr/bin/shiny-server.sh
+
+RUN chmod +x /usr/bin/shiny-server.sh
+
+CMD ["/usr/bin/shiny-server.sh"]
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/about.md
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/about.md
@ -0,0 +1,30 @@
+---
+title: "about"
+author: "Le Zhang"
+date: "August 24, 2017"
+output: html_document
+---
+
+### Employee Attrition Prediction
+
+This is a demonstration on a case study of employee attrition prediction. 
+Data science and machine learning development process often consists of multiple
+steps. Containerizing each of the steps help modularize the whole process and 
+thus making it easier for DevOps.
+
+For simplicity reason, the demo process is merely composed of two steps, which are
+data exploration and model creation. 
+
+This web-based app is to show how to do simple data exploration graphically on
+the HR data set. 
+
+#### R accelerator
+
+The end-to-end tutorial of the R based template for data processing, model 
+training, etc. (we call it "acceleratoR") can be found [here](https://github.com/Microsoft/acceleratoRs/blob/master/EmployeeAttritionPrediction).
+
+#### Operationalization
+
+Operationalization of the case on Azure cloud (i.e., data exploration, model creation, 
+model management, model deployment, etc.) with [Azure Data Science VM](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-provision-vm) 
+[Azure Storage](https://azure.microsoft.com/en-us/services/storage/), [Azure Container Service](https://azure.microsoft.com/en-us/services/container-service/), etc., can be found [here](https://github.com/Microsoft/acceleratoRs/blob/master/EmployeeAttritionPrediction).
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/global.R
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/global.R
@ -0,0 +1,32 @@
+# ------------------------------------------------------------------------------
+# R packages needed for the analytics.
+# ------------------------------------------------------------------------------
+
+library(shiny)
+library(dplyr)
+library(magrittr)
+library(ggplot2)
+library(markdown)
+library(scales)
+
+# ------------------------------------------------------------------------------
+# Global variables.
+# ------------------------------------------------------------------------------
+
+data_url  <- "https://zhledata.blob.core.windows.net/employee/DataSet1.csv"
+
+# ------------------------------------------------------------------------------
+# Functions.
+# ------------------------------------------------------------------------------
+
+# Load HR demographic data.
+
+loadData <- function() {
+  df <- read.csv(data_url) 
+  
+  return(df)
+}
+
+# Load HR data and pre-trained model.
+  
+df_hr <- loadData()
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/server.R
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/server.R
@ -0,0 +1,103 @@
+source("global.R")
+
+# The actual shiny server function.
+
+shinyServer(function(input, output) {
+  
+
+  # Plot a table of the HR data.
+  
+  output$hrtable <- DT::renderDataTable({
+    DT::datatable(df_hr[, input$show_vars, drop=FALSE])
+  })
+  
+  # Downloadable csv of selected dataset. 
+  
+  output$downloadData <- downloadHandler(
+    filename = function() {
+      paste(input$dataset, ".csv", sep = "")
+    },
+    content = function(file) {
+      write.csv(datasetInput(), file, row.names = FALSE)
+    }
+  )
+  
+  # Plot some general summary statistics for those who are predicted attrition.
+  
+  output$plot3 <- renderPlot({
+    if (identical(input$att_vars, "Yes")) {
+      df_hr %<>% filter(as.character(Attrition) == "Yes") 
+    } else if (identical(input$att_vars, "No")) {
+      df_hr %<>% filter(as.character(Attrition) == "No") 
+    } else if (identical(input$att_vars, c("Yes", "No"))) {
+      df_hr
+    } else {
+      df_hr <- df_hr[0, ]
+    }
+    
+    df_hr <- filter(df_hr, JobRole %in% input$disc_vars)
+    
+    ggplot(df_hr, aes(JobRole, fill=Attrition)) +
+      geom_bar(aes(y=(..count..)/sum(..count..)), 
+               position="dodge",
+               alpha=0.6) +
+      scale_y_continuous(labels=percent) +
+      xlab(input$disc_vars) +
+      ylab("Percentage") +
+      theme_bw() +
+      ggtitle(paste("Count for", input$disc_vars))
+  })
+  
+  output$plot <- renderPlot({
+    if (identical(input$att_vars, "Yes")) {
+      df_hr %<>% filter(as.character(Attrition) == "Yes") 
+    } else if (identical(input$att_vars, "No")) {
+      df_hr %<>% filter(as.character(Attrition) == "No") 
+    } else if (identical(input$att_vars, c("Yes", "No"))) {
+      df_hr
+    } else {
+      df_hr <- df_hr[0, ]
+    }
+    
+    df_hr_final <- select(df_hr, one_of("Attrition", input$plot_vars))
+    
+    ggplot(df_hr_final, 
+           aes_string(input$plot_vars, 
+                      color="Attrition",
+                      fill="Attrition")) +
+      geom_density(alpha=0.2) +
+      theme_bw() +
+      xlab(input$plot_vars) +
+      ylab("Density") +
+      ggtitle(paste("Estimated density for", input$plot_vars))
+  })
+  
+  # Monthly income, service year, etc.
+  
+  output$plot2 <- renderPlot({
+    if (identical(input$att_vars, "Yes")) {
+      df_hr %<>% filter(as.character(Attrition) == "Yes") 
+    } else if (identical(input$att_vars, "No")) {
+      df_hr %<>% filter(as.character(Attrition) == "No") 
+    } else if (identical(input$att_vars, c("Yes", "No"))) {
+      df_hr
+    } else {
+      df_hr <- df_hr[0, ]
+    }
+    
+    df_hr <- filter(df_hr, 
+                    YearsAtCompany >= input$years_service[1] &
+                      YearsAtCompany <= input$years_service[2] &
+                      JobLevel < input$job_level &
+                      JobRole %in% input$job_roles)
+    
+    ggplot(df_hr,
+           aes(x=factor(JobRole), y=MonthlyIncome, color=factor(Attrition))) +
+      geom_boxplot() +
+      xlab("Job Role") +
+      ylab("Monthly income") +
+      scale_fill_discrete(guide=guide_legend(title="Attrition")) +
+      theme_bw() +
+      theme(text=element_text(size=13), legend.position="top")
+  })
+})
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/ui.R
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/myapp/ui.R
@ -0,0 +1,109 @@
+source("global.R")
+
+navbarPage(
+  "HR Analytics - data exploration",
+  tabPanel(
+    "About",
+    fluidRow(
+      column(3, includeMarkdown("about.md")),
+      column(
+        6,
+        img(class="img-polaroid",
+            src=paste0("https://careers.microsoft.com/content/images/services/HomePage_Hero1_Tim.jpg"))
+      )
+    )
+  ),
+  tabPanel(
+    "Data",
+    sidebarLayout(
+      sidebarPanel(
+        # Variables to select for displayed demographic data.
+        
+        checkboxGroupInput(
+          "show_vars", 
+          "Columns in HR data set to show:",
+          names(df_hr),
+          selected=names(df_hr)
+        ),
+        
+        # Button
+        downloadButton("hrData", "Download")
+      ),
+      
+      mainPanel(
+        tabsetPanel(
+          id="dataset",
+          tabPanel("HR Demographic data", DT::dataTableOutput("hrtable"))
+        )
+      )
+    )
+  ),
+  tabPanel(
+    "Plot",
+    
+    h4("Select employees of attrition or non-attrition to visualize."),
+    
+    checkboxGroupInput(
+      "att_vars",
+      "Attrition or not:",
+      c("Yes", "No"),
+      selected=c("Yes", "No")),
+    
+    fluidRow(
+      column(
+        4, 
+        h4("Count of discrete variable."),
+        plotOutput("plot3"),
+        
+        checkboxGroupInput(
+          "disc_vars",
+          "Job roles:",
+          unique(df_hr$JobRole),
+          selected=unique(df_hr$JobRole)[1:5])
+      ),
+      
+      column(
+        4, 
+        h4("Distribution of continuous variable."),
+        plotOutput("plot"),
+        
+        selectInput(
+          "plot_vars",
+          "Variable to visualize:",
+          names(select_if(df_hr, is.integer)),
+          selected=names(select_if(df_hr, is.integer)))
+      ),
+      
+      column(
+        4, 
+        h4("Comparison on certain factors."),
+        plotOutput("plot2"),
+        
+        # Years of service.
+
+        sliderInput(
+          "years_service",
+          "Years of service:",
+          min=1,
+          max=40,
+          value=c(2, 5)),
+        
+        # Job level.
+        
+        sliderInput(
+          "job_level",
+          "Job level:",
+          min=1,
+          max=5,
+          value=3
+        ),
+        
+        checkboxGroupInput(
+          "job_roles",
+          "Job roles:",
+          unique(df_hr$JobRole),
+          selected=unique(df_hr$JobRole)[1:5])
+      )
+    )
+  )
+)
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/shiny-server.conf
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/shiny-server.conf
@ -0,0 +1,26 @@
+# Define the user we should use when spawning R Shiny processes
+run_as shiny;
+
+# This will show screen shot of errors in docker bash.
+sanitize_errors off;
+
+# Define a top-level server which will listen on a port
+server {
+  # Instruct this server to listen on port 80. The app at dokku-alt need expose PORT 80, or 500 e etc. See the docs
+  listen 3030;
+
+  # Define the location available at the base URL
+  location / {
+
+    # Run this location in 'site_dir' mode, which hosts the entire directory
+    # tree at '/srv/shiny-server'
+    site_dir /srv/shiny-server;
+    
+    # Define where we should put the log files for this location
+    log_dir /var/log/shiny-server;
+    
+    # Should we list the contents of a (non-Shiny-App) directory when the user 
+    # visits the corresponding URL?
+    directory_index on;
+  }
+}
--- a/EmployeeAttritionPrediction/Code/hrDataExploration/shiny-server.sh
+++ b/EmployeeAttritionPrediction/Code/hrDataExploration/shiny-server.sh
@ -0,0 +1,7 @@
+#!/bin/sh
+
+# Make sure the directory for individual app logs exists
+mkdir -p /var/log/shiny-server
+chown shiny.shiny /var/log/shiny-server
+
+exec shiny-server >> /var/log/shiny-server.log 2>&1
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/Dockerfile
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/Dockerfile
@ -0,0 +1,37 @@
+FROM r-base:latest
+
+MAINTAINER Le Zhang "zhle@microsoft.com"
+
+RUN apt-get update && apt-get install -y -t unstable \
+    sudo \
+    gdebi-core \
+    pandoc \
+    pandoc-citeproc \
+    libcurl4-gnutls-dev \
+    libcairo2-dev/unstable \
+    libxt-dev \
+    libssl-dev \
+    libxml2-dev 
+
+# Download and install shiny server
+
+RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
+    VERSION=$(cat version.txt)  && \
+    wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
+    gdebi -n ss-latest.deb && \
+    rm -f version.txt ss-latest.deb
+
+RUN R -e "install.packages(c('shiny', 'ggplot2', 'dplyr', 'magrittr', 'caret', 'caretEnsemble', 'kernlab', 'randomForest', 'xgboost', 'DT', 'DMwR', 'markdown', 'mlbench', 'devtools', 'XML', 'gridSVG', 'pROC', 'plotROC', 'scales'), repos='http://cran.rstudio.com/')" 
+
+RUN R -e "library(devtools);devtools::install_github('sachsmc/plotROC')"
+
+COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
+COPY /myapp /srv/shiny-server/
+
+EXPOSE 3838
+
+COPY shiny-server.sh /usr/bin/shiny-server.sh
+
+RUN chmod +x /usr/bin/shiny-server.sh
+
+CMD ["/usr/bin/shiny-server.sh"]
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/about.md
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/about.md
@ -0,0 +1,32 @@
+---
+title: "about"
+author: "Le Zhang"
+date: "August 24, 2017"
+output: html_document
+---
+
+### Employee Attrition Prediction
+
+This is a demonstration on a case study of employee attrition prediction. 
+Data science and machine learning development process often consists of multiple
+steps. Containerizing each of the steps help modularize the whole process and 
+thus making it easier for DevOps.
+
+For simplicity reason, the demo process is merely composed of two steps, which are
+data exploration and model creation. 
+
+This web-based app is to show how to create a model on the data. The training
+candidature algorithms include Support Vector Machine (SVM), Random Forest, and
+Extreme Gradient Boosting (XGBoost). For illustration purpose, only a few high-
+level parameters are allowed to set.
+
+#### R accelerator
+
+The end-to-end tutorial of the R based template for data processing, model 
+training, etc. (we call it "acceleratoR") can be found [here](https://github.com/Microsoft/acceleratoRs/blob/master/EmployeeAttritionPrediction).
+
+#### Operationalization
+
+Operationalization of the case on Azure cloud (i.e., data exploration, model creation, 
+model management, model deployment, etc.) with [Azure Data Science VM](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-provision-vm) 
+[Azure Storage](https://azure.microsoft.com/en-us/services/storage/), [Azure Container Service](https://azure.microsoft.com/en-us/services/container-service/), etc., can be found [here](https://github.com/Microsoft/acceleratoRs/blob/master/EmployeeAttritionPrediction).
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/global.R
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/global.R
@ -0,0 +1,142 @@
+# ------------------------------------------------------------------------------
+# R packages needed for the analytics.
+# ------------------------------------------------------------------------------
+
+library(caret)
+library(caretEnsemble)
+library(DMwR)
+library(dplyr)
+library(ggplot2)
+library(markdown)
+library(magrittr)
+library(mlbench)
+library(pROC)
+library(plotROC)
+library(shiny)
+
+# ------------------------------------------------------------------------------
+# Global variables.
+# ------------------------------------------------------------------------------
+
+data_url  <- "https://zhledata.blob.core.windows.net/employee/DataSet1.csv"
+
+# ------------------------------------------------------------------------------
+# Functions.
+# ------------------------------------------------------------------------------
+
+# Load HR demographic data.
+
+loadData <- function() {
+  df <- read.csv(data_url)
+  
+  return(df)
+}
+
+
+# Process data - the same data processing steps apply on the data.
+
+processData <- function(data) {
+  
+  # 1. Remove zero-variance variables.
+  
+  pred_no_var <- c("EmployeeCount", "StandardHours")
+  data %<>% select(-one_of(pred_no_var))
+  
+  # 2. Convert Integer to Factor type of data.
+  
+  int_2_ftr_vars <- c("Education", 
+                      "EnvironmentSatisfaction", 
+                      "JobInvolvement", 
+                      "JobLevel", 
+                      "JobSatisfaction", 
+                      "NumCompaniesWorked", 
+                      "PerformanceRating", 
+                      "RelationshipSatisfaction", 
+                      "StockOptionLevel")
+  data[, int_2_ftr_vars] <- lapply((data[, int_2_ftr_vars]), as.factor)
+  
+  # 3. Keep the most salient variables. 
+  
+  least_important_vars <- c("Department", "Gender", "PerformanceRating")
+  data %<>% select(-one_of(least_important_vars))
+  
+  return(data)
+} 
+
+# Data split.
+
+splitData <- function(data, ratio) {
+  if (!("Attrition" %in% names(data))) 
+      stop("No label found in data set.")
+  
+  train_index <- 
+    createDataPartition(data$Attrition,
+                        times=1,
+                        p=ratio / 100) %>%
+    unlist()
+  
+  data_train <- data[train_index, ]
+  data_test  <- data[-train_index, ]
+  
+  data_split <- list(train=data_train, test=data_test)
+  
+  return(data_split)
+}
+
+# Model training.
+
+trainModel <- function(data, 
+                       smote_over,
+                       smote_under,
+                       method="boot",
+                       number=3,
+                       repeats=3,
+                       search="grid",
+                       algorithm="rf") {
+  
+  # If the training set is imbalanced, SMOTE will be applied.
+  
+  data %<>% as.data.frame()
+  
+  data <- SMOTE(Attrition ~ .,
+                    data,
+                    perc.over=smote_over,
+                    perc.under=smote_under)
+  
+  # Train control.
+  
+  tc <- trainControl(method=method, 
+                     number=number, 
+                     repeats=repeats, 
+                     search="grid",
+                     classProbs=TRUE,
+                     savePredictions="final",
+                     summaryFunction=twoClassSummary)
+  
+  # Model training.
+  
+  model <- train(Attrition ~ .,
+                 data,
+                 method=algorithm,
+                 trControl=tc)
+  
+  return(model)
+}
+
+# Function for predicting attrition based on demographic data.
+
+inference <- function(model, data) {
+  if ("Attrition" %in% names(data)) {
+    data %<>% select(-Attrition)
+  }
+  
+  labels <- predict(model, newdata=data, type="prob")
+  
+  return(labels)
+}
+
+# Load and pre-process HR data.
+  
+df_hr <- 
+  loadData() %>%
+  processData()
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/server.R
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/server.R
@ -0,0 +1,127 @@
+source("global.R")
+
+# The actual shiny server function.
+
+shinyServer(function(input, output) {
+  
+  # Training and testing data.
+  
+  dataSplit <- reactive({
+    df <- splitData(df_hr, input$ratio)
+    
+    df
+  })
+  
+  # Train a reactive model.
+  
+  modelTrained <- eventReactive(input$goButton, {
+    df <- dataSplit()
+    df_train <- df$train
+    df_test  <- df$test
+    
+    if (input$algorithm == "SVM") {
+      method <- "svmRadial"
+    } else if (input$algorithm == "Random Forest") {
+      method <- "rf"
+    } else {
+      method <- "xgbLinear"
+    }
+    
+    model <- trainModel(data=df_train, 
+                        smote_over=input$smoteOver,
+                        smote_under=input$smoteDown,
+                        method="boot",
+                        number=input$number,
+                        repeats=input$repeats,
+                        search="grid",
+                        algorithm=method)
+    
+    model
+  })
+  
+  # Print summary of data set.
+  
+  output$summary <- renderPrint({
+    df <- dataSplit()
+
+    # str(df$train)
+    
+    table(df$train$Attrition)
+  })
+  
+  # Print table of training data set.
+
+  output$dataTrain <- DT::renderDataTable({
+    df <- dataSplit()
+    
+    DT::datatable(df$train)
+  })
+  
+  # Plot some general summary statistics for those who are predicted attrition.
+  
+  output$plot <- renderPlot({
+    
+    df <- dataSplit()
+    df_test <- df$test
+    
+    # Train a model
+    
+    model <- modelTrained()
+    
+    # Use the model for inference on testing data.
+    
+    results <- inference(model, data=df_test)
+    results <- mutate(results, label=df_test$Attrition)
+    
+    # Plot the ROC curve.
+    
+    basic_plot <- 
+      ggplot(results, 
+             aes(m=Yes, d=factor(label, levels=c("No", "Yes")))) +
+      geom_roc(n.cuts=0)
+    
+    basic_plot +
+      style_roc(theme=theme_grey) +
+      theme(axis.text=element_text(colour="blue")) +
+      # annotate("text",
+      #          x=.75,
+      #          y=.25,
+      #          label=paste("AUC =", round(calc_auc(basic_plot)$AUC, 2))) +
+      ggtitle("Plot of ROC curve") +
+      scale_x_continuous("1 - Specificity", breaks = seq(0, 1, by = .1))
+  })
+  
+  output$auc <- renderPrint({
+    
+    df <- dataSplit()
+    df_test <- df$test
+    
+    # Train a model
+    
+    model <- modelTrained()
+    
+    # Use the model for inference on testing data.
+    
+    results <- inference(model, data=df_test)
+    results <- mutate(results, label=df_test$Attrition)
+    
+    basic_plot <- 
+      ggplot(results, 
+             aes(m=Yes, d=factor(label, levels=c("No", "Yes")))) +
+      geom_roc(n.cuts=0)
+    
+    sprintf("AUC of the ROC curve is %f", round(calc_auc(basic_plot)$AUC, 2))
+  })
+  
+  # # Export the trained model.
+  # 
+  # output$downloadModel <- downloadHandler(
+  #   filename = function() {
+  #     paste(input$algorithm, "_model", ".rds", sep="")
+  #   },
+  #   
+  #   content = function(file) {
+  #     saveRDS(model, file)
+  #   }
+  # )
+})
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/ui.R
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/myapp/ui.R
@ -0,0 +1,109 @@
+source("global.R")
+
+navbarPage(
+  "HR Analytics - model creation",
+  tabPanel(
+    "About",
+    fluidRow(
+      column(3, includeMarkdown("about.md")),
+      column(6, img(
+        class="img-polaroid",
+        src=paste0("https://careers.microsoft.com/content/images/services/HomePage_Hero1_Tim.jpg"))
+      )
+    )
+  ),
+  
+  tabPanel(
+    "Model",
+    sidebarLayout(
+      sidebarPanel(
+        # Split ratio for training/testing data.
+        
+        sliderInput(inputId="ratio",
+                    label="Split ratio (%) for training data.",
+                    min=0,
+                    max=100,
+                    value=70),
+        
+        # SMOTE upsampling percentage.
+        
+        p("SMOTE is used for balancing data set"),
+        
+        numericInput(inputId="smoteOver",
+                     label="Upsampling percentage in SMOTE for minority class.",
+                     value=300),
+        
+        # SMOTE downsampling percentage.
+        
+        numericInput(inputId="smoteDown",
+                     label="Downsampling percentage in SMOTE for majority class.",
+                     value=150),
+        
+        # Repeats in train control.
+        
+        p("High-level control for cross-validation in training the model."),
+        
+        numericInput(inputId="repeats",
+                     label="Number of repeats for a k-fold cross-validation.", 
+                     min=1,
+                     max=3,
+                     value=1),
+        
+        # Number of cross-validations in train control.
+        
+        numericInput(inputId="number",
+                     label="Number of folds in cross-validation.",
+                     min=2,
+                     max=5,
+                     value=1),
+        
+        # Algorithm for use.
+        
+        selectInput(inputId="algorithm",
+                    label="Machine learning algorithm to use for training a model:",
+                    choices=c("SVM", "Random Forest", "XGBoost")),
+        
+        # Train model
+        
+        p("Click the button to train a model with the above settings (it may 
+          take some time depending on algorithm used for training). After the
+          training process, a ROC curve which evaluates model performance on 
+          the testing data set is plotted."),
+        
+        actionButton("goButton", "Train")
+        
+        # # Export model
+        # 
+        # p("Export the trained model"),
+        # 
+        # downloadButton("downloadModel", 
+        #                "Download")
+      ),
+      
+      mainPanel(
+        
+        # Summary of the training data set.
+        
+        p("It should be noted that the data set is not balanced, which may 
+          negatively impact model training if no balancing technique is
+          applied."),
+        
+        verbatimTextOutput("summary"),
+        
+        # Print table of training data.
+        
+        tabsetPanel(
+          id="dataset",
+          tabPanel("HR Demographic data for training", 
+                   DT::dataTableOutput("dataTrain"))
+        ),
+        
+        # Plot the model validation results.
+        
+        plotOutput("plot"),
+        
+        verbatimTextOutput("auc")
+      )
+    )
+  )
+)
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/shiny-server.conf
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/shiny-server.conf
@ -0,0 +1,26 @@
+# Define the user we should use when spawning R Shiny processes
+run_as shiny;
+
+# This will show screen shot of errors in docker bash.
+sanitize_errors off;
+
+# Define a top-level server which will listen on a port
+server {
+  # Instruct this server to listen on port 80. The app at dokku-alt need expose PORT 80, or 500 e etc. See the docs
+  listen 3838;
+
+  # Define the location available at the base URL
+  location / {
+
+    # Run this location in 'site_dir' mode, which hosts the entire directory
+    # tree at '/srv/shiny-server'
+    site_dir /srv/shiny-server;
+    
+    # Define where we should put the log files for this location
+    log_dir /var/log/shiny-server;
+    
+    # Should we list the contents of a (non-Shiny-App) directory when the user 
+    # visits the corresponding URL?
+    directory_index on;
+  }
+}
--- a/EmployeeAttritionPrediction/Code/hrModelCreation/shiny-server.sh
+++ b/EmployeeAttritionPrediction/Code/hrModelCreation/shiny-server.sh
@ -0,0 +1,7 @@
+#!/bin/sh
+
+# Make sure the directory for individual app logs exists
+mkdir -p /var/log/shiny-server
+chown shiny.shiny /var/log/shiny-server
+
+exec shiny-server >> /var/log/shiny-server.log 2>&1
--- a/EmployeeAttritionPrediction/Code/script.sh
+++ b/EmployeeAttritionPrediction/Code/script.sh
@ -0,0 +1,21 @@
+#!/bin/bash
+
+# install R libraries.
+
+sudo mkdir /etc/skel/R
+sudo mkdir /etc/skel/R/lib
+sudo Rscript -e 'library(devtools);library(withr);withr::with_libpaths(new="/etc/skel/R/lib/", install(c("DMwR", "caretEnsemble", "pROC", "jiebaR")));withr::with_libpaths(new="/etc/skel/R/lib/", install_url("https://github.com/yueguoguo/Azure-R-Interface/raw/master/utils/msLanguageR_0.1.0.tar.gz"))'
+
+# Copy /etc/skel to home directory of all users.
+
+USR=$(ls /home | grep user)
+
+for u in ${USR}; do
+  DBASE="/home/$u/"
+  
+  cp -rf /etc/skel/R ${DBASE}/
+done 
+
+# Start the Rstudio Server
+
+rstudio-server start
--- a/EmployeeAttritionPrediction/Docs/Misc/pics/about.png
+++ b/EmployeeAttritionPrediction/Docs/Misc/pics/about.png
--- a/EmployeeAttritionPrediction/Docs/Misc/pics/datavisual.png
+++ b/EmployeeAttritionPrediction/Docs/Misc/pics/datavisual.png
--- a/EmployeeAttritionPrediction/Docs/Misc/pics/model.png
+++ b/EmployeeAttritionPrediction/Docs/Misc/pics/model.png