Merge branch 'master' of github.com:Microsoft/acceleratoRs

2017-05-19 22:23:11 +09:00 · 2017-05-19 22:23:11 +09:00 · 91ec3bea03
--- a/CreditRiskPrediction/Code/CreditRiskDeploy.Rmd
+++ b/CreditRiskPrediction/Code/CreditRiskDeploy.Rmd
@ -0,0 +1,297 @@
+---
+title: "Deploy a Credit Risk Model as a Web Service"
+author: "Fang Zhou, Data Scientist, Microsoft"
+date: "`r Sys.Date()`"
+output: html_document
+---
+
+```{r setup, include=FALSE, purl=FALSE}
+knitr::opts_chunk$set(echo = TRUE,
+                      fig.width = 8,
+                      fig.height = 5,
+                      fig.align='center',
+                      dev = "png")
+```
+
+## 1 Introduction
+
+The `mrsdeploy` package, delivered with Microsoft R Client and R Server, provides functions for:
+
+**1** Establishing a remote session in a R console application for the purposes of executing code on that server
+
+**2** Publishing and managing an R web service that is backed by the R code block or script you provided. 
+
+Each feature can be used independently, but the greatest value is achieved when you can leverage both.
+
+This document will walk through you how to deploy a credit risk model as a web service, using the `mrsdeploy` package.
+
+It will start by modelling locally, then publish it as a web service, and then share it with other authenticated users for consumption, and finally manage and update the web service. 
+
+## 2 Automated Credit Risk Model Deployment
+
+### 2.1 Setup
+
+We load the required R packages.
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+## Setup
+
+# Load the required packages into the R session.
+
+library(rattle)       # Use normVarNames().
+library(dplyr)        # Wrangling: tbl_df(), group_by(), print(), glimpse().
+library(magrittr)     # Pipe operator %>% %<>% %T>% equals().
+library(scales)       # Include commas in numbers.
+library(MicrosoftML)  # Build models using Microsoft ML algortihms.
+library(mrsdeploy)    # Publish an R model as a web service.
+```
+
+Then, the dataset processedSimu is ingested for demonstration. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+## Data Ingestion
+
+# Identify the source location of the dataset.
+
+#DATA <- "../../Data/"
+#txn_fname <- file.path(DATA, "Raw/processedSimu.csv")
+
+wd <- getwd()
+
+dpath <- "../Data"
+data_fname <- file.path(wd, dpath, "processedSimu.csv")
+
+# Ingest the dataset.
+
+data <- read.csv(file=data_fname) %T>% 
+  {dim(.) %>% comma() %>% cat("\n")}
+
+# A glimpse into the data.
+
+glimpse(data)
+```
+
+### 2.2 Model Locally
+
+Now, let's get started to build an R model based web service. 
+
+First of all, we create a machine learning fast tree model on the dataset processedSimu by using the function `rxFastTrees()` from the `MicrosoftML` package. This model could be used to predict whether an account will default or to predict its probability of default, given some transaction statistics and demographic & bank account information as inputs.
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+## Variable roles.
+
+# Target variable
+
+target <- "bad_flag"
+
+# Note any identifier.
+
+id <- c("account_id") %T>% print() 
+
+# Note the available variables as model inputs.
+
+vars <- setdiff(names(data), c(target, id))
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Split Data
+
+set.seed(42)
+
+data <- data[order(runif(nrow(data))), ]
+
+train <- sample(nrow(data), 0.70 * nrow(data))
+test <- setdiff(seq_len(nrow(data)), train)
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Prepare the formula
+
+top_vars <- c("amount_6", "pur_6", "avg_pur_amt_6", "avg_interval_pur_6", "credit_limit", "age", "income", "sex", "education", "marital_status")
+
+form <- as.formula(paste(target, paste(top_vars, collapse="+"), sep="~"))
+form
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Train model: rxFastTrees
+
+model_rxtrees <- rxFastTrees(formula=form,
+                             data=data[train, c(target, vars)],
+                             type="binary",
+                             numTrees=100,
+                             numLeaves=20,
+                             learningRate=0.2,
+                             minSplit=10,
+                             unbalancedSets=FALSE,
+                             verbose=0)
+
+model_rxtrees
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Produce a prediction function that can use the model
+
+creditRiskPrediction <- function(account_id, amount_6, pur_6, avg_pur_amt_6, avg_interval_pur_6, 
+                                 credit_limit, marital_status, sex, education, income, age)
+{ 
+  newdata <- data.frame(account_id=account_id,
+                          amount_6=amount_6, 
+                          pur_6=pur_6, 
+                          avg_pur_amt_6=avg_pur_amt_6, 
+                          avg_interval_pur_6=avg_interval_pur_6, 
+                          credit_limit=credit_limit, 
+                          marital_status=marital_status, 
+                          sex=sex, 
+                          education=education, 
+                          income=income, 
+                          age=age)
+  
+  pred <- rxPredict(modelObject=model_rxtrees, data=newdata)[, c(1, 3)]
+  pred <- cbind(newdata$account_id, pred)
+  names(pred) <- c("account_id", "scored_label", "scored_prob")
+  pred 
+}
+
+# Test function locally by printing results
+
+pred <- creditRiskPrediction(account_id="a_1055521029582310",
+                             amount_6=173.22, 
+                             pur_6=1, 
+                             avg_pur_amt_6=173.22, 
+                             avg_interval_pur_6=0, 
+                             credit_limit=5.26, 
+                             marital_status="married", 
+                             sex="male", 
+                             education="undergraduate", 
+                             income=12.36, 
+                             age=38)
+
+print(pred)
+```
+
+### 2.2 Publish model as a web service
+
+The second procedure is to publish the model as a web service by following the below steps.
+
+Step 1: From your local R IDE, log into Microsoft R Server with your credentials using the appropriate authentication function from the `mrsdeploy` package (remoteLogin or remoteLoginAAD). 
+
+For simplicity, the code below uses the basic local admin account for authentication with the remoteLogin function and `session = false` so that no remote R session is started. 
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Use `remoteLogin` to authenticate with R Server using 
+# the local admin account. Use session = false so no 
+# remote R session started
+
+remoteLogin("http://localhost:12800", 
+         username="admin", 
+         password="P@ssw0rd",
+         session=FALSE)
+```
+
+Now, you are successfully connected to the remote R Server.
+
+Step 2: Publish the model as a web service to R Server using the `publishService()` function from the `mrsdeploy` package. 
+
+In this example, you publish a web service called "crpService" using the model `model_rxtrees` and the function `creditRiskPrediction()`. As an input, the service takes a list of transaction statistics and demographic & bank account information represented as numerical or categorical. As an output, an R data frame including the account id, the predicted label of default, and the probability of default for the given individual account, has of being achieved with the pre-defined credit risk prediction function. 
+
+When publishing, you must specify, among other parameters, a service name and version, the R code, the inputs, as well as the outputs that application developers will need to integrate in their applications. 
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Publish a web service
+
+api <- publishService(
+       "crpService",
+        code=creditRiskPrediction,
+        model=model_rxtrees,
+        inputs=list(account_id="character",
+                    amount_6="numeric", 
+                    pur_6="numeric", 
+                    avg_pur_amt_6="numeric", 
+                    avg_interval_pur_6="numeric", 
+                    credit_limit="numeric", 
+                    marital_status="character", 
+                    sex="character", 
+                    education="character", 
+                    income="numeric", 
+                    age="numeric"),
+        outputs=list(pred="data.frame"),
+        v="v1.0.0")
+```
+
+### 2.3 Test the service by consuming it in R
+
+After publishing it , we can consume the service in R directly to verify that the results are as expected.
+
+```{r, message=FALSE, warning=FALSE, error=TRUE}
+# Get service and assign service to the variable `api`.
+
+api <- getService("crpService", "v1.0.0")
+
+# Consume service by calling function, `creditRiskPrediction` contained in this service
+
+result <- api$creditRiskPrediction(account_id="a_1055521029582310",
+                                   amount_6=173.22, 
+                                   pur_6=1, 
+                                   avg_pur_amt_6=173.22, 
+                                   avg_interval_pur_6=0, 
+                                   credit_limit=5.26, 
+                                   marital_status="married", 
+                                   sex="male", 
+                                   education="undergraduate", 
+                                   income=12.36, 
+                                   age=38)
+
+# Print response output named `answer`
+
+print(result$output("pred")) 
+```
+
+### 2.4 Update the web service
+
+In the process of production, we could manage and update the web service timely.
+
+```{r, message=FALSE, warning=FALSE, error=TRUE}
+# Load the pre-trained optimal model obtained from the template of CreditRiskScale.
+
+load(file="model_rxtrees.RData")
+
+model_rxtrees
+
+api <- updateService(name="crpService", 
+                     v="v1.0.0",
+                     model=model_rxtrees,
+                     descr="Update the model hyper-parameters")
+
+# Re-test the updated service by consuming it
+
+result <- api$creditRiskPrediction(account_id="a_1055521029582310",
+                                   amount_6=173.22, 
+                                   pur_6=1, 
+                                   avg_pur_amt_6=173.22, 
+                                   avg_interval_pur_6=0, 
+                                   credit_limit=5.26, 
+                                   marital_status="married", 
+                                   sex="male", 
+                                   education="undergraduate", 
+                                   income=12.36, 
+                                   age=38)
+
+# Print response output named `answer`
+
+print(result$output("pred")) 
+```
+
+### 2.5 Application Integration
+
+Last but not least, we can get the json file that is needed for application integration.
+
+```{r, message=FALSE, warning=FALSE, error=TRUE}
+# Get this service's `swagger.json` file that is needed for web application integration
+
+swagger <- api$swagger(json = FALSE)
+
+# Delete the service to make the script re-runable
+
+deleteService(name="crpService", v="v1.0.0")
+```
--- a/CreditRiskPrediction/Code/CreditRiskDeploy.html
+++ b/CreditRiskPrediction/Code/CreditRiskDeploy.html
--- a/CreditRiskPrediction/Code/CreditRiskDeploy.ipynb
+++ b/CreditRiskPrediction/Code/CreditRiskDeploy.ipynb
@ -0,0 +1,518 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "title: \"Deploy a Credit Risk Model as a Web Service\"\n",
+    "author: \"Fang Zhou, Data Scientist, Microsoft\"\n",
+    "date: \"`r Sys.Date()`\"\n",
+    "output: html_document\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "id": "",
+     "include": "FALSE,",
+     "purl": "FALSE"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "knitr::opts_chunk$set(echo = TRUE,\n",
+    "                      fig.width = 8,\n",
+    "                      fig.height = 5,\n",
+    "                      fig.align='center',\n",
+    "                      dev = \"png\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1 Introduction\n",
+    "\n",
+    "The `mrsdeploy` package, delivered with Microsoft R Client and R Server, provides functions for:\n",
+    "\n",
+    "**1** Establishing a remote session in a R console application for the purposes of executing code on that server\n",
+    "\n",
+    "**2** Publishing and managing an R web service that is backed by the R code block or script you provided. \n",
+    "\n",
+    "Each feature can be used independently, but the greatest value is achieved when you can leverage both.\n",
+    "\n",
+    "This document will walk through you how to deploy a credit risk model as a web service, using the `mrsdeploy` package.\n",
+    "\n",
+    "It will start by modelling locally, then publish it as a web service, and then share it with other authenticated users for consumption, and finally manage and update the web service. \n",
+    "\n",
+    "## 2 Automated Credit Risk Model Deployment\n",
+    "\n",
+    "### 2.1 Setup\n",
+    "\n",
+    "We load the required R packages."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## Setup\n",
+    "\n",
+    "# Load the required packages into the R session.\n",
+    "\n",
+    "library(rattle)       # Use normVarNames().\n",
+    "library(dplyr)        # Wrangling: tbl_df(), group_by(), print(), glimpse().\n",
+    "library(magrittr)     # Pipe operator %>% %<>% %T>% equals().\n",
+    "library(scales)       # Include commas in numbers.\n",
+    "library(MicrosoftML)  # Build models using Microsoft ML algortihms.\n",
+    "library(mrsdeploy)    # Publish an R model as a web service."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then, the dataset processedSimu is ingested for demonstration. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## Data Ingestion\n",
+    "\n",
+    "# Identify the source location of the dataset.\n",
+    "\n",
+    "#DATA <- \"../../Data/\"\n",
+    "#txn_fname <- file.path(DATA, \"Raw/processedSimu.csv\")\n",
+    "\n",
+    "wd <- getwd()\n",
+    "\n",
+    "dpath <- \"../Data\"\n",
+    "data_fname <- file.path(wd, dpath, \"processedSimu.csv\")\n",
+    "\n",
+    "# Ingest the dataset.\n",
+    "\n",
+    "data <- read.csv(file=data_fname) %T>% \n",
+    "  {dim(.) %>% comma() %>% cat(\"\\n\")}\n",
+    "\n",
+    "# A glimpse into the data.\n",
+    "\n",
+    "glimpse(data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.2 Model Locally\n",
+    "\n",
+    "Now, let's get started to build an R model based web service. \n",
+    "\n",
+    "First of all, we create a machine learning fast tree model on the dataset processedSimu by using the function `rxFastTrees()` from the `MicrosoftML` package. This model could be used to predict whether an account will default or to predict its probability of default, given some transaction statistics and demographic & bank account information as inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## Variable roles.\n",
+    "\n",
+    "# Target variable\n",
+    "\n",
+    "target <- \"bad_flag\"\n",
+    "\n",
+    "# Note any identifier.\n",
+    "\n",
+    "id <- c(\"account_id\") %T>% print() \n",
+    "\n",
+    "# Note the available variables as model inputs.\n",
+    "\n",
+    "vars <- setdiff(names(data), c(target, id))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Split Data\n",
+    "\n",
+    "set.seed(42)\n",
+    "\n",
+    "data <- data[order(runif(nrow(data))), ]\n",
+    "\n",
+    "train <- sample(nrow(data), 0.70 * nrow(data))\n",
+    "test <- setdiff(seq_len(nrow(data)), train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Prepare the formula\n",
+    "\n",
+    "top_vars <- c(\"amount_6\", \"pur_6\", \"avg_pur_amt_6\", \"avg_interval_pur_6\", \"credit_limit\", \"age\", \"income\", \"sex\", \"education\", \"marital_status\")\n",
+    "\n",
+    "form <- as.formula(paste(target, paste(top_vars, collapse=\"+\"), sep=\"~\"))\n",
+    "form"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Train model: rxFastTrees\n",
+    "\n",
+    "model_rxtrees <- rxFastTrees(formula=form,\n",
+    "                             data=data[train, c(target, vars)],\n",
+    "                             type=\"binary\",\n",
+    "                             numTrees=100,\n",
+    "                             numLeaves=20,\n",
+    "                             learningRate=0.2,\n",
+    "                             minSplit=10,\n",
+    "                             unbalancedSets=FALSE,\n",
+    "                             verbose=0)\n",
+    "\n",
+    "model_rxtrees"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Produce a prediction function that can use the model\n",
+    "\n",
+    "creditRiskPrediction <- function(account_id, amount_6, pur_6, avg_pur_amt_6, avg_interval_pur_6, \n",
+    "                                 credit_limit, marital_status, sex, education, income, age)\n",
+    "{ \n",
+    "  newdata <- data.frame(account_id=account_id,\n",
+    "                          amount_6=amount_6, \n",
+    "                          pur_6=pur_6, \n",
+    "                          avg_pur_amt_6=avg_pur_amt_6, \n",
+    "                          avg_interval_pur_6=avg_interval_pur_6, \n",
+    "                          credit_limit=credit_limit, \n",
+    "                          marital_status=marital_status, \n",
+    "                          sex=sex, \n",
+    "                          education=education, \n",
+    "                          income=income, \n",
+    "                          age=age)\n",
+    "  \n",
+    "  pred <- rxPredict(modelObject=model_rxtrees, data=newdata)[, c(1, 3)]\n",
+    "  pred <- cbind(newdata$account_id, pred)\n",
+    "  names(pred) <- c(\"account_id\", \"scored_label\", \"scored_prob\")\n",
+    "  pred \n",
+    "}\n",
+    "\n",
+    "# Test function locally by printing results\n",
+    "\n",
+    "pred <- creditRiskPrediction(account_id=\"a_1055521029582310\",\n",
+    "                             amount_6=173.22, \n",
+    "                             pur_6=1, \n",
+    "                             avg_pur_amt_6=173.22, \n",
+    "                             avg_interval_pur_6=0, \n",
+    "                             credit_limit=5.26, \n",
+    "                             marital_status=\"married\", \n",
+    "                             sex=\"male\", \n",
+    "                             education=\"undergraduate\", \n",
+    "                             income=12.36, \n",
+    "                             age=38)\n",
+    "\n",
+    "print(pred)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.2 Publish model as a web service\n",
+    "\n",
+    "The second procedure is to publish the model as a web service by following the below steps.\n",
+    "\n",
+    "Step 1: From your local R IDE, log into Microsoft R Server with your credentials using the appropriate authentication function from the `mrsdeploy` package (remoteLogin or remoteLoginAAD). \n",
+    "\n",
+    "For simplicity, the code below uses the basic local admin account for authentication with the remoteLogin function and `session = false` so that no remote R session is started."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Use `remoteLogin` to authenticate with R Server using \n",
+    "# the local admin account. Use session = false so no \n",
+    "# remote R session started\n",
+    "\n",
+    "remoteLogin(\"http://localhost:12800\", \n",
+    "         username=\"admin\", \n",
+    "         password=\"P@ssw0rd\",\n",
+    "         session=FALSE)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, you are successfully connected to the remote R Server.\n",
+    "\n",
+    "Step 2: Publish the model as a web service to R Server using the `publishService()` function from the `mrsdeploy` package. \n",
+    "\n",
+    "In this example, you publish a web service called \"crpService\" using the model `model_rxtrees` and the function `creditRiskPrediction()`. As an input, the service takes a list of transaction statistics and demographic & bank account information represented as numerical or categorical. As an output, an R data frame including the account id, the predicted label of default, and the probability of default for the given individual account, has of being achieved with the pre-defined credit risk prediction function. \n",
+    "\n",
+    "When publishing, you must specify, among other parameters, a service name and version, the R code, the inputs, as well as the outputs that application developers will need to integrate in their applications."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Publish a web service\n",
+    "\n",
+    "api <- publishService(\n",
+    "       \"crpService\",\n",
+    "        code=creditRiskPrediction,\n",
+    "        model=model_rxtrees,\n",
+    "        inputs=list(account_id=\"character\",\n",
+    "                    amount_6=\"numeric\", \n",
+    "                    pur_6=\"numeric\", \n",
+    "                    avg_pur_amt_6=\"numeric\", \n",
+    "                    avg_interval_pur_6=\"numeric\", \n",
+    "                    credit_limit=\"numeric\", \n",
+    "                    marital_status=\"character\", \n",
+    "                    sex=\"character\", \n",
+    "                    education=\"character\", \n",
+    "                    income=\"numeric\", \n",
+    "                    age=\"numeric\"),\n",
+    "        outputs=list(pred=\"data.frame\"),\n",
+    "        v=\"v1.0.0\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.3 Test the service by consuming it in R\n",
+    "\n",
+    "After publishing it , we can consume the service in R directly to verify that the results are as expected."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "TRUE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Get service and assign service to the variable `api`.\n",
+    "\n",
+    "api <- getService(\"crpService\", \"v1.0.0\")\n",
+    "\n",
+    "# Consume service by calling function, `creditRiskPrediction` contained in this service\n",
+    "\n",
+    "result <- api$creditRiskPrediction(account_id=\"a_1055521029582310\",\n",
+    "                                   amount_6=173.22, \n",
+    "                                   pur_6=1, \n",
+    "                                   avg_pur_amt_6=173.22, \n",
+    "                                   avg_interval_pur_6=0, \n",
+    "                                   credit_limit=5.26, \n",
+    "                                   marital_status=\"married\", \n",
+    "                                   sex=\"male\", \n",
+    "                                   education=\"undergraduate\", \n",
+    "                                   income=12.36, \n",
+    "                                   age=38)\n",
+    "\n",
+    "# Print response output named `answer`\n",
+    "\n",
+    "print(result$output(\"pred\")) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.4 Update the web service\n",
+    "\n",
+    "In the process of production, we could manage and update the web service timely."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "TRUE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Load the pre-trained optimal model obtained from the template of CreditRiskScale.\n",
+    "\n",
+    "load(file=\"model_rxtrees.RData\")\n",
+    "\n",
+    "model_rxtrees\n",
+    "\n",
+    "api <- updateService(name=\"crpService\", \n",
+    "                     v=\"v1.0.0\",\n",
+    "                     model=model_rxtrees,\n",
+    "                     descr=\"Update the model hyper-parameters\")\n",
+    "\n",
+    "# Re-test the updated service by consuming it\n",
+    "\n",
+    "result <- api$creditRiskPrediction(account_id=\"a_1055521029582310\",\n",
+    "                                   amount_6=173.22, \n",
+    "                                   pur_6=1, \n",
+    "                                   avg_pur_amt_6=173.22, \n",
+    "                                   avg_interval_pur_6=0, \n",
+    "                                   credit_limit=5.26, \n",
+    "                                   marital_status=\"married\", \n",
+    "                                   sex=\"male\", \n",
+    "                                   education=\"undergraduate\", \n",
+    "                                   income=12.36, \n",
+    "                                   age=38)\n",
+    "\n",
+    "# Print response output named `answer`\n",
+    "\n",
+    "print(result$output(\"pred\")) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.5 Application Integration\n",
+    "\n",
+    "Last but not least, we can get the json file that is needed for application integration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "TRUE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Get this service's `swagger.json` file that is needed for web application integration\n",
+    "\n",
+    "swagger <- api$swagger(json = FALSE)\n",
+    "\n",
+    "# Delete the service to make the script re-runable\n",
+    "\n",
+    "deleteService(name=\"crpService\", v=\"v1.0.0\")"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/CreditRiskPrediction/Code/CreditRiskScale.Rmd
+++ b/CreditRiskPrediction/Code/CreditRiskScale.Rmd
@ -0,0 +1,354 @@
+---
+title: "Faster and Scalable Credit Risk Prediction"
+author: "Fang Zhou, Data Scientist, Microsoft"
+date: "`r Sys.Date()`"
+output: html_document
+---
+
+```{r setup, include=FALSE, purl=FALSE}
+knitr::opts_chunk$set(echo = TRUE,
+                      fig.width = 8,
+                      fig.height = 5,
+                      fig.align='center',
+                      dev = "png")
+```
+
+## 1 Introduction
+
+Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R. 
+
+The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.
+
+This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.
+
+## 2 Faster and Scalable Credit Risk Models
+
+### 2.1 Setup
+
+We load the required R packages.
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+## Setup
+
+# Load the required packages into the R session.
+
+library(rattle)       # Use normVarNames().
+library(dplyr)        # Wrangling: tbl_df(), group_by(), print(), glimpse().
+library(magrittr)     # Pipe operator %>% %<>% %T>% equals().
+library(scales)       # Include commas in numbers.
+library(RevoScaleR)   # Enable out-of-memory computation in R.
+library(dplyrXdf)     # Wrangling on xdf data format.
+library(MicrosoftML)  # Build models using Microsoft ML algortihms.
+library(caret)        # Calculate confusion matrix by using confusionMatrix().
+library(ROCR)         # Provide functions for model performance evaluation.
+```
+
+Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+## Data Ingestion
+
+# Identify the source location of the dataset.
+
+#DATA <- "../../Data/"
+#data_fname <- file.path(DATA, "Raw/processedSimu.csv")
+
+wd <- getwd()
+
+dpath <- "../Data"
+data_fname <- file.path(wd, dpath, "processedSimu.csv")
+output_fname <- file.path(wd, dpath, "processedSimu.xdf")
+output <- RxXdfData(file=output_fname)
+
+# Ingest the dataset.
+
+data <- rxImport(inData=data_fname, 
+                 outFile=output,
+                 stringsAsFactors=TRUE,
+                 overwrite=TRUE)
+
+
+# View data information.
+
+rxGetVarInfo(data)
+```
+
+### 2.2 Model Building
+
+Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package. 
+
+First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`. 
+
+From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively. 
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+## Variable roles.
+
+# Target variable
+
+target <- "bad_flag"
+
+# Note any identifier.
+
+id <- c("account_id") %T>% print() 
+
+# Note the available variables as model inputs.
+
+vars <- setdiff(names(data), c(target, id))
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Split Data
+
+set.seed(42)
+
+# Add training/testing flag to each observation.
+
+data %<>%
+  mutate(.train=factor(sample(1:2, .rxNumRows,
+                              replace=TRUE,
+                              prob=c(0.70, 0.30)),
+                       levels=1:2))
+
+# Split dataset into training/test.
+
+data_split <- rxSplit(data, splitByFactor=".train")
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Prepare the formula
+
+top_vars <- c("amount_6", "pur_6", "avg_pur_amt_6", "avg_interval_pur_6", "credit_limit", "age", "income", "sex", "education", "marital_status")
+
+form <- as.formula(paste(target, paste(top_vars, collapse="+"), sep="~"))
+form
+```
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Specify the local parallel compute context.
+
+rxSetComputeContext("localpar")
+
+# Train model: rxLogisticRegression
+
+time_rxlogit <- system.time(
+  
+  model_rxlogit <- rxLogisticRegression(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    l1Weight=1,
+    verbose=0
+  )
+)
+
+# Train model: rxFastForest
+
+time_rxforest <- system.time(
+  
+  model_rxforest <- rxFastForest(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    numTrees=100,
+    numLeaves=20,
+    minSplit=10,
+    verbose=0
+  )
+)
+
+# Train model: rxFastTrees
+
+time_rxtrees1 <- system.time(
+  
+  model_rxtrees1 <- rxFastTrees(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    numTrees=100,
+    numLeaves=20,
+    learningRate=0.2,
+    minSplit=10,
+    unbalancedSets=FALSE,
+    verbose=0
+  )
+)
+
+time_rxtrees2 <- system.time(
+  
+  model_rxtrees2 <- rxFastTrees(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    numTrees=500,
+    numLeaves=20,
+    learningRate=0.2,
+    minSplit=10,
+    unbalancedSets=FALSE,
+    verbose=0
+  )
+)
+
+time_rxtrees3 <- system.time(
+  
+  model_rxtrees3 <- rxFastTrees(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    numTrees=500,
+    numLeaves=20,
+    learningRate=0.3,
+    minSplit=10,
+    unbalancedSets=FALSE,
+    verbose=0
+  )
+)
+
+time_rxtrees4 <- system.time(
+  
+  model_rxtrees4 <- rxFastTrees(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    numTrees=500,
+    numLeaves=20,
+    learningRate=0.3,
+    minSplit=10,
+    unbalancedSets=TRUE,
+    verbose=0
+  )
+)
+```
+
+Next, we build an ensemble of fast tree models by using the function `rxEnsemble()`.
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Train an ensemble model.
+
+time_ensemble <- system.time(
+  
+  model_ensemble <- rxEnsemble(
+    formula=form,
+    data=data_split[[1]],
+    type="binary",
+    trainers=list(fastTrees(), 
+                  fastTrees(numTrees=500), 
+                  fastTrees(numTrees=500, learningRate=0.3),
+                  fastTrees(numTrees=500, learningRate=0.3, unbalancedSets=TRUE)),
+    combineMethod="vote",
+    replace=TRUE,
+    verbose=0
+  )
+)
+```
+
+### 2.3 Model Evaluation 
+
+Finally, we evaluate and compare the above built models at various aspects. 
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Predict
+
+models <- list(model_rxlogit, model_rxforest, 
+               model_rxtrees1, model_rxtrees2, model_rxtrees3, model_rxtrees4, 
+               model_ensemble)
+
+# Predict class
+
+predictions <- lapply(models, 
+                      rxPredict, 
+                      data=data_split[[2]]) %>%
+                lapply('[[', 1)
+
+levels(predictions[[7]]) <- c("no", "yes")
+
+# Confusion matrix evaluation results.
+
+cm_metrics <-lapply(predictions,
+                    confusionMatrix, 
+                    reference=data_split[[2]][[target]],
+                    positive="yes")
+
+# Accuracy
+
+acc_metrics <- 
+  lapply(cm_metrics, `[[`, "overall") %>%
+  lapply(`[`, 1) %>%
+  unlist() %>%
+  as.vector()
+
+# Recall
+
+rec_metrics <- 
+  lapply(cm_metrics, `[[`, "byClass") %>%
+  lapply(`[`, 1) %>%
+  unlist() %>%
+  as.vector()
+  
+# Precision
+
+pre_metrics <- 
+  lapply(cm_metrics, `[[`, "byClass") %>%
+  lapply(`[`, 3) %>%
+  unlist() %>%
+  as.vector()
+
+# Predict class probability
+
+probs <- lapply(models[c(1, 2, 3, 4, 5, 6)],
+                rxPredict,
+                data=data_split[[2]]) %>%
+                lapply('[[', 3)
+
+# Create prediction object
+
+preds <- lapply(probs, 
+                ROCR::prediction,
+                labels=data_split[[2]][[target]])
+
+# Auc
+
+auc_metrics <- lapply(preds, 
+                      ROCR::performance,
+                      "auc") %>%
+               lapply(slot, "y.values") %>%
+               lapply('[[', 1) %>%
+               unlist()
+
+auc_metrics <- c(auc_metrics, NaN)
+
+algo_list <- c("rxLogisticRegression", 
+               "rxFastForest", 
+               "rxFastTrees", 
+               "rxFastTrees(500)", 
+               "rxFastTrees(500, 0.3)", 
+               "rxFastTrees(500, 0.3, ub)",
+               "rxEnsemble")
+
+time_consumption <- c(time_rxlogit[3], time_rxforest[[3]], 
+                      time_rxtrees1[3], time_rxtrees2[[3]], 
+                      time_rxtrees3[[3]], time_rxtrees4[[3]],
+                      time_ensemble[3])
+
+df_comp <- 
+  data.frame(Models=algo_list, 
+             Accuracy=acc_metrics, 
+             Recall=rec_metrics, 
+             Precision=pre_metrics,
+             AUC=auc_metrics,
+             Time=time_consumption) %T>%
+             print()
+```
+
+### 2.4 Save Models for Deployment
+
+Last but not least, we need to save the model objects in various formats, (e.g., `.RData`, `SQLServerData`, ect) for the later usage of deployment. 
+
+```{r, message=FALSE, warning=FALSE, error=FALSE}
+# Save model for deployment usage.
+
+model_rxtrees <- model_rxtrees3
+
+save(model_rxtrees, file="model_rxtrees.RData")
+```
+
--- a/CreditRiskPrediction/Code/CreditRiskScale.html
+++ b/CreditRiskPrediction/Code/CreditRiskScale.html
--- a/CreditRiskPrediction/Code/CreditRiskScale.ipynb
+++ b/CreditRiskPrediction/Code/CreditRiskScale.ipynb
@ -0,0 +1,525 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "title: \"Faster and Scalable Credit Risk Prediction\"\n",
+    "author: \"Fang Zhou, Data Scientist, Microsoft\"\n",
+    "date: \"`r Sys.Date()`\"\n",
+    "output: html_document\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "id": "",
+     "include": "FALSE,",
+     "purl": "FALSE"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "knitr::opts_chunk$set(echo = TRUE,\n",
+    "                      fig.width = 8,\n",
+    "                      fig.height = 5,\n",
+    "                      fig.align='center',\n",
+    "                      dev = \"png\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1 Introduction\n",
+    "\n",
+    "Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R. \n",
+    "\n",
+    "The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.\n",
+    "\n",
+    "This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.\n",
+    "\n",
+    "## 2 Faster and Scalable Credit Risk Models\n",
+    "\n",
+    "### 2.1 Setup\n",
+    "\n",
+    "We load the required R packages."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## Setup\n",
+    "\n",
+    "# Load the required packages into the R session.\n",
+    "\n",
+    "library(rattle)       # Use normVarNames().\n",
+    "library(dplyr)        # Wrangling: tbl_df(), group_by(), print(), glimpse().\n",
+    "library(magrittr)     # Pipe operator %>% %<>% %T>% equals().\n",
+    "library(scales)       # Include commas in numbers.\n",
+    "library(RevoScaleR)   # Enable out-of-memory computation in R.\n",
+    "library(dplyrXdf)     # Wrangling on xdf data format.\n",
+    "library(MicrosoftML)  # Build models using Microsoft ML algortihms.\n",
+    "library(caret)        # Calculate confusion matrix by using confusionMatrix().\n",
+    "library(ROCR)         # Provide functions for model performance evaluation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## Data Ingestion\n",
+    "\n",
+    "# Identify the source location of the dataset.\n",
+    "\n",
+    "#DATA <- \"../../Data/\"\n",
+    "#data_fname <- file.path(DATA, \"Raw/processedSimu.csv\")\n",
+    "\n",
+    "wd <- getwd()\n",
+    "\n",
+    "dpath <- \"../Data\"\n",
+    "data_fname <- file.path(wd, dpath, \"processedSimu.csv\")\n",
+    "output_fname <- file.path(wd, dpath, \"processedSimu.xdf\")\n",
+    "output <- RxXdfData(file=output_fname)\n",
+    "\n",
+    "# Ingest the dataset.\n",
+    "\n",
+    "data <- rxImport(inData=data_fname, \n",
+    "                 outFile=output,\n",
+    "                 stringsAsFactors=TRUE,\n",
+    "                 overwrite=TRUE)\n",
+    "\n",
+    "\n",
+    "# View data information.\n",
+    "\n",
+    "rxGetVarInfo(data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.2 Model Building\n",
+    "\n",
+    "Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package. \n",
+    "\n",
+    "First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`. \n",
+    "\n",
+    "From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## Variable roles.\n",
+    "\n",
+    "# Target variable\n",
+    "\n",
+    "target <- \"bad_flag\"\n",
+    "\n",
+    "# Note any identifier.\n",
+    "\n",
+    "id <- c(\"account_id\") %T>% print() \n",
+    "\n",
+    "# Note the available variables as model inputs.\n",
+    "\n",
+    "vars <- setdiff(names(data), c(target, id))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Split Data\n",
+    "\n",
+    "set.seed(42)\n",
+    "\n",
+    "# Add training/testing flag to each observation.\n",
+    "\n",
+    "data %<>%\n",
+    "  mutate(.train=factor(sample(1:2, .rxNumRows,\n",
+    "                              replace=TRUE,\n",
+    "                              prob=c(0.70, 0.30)),\n",
+    "                       levels=1:2))\n",
+    "\n",
+    "# Split dataset into training/test.\n",
+    "\n",
+    "data_split <- rxSplit(data, splitByFactor=\".train\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Prepare the formula\n",
+    "\n",
+    "top_vars <- c(\"amount_6\", \"pur_6\", \"avg_pur_amt_6\", \"avg_interval_pur_6\", \"credit_limit\", \"age\", \"income\", \"sex\", \"education\", \"marital_status\")\n",
+    "\n",
+    "form <- as.formula(paste(target, paste(top_vars, collapse=\"+\"), sep=\"~\"))\n",
+    "form"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Specify the local parallel compute context.\n",
+    "\n",
+    "rxSetComputeContext(\"localpar\")\n",
+    "\n",
+    "# Train model: rxLogisticRegression\n",
+    "\n",
+    "time_rxlogit <- system.time(\n",
+    "  \n",
+    "  model_rxlogit <- rxLogisticRegression(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    l1Weight=1,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")\n",
+    "\n",
+    "# Train model: rxFastForest\n",
+    "\n",
+    "time_rxforest <- system.time(\n",
+    "  \n",
+    "  model_rxforest <- rxFastForest(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    numTrees=100,\n",
+    "    numLeaves=20,\n",
+    "    minSplit=10,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")\n",
+    "\n",
+    "# Train model: rxFastTrees\n",
+    "\n",
+    "time_rxtrees1 <- system.time(\n",
+    "  \n",
+    "  model_rxtrees1 <- rxFastTrees(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    numTrees=100,\n",
+    "    numLeaves=20,\n",
+    "    learningRate=0.2,\n",
+    "    minSplit=10,\n",
+    "    unbalancedSets=FALSE,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")\n",
+    "\n",
+    "time_rxtrees2 <- system.time(\n",
+    "  \n",
+    "  model_rxtrees2 <- rxFastTrees(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    numTrees=500,\n",
+    "    numLeaves=20,\n",
+    "    learningRate=0.2,\n",
+    "    minSplit=10,\n",
+    "    unbalancedSets=FALSE,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")\n",
+    "\n",
+    "time_rxtrees3 <- system.time(\n",
+    "  \n",
+    "  model_rxtrees3 <- rxFastTrees(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    numTrees=500,\n",
+    "    numLeaves=20,\n",
+    "    learningRate=0.3,\n",
+    "    minSplit=10,\n",
+    "    unbalancedSets=FALSE,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")\n",
+    "\n",
+    "time_rxtrees4 <- system.time(\n",
+    "  \n",
+    "  model_rxtrees4 <- rxFastTrees(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    numTrees=500,\n",
+    "    numLeaves=20,\n",
+    "    learningRate=0.3,\n",
+    "    minSplit=10,\n",
+    "    unbalancedSets=TRUE,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we build an ensemble of fast tree models by using the function `rxEnsemble()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Train an ensemble model.\n",
+    "\n",
+    "time_ensemble <- system.time(\n",
+    "  \n",
+    "  model_ensemble <- rxEnsemble(\n",
+    "    formula=form,\n",
+    "    data=data_split[[1]],\n",
+    "    type=\"binary\",\n",
+    "    trainers=list(fastTrees(), \n",
+    "                  fastTrees(numTrees=500), \n",
+    "                  fastTrees(numTrees=500, learningRate=0.3),\n",
+    "                  fastTrees(numTrees=500, learningRate=0.3, unbalancedSets=TRUE)),\n",
+    "    combineMethod=\"vote\",\n",
+    "    replace=TRUE,\n",
+    "    verbose=0\n",
+    "  )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.3 Model Evaluation \n",
+    "\n",
+    "Finally, we evaluate and compare the above built models at various aspects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Predict\n",
+    "\n",
+    "models <- list(model_rxlogit, model_rxforest, \n",
+    "               model_rxtrees1, model_rxtrees2, model_rxtrees3, model_rxtrees4, \n",
+    "               model_ensemble)\n",
+    "\n",
+    "# Predict class\n",
+    "\n",
+    "predictions <- lapply(models, \n",
+    "                      rxPredict, \n",
+    "                      data=data_split[[2]]) %>%\n",
+    "                lapply('[[', 1)\n",
+    "\n",
+    "levels(predictions[[7]]) <- c(\"no\", \"yes\")\n",
+    "\n",
+    "# Confusion matrix evaluation results.\n",
+    "\n",
+    "cm_metrics <-lapply(predictions,\n",
+    "                    confusionMatrix, \n",
+    "                    reference=data_split[[2]][[target]],\n",
+    "                    positive=\"yes\")\n",
+    "\n",
+    "# Accuracy\n",
+    "\n",
+    "acc_metrics <- \n",
+    "  lapply(cm_metrics, `[[`, \"overall\") %>%\n",
+    "  lapply(`[`, 1) %>%\n",
+    "  unlist() %>%\n",
+    "  as.vector()\n",
+    "\n",
+    "# Recall\n",
+    "\n",
+    "rec_metrics <- \n",
+    "  lapply(cm_metrics, `[[`, \"byClass\") %>%\n",
+    "  lapply(`[`, 1) %>%\n",
+    "  unlist() %>%\n",
+    "  as.vector()\n",
+    "  \n",
+    "# Precision\n",
+    "\n",
+    "pre_metrics <- \n",
+    "  lapply(cm_metrics, `[[`, \"byClass\") %>%\n",
+    "  lapply(`[`, 3) %>%\n",
+    "  unlist() %>%\n",
+    "  as.vector()\n",
+    "\n",
+    "# Predict class probability\n",
+    "\n",
+    "probs <- lapply(models[c(1, 2, 3, 4, 5, 6)],\n",
+    "                rxPredict,\n",
+    "                data=data_split[[2]]) %>%\n",
+    "                lapply('[[', 3)\n",
+    "\n",
+    "# Create prediction object\n",
+    "\n",
+    "preds <- lapply(probs, \n",
+    "                ROCR::prediction,\n",
+    "                labels=data_split[[2]][[target]])\n",
+    "\n",
+    "# Auc\n",
+    "\n",
+    "auc_metrics <- lapply(preds, \n",
+    "                      ROCR::performance,\n",
+    "                      \"auc\") %>%\n",
+    "               lapply(slot, \"y.values\") %>%\n",
+    "               lapply('[[', 1) %>%\n",
+    "               unlist()\n",
+    "\n",
+    "auc_metrics <- c(auc_metrics, NaN)\n",
+    "\n",
+    "algo_list <- c(\"rxLogisticRegression\", \n",
+    "               \"rxFastForest\", \n",
+    "               \"rxFastTrees\", \n",
+    "               \"rxFastTrees(500)\", \n",
+    "               \"rxFastTrees(500, 0.3)\", \n",
+    "               \"rxFastTrees(500, 0.3, ub)\",\n",
+    "               \"rxEnsemble\")\n",
+    "\n",
+    "time_consumption <- c(time_rxlogit[3], time_rxforest[[3]], \n",
+    "                      time_rxtrees1[3], time_rxtrees2[[3]], \n",
+    "                      time_rxtrees3[[3]], time_rxtrees4[[3]],\n",
+    "                      time_ensemble[3])\n",
+    "\n",
+    "df_comp <- \n",
+    "  data.frame(Models=algo_list, \n",
+    "             Accuracy=acc_metrics, \n",
+    "             Recall=rec_metrics, \n",
+    "             Precision=pre_metrics,\n",
+    "             AUC=auc_metrics,\n",
+    "             Time=time_consumption) %T>%\n",
+    "             print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.4 Save Models for Deployment\n",
+    "\n",
+    "Last but not least, we need to save the model objects in various formats, (e.g., `.RData`, `SQLServerData`, ect) for the later usage of deployment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "attributes": {
+     "classes": [],
+     "error": "FALSE",
+     "id": "",
+     "message": "FALSE,",
+     "warning": "FALSE,"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Save model for deployment usage.\n",
+    "\n",
+    "model_rxtrees <- model_rxtrees3\n",
+    "\n",
+    "save(model_rxtrees, file=\"model_rxtrees.RData\")"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/CreditRiskPrediction/Code/README.md
+++ b/CreditRiskPrediction/Code/README.md
@ -9,8 +9,15 @@ Some other critical R packages for the analysis:
 * glmnet >= 2.0-5 Logistic regression model with L1 and L2 regularization.
 * xgboost >= 0.6-4 Extreme gradiant boost model.
 * randomForest >= 4.6-12 Random Forest model.
+* caret >= 6.0-73 Classification and regression training.
 * caretEnsemble >= 2.0.0 Ensemble of caret based models.

+* RevoScaleR >= 9.1 Parallel and chunked data processing and modeling. 
+* dplyrXdf >= 0.9.2 Out-of-Memory Data wrangling.
+* MicrosoftML >= 9.1 Microsoft machine learning models.
+
+* mrsdeploy >= 9.1 R Server Operationalization.
+
 # Use of template

 The codes for analytics, embedded with step-by-step instructions, are written in R markdown, and can be run interactively within the code chunks of the markdown file.
--- a/CreditRiskPrediction/Code/model_rxtrees.RData
+++ b/CreditRiskPrediction/Code/model_rxtrees.RData
--- a/CreditRiskPrediction/README.md
+++ b/CreditRiskPrediction/README.md
@ -9,7 +9,7 @@ Many banks nowadays are driving innovation to enhance risk management. For examp
 The repository contains three parts

 - **Data** This contains the provided sample data. 
- **Code** This contains the R development code. They are displayed in R markdown files which can yield files of various formats. 
+- **Code** This contains the R development code. They are displayed in R markdown files which can yield files of various formats, like html, ipynb, ect. 
 - **Docs** This contains the documents, like blog, installation instructions, etc. 

 ## Business domain
@ -36,4 +36,13 @@ In the data-driven credit risk prediction model, normally two types of data are

 1. Traditional logistic regression model with L1 regularization are built as a baseline.
 2. Machine learning models, such as gradiant boosting and random forest, or their ensembles, are fine tuned to compare the performance at various aspects. 
-3. Innovative convolutionary hotspot method will be pursued in the near future.
+3. Innovative convolutionary hotspot method will be pursued in the near future.
+
+## Scalability
+
+**Faster and scalable credit risk models** are built using the state-of-the-art machine learning algorithms provided by the `MicrosoftML` package.
+
+## Operationalization
+
+An **R model based web service for credit risk prediction** is published and consumed by using the `mrsdeploy` package that ships with Microsoft R Client and R Server 9.1.
+