Merge branch 'master' of github.com:Microsoft/acceleratoRs

This commit is contained in:
yueguoguo 2017-05-19 22:23:11 +09:00
Родитель 79bc4a23c3 162d9bcd79
Коммит 91ec3bea03
9 изменённых файлов: 2689 добавлений и 2 удалений

Просмотреть файл

@ -0,0 +1,297 @@
---
title: "Deploy a Credit Risk Model as a Web Service"
author: "Fang Zhou, Data Scientist, Microsoft"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE, purl=FALSE}
knitr::opts_chunk$set(echo = TRUE,
fig.width = 8,
fig.height = 5,
fig.align='center',
dev = "png")
```
## 1 Introduction
The `mrsdeploy` package, delivered with Microsoft R Client and R Server, provides functions for:
**1** Establishing a remote session in a R console application for the purposes of executing code on that server
**2** Publishing and managing an R web service that is backed by the R code block or script you provided.
Each feature can be used independently, but the greatest value is achieved when you can leverage both.
This document will walk through you how to deploy a credit risk model as a web service, using the `mrsdeploy` package.
It will start by modelling locally, then publish it as a web service, and then share it with other authenticated users for consumption, and finally manage and update the web service.
## 2 Automated Credit Risk Model Deployment
### 2.1 Setup
We load the required R packages.
```{r, message=FALSE, warning=FALSE, error=FALSE}
## Setup
# Load the required packages into the R session.
library(rattle) # Use normVarNames().
library(dplyr) # Wrangling: tbl_df(), group_by(), print(), glimpse().
library(magrittr) # Pipe operator %>% %<>% %T>% equals().
library(scales) # Include commas in numbers.
library(MicrosoftML) # Build models using Microsoft ML algortihms.
library(mrsdeploy) # Publish an R model as a web service.
```
Then, the dataset processedSimu is ingested for demonstration. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
```{r, message=FALSE, warning=FALSE, error=FALSE}
## Data Ingestion
# Identify the source location of the dataset.
#DATA <- "../../Data/"
#txn_fname <- file.path(DATA, "Raw/processedSimu.csv")
wd <- getwd()
dpath <- "../Data"
data_fname <- file.path(wd, dpath, "processedSimu.csv")
# Ingest the dataset.
data <- read.csv(file=data_fname) %T>%
{dim(.) %>% comma() %>% cat("\n")}
# A glimpse into the data.
glimpse(data)
```
### 2.2 Model Locally
Now, let's get started to build an R model based web service.
First of all, we create a machine learning fast tree model on the dataset processedSimu by using the function `rxFastTrees()` from the `MicrosoftML` package. This model could be used to predict whether an account will default or to predict its probability of default, given some transaction statistics and demographic & bank account information as inputs.
```{r, message=FALSE, warning=FALSE, error=FALSE}
## Variable roles.
# Target variable
target <- "bad_flag"
# Note any identifier.
id <- c("account_id") %T>% print()
# Note the available variables as model inputs.
vars <- setdiff(names(data), c(target, id))
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Split Data
set.seed(42)
data <- data[order(runif(nrow(data))), ]
train <- sample(nrow(data), 0.70 * nrow(data))
test <- setdiff(seq_len(nrow(data)), train)
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Prepare the formula
top_vars <- c("amount_6", "pur_6", "avg_pur_amt_6", "avg_interval_pur_6", "credit_limit", "age", "income", "sex", "education", "marital_status")
form <- as.formula(paste(target, paste(top_vars, collapse="+"), sep="~"))
form
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Train model: rxFastTrees
model_rxtrees <- rxFastTrees(formula=form,
data=data[train, c(target, vars)],
type="binary",
numTrees=100,
numLeaves=20,
learningRate=0.2,
minSplit=10,
unbalancedSets=FALSE,
verbose=0)
model_rxtrees
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Produce a prediction function that can use the model
creditRiskPrediction <- function(account_id, amount_6, pur_6, avg_pur_amt_6, avg_interval_pur_6,
credit_limit, marital_status, sex, education, income, age)
{
newdata <- data.frame(account_id=account_id,
amount_6=amount_6,
pur_6=pur_6,
avg_pur_amt_6=avg_pur_amt_6,
avg_interval_pur_6=avg_interval_pur_6,
credit_limit=credit_limit,
marital_status=marital_status,
sex=sex,
education=education,
income=income,
age=age)
pred <- rxPredict(modelObject=model_rxtrees, data=newdata)[, c(1, 3)]
pred <- cbind(newdata$account_id, pred)
names(pred) <- c("account_id", "scored_label", "scored_prob")
pred
}
# Test function locally by printing results
pred <- creditRiskPrediction(account_id="a_1055521029582310",
amount_6=173.22,
pur_6=1,
avg_pur_amt_6=173.22,
avg_interval_pur_6=0,
credit_limit=5.26,
marital_status="married",
sex="male",
education="undergraduate",
income=12.36,
age=38)
print(pred)
```
### 2.2 Publish model as a web service
The second procedure is to publish the model as a web service by following the below steps.
Step 1: From your local R IDE, log into Microsoft R Server with your credentials using the appropriate authentication function from the `mrsdeploy` package (remoteLogin or remoteLoginAAD).
For simplicity, the code below uses the basic local admin account for authentication with the remoteLogin function and `session = false` so that no remote R session is started.
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Use `remoteLogin` to authenticate with R Server using
# the local admin account. Use session = false so no
# remote R session started
remoteLogin("http://localhost:12800",
username="admin",
password="P@ssw0rd",
session=FALSE)
```
Now, you are successfully connected to the remote R Server.
Step 2: Publish the model as a web service to R Server using the `publishService()` function from the `mrsdeploy` package.
In this example, you publish a web service called "crpService" using the model `model_rxtrees` and the function `creditRiskPrediction()`. As an input, the service takes a list of transaction statistics and demographic & bank account information represented as numerical or categorical. As an output, an R data frame including the account id, the predicted label of default, and the probability of default for the given individual account, has of being achieved with the pre-defined credit risk prediction function.
When publishing, you must specify, among other parameters, a service name and version, the R code, the inputs, as well as the outputs that application developers will need to integrate in their applications.
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Publish a web service
api <- publishService(
"crpService",
code=creditRiskPrediction,
model=model_rxtrees,
inputs=list(account_id="character",
amount_6="numeric",
pur_6="numeric",
avg_pur_amt_6="numeric",
avg_interval_pur_6="numeric",
credit_limit="numeric",
marital_status="character",
sex="character",
education="character",
income="numeric",
age="numeric"),
outputs=list(pred="data.frame"),
v="v1.0.0")
```
### 2.3 Test the service by consuming it in R
After publishing it , we can consume the service in R directly to verify that the results are as expected.
```{r, message=FALSE, warning=FALSE, error=TRUE}
# Get service and assign service to the variable `api`.
api <- getService("crpService", "v1.0.0")
# Consume service by calling function, `creditRiskPrediction` contained in this service
result <- api$creditRiskPrediction(account_id="a_1055521029582310",
amount_6=173.22,
pur_6=1,
avg_pur_amt_6=173.22,
avg_interval_pur_6=0,
credit_limit=5.26,
marital_status="married",
sex="male",
education="undergraduate",
income=12.36,
age=38)
# Print response output named `answer`
print(result$output("pred"))
```
### 2.4 Update the web service
In the process of production, we could manage and update the web service timely.
```{r, message=FALSE, warning=FALSE, error=TRUE}
# Load the pre-trained optimal model obtained from the template of CreditRiskScale.
load(file="model_rxtrees.RData")
model_rxtrees
api <- updateService(name="crpService",
v="v1.0.0",
model=model_rxtrees,
descr="Update the model hyper-parameters")
# Re-test the updated service by consuming it
result <- api$creditRiskPrediction(account_id="a_1055521029582310",
amount_6=173.22,
pur_6=1,
avg_pur_amt_6=173.22,
avg_interval_pur_6=0,
credit_limit=5.26,
marital_status="married",
sex="male",
education="undergraduate",
income=12.36,
age=38)
# Print response output named `answer`
print(result$output("pred"))
```
### 2.5 Application Integration
Last but not least, we can get the json file that is needed for application integration.
```{r, message=FALSE, warning=FALSE, error=TRUE}
# Get this service's `swagger.json` file that is needed for web application integration
swagger <- api$swagger(json = FALSE)
# Delete the service to make the script re-runable
deleteService(name="crpService", v="v1.0.0")
```

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -0,0 +1,518 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"title: \"Deploy a Credit Risk Model as a Web Service\"\n",
"author: \"Fang Zhou, Data Scientist, Microsoft\"\n",
"date: \"`r Sys.Date()`\"\n",
"output: html_document\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"include": "FALSE,",
"purl": "FALSE"
}
},
"outputs": [],
"source": [
"knitr::opts_chunk$set(echo = TRUE,\n",
" fig.width = 8,\n",
" fig.height = 5,\n",
" fig.align='center',\n",
" dev = \"png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1 Introduction\n",
"\n",
"The `mrsdeploy` package, delivered with Microsoft R Client and R Server, provides functions for:\n",
"\n",
"**1** Establishing a remote session in a R console application for the purposes of executing code on that server\n",
"\n",
"**2** Publishing and managing an R web service that is backed by the R code block or script you provided. \n",
"\n",
"Each feature can be used independently, but the greatest value is achieved when you can leverage both.\n",
"\n",
"This document will walk through you how to deploy a credit risk model as a web service, using the `mrsdeploy` package.\n",
"\n",
"It will start by modelling locally, then publish it as a web service, and then share it with other authenticated users for consumption, and finally manage and update the web service. \n",
"\n",
"## 2 Automated Credit Risk Model Deployment\n",
"\n",
"### 2.1 Setup\n",
"\n",
"We load the required R packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"## Setup\n",
"\n",
"# Load the required packages into the R session.\n",
"\n",
"library(rattle) # Use normVarNames().\n",
"library(dplyr) # Wrangling: tbl_df(), group_by(), print(), glimpse().\n",
"library(magrittr) # Pipe operator %>% %<>% %T>% equals().\n",
"library(scales) # Include commas in numbers.\n",
"library(MicrosoftML) # Build models using Microsoft ML algortihms.\n",
"library(mrsdeploy) # Publish an R model as a web service."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, the dataset processedSimu is ingested for demonstration. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"## Data Ingestion\n",
"\n",
"# Identify the source location of the dataset.\n",
"\n",
"#DATA <- \"../../Data/\"\n",
"#txn_fname <- file.path(DATA, \"Raw/processedSimu.csv\")\n",
"\n",
"wd <- getwd()\n",
"\n",
"dpath <- \"../Data\"\n",
"data_fname <- file.path(wd, dpath, \"processedSimu.csv\")\n",
"\n",
"# Ingest the dataset.\n",
"\n",
"data <- read.csv(file=data_fname) %T>% \n",
" {dim(.) %>% comma() %>% cat(\"\\n\")}\n",
"\n",
"# A glimpse into the data.\n",
"\n",
"glimpse(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Model Locally\n",
"\n",
"Now, let's get started to build an R model based web service. \n",
"\n",
"First of all, we create a machine learning fast tree model on the dataset processedSimu by using the function `rxFastTrees()` from the `MicrosoftML` package. This model could be used to predict whether an account will default or to predict its probability of default, given some transaction statistics and demographic & bank account information as inputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"## Variable roles.\n",
"\n",
"# Target variable\n",
"\n",
"target <- \"bad_flag\"\n",
"\n",
"# Note any identifier.\n",
"\n",
"id <- c(\"account_id\") %T>% print() \n",
"\n",
"# Note the available variables as model inputs.\n",
"\n",
"vars <- setdiff(names(data), c(target, id))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Split Data\n",
"\n",
"set.seed(42)\n",
"\n",
"data <- data[order(runif(nrow(data))), ]\n",
"\n",
"train <- sample(nrow(data), 0.70 * nrow(data))\n",
"test <- setdiff(seq_len(nrow(data)), train)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Prepare the formula\n",
"\n",
"top_vars <- c(\"amount_6\", \"pur_6\", \"avg_pur_amt_6\", \"avg_interval_pur_6\", \"credit_limit\", \"age\", \"income\", \"sex\", \"education\", \"marital_status\")\n",
"\n",
"form <- as.formula(paste(target, paste(top_vars, collapse=\"+\"), sep=\"~\"))\n",
"form"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Train model: rxFastTrees\n",
"\n",
"model_rxtrees <- rxFastTrees(formula=form,\n",
" data=data[train, c(target, vars)],\n",
" type=\"binary\",\n",
" numTrees=100,\n",
" numLeaves=20,\n",
" learningRate=0.2,\n",
" minSplit=10,\n",
" unbalancedSets=FALSE,\n",
" verbose=0)\n",
"\n",
"model_rxtrees"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Produce a prediction function that can use the model\n",
"\n",
"creditRiskPrediction <- function(account_id, amount_6, pur_6, avg_pur_amt_6, avg_interval_pur_6, \n",
" credit_limit, marital_status, sex, education, income, age)\n",
"{ \n",
" newdata <- data.frame(account_id=account_id,\n",
" amount_6=amount_6, \n",
" pur_6=pur_6, \n",
" avg_pur_amt_6=avg_pur_amt_6, \n",
" avg_interval_pur_6=avg_interval_pur_6, \n",
" credit_limit=credit_limit, \n",
" marital_status=marital_status, \n",
" sex=sex, \n",
" education=education, \n",
" income=income, \n",
" age=age)\n",
" \n",
" pred <- rxPredict(modelObject=model_rxtrees, data=newdata)[, c(1, 3)]\n",
" pred <- cbind(newdata$account_id, pred)\n",
" names(pred) <- c(\"account_id\", \"scored_label\", \"scored_prob\")\n",
" pred \n",
"}\n",
"\n",
"# Test function locally by printing results\n",
"\n",
"pred <- creditRiskPrediction(account_id=\"a_1055521029582310\",\n",
" amount_6=173.22, \n",
" pur_6=1, \n",
" avg_pur_amt_6=173.22, \n",
" avg_interval_pur_6=0, \n",
" credit_limit=5.26, \n",
" marital_status=\"married\", \n",
" sex=\"male\", \n",
" education=\"undergraduate\", \n",
" income=12.36, \n",
" age=38)\n",
"\n",
"print(pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Publish model as a web service\n",
"\n",
"The second procedure is to publish the model as a web service by following the below steps.\n",
"\n",
"Step 1: From your local R IDE, log into Microsoft R Server with your credentials using the appropriate authentication function from the `mrsdeploy` package (remoteLogin or remoteLoginAAD). \n",
"\n",
"For simplicity, the code below uses the basic local admin account for authentication with the remoteLogin function and `session = false` so that no remote R session is started."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Use `remoteLogin` to authenticate with R Server using \n",
"# the local admin account. Use session = false so no \n",
"# remote R session started\n",
"\n",
"remoteLogin(\"http://localhost:12800\", \n",
" username=\"admin\", \n",
" password=\"P@ssw0rd\",\n",
" session=FALSE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, you are successfully connected to the remote R Server.\n",
"\n",
"Step 2: Publish the model as a web service to R Server using the `publishService()` function from the `mrsdeploy` package. \n",
"\n",
"In this example, you publish a web service called \"crpService\" using the model `model_rxtrees` and the function `creditRiskPrediction()`. As an input, the service takes a list of transaction statistics and demographic & bank account information represented as numerical or categorical. As an output, an R data frame including the account id, the predicted label of default, and the probability of default for the given individual account, has of being achieved with the pre-defined credit risk prediction function. \n",
"\n",
"When publishing, you must specify, among other parameters, a service name and version, the R code, the inputs, as well as the outputs that application developers will need to integrate in their applications."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Publish a web service\n",
"\n",
"api <- publishService(\n",
" \"crpService\",\n",
" code=creditRiskPrediction,\n",
" model=model_rxtrees,\n",
" inputs=list(account_id=\"character\",\n",
" amount_6=\"numeric\", \n",
" pur_6=\"numeric\", \n",
" avg_pur_amt_6=\"numeric\", \n",
" avg_interval_pur_6=\"numeric\", \n",
" credit_limit=\"numeric\", \n",
" marital_status=\"character\", \n",
" sex=\"character\", \n",
" education=\"character\", \n",
" income=\"numeric\", \n",
" age=\"numeric\"),\n",
" outputs=list(pred=\"data.frame\"),\n",
" v=\"v1.0.0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Test the service by consuming it in R\n",
"\n",
"After publishing it , we can consume the service in R directly to verify that the results are as expected."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "TRUE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Get service and assign service to the variable `api`.\n",
"\n",
"api <- getService(\"crpService\", \"v1.0.0\")\n",
"\n",
"# Consume service by calling function, `creditRiskPrediction` contained in this service\n",
"\n",
"result <- api$creditRiskPrediction(account_id=\"a_1055521029582310\",\n",
" amount_6=173.22, \n",
" pur_6=1, \n",
" avg_pur_amt_6=173.22, \n",
" avg_interval_pur_6=0, \n",
" credit_limit=5.26, \n",
" marital_status=\"married\", \n",
" sex=\"male\", \n",
" education=\"undergraduate\", \n",
" income=12.36, \n",
" age=38)\n",
"\n",
"# Print response output named `answer`\n",
"\n",
"print(result$output(\"pred\")) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 Update the web service\n",
"\n",
"In the process of production, we could manage and update the web service timely."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "TRUE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Load the pre-trained optimal model obtained from the template of CreditRiskScale.\n",
"\n",
"load(file=\"model_rxtrees.RData\")\n",
"\n",
"model_rxtrees\n",
"\n",
"api <- updateService(name=\"crpService\", \n",
" v=\"v1.0.0\",\n",
" model=model_rxtrees,\n",
" descr=\"Update the model hyper-parameters\")\n",
"\n",
"# Re-test the updated service by consuming it\n",
"\n",
"result <- api$creditRiskPrediction(account_id=\"a_1055521029582310\",\n",
" amount_6=173.22, \n",
" pur_6=1, \n",
" avg_pur_amt_6=173.22, \n",
" avg_interval_pur_6=0, \n",
" credit_limit=5.26, \n",
" marital_status=\"married\", \n",
" sex=\"male\", \n",
" education=\"undergraduate\", \n",
" income=12.36, \n",
" age=38)\n",
"\n",
"# Print response output named `answer`\n",
"\n",
"print(result$output(\"pred\")) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.5 Application Integration\n",
"\n",
"Last but not least, we can get the json file that is needed for application integration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "TRUE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Get this service's `swagger.json` file that is needed for web application integration\n",
"\n",
"swagger <- api$swagger(json = FALSE)\n",
"\n",
"# Delete the service to make the script re-runable\n",
"\n",
"deleteService(name=\"crpService\", v=\"v1.0.0\")"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 1
}

Просмотреть файл

@ -0,0 +1,354 @@
---
title: "Faster and Scalable Credit Risk Prediction"
author: "Fang Zhou, Data Scientist, Microsoft"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE, purl=FALSE}
knitr::opts_chunk$set(echo = TRUE,
fig.width = 8,
fig.height = 5,
fig.align='center',
dev = "png")
```
## 1 Introduction
Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R.
The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.
This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.
## 2 Faster and Scalable Credit Risk Models
### 2.1 Setup
We load the required R packages.
```{r, message=FALSE, warning=FALSE, error=FALSE}
## Setup
# Load the required packages into the R session.
library(rattle) # Use normVarNames().
library(dplyr) # Wrangling: tbl_df(), group_by(), print(), glimpse().
library(magrittr) # Pipe operator %>% %<>% %T>% equals().
library(scales) # Include commas in numbers.
library(RevoScaleR) # Enable out-of-memory computation in R.
library(dplyrXdf) # Wrangling on xdf data format.
library(MicrosoftML) # Build models using Microsoft ML algortihms.
library(caret) # Calculate confusion matrix by using confusionMatrix().
library(ROCR) # Provide functions for model performance evaluation.
```
Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
```{r, message=FALSE, warning=FALSE, error=FALSE}
## Data Ingestion
# Identify the source location of the dataset.
#DATA <- "../../Data/"
#data_fname <- file.path(DATA, "Raw/processedSimu.csv")
wd <- getwd()
dpath <- "../Data"
data_fname <- file.path(wd, dpath, "processedSimu.csv")
output_fname <- file.path(wd, dpath, "processedSimu.xdf")
output <- RxXdfData(file=output_fname)
# Ingest the dataset.
data <- rxImport(inData=data_fname,
outFile=output,
stringsAsFactors=TRUE,
overwrite=TRUE)
# View data information.
rxGetVarInfo(data)
```
### 2.2 Model Building
Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package.
First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively.
```{r, message=FALSE, warning=FALSE, error=FALSE}
## Variable roles.
# Target variable
target <- "bad_flag"
# Note any identifier.
id <- c("account_id") %T>% print()
# Note the available variables as model inputs.
vars <- setdiff(names(data), c(target, id))
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Split Data
set.seed(42)
# Add training/testing flag to each observation.
data %<>%
mutate(.train=factor(sample(1:2, .rxNumRows,
replace=TRUE,
prob=c(0.70, 0.30)),
levels=1:2))
# Split dataset into training/test.
data_split <- rxSplit(data, splitByFactor=".train")
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Prepare the formula
top_vars <- c("amount_6", "pur_6", "avg_pur_amt_6", "avg_interval_pur_6", "credit_limit", "age", "income", "sex", "education", "marital_status")
form <- as.formula(paste(target, paste(top_vars, collapse="+"), sep="~"))
form
```
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Specify the local parallel compute context.
rxSetComputeContext("localpar")
# Train model: rxLogisticRegression
time_rxlogit <- system.time(
model_rxlogit <- rxLogisticRegression(
formula=form,
data=data_split[[1]],
type="binary",
l1Weight=1,
verbose=0
)
)
# Train model: rxFastForest
time_rxforest <- system.time(
model_rxforest <- rxFastForest(
formula=form,
data=data_split[[1]],
type="binary",
numTrees=100,
numLeaves=20,
minSplit=10,
verbose=0
)
)
# Train model: rxFastTrees
time_rxtrees1 <- system.time(
model_rxtrees1 <- rxFastTrees(
formula=form,
data=data_split[[1]],
type="binary",
numTrees=100,
numLeaves=20,
learningRate=0.2,
minSplit=10,
unbalancedSets=FALSE,
verbose=0
)
)
time_rxtrees2 <- system.time(
model_rxtrees2 <- rxFastTrees(
formula=form,
data=data_split[[1]],
type="binary",
numTrees=500,
numLeaves=20,
learningRate=0.2,
minSplit=10,
unbalancedSets=FALSE,
verbose=0
)
)
time_rxtrees3 <- system.time(
model_rxtrees3 <- rxFastTrees(
formula=form,
data=data_split[[1]],
type="binary",
numTrees=500,
numLeaves=20,
learningRate=0.3,
minSplit=10,
unbalancedSets=FALSE,
verbose=0
)
)
time_rxtrees4 <- system.time(
model_rxtrees4 <- rxFastTrees(
formula=form,
data=data_split[[1]],
type="binary",
numTrees=500,
numLeaves=20,
learningRate=0.3,
minSplit=10,
unbalancedSets=TRUE,
verbose=0
)
)
```
Next, we build an ensemble of fast tree models by using the function `rxEnsemble()`.
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Train an ensemble model.
time_ensemble <- system.time(
model_ensemble <- rxEnsemble(
formula=form,
data=data_split[[1]],
type="binary",
trainers=list(fastTrees(),
fastTrees(numTrees=500),
fastTrees(numTrees=500, learningRate=0.3),
fastTrees(numTrees=500, learningRate=0.3, unbalancedSets=TRUE)),
combineMethod="vote",
replace=TRUE,
verbose=0
)
)
```
### 2.3 Model Evaluation
Finally, we evaluate and compare the above built models at various aspects.
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Predict
models <- list(model_rxlogit, model_rxforest,
model_rxtrees1, model_rxtrees2, model_rxtrees3, model_rxtrees4,
model_ensemble)
# Predict class
predictions <- lapply(models,
rxPredict,
data=data_split[[2]]) %>%
lapply('[[', 1)
levels(predictions[[7]]) <- c("no", "yes")
# Confusion matrix evaluation results.
cm_metrics <-lapply(predictions,
confusionMatrix,
reference=data_split[[2]][[target]],
positive="yes")
# Accuracy
acc_metrics <-
lapply(cm_metrics, `[[`, "overall") %>%
lapply(`[`, 1) %>%
unlist() %>%
as.vector()
# Recall
rec_metrics <-
lapply(cm_metrics, `[[`, "byClass") %>%
lapply(`[`, 1) %>%
unlist() %>%
as.vector()
# Precision
pre_metrics <-
lapply(cm_metrics, `[[`, "byClass") %>%
lapply(`[`, 3) %>%
unlist() %>%
as.vector()
# Predict class probability
probs <- lapply(models[c(1, 2, 3, 4, 5, 6)],
rxPredict,
data=data_split[[2]]) %>%
lapply('[[', 3)
# Create prediction object
preds <- lapply(probs,
ROCR::prediction,
labels=data_split[[2]][[target]])
# Auc
auc_metrics <- lapply(preds,
ROCR::performance,
"auc") %>%
lapply(slot, "y.values") %>%
lapply('[[', 1) %>%
unlist()
auc_metrics <- c(auc_metrics, NaN)
algo_list <- c("rxLogisticRegression",
"rxFastForest",
"rxFastTrees",
"rxFastTrees(500)",
"rxFastTrees(500, 0.3)",
"rxFastTrees(500, 0.3, ub)",
"rxEnsemble")
time_consumption <- c(time_rxlogit[3], time_rxforest[[3]],
time_rxtrees1[3], time_rxtrees2[[3]],
time_rxtrees3[[3]], time_rxtrees4[[3]],
time_ensemble[3])
df_comp <-
data.frame(Models=algo_list,
Accuracy=acc_metrics,
Recall=rec_metrics,
Precision=pre_metrics,
AUC=auc_metrics,
Time=time_consumption) %T>%
print()
```
### 2.4 Save Models for Deployment
Last but not least, we need to save the model objects in various formats, (e.g., `.RData`, `SQLServerData`, ect) for the later usage of deployment.
```{r, message=FALSE, warning=FALSE, error=FALSE}
# Save model for deployment usage.
model_rxtrees <- model_rxtrees3
save(model_rxtrees, file="model_rxtrees.RData")
```

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -0,0 +1,525 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"title: \"Faster and Scalable Credit Risk Prediction\"\n",
"author: \"Fang Zhou, Data Scientist, Microsoft\"\n",
"date: \"`r Sys.Date()`\"\n",
"output: html_document\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"id": "",
"include": "FALSE,",
"purl": "FALSE"
}
},
"outputs": [],
"source": [
"knitr::opts_chunk$set(echo = TRUE,\n",
" fig.width = 8,\n",
" fig.height = 5,\n",
" fig.align='center',\n",
" dev = \"png\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1 Introduction\n",
"\n",
"Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R. \n",
"\n",
"The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.\n",
"\n",
"This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.\n",
"\n",
"## 2 Faster and Scalable Credit Risk Models\n",
"\n",
"### 2.1 Setup\n",
"\n",
"We load the required R packages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"## Setup\n",
"\n",
"# Load the required packages into the R session.\n",
"\n",
"library(rattle) # Use normVarNames().\n",
"library(dplyr) # Wrangling: tbl_df(), group_by(), print(), glimpse().\n",
"library(magrittr) # Pipe operator %>% %<>% %T>% equals().\n",
"library(scales) # Include commas in numbers.\n",
"library(RevoScaleR) # Enable out-of-memory computation in R.\n",
"library(dplyrXdf) # Wrangling on xdf data format.\n",
"library(MicrosoftML) # Build models using Microsoft ML algortihms.\n",
"library(caret) # Calculate confusion matrix by using confusionMatrix().\n",
"library(ROCR) # Provide functions for model performance evaluation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"## Data Ingestion\n",
"\n",
"# Identify the source location of the dataset.\n",
"\n",
"#DATA <- \"../../Data/\"\n",
"#data_fname <- file.path(DATA, \"Raw/processedSimu.csv\")\n",
"\n",
"wd <- getwd()\n",
"\n",
"dpath <- \"../Data\"\n",
"data_fname <- file.path(wd, dpath, \"processedSimu.csv\")\n",
"output_fname <- file.path(wd, dpath, \"processedSimu.xdf\")\n",
"output <- RxXdfData(file=output_fname)\n",
"\n",
"# Ingest the dataset.\n",
"\n",
"data <- rxImport(inData=data_fname, \n",
" outFile=output,\n",
" stringsAsFactors=TRUE,\n",
" overwrite=TRUE)\n",
"\n",
"\n",
"# View data information.\n",
"\n",
"rxGetVarInfo(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Model Building\n",
"\n",
"Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package. \n",
"\n",
"First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`. \n",
"\n",
"From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"## Variable roles.\n",
"\n",
"# Target variable\n",
"\n",
"target <- \"bad_flag\"\n",
"\n",
"# Note any identifier.\n",
"\n",
"id <- c(\"account_id\") %T>% print() \n",
"\n",
"# Note the available variables as model inputs.\n",
"\n",
"vars <- setdiff(names(data), c(target, id))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Split Data\n",
"\n",
"set.seed(42)\n",
"\n",
"# Add training/testing flag to each observation.\n",
"\n",
"data %<>%\n",
" mutate(.train=factor(sample(1:2, .rxNumRows,\n",
" replace=TRUE,\n",
" prob=c(0.70, 0.30)),\n",
" levels=1:2))\n",
"\n",
"# Split dataset into training/test.\n",
"\n",
"data_split <- rxSplit(data, splitByFactor=\".train\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Prepare the formula\n",
"\n",
"top_vars <- c(\"amount_6\", \"pur_6\", \"avg_pur_amt_6\", \"avg_interval_pur_6\", \"credit_limit\", \"age\", \"income\", \"sex\", \"education\", \"marital_status\")\n",
"\n",
"form <- as.formula(paste(target, paste(top_vars, collapse=\"+\"), sep=\"~\"))\n",
"form"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Specify the local parallel compute context.\n",
"\n",
"rxSetComputeContext(\"localpar\")\n",
"\n",
"# Train model: rxLogisticRegression\n",
"\n",
"time_rxlogit <- system.time(\n",
" \n",
" model_rxlogit <- rxLogisticRegression(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" l1Weight=1,\n",
" verbose=0\n",
" )\n",
")\n",
"\n",
"# Train model: rxFastForest\n",
"\n",
"time_rxforest <- system.time(\n",
" \n",
" model_rxforest <- rxFastForest(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" numTrees=100,\n",
" numLeaves=20,\n",
" minSplit=10,\n",
" verbose=0\n",
" )\n",
")\n",
"\n",
"# Train model: rxFastTrees\n",
"\n",
"time_rxtrees1 <- system.time(\n",
" \n",
" model_rxtrees1 <- rxFastTrees(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" numTrees=100,\n",
" numLeaves=20,\n",
" learningRate=0.2,\n",
" minSplit=10,\n",
" unbalancedSets=FALSE,\n",
" verbose=0\n",
" )\n",
")\n",
"\n",
"time_rxtrees2 <- system.time(\n",
" \n",
" model_rxtrees2 <- rxFastTrees(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" numTrees=500,\n",
" numLeaves=20,\n",
" learningRate=0.2,\n",
" minSplit=10,\n",
" unbalancedSets=FALSE,\n",
" verbose=0\n",
" )\n",
")\n",
"\n",
"time_rxtrees3 <- system.time(\n",
" \n",
" model_rxtrees3 <- rxFastTrees(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" numTrees=500,\n",
" numLeaves=20,\n",
" learningRate=0.3,\n",
" minSplit=10,\n",
" unbalancedSets=FALSE,\n",
" verbose=0\n",
" )\n",
")\n",
"\n",
"time_rxtrees4 <- system.time(\n",
" \n",
" model_rxtrees4 <- rxFastTrees(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" numTrees=500,\n",
" numLeaves=20,\n",
" learningRate=0.3,\n",
" minSplit=10,\n",
" unbalancedSets=TRUE,\n",
" verbose=0\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we build an ensemble of fast tree models by using the function `rxEnsemble()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Train an ensemble model.\n",
"\n",
"time_ensemble <- system.time(\n",
" \n",
" model_ensemble <- rxEnsemble(\n",
" formula=form,\n",
" data=data_split[[1]],\n",
" type=\"binary\",\n",
" trainers=list(fastTrees(), \n",
" fastTrees(numTrees=500), \n",
" fastTrees(numTrees=500, learningRate=0.3),\n",
" fastTrees(numTrees=500, learningRate=0.3, unbalancedSets=TRUE)),\n",
" combineMethod=\"vote\",\n",
" replace=TRUE,\n",
" verbose=0\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Model Evaluation \n",
"\n",
"Finally, we evaluate and compare the above built models at various aspects."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Predict\n",
"\n",
"models <- list(model_rxlogit, model_rxforest, \n",
" model_rxtrees1, model_rxtrees2, model_rxtrees3, model_rxtrees4, \n",
" model_ensemble)\n",
"\n",
"# Predict class\n",
"\n",
"predictions <- lapply(models, \n",
" rxPredict, \n",
" data=data_split[[2]]) %>%\n",
" lapply('[[', 1)\n",
"\n",
"levels(predictions[[7]]) <- c(\"no\", \"yes\")\n",
"\n",
"# Confusion matrix evaluation results.\n",
"\n",
"cm_metrics <-lapply(predictions,\n",
" confusionMatrix, \n",
" reference=data_split[[2]][[target]],\n",
" positive=\"yes\")\n",
"\n",
"# Accuracy\n",
"\n",
"acc_metrics <- \n",
" lapply(cm_metrics, `[[`, \"overall\") %>%\n",
" lapply(`[`, 1) %>%\n",
" unlist() %>%\n",
" as.vector()\n",
"\n",
"# Recall\n",
"\n",
"rec_metrics <- \n",
" lapply(cm_metrics, `[[`, \"byClass\") %>%\n",
" lapply(`[`, 1) %>%\n",
" unlist() %>%\n",
" as.vector()\n",
" \n",
"# Precision\n",
"\n",
"pre_metrics <- \n",
" lapply(cm_metrics, `[[`, \"byClass\") %>%\n",
" lapply(`[`, 3) %>%\n",
" unlist() %>%\n",
" as.vector()\n",
"\n",
"# Predict class probability\n",
"\n",
"probs <- lapply(models[c(1, 2, 3, 4, 5, 6)],\n",
" rxPredict,\n",
" data=data_split[[2]]) %>%\n",
" lapply('[[', 3)\n",
"\n",
"# Create prediction object\n",
"\n",
"preds <- lapply(probs, \n",
" ROCR::prediction,\n",
" labels=data_split[[2]][[target]])\n",
"\n",
"# Auc\n",
"\n",
"auc_metrics <- lapply(preds, \n",
" ROCR::performance,\n",
" \"auc\") %>%\n",
" lapply(slot, \"y.values\") %>%\n",
" lapply('[[', 1) %>%\n",
" unlist()\n",
"\n",
"auc_metrics <- c(auc_metrics, NaN)\n",
"\n",
"algo_list <- c(\"rxLogisticRegression\", \n",
" \"rxFastForest\", \n",
" \"rxFastTrees\", \n",
" \"rxFastTrees(500)\", \n",
" \"rxFastTrees(500, 0.3)\", \n",
" \"rxFastTrees(500, 0.3, ub)\",\n",
" \"rxEnsemble\")\n",
"\n",
"time_consumption <- c(time_rxlogit[3], time_rxforest[[3]], \n",
" time_rxtrees1[3], time_rxtrees2[[3]], \n",
" time_rxtrees3[[3]], time_rxtrees4[[3]],\n",
" time_ensemble[3])\n",
"\n",
"df_comp <- \n",
" data.frame(Models=algo_list, \n",
" Accuracy=acc_metrics, \n",
" Recall=rec_metrics, \n",
" Precision=pre_metrics,\n",
" AUC=auc_metrics,\n",
" Time=time_consumption) %T>%\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 Save Models for Deployment\n",
"\n",
"Last but not least, we need to save the model objects in various formats, (e.g., `.RData`, `SQLServerData`, ect) for the later usage of deployment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"attributes": {
"classes": [],
"error": "FALSE",
"id": "",
"message": "FALSE,",
"warning": "FALSE,"
}
},
"outputs": [],
"source": [
"# Save model for deployment usage.\n",
"\n",
"model_rxtrees <- model_rxtrees3\n",
"\n",
"save(model_rxtrees, file=\"model_rxtrees.RData\")"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 1
}

Просмотреть файл

@ -9,8 +9,15 @@ Some other critical R packages for the analysis:
* glmnet >= 2.0-5 Logistic regression model with L1 and L2 regularization.
* xgboost >= 0.6-4 Extreme gradiant boost model.
* randomForest >= 4.6-12 Random Forest model.
* caret >= 6.0-73 Classification and regression training.
* caretEnsemble >= 2.0.0 Ensemble of caret based models.
* RevoScaleR >= 9.1 Parallel and chunked data processing and modeling.
* dplyrXdf >= 0.9.2 Out-of-Memory Data wrangling.
* MicrosoftML >= 9.1 Microsoft machine learning models.
* mrsdeploy >= 9.1 R Server Operationalization.
# Use of template
The codes for analytics, embedded with step-by-step instructions, are written in R markdown, and can be run interactively within the code chunks of the markdown file.

Двоичные данные
CreditRiskPrediction/Code/model_rxtrees.RData Normal file

Двоичный файл не отображается.

Просмотреть файл

@ -9,7 +9,7 @@ Many banks nowadays are driving innovation to enhance risk management. For examp
The repository contains three parts
- **Data** This contains the provided sample data.
- **Code** This contains the R development code. They are displayed in R markdown files which can yield files of various formats.
- **Code** This contains the R development code. They are displayed in R markdown files which can yield files of various formats, like html, ipynb, ect.
- **Docs** This contains the documents, like blog, installation instructions, etc.
## Business domain
@ -36,4 +36,13 @@ In the data-driven credit risk prediction model, normally two types of data are
1. Traditional logistic regression model with L1 regularization are built as a baseline.
2. Machine learning models, such as gradiant boosting and random forest, or their ensembles, are fine tuned to compare the performance at various aspects.
3. Innovative convolutionary hotspot method will be pursued in the near future.
3. Innovative convolutionary hotspot method will be pursued in the near future.
## Scalability
**Faster and scalable credit risk models** are built using the state-of-the-art machine learning algorithms provided by the `MicrosoftML` package.
## Operationalization
An **R model based web service for credit risk prediction** is published and consumed by using the `mrsdeploy` package that ships with Microsoft R Client and R Server 9.1.