Review and minor changes
This commit is contained in:
Родитель
b9b209e3ae
Коммит
e02efe40b5
|
@ -15,11 +15,23 @@ knitr::opts_chunk$set(echo = TRUE,
|
||||||
|
|
||||||
## 1 Introduction
|
## 1 Introduction
|
||||||
|
|
||||||
Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R.
|
Microsoft R is a collection of servers and tools that extend the
|
||||||
|
capabilities of R, making it easier and faster to build and deploy
|
||||||
|
R-based solutions. Microsoft R brings you the ability to do parallel
|
||||||
|
and chunked data processing and modelling that relaxes the
|
||||||
|
restrictions on dataset size imposed by in-memory open source R.
|
||||||
|
|
||||||
The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.
|
The `MicrosoftML` package brings new machine learning functionality
|
||||||
|
with increased speed, performance and scalability, especially for
|
||||||
|
handling a large corpus of text data or high-dimensional categorical
|
||||||
|
data. The `MicrosoftML` package is installed with **Microsoft R
|
||||||
|
Client**, **Microsoft R Server** and with the **SQL Server Machine
|
||||||
|
Learning Services**.
|
||||||
|
|
||||||
This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.
|
This document will walk through you how to build faster and scalable
|
||||||
|
credit risk models, using the `MicrosoftML` package that adds
|
||||||
|
state-of-the-art machine learning algorithms and data transforms to
|
||||||
|
Microsoft R Server.
|
||||||
|
|
||||||
## 2 Faster and Scalable Credit Risk Models
|
## 2 Faster and Scalable Credit Risk Models
|
||||||
|
|
||||||
|
@ -43,16 +55,15 @@ library(caret) # Calculate confusion matrix by using confusionMatrix().
|
||||||
library(ROCR) # Provide functions for model performance evaluation.
|
library(ROCR) # Provide functions for model performance evaluation.
|
||||||
```
|
```
|
||||||
|
|
||||||
Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
|
Then, the dataset processedSimu is ingested and transformed into a
|
||||||
|
`.xdf` data format. This random dataset was created to simulate real
|
||||||
|
world banking transaction data.
|
||||||
|
|
||||||
```{r, message=FALSE, warning=FALSE, error=FALSE}
|
```{r, message=FALSE, warning=FALSE, error=FALSE}
|
||||||
## Data Ingestion
|
## Data Ingestion
|
||||||
|
|
||||||
# Identify the source location of the dataset.
|
# Identify the source location of the dataset.
|
||||||
|
|
||||||
#DATA <- "../../Data/"
|
|
||||||
#data_fname <- file.path(DATA, "Raw/processedSimu.csv")
|
|
||||||
|
|
||||||
wd <- getwd()
|
wd <- getwd()
|
||||||
|
|
||||||
dpath <- "../Data"
|
dpath <- "../Data"
|
||||||
|
@ -75,11 +86,18 @@ rxGetVarInfo(data)
|
||||||
|
|
||||||
### 2.2 Model Building
|
### 2.2 Model Building
|
||||||
|
|
||||||
Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package.
|
Now, let's get started to build credit risk models by leveraging
|
||||||
|
different machine learning algorithms from the `MicrosoftML` package.
|
||||||
|
|
||||||
First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
|
First of all, we create individual machine learning models on the
|
||||||
|
dataset processedSimu.xdf by using the functions
|
||||||
|
`rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
|
||||||
|
|
||||||
From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively.
|
From the credit risk prediction template, we know that gradient
|
||||||
|
boosting is the most suitable algorithm for this example, considering
|
||||||
|
the overall performance. Therefore, the models implemented by the
|
||||||
|
function `rxFastTrees()` with different sets of parameters are trained
|
||||||
|
respectively.
|
||||||
|
|
||||||
```{r, message=FALSE, warning=FALSE, error=FALSE}
|
```{r, message=FALSE, warning=FALSE, error=FALSE}
|
||||||
## Variable roles.
|
## Variable roles.
|
||||||
|
|
Загрузка…
Ссылка в новой задаче