Review and minor changes
This commit is contained in:
Родитель
b9b209e3ae
Коммит
e02efe40b5
|
@ -15,11 +15,23 @@ knitr::opts_chunk$set(echo = TRUE,
|
|||
|
||||
## 1 Introduction
|
||||
|
||||
Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R.
|
||||
Microsoft R is a collection of servers and tools that extend the
|
||||
capabilities of R, making it easier and faster to build and deploy
|
||||
R-based solutions. Microsoft R brings you the ability to do parallel
|
||||
and chunked data processing and modelling that relaxes the
|
||||
restrictions on dataset size imposed by in-memory open source R.
|
||||
|
||||
The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.
|
||||
The `MicrosoftML` package brings new machine learning functionality
|
||||
with increased speed, performance and scalability, especially for
|
||||
handling a large corpus of text data or high-dimensional categorical
|
||||
data. The `MicrosoftML` package is installed with **Microsoft R
|
||||
Client**, **Microsoft R Server** and with the **SQL Server Machine
|
||||
Learning Services**.
|
||||
|
||||
This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.
|
||||
This document will walk through you how to build faster and scalable
|
||||
credit risk models, using the `MicrosoftML` package that adds
|
||||
state-of-the-art machine learning algorithms and data transforms to
|
||||
Microsoft R Server.
|
||||
|
||||
## 2 Faster and Scalable Credit Risk Models
|
||||
|
||||
|
@ -43,16 +55,15 @@ library(caret) # Calculate confusion matrix by using confusionMatrix().
|
|||
library(ROCR) # Provide functions for model performance evaluation.
|
||||
```
|
||||
|
||||
Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
|
||||
Then, the dataset processedSimu is ingested and transformed into a
|
||||
`.xdf` data format. This random dataset was created to simulate real
|
||||
world banking transaction data.
|
||||
|
||||
```{r, message=FALSE, warning=FALSE, error=FALSE}
|
||||
## Data Ingestion
|
||||
|
||||
# Identify the source location of the dataset.
|
||||
|
||||
#DATA <- "../../Data/"
|
||||
#data_fname <- file.path(DATA, "Raw/processedSimu.csv")
|
||||
|
||||
wd <- getwd()
|
||||
|
||||
dpath <- "../Data"
|
||||
|
@ -75,11 +86,18 @@ rxGetVarInfo(data)
|
|||
|
||||
### 2.2 Model Building
|
||||
|
||||
Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package.
|
||||
Now, let's get started to build credit risk models by leveraging
|
||||
different machine learning algorithms from the `MicrosoftML` package.
|
||||
|
||||
First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
|
||||
First of all, we create individual machine learning models on the
|
||||
dataset processedSimu.xdf by using the functions
|
||||
`rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
|
||||
|
||||
From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively.
|
||||
From the credit risk prediction template, we know that gradient
|
||||
boosting is the most suitable algorithm for this example, considering
|
||||
the overall performance. Therefore, the models implemented by the
|
||||
function `rxFastTrees()` with different sets of parameters are trained
|
||||
respectively.
|
||||
|
||||
```{r, message=FALSE, warning=FALSE, error=FALSE}
|
||||
## Variable roles.
|
||||
|
|
Загрузка…
Ссылка в новой задаче