Review and minor changes

2017-06-23 08:26:12 +08:00 · 2017-06-23 08:26:12 +08:00 · e02efe40b5
--- a/CreditRiskPrediction/Code/CreditRiskScale.Rmd
+++ b/CreditRiskPrediction/Code/CreditRiskScale.Rmd
@ -15,11 +15,23 @@ knitr::opts_chunk$set(echo = TRUE,
 ## 1 Introduction
-Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R. 
+Microsoft R is a collection of servers and tools that extend the
 capabilities of R, making it easier and faster to build and deploy
 R-based solutions. Microsoft R brings you the ability to do parallel
 and chunked data processing and modelling that relaxes the
 restrictions on dataset size imposed by in-memory open source R.
-The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.
+The `MicrosoftML` package brings new machine learning functionality
 with increased speed, performance and scalability, especially for
 handling a large corpus of text data or high-dimensional categorical
 data. The `MicrosoftML` package is installed with **Microsoft R
 Client**, **Microsoft R Server** and with the **SQL Server Machine
 Learning Services**.
-This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.
+This document will walk through you how to build faster and scalable
 credit risk models, using the `MicrosoftML` package that adds
 state-of-the-art machine learning algorithms and data transforms to
 Microsoft R Server.
 ## 2 Faster and Scalable Credit Risk Models
@ -43,16 +55,15 @@ library(caret)        # Calculate confusion matrix by using confusionMatrix().
 library(ROCR)         # Provide functions for model performance evaluation.
 ```
-Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
+Then, the dataset processedSimu is ingested and transformed into a
 `.xdf` data format. This random dataset was created to simulate real
 world banking transaction data.
 ```{r, message=FALSE, warning=FALSE, error=FALSE}
 ## Data Ingestion
 # Identify the source location of the dataset.
 #DATA <- "../../Data/"
 #data_fname <- file.path(DATA, "Raw/processedSimu.csv")
 wd <- getwd()
 dpath <- "../Data"
@ -75,11 +86,18 @@ rxGetVarInfo(data)
 ### 2.2 Model Building
-Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package. 
+Now, let's get started to build credit risk models by leveraging
 different machine learning algorithms from the `MicrosoftML` package.
-First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`. 
+First of all, we create individual machine learning models on the
 dataset processedSimu.xdf by using the functions
 `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
-From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively. 
+From the credit risk prediction template, we know that gradient
 boosting is the most suitable algorithm for this example, considering
 the overall performance. Therefore, the models implemented by the
 function `rxFastTrees()` with different sets of parameters are trained
 respectively.
 ```{r, message=FALSE, warning=FALSE, error=FALSE}
 ## Variable roles.