From e02efe40b5a1e4135aa913e951fe35dc032442b6 Mon Sep 17 00:00:00 2001
From: Graham Williams <graham.williams@togaware.com>
Date: Fri, 23 Jun 2017 08:26:12 +0800
Subject: [PATCH] Review and minor changes

---
 CreditRiskPrediction/Code/CreditRiskScale.Rmd | 38 ++++++++++++++-----
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/CreditRiskPrediction/Code/CreditRiskScale.Rmd b/CreditRiskPrediction/Code/CreditRiskScale.Rmd
index 47fe16f..0b15974 100644
--- a/CreditRiskPrediction/Code/CreditRiskScale.Rmd
+++ b/CreditRiskPrediction/Code/CreditRiskScale.Rmd
@@ -15,11 +15,23 @@ knitr::opts_chunk$set(echo = TRUE,
 
 ## 1 Introduction
 
-Microsoft R is a collection of servers and tools that extend the capabilities of R, making it easier and faster to build and deploy R-based solutions. Microsoft R brings you the ability to do parallel and chunked data processing and modelling that relax the restrictions on dataset size imposed by in-memory open source R. 
+Microsoft R is a collection of servers and tools that extend the
+capabilities of R, making it easier and faster to build and deploy
+R-based solutions. Microsoft R brings you the ability to do parallel
+and chunked data processing and modelling that relaxes the
+restrictions on dataset size imposed by in-memory open source R.
 
-The `MicrosoftML` package brings new machine learning functionality with increased speed, performance and scalability, especially for handling a large corpus of text data or high-dimensional categorical data. The `MicrosoftML` package is installed with **Microsoft R Client**, **Microsoft R Server** and with the **SQL Server Machine Learning Services**.
+The `MicrosoftML` package brings new machine learning functionality
+with increased speed, performance and scalability, especially for
+handling a large corpus of text data or high-dimensional categorical
+data. The `MicrosoftML` package is installed with **Microsoft R
+Client**, **Microsoft R Server** and with the **SQL Server Machine
+Learning Services**.
 
-This document will walk through you how to build faster and scalable credit risk models, using the `MicrosoftML` package that adds state-of-the-art machine learning algorithms and data transforms to Microsoft R Server.
+This document will walk through you how to build faster and scalable
+credit risk models, using the `MicrosoftML` package that adds
+state-of-the-art machine learning algorithms and data transforms to
+Microsoft R Server.
 
 ## 2 Faster and Scalable Credit Risk Models
 
@@ -43,16 +55,15 @@ library(caret)        # Calculate confusion matrix by using confusionMatrix().
 library(ROCR)         # Provide functions for model performance evaluation.
 ```
 
-Then, the dataset processedSimu is ingested and transformed into a `.xdf` data format. This dataset was created by the data preprocessing steps in the data science accelerator for credit risk prediction.
+Then, the dataset processedSimu is ingested and transformed into a
+`.xdf` data format. This random dataset was created to simulate real
+world banking transaction data.
 
 ```{r, message=FALSE, warning=FALSE, error=FALSE}
 ## Data Ingestion
 
 # Identify the source location of the dataset.
 
-#DATA <- "../../Data/"
-#data_fname <- file.path(DATA, "Raw/processedSimu.csv")
-
 wd <- getwd()
 
 dpath <- "../Data"
@@ -75,11 +86,18 @@ rxGetVarInfo(data)
 
 ### 2.2 Model Building
 
-Now, let's get started to build credit risk models by leveraging different machine learning algorithms from the `MicrosoftML` package. 
+Now, let's get started to build credit risk models by leveraging
+different machine learning algorithms from the `MicrosoftML` package.
 
-First of all, we create individual machine learning models on the dataset processedSimu.xdf by using the functions `rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`. 
+First of all, we create individual machine learning models on the
+dataset processedSimu.xdf by using the functions
+`rxLogisticRegression()`, `rxFastForest()`, `rxFastTrees()`.
 
-From the credit risk prediction template, we know that gradient boosting is the most suitable algorithm for this example, considering the overall performance. Therefore, the models implemented by the function `rxFastTrees()` with different sets of parameters are trained respectively. 
+From the credit risk prediction template, we know that gradient
+boosting is the most suitable algorithm for this example, considering
+the overall performance. Therefore, the models implemented by the
+function `rxFastTrees()` with different sets of parameters are trained
+respectively.
 
 ```{r, message=FALSE, warning=FALSE, error=FALSE}
 ## Variable roles.