This commit is contained in:
yueguoguo 2017-03-30 17:36:48 +08:00
Родитель e1e6abc6a1
Коммит 42723072b8
4 изменённых файлов: 15 добавлений и 15 удалений

Просмотреть файл

Просмотреть файл

@ -21,25 +21,25 @@ if (rxGetComputeContext()@description == "dopar") {
# make a function to do clustering of given data set.
clusterAnalysis <- function(data, numClusters) {
xdf <- RxXdfData(data)
# create formula.
names <- rxGetVarNames(data=xdf)
names <- names[!(names %in% c("Class", "Time"))] # the original data set is labelled so remove the label.
formula <- as.formula(paste0("~", paste(names, collapse="+")))
# to scale data.
df <- rxImport(xdf,
varsToDrop=c("Time", "Class"))
df <- as.data.frame(scale(df))
clusters <- rxKmeans(formula,
df,
numClusters=numClusters)
clusters$cluster
}
@ -49,4 +49,4 @@ results <- rxExec(FUN=clusterAnalysis,
data="data.xdf",
numClusters=rxElemArg(c(2:5)))
save(results, file="./results.RData")
save(results, file="./results.RData")

Просмотреть файл

@ -167,7 +167,7 @@ data are labelled and so in our clustering analysis the label is
removed.
The R code for clustering is available from github as
[worker_cluster.R]{...test/worker_cluster.R}. The analysis basically
[workerCluster.R]{...test/workerCluster.R}. The analysis basically
normalises the credit transaction data and then performs 10 repeated
clustering analyses (targeting 2 clusters) for each using the k-means
algorithm. The repetition is completed in parallel with the specified
@ -177,7 +177,7 @@ computing context information will be automatically added by the
The script can then be saved and later on path to the script is used
as reference. For example, in this demo, the script is saved with name
"worker_cluster.R" which is located in the "/test" directory.
"workerCluster.R" which is located in the "/test" directory.
The following code is to run the clustering analysis on a specified
computing environment. This is achieved by setting computing
@ -212,7 +212,7 @@ AzureDSVM::executeScript(context=context,
machines=machines,
remote=master,
user=USER,
script="./test/worker_cluster.R",
script="./test/workerCluster.R",
master=master,
slaves=slaves,
computeContext="localParallel")
@ -244,7 +244,7 @@ AzureDSVM::executeScript(context=context,
machines=machines,
remote=master,
user=USER,
script="./test/worker_cluster.R",
script="./test/workerCluster.R",
master=master,
slaves=slaves,
computeContext="clusterParallel")

Просмотреть файл

@ -179,7 +179,7 @@ directly from
[togaware]{https://access.togaware.com/creditcard.xdf} in XDF format. The data consists both normal and fraudulent transactions, which are indicated by the label "Class", and the problem is to detect a potential fraudulent transaction based on patterns "learnt" by the trained model.
Codes of solving such a machine learning problem can be obtained from
[worker_classification.R]{...test/worker_classification.R}. The function `mlProcess` takes data, formula, and model specs as inputs. Considering scalability and performance efficiency, data of xdf format is used, which allows parallel computation outside memory. Area-under-curve is used as performance metric to evaluate quality of model. The function returns a model object (based on the training results) and evaluation result of the model.
[workerClassification.R]{...test/workerClassification.R}. The function `mlProcess` takes data, formula, and model specs as inputs. Considering scalability and performance efficiency, data of xdf format is used, which allows parallel computation outside memory. Area-under-curve is used as performance metric to evaluate quality of model. The function returns a model object (based on the training results) and evaluation result of the model.
Following shows snippets of the machine learning process.
@ -263,7 +263,7 @@ executeScript(context,
machines=LDSVM,
remote=VM_URL,
user=USER,
script="./worker_classification.R",
script="./workerClassification.R",
master=VM_URL,
slaves=VM_URL,
computeContext="localParallel")
@ -277,7 +277,7 @@ executeScript(context,
machines=LDSVMS,
remote=VMS_URL[1],
user=USER,
script="./worker_classification.R",
script="./workerClassification.R",
master=VMS_URL[1],
slaves=VMS_URL[-1],
computeContext="clusterParallel")