Merge branch 'master' of https://github.com/Microsoft/acceleratoRs

2017-08-02 15:55:57 +08:00 · 2017-08-02 15:55:57 +08:00 · e32719a791
--- a/README.md
+++ b/README.md
@ -19,7 +19,7 @@ Each of accelerators shared in this repo is structured following the project tem
    * `Docs` - Normally related documentations, references, and perhaps
        yielded reports will be put in this directory.

-* An accelerator should be able to run interactively as an R notebooks in RStudio. 
+* An accelerator should be able to run interactively in an IDE that supports R markdown such as [R Tools for Visual Studio (RTVS)](https://docs.microsoft.com/en-us/visualstudio/rtvs/rmarkdown) or RStudio. 
 * Makefile is by default provided to generate documents of other formats, or alternatively rmarkdown::render can be used for the same purpose. 

 # Contributing
--- a/SolarPanelForecasting/Code/Makefile
+++ b/SolarPanelForecasting/Code/Makefile
@ -0,0 +1,44 @@
+RMD=$(wildcard *_*.Rmd)
+
+RCD=$(RMD:.Rmd=.R)
+HTM=$(RMD:.Rmd=.html)
+PDF=$(RMD:.Rmd=.pdf)
+ODT=$(RMD:.Rmd=.odt)
+DOC=$(RMD:.Rmd=.docx)
+MDN=$(RMD:.Rmd=.md)
+IPY=$(RMD:.Rmd=.ipynb)
+
+%.R: %.Rmd
+	Rscript -e 'knitr::purl("$*.Rmd")'
+
+%.md: %.Rmd
+	Rscript -e 'rmarkdown::render("$*.Rmd", "rmarkdown::md_document")'
+
+%.html: %.Rmd
+	Rscript -e 'rmarkdown::render("$*.Rmd", "rmarkdown::html_document")'
+
+.PRECIOUS: %.pdf
+%.pdf: %.Rmd
+	Rscript -e 'rmarkdown::render("$*.Rmd", "rmarkdown::pdf_document")'
+
+%.view: %.pdf
+	evince $^ &
+
+%.ipynb: %.Rmd
+	notedown $^ --nomagic > $@
+	sh support/fix_ipynb.sh $@
+
+%.docx: %.Rmd
+	Rscript -e 'rmarkdown::render("$*.Rmd", "rmarkdown::word_document")'
+
+%.odt: %.Rmd
+	Rscript -e 'rmarkdown::render("$*.Rmd", "rmarkdown::odt_document")'
+
+clean:
+	rm -f *.docx *.R *.odt *.pdf *.html *.md *.ipynb
+
+realclean: clean
+	rm -f *~
+	rm -rf _book _site _html data models
+	rm -rf app_education_files
+
--- a/SolarPanelForecasting/Code/README.md
+++ b/SolarPanelForecasting/Code/README.md
@ -0,0 +1,35 @@
+# Prerequisites
+
+*Place the prerequisites for running the codes*
+
+* R >= 3.3.1
+* rmarkdown >= 1.3
+* AzureSMR >= 0.2.6
+* AzureDSVM >= 0.2.0
+* keras >= 2.0.6
+* ggplot2 >= 2.2.1
+* magrittr >= 1.5
+* dplyr >= 0.7.1.9000
+* readr >= 0.2.2
+
+# Use of template
+
+The codes for analytics, embedded with step-by-step instructions, are written in R markdown, and can be run interactively within the code chunks of the markdown file.
+
+Makefile in the folder can be used to produce report in various formats based upon the R markdown script. Suported output formats include
+
+* R - pure R codes,
+* md - markdown, 
+* html - html,
+* pdf - pdf,
+* ipynb - Jupyter notebook,
+* docx - Microsoft Word document, and 
+* odt - OpenDocument document.
+
+To generate an output of the above format, simply run
+
+```
+make <filename>.<supported format>
+```
+
+The geneated files can be removed by `make clean` or `make realclean`
--- a/SolarPanelForecasting/Code/SolarPanelForecastingCode.Rmd
+++ b/SolarPanelForecasting/Code/SolarPanelForecastingCode.Rmd
@ -0,0 +1,419 @@
+---
+title: "Solar power forecasting with Long Short-Term Memory (LSTM)"
+author: "Le Zhang, Data Scientist, Cloud and AI, Microsoft"
+date: '`r Sys.Date()`'
+output:
+  html_notebook: default
+---
+
+This accelerator is a reproduction of CNTK tutorial 106 B - using LSTM
+for time series forecasting in R. The original tutorial can be found [here](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_106B_LSTM_Timeseries_with_IOT_Data.ipynb).
+
+The accelerator here mainly demonstrates how one can use `keras` R interface 
+together with CNTK backend, to train a LSTM model for solar power forecasting,
+in a Azure Data Science Virtual Machine (DSVM).
+
+## 1 Introduction
+
+### 1.1 Context.
+
+[Solar power forecasting](https://en.wikipedia.org/wiki/Solar_power_forecasting)
+is a challenging and important problem. Analyzing historical time-series data of
+solar power generation may help predict the total amount of energy produced by 
+solar panels. 
+
+More discussion about solar power forecasting can be found in the Wikipedia page. The
+model illustrated in this accelerator, is a simplified one, which is to merely demonstrate
+how an R based LSTM model can be trained in an Azure DSVM.
+
+### 1.2 Overall introduction
+
+Overall introduction of model techniques, training framework, and cloud 
+computing resources management can be found in another markdown file.
+
+## 2 Step by step tutorial
+
+### 2.1 Set up
+
+Load the following R packages for this tutorial.
+
+```{r}
+library(keras)
+library(magrittr)
+library(dplyr)
+library(readr)
+library(ggplot2)
+```
+
+### 2.2 Data pre-processing
+
+#### 2.2.1 Data downloading.
+
+The original data set is preserved [here](https://guschmueds.blob.core.windows.net/datasets/solar.csv).
+
+For convenience of reproduction, the data is downloaded onto local system.
+
+```{r}
+data_url <- "https://guschmueds.blob.core.windows.net/datasets/solar.csv"
+
+data_dir  <- tempdir()
+data_file <- tempfile(tmpdir=data_dir, fileext="csv")
+
+# download data.
+
+download.file(url=data_url,
+              destfile=data_file)
+```
+
+```{r}
+# Read the data into memory.
+
+df_panel <- read_csv(data_file)
+```
+
+#### 2.2.2 Data understanding
+
+The original data set is in the form of 
+
+|Time | solar.current | solar.total|
+|------------------|----------|-----------|
+|2013-12-01 7:00|6.30|1.69|
+|2013-12-01 7:30|44.30|11.36|
+|2013-12-01 8:00|208.00|67.50|
+|...|...|...|
+|2016-12-01 12:00|1815.00|5330.00|
+
+The first column is the time stamp of when solar panel is recorded. The frequency of
+reading is once per half an hour. The second and the third columns are current
+power at the time of reading and the total reading so far on that day.
+
+The data can be interactively explored by the following codes.
+
+```{r}
+# Take a glimpse of the data.
+
+glimpse(df_panel)
+
+ggplot(df_panel, aes(x=solar.current)) +
+  geom_histogram()
+```
+
+#### 2.2.3 Data re-formatting.
+
+The objective is, to predict the max
+value of total power reading on a day, by using a sequence of historical solar power readings. 
+
+Since every day the number of solar panel power readings may be different - a unique
+length, 14, is then used for each day. That is, in a daily basis, a univariate
+times series of 14 elements (14 readings of solar panel power) are
+formed as input data, in order to predict the maximum value of total power
+generation of that day.
+
+Following this principle, the data of a day is then re-formatted as
+
+|Time series | Predicted target|
+|-------------------|-----------|
+|1.7, 11.4|10300|
+|1.7, 11.4, 67.5|10300|
+|1.7, 11.4, 67.5, 250.5|10300|
+|1.7, 11.4, 67.5, 250.5, 573.5|10300|
+|...|...|
+
+For training purpose, time stamp is not necessary so the re-formed data are
+aggregated as a set of sequences. 
+
+The following codes accomplish the processing task, in which there are also
+sub-tasks for normalization, maximization and minimization, grouping, etc.
+
+1. Normalize the data as LSTM does not perform well on the un-scaled data.
+```{r}
+# Functions for 0-1 normalization.
+
+normalizeData <- function(data) {
+  (data - min(data)) / (max(data) - min(data))
+}
+
+denomalizeData <- function(data, max, min) {
+  data * (max - min) + min
+}
+
+df_panel_norm <-
+  mutate(df_panel, solar.current=normalizeData(solar.current)) %>%
+  mutate(solar.total=normalizeData(solar.total)) %T>%
+  print()
+
+# Save max and min values for later reference, to reconcile original data
+# when necessary.
+
+normal_ref <- list(current_max=max(df_panel$solar.current),
+                   current_min=min(df_panel$solar.current),
+                   total_max=max(df_panel$solar.total),
+                   total_min=min(df_panel$solar.total))
+```
+
+2. Grouping the data by day.
+```{r}
+df_panel_group <-
+  mutate(df_panel_norm, date = as.Date(time)) %>%
+  group_by(date) %>%
+  arrange(date) %T>%
+  print()
+```
+
+3. Append the columns "solar.current.max" and "solar.total.max"
+```{r}
+# Compute the max values for current and totatl power generation for each day.
+
+df_panel_current_max <-
+  summarise(df_panel_group, solar.current.max = max(solar.current)) %T>%
+  print()
+
+df_panel_total_max <-
+  summarise(df_panel_group, solar.total.max = max(solar.total)) %T>%
+  print()
+
+# Append the max values of power generation.
+
+df_panel_max <- 
+  df_panel_current_max %>%
+  mutate(solar.total.max=df_panel_total_max$solar.total.max) %>%
+  mutate(day_id=row_number())
+
+df_panel_group$solar.current.max <- df_panel_max$solar.current.max[match(df_panel_group$date, df_panel_max$date)]
+df_panel_group$solar.total.max   <- df_panel_max$solar.total.max[match(df_panel_group$date, df_panel_max$date)]
+
+df_panel_all <-
+  df_panel_group %T>%
+  print()
+```
+
+4. Generate the time series sequences for each day.
+
+NOTE: **according to the original [CNTK tutorial](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_106B_LSTM_Timeseries_with_IOT_Data.ipynb), those days with less than 8 readings are
+omitted from the data, and those with more than 14 readings are truncated to be exactly 14. 
+```{r}
+# Find the days that have more than 8 readings.
+
+day_more_than_8 <- 
+  summarise(df_panel_all, group_size = n()) %>%
+  filter(group_size > 8) %>%
+  select(date) 
+
+# Get those days with more than 8 readings, and truncate the number of readings
+# to be equal or less than 14.
+
+df_panel_seq <- 
+  df_panel_all[which(as.Date(df_panel_all$date) %in% as.Date(day_more_than_8$date)), ] %>%
+  filter(row_number() <= 14) %>%
+  mutate(ndata = n()) %T>%
+  print()
+```
+
+According to the data format, for each day, the first sequence is composed by 
+the initial two readings, and the next is generated by appending it with the 
+power reading at next time step. The process iterates until all the readings
+on that day form the last sequence.
+
+Function to generate the sequence is as follows.
+
+```{r}
+genSequence <- function(data) {
+  if (!"day_id" %in% names(data))
+    stop("Input data frame does not have Day ID (day_id) column!")
+  
+  # since 14 is the maximum value so each day there are 13 readings as presumbly
+  # it starts with 2 initial readings.
+  # NOTE: the difference from approach in this tutorial to that in CNTK official tutorial is here
+  # the meter readings are padded with 0s. This is because keras interface does not take list as input.
+   
+  date <- as.character(0)
+  x    <- array(0, dim=c(14 * n_groups(data), 14, 1))
+  y    <- array(0, dim=c(14 * n_groups(data), 1))
+  
+  index <- 1
+  
+  cat("Generating data ...")
+  
+  for (j in unique(data$day_id)) {
+    readings <- select(filter(data, day_id == j), 
+                       solar.total,
+                       solar.total.max,
+                       date)
+    
+    readings_date <- readings$date
+    readings_x    <- as.vector(readings$solar.total)
+    readings_y    <- as.vector(readings$solar.total.max)
+    
+    reading_date <- unique(readings_date)
+    reading_y    <- unique(readings_y)
+    
+    for (i in 2:nrow(readings)) {
+      x[index, 1:i, 1]   <- readings_x[1:i]
+      y[index, 1] <- reading_y
+      date[index] <- as.character(reading_date)
+      
+      # day_id is different form group index! So we use another separate iterator.
+    
+      index <- index + 1
+    }
+  }
+  
+  return(list(x=array(x[1:(index - 1), 1:14, 1], dim=c(index - 1, 14, 1)),
+              y=y[1:(index - 1)],
+              date=date[1:(index - 1)]))
+}
+```
+
+#### 2.2.4 Data splitting
+
+The whole data set is split into training, validating, and testing sets, and 
+the data sets are sampled in the following scenario:
+
+|Day1|Day2|...|DayN-1|DayN|DayN+1|DayN+2|...|Day2N-1|Day2N|
+|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
+|Train|Train|...|Val|Test|Train|Train|...|Val|Test|
+
+To follow the original tutorial, training, validating, and testing data are 
+sampled from 8 sequential days, 1 day, 1 day, in every 10 day of the original
+data set.
+
+```{r}
+df_panel_seq_sample <- 
+  mutate(df_panel_seq, sample_index = day_id %% 10) %T>% 
+  print()
+
+df_train <- filter(df_panel_seq_sample, sample_index <= 8 & sample_index > 0)
+df_val   <- filter(df_panel_seq_sample, sample_index == 9)
+df_test  <- filter(df_panel_seq_sample, sample_index == 0)
+```
+
+The data sets are then processed with `genSequence` function to generate the 
+time sequence data into the required format.
+
+```{r}
+seq_train <- genSequence(df_train)
+seq_val   <- genSequence(df_val)
+seq_test  <- genSequence(df_test)
+
+x_train <- seq_train$x
+y_train <- seq_train$y
+
+x_val <- seq_val$x
+y_val <- seq_val$y
+
+x_test <- seq_test$x
+y_test <- seq_test$y
+date_test <- as.Date(seq_test$date)
+```
+
+### 2.3 Model definition and creation
+
+The overall structure of the LSTM neural network is shown as below.
+
+![](../Docs/Figs/lstm.png)
+
+There are 14 LSTM cells, each taking an input of solar power reading from a
+series. To minimize overfitting, a dropout layer with 0.2 dropout rate is added. The final layer is
+a neuron that is densely connected with the dropout layer, and the output is the predicted solar power value.
+
+#### 2.3.1 Model definition.
+
+In Keras, one type of neural network model is to stack basic layers, and this type of model
+starts with `keras_model_sequential()` function. According
+to the model description, the R code to define the model is 
+
+```{r}
+# The neural network topology is the same as that in the original CNTK tutorial.
+
+model <- 
+  keras_model_sequential() %>%
+  layer_lstm(units=14,
+             input_shape=c(14, 1)) %>%
+  layer_dropout(rate=0.2) %>%
+  layer_dense(units=1)
+```
+
+The defined model is then compiled, where loss function (mean squared error) 
+and optimization method (Adam method) are specified.
+
+```{r}
+model %>% compile(loss='mse', optimizer='adam') 
+```
+
+After the compilation, basic information of the model can be visualized by `summary`.
+
+```{r}
+summary(model)
+```
+
+#### 2.3.2 Model training
+
+After model definition and data pre-processing, the model is trained with the 
+training set. Epoch size and batch size can be varied as parameters to  
+fine tune the model performance.
+
+```{r}
+# Large sizes of epoch and batch will induce longer training time.
+
+epoch_size <- 200
+batch_size <- 1
+
+# Validating sets can be used to validate the model.
+
+model %>% fit(x_train,
+              y_train,
+              validation_data=list(x_val, y_val),
+              batch_size=batch_size,
+              epochs=epoch_size)
+```
+
+#### 2.3.4 Model scoring
+
+After training, the model can be scored with the loss metric.
+
+```{r}
+# evaluation on the test data.
+
+score <- 
+  evaluate(model, x_test, y_test) %T>%
+  print()
+```
+
+#### 2.3.5 Result visualization
+
+```{r eval=FALSE}
+# Use the model for prediction.
+
+y_pred <- predict(model, 
+                  x_test) 
+
+# Reconcile the original data.
+
+y_pred <- denomalizeData(y_pred, normal_ref$total_max, normal_ref$total_min)
+y_test <- denomalizeData(y_test, normal_ref$total_max, normal_ref$total_min)
+
+# Plot the comparison results.
+
+df_plot <- data.frame(
+  date=date_test,
+  index=1:length(y_test),
+  true=y_test,
+  pred=y_pred)
+
+ggplot(df_plot, aes(x=date)) +
+  geom_line(aes(y=y_test, color="True")) +
+  geom_line(aes(y=y_pred, color="Pred")) +
+  theme_bw() +
+  ggtitle("Solar power forecasting") +
+  xlab("Date") +
+  ylab("Max of total solar power")
+```
+
+The result comparing prediction and ground-truth power values is shown as follows. 
+![](../Docs/Figs/result.png)
+
+It shows in the plot that the prediction results align well with the true values. 
+There are ways of improving the model such as
+* increasing the number of epochs.
+* further preprocessing the training data to smooth out missing values.
+* complicating the network topology. 
--- a/SolarPanelForecasting/Code/SolarPanelForecastingCode.nb.html
+++ b/SolarPanelForecasting/Code/SolarPanelForecastingCode.nb.html
--- a/SolarPanelForecasting/Code/SolarPanelForecastingTutorial.Rmd
+++ b/SolarPanelForecasting/Code/SolarPanelForecastingTutorial.Rmd
@ -0,0 +1,264 @@
+---
+title: "Solar power forecasting with Long Short-Term Memory (LSTM)"
+author: "Le Zhang, Data Scientist, Cloud and AI, Microsoft"
+date: '`r Sys.Date()`'
+output:
+  html_notebook: default
+---
+
+This accelerator is a reproduction of CNTK tutorial 106 B - using LSTM
+for time series forecasting in R. The original tutorial can be found [here](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_106B_LSTM_Timeseries_with_IOT_Data.ipynb).
+
+The accelerator here mainly demonstrates how one can use `keras` R interface 
+together with Cognitive Toolkit backend, to train a LSTM model for solar power forecasting,
+in a Azure Data Science Virtual Machine (DSVM).
+
+## 1 Introduction
+
+### 1.1 Context.
+
+[Solar power forecasting](https://en.wikipedia.org/wiki/Solar_power_forecasting)
+is a challenging and important problem. Analyzing historical time-series data of
+solar power generation may help predict the total amount of energy produced by 
+solar panels. 
+
+More discussion about solar power forecasting can be found in the Wikipedia page. The
+model illustrated in this accelerator, is a simplified one, which is to merely demonstrate
+how an R based LSTM model can be trained with Cognitive Toolkit backend in an Azure DSVM.
+
+### 1.2 LSTM
+
+LSTM is a type of Recurrent Neural Network, which is featued by its capability in
+modeling long-term dependencies. It's been pratically applied in many fields such
+as Natural Language Processing (NLP), action recognition, time series prediction,
+etc. 
+
+While a comprehensive discussion of LSTM is not the focus of this accelerator, 
+more information can be found in [Chris Olah's blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).
+
+### 1.3 Cognitive Toolkit and Keras
+
+#### 1.3.1 Cognitive Toolkit
+
+[Microsoft Cognitive Toolkit (previously known as CNTK)](https://www.microsoft.com/en-us/cognitive-toolkit/) is a free, easy-to-use,
+open-source, and commercial-grade toolkit that trains deep learning algorithms
+to learn like the human brains. 
+
+It is featured by 
+* Highly optimized and built-in components that handle multi-dimensional data 
+from different language environment, deal with various types of deep learning
+algorithms, add user-defined core components on the GPU, etc.
+* Efficient resource usage that avails parallelism with multiple GPU/machines.
+* Easy expression of neural network with full APIs of Python, C++, and BrainScript.
+* Training support on Azure.
+
+#### 1.3.2 Keras
+
+[Keras](https://keras.io/) is a high-level neural networks API, that is capable of running various 
+backends such as Cognitive Toolkit, Tensorflow, and Theano. It made experimenting
+deep learning neural networks from idea to result easier than ever before.
+
+#### 1.3.3 Cognitive Toolkit + Keras in R
+
+Since version 2.0, Cognitive Toolkit starts to support keras. 
+
+Cognitive Toolkit has not supported R yet. However, by using [Keras R interface](https://rstudio.github.io/keras/), one can try training neural network
+models by using keras API with Cognitive Toolkit backend.
+
+## 2 Cloud resource deployment
+
+Azure cloud platform offers varieties of resources for elastically running 
+scalable analytical jobs. Especially, VMs or VM clusters incorporated with 
+high-performance computing engines make it convenient for researchers and 
+developers to prototype and validate models easily.
+
+The following sections demonstrate how to train an LSTM model on a DSVM with 
+Cognitive Toolkit and Keras R interface.
+
+NOTE: **the script for demonstrating Cognitive Toolkit + Keras can also be run
+in a local environment, but one needs to manually download and install
+Cognitive Toolkit, Keras, Keras R package, CUDA Toolkit (if GPU device is available and GPU acceleration 
+is wanted), and their dependencies.**
+
+### 2.1 Data Science Virtual Machine (DSVM)
+
+[Azure DSVM](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-provision-vm) is a curated VM that is pre-installed with 
+rich set of commonly used data science and AI development tools such as R/Python
+environment, Cognitive Toolkit, SQL Server, etc.
+
+DSVM is a desirable workplace to experiment, prototype, and productize data 
+analytical and AI solutions. Elasticity of the offering also guarantees the cost
+effectiveness, which makes it more economically efficient compared to on-premise
+servers.
+
+### 2.2 Configuration and setup
+
+Both Cognitive Toolkit and keras are pre-installed on DSVM. However, so far the 
+keras R packages as well as its dependencies are not available. Preliminary 
+installation and configuration are therefore required.
+
+### 2.3 Resource deployment with `AzureDSVM`
+
+[`AzureDSVM`](https://github.com/Azure/AzureDSVM) is an R packages that allow R users to directly interact with Azure
+account to administrate DSVM instances.
+
+To fire up a DSVM, one just needs to specify information such as DSVM name, user
+name, operating system, VM size, etc. For example, the following script fires up
+a Ubuntu DSVM of size D1_V2, located at Southeast Asia. NC-series VMs, which are
+incorporated with GPU devices, are available in certain regions such as East US,
+West Europe, etc. Compared to D-series VMs, NC-series have higher pricing rate, so 
+there are trade-offs for choosing an appropriate machine for training work.
+
+NOTE: **Keras and Keras R interface promise seamless utilization of GPU device 
+that is properly configured in the VM for accelerating deep learning model training.**
+
+```{r}
+# load the packages
+
+library(AzureSMR)
+library(AzureDSVM)
+```
+
+```{r}
+# Credentials for authentication against Azure account are preserved in a json
+# formmated file, named "config.json", which is located at ~/.azuresmr
+
+# The credentials needed for authetication include Client ID, Tenant ID, 
+# authentication key, password, and public key.
+
+settingsfile <- getOption("AzureSMR.config")
+config <- read.AzureSMR.config()
+```
+
+```{r}
+# Authentication with the credential information. 
+
+asc <- createAzureContext()
+
+with(config,
+     setAzureContext(asc, tenantID=tenantID, clientID=clientID, authKey=authKey)
+)
+azureAuthenticate(asc)
+```
+
+```{r}
+# location and resource group name.
+
+dsvm_location <- "southeastasia"
+dsvm_rg       <- paste0("rg", paste(sample(letters, 3), collapse=""))
+
+# VM size, operating system, and VM name.
+
+dsvm_size     <- "Standard_D1_v2"
+dsvm_os       <- "Ubuntu"
+dsvm_name     <- paste0("dsvm", 
+                        paste(sample(letters, 3), collapse=""))
+
+# VM user name, authentication method (password in this case), and login password.
+
+dsvm_username <- "dsvmuser"
+dsvm_authen   <- "Password"
+dsvm_password <- config$PASSWORD
+```
+
+```{r eval=FALSE}
+# deploy the DSVM.
+
+deployDSVM(asc, 
+           resource.group=dsvm_rg,
+           location=dsvm_location,
+           hostname=dsvm_name,
+           username=dsvm_username,
+           size=dsvm_size,
+           os=dsvm_os,
+           authen=dsvm_authen,
+           password=dsvm_password,
+           mode="Sync")
+```
+
+As originally a DSVM does not have keras R interface installed and 
+configured. A post-deployment installation and configuration of the package can 
+be achieved by adding an extension to the deployed DSVM. Basically it runs a 
+shell script that is located on a remote place.
+
+```{r}
+# URL of the shell script and the command to run the script.
+
+dsvm_fileurl <- "https://raw.githubusercontent.com/yueguoguo/Azure-R-Interface/master/demos/demo-5/script.sh"
+dsvm_command <- "sudo sh script.sh"
+```
+
+```{r eval=FALSE}
+# Add extension to the DSVM.
+
+addExtensionDSVM(asc,
+                 location=dsvm_location,
+                 resource.group=dsvm_rg,
+                 hostname=dsvm_name,
+                 os=dsvm_os, 
+                 fileurl=dsvm_fileurl, 
+                 command=dsvm_command)
+```
+
+### 2.4 Remote access to the DSVM.
+
+After a successful deployment and extension, the DSVM can be remotely accessed 
+by 
+1. Rstudio Server - http://<dsvm_name>.<location>.cloudapp.azure.com:8787
+2. Jupyter Notebook - https://<dsvm_name>.<location>.cloudapp.azure.com:8000
+3. X2Go client.
+
+NOTE: **it was found that keras R interface does not work well in Rstudio server
+owing to SSL certificate issue. This may be related to "http" protocol. Jupyter
+Notebook which is based on "https" protocol works well.**
+
+Idealy in the R session of the remote DSVM, typing the following
+
+```{r eval=FALSE}
+library(keras)
+
+backend()
+```
+
+will show the message of "Using CNTK backend...", which means the interface can 
+detect and load Cognitive Toolkit backend. If the DSVM is an NC-series one, GPU
+device will be detected and used.
+
+## 2.5 Model building
+
+After all the set up, the model can be created by using Keras R interface 
+functions. As the model building follows the original CNTK tutorial, text of 
+introduction and description will not be replicated here.
+
+Script of the whole step-by-step tutorial is available [here](../Code/lstm.R)
+
+### 2.6 Run script on the DSVM
+
+The script can be run on the deployed DSVM in various ways.
+
+1. Jupyter Notebook - access the Jupyter Hub hosted by the DSVM via https://<dsvm_name>.<location>.cloudapp.azure.com:8000. Create an R-kernel 
+notebook to run the script.
+2. X2Go client - create a new X2Go session for remote desktop of that machine.
+The script can be copied onto the DSVM via either SSH or any SSH-based file
+transfer software, and then be run in Rstudio desktop version.
+
+NOTE: **it was found that if the script is run with R console or Rscript in 
+command line, GPU device will not be activated for acceleration, while running
+the script in Rstudio IDE does not have such kind of problem.**
+
+## 3 Closing
+
+After the experiment, it is recommended to either stop and deallocate, or 
+destroy the computing resource, if it is not needed.
+
+```{r eval=FALSE}
+# Stop and deallocate the DSVM.
+
+operateDSVM(asc, dsvm_rg, dsvm_name, "Stop")
+```
+
+```{r eval=FALSE}
+# Delete the resource group.
+
+azureDeleteResourceGroup(asc, dsvm_rg)
+```
--- a/SolarPanelForecasting/Code/SolarPanelForecastingTutorial.nb.html
+++ b/SolarPanelForecasting/Code/SolarPanelForecastingTutorial.nb.html
--- a/SolarPanelForecasting/Data/README.md
+++ b/SolarPanelForecasting/Data/README.md
@ -0,0 +1,9 @@
+# List of data sets
+|  Data Set Name | Link to the Full Data Set   | Full Data Set Size (MB)  | Link to Report |
+| ---:| ---: | ---: | ---: |
+| Data Set 1 | [link](https://guschmueds.blob.core.windows.net/datasets/solar.csv) | 1.1 | N/A|
+
+# Description of data sets
+
+* Data Set 1 *Solar panel readings from 2013-12-01 to 2016-12-01,
+    sampled every half an hour in a day.*
--- a/SolarPanelForecasting/Docs/Figs/lstm.png
+++ b/SolarPanelForecasting/Docs/Figs/lstm.png
--- a/SolarPanelForecasting/Docs/Figs/result.png
+++ b/SolarPanelForecasting/Docs/Figs/result.png
--- a/SolarPanelForecasting/Docs/README.md
+++ b/SolarPanelForecasting/Docs/README.md
@ -0,0 +1,4 @@
+# Documents
+
+*This folder contains documents such as blogs, installation instructions, etc. is also the default diretory where the generated reports from R markdown are placed.*
+
--- a/SolarPanelForecasting/README.md
+++ b/SolarPanelForecasting/README.md
@ -0,0 +1,62 @@
+# Data Science Accelerator - *Solar power forecasting with Cognitive Toolkit in R*
+
+## Overview
+
+This repo reproduces [CNTK tutorial 106
+B](https://github.com/Microsoft/CNTK/blob/master/Tutorials/CNTK_106B_LSTM_Timeseries_with_IOT_Data.ipynb)
+- Deep Learning time series forecasting with Long Short-Term Memory
+(LSTM) in R, by using the Keras R interface with Microsoft Cognitive
+Toolkit in an Azure Data Science Virtual Machine (DSVM).
+
+An Azure account can be created for free by visiting [Microsoft
+Azure](https://azure.microsoft.com/free). This will then allow you to
+deploy a [Ubuntu Data Science Virtual
+Machine](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-virtual-machine-overview)
+through the [Azure Portal](https://ms.portal.azure.com). You can then
+connect to the server's [RStudio
+Server](https://www.rstudio.com/products/rstudio/#Server) instance
+through a local web browser via ```http://<ip address>:8787```.
+
+The repository contains three parts
+
+- **Data** Solar panel readings collected from Internet-of-Things (IoTs)
+    devices are used.
+- **Code** Two R markdown files are available - the first one titled
+    [SolarPanelForecastingTutorial](https://github.com/Microsoft/acceleratoRs/blob/master/SolarPanelForecasting/Code/SolarPanelForecastingTutorial.Rmd) provides a general introduction of
+    the accelerator and codes for setting up an experimental environment
+    on Azure DSVM; the second one titled [SolarPanelForecastingCode](https://github.com/Microsoft/acceleratoRs/blob/master/SolarPanelForecasting/Code/SolarPanelForecastingCode.Rmd)
+    wraps codes and step-by-step tutorials on build a LSTM model for
+    forecasting from end to end. 
+- **Docs** Blogs and decks will be added soon. 
+
+## Business domain
+
+The accelerator presents a tutorial on forecasting solar panel power
+readings by using a LSTM based neural network model trained on the
+historical data. Solar power forecasting is a critical problem, and a
+model with desirable estimation accuracy potentially benefits many
+domain-specific business such as energy trading, management, etc.
+
+## Data science problem
+
+The problem is to predict the maximum value of total power generation in
+a day from the solar panel, by taking the sequential readings of solar
+power generation at the current and past sampling moments.
+
+## Data understanding
+
+The data set used in the accelerator was collected from IoT devices
+incorporated in solar panels. The data is available at the
+[URL](https://guschmueds.blob.core.windows.net/datasets/solar.csv).
+
+## Modeling
+
+Model used in this accelerator is based on LSTM, which is capable of
+modeling long-term depenencies in time series data. By properly
+processing the original data into sequences of power readings, a deep
+neural network formed by LSTM cells and dropout layers can capture the
+patterns in the time series so as to predict the output.
+
+## Solution architecture
+
+The experiment is conducted on a Ubuntu DSVM.