This commit is contained in:
hong-revo 2017-03-12 05:16:42 +11:00
Родитель fdb68aa2f2
Коммит a44b5adef4
4 изменённых файлов: 55 добавлений и 28 удалений

Просмотреть файл

@ -7,36 +7,17 @@ This is the repo for the data science workflow demo presented at Microsoft Ignit
You can see a video of the presentation on [Channel 9](https://channel9.msdn.com/events/Ignite/Australia-2017/DA334), and the slide deck itself is at [docs/galaxy-ignite.pptx](docs/galaxy-ignite.pptx).
The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). See [cite.md](cite.md) for a list of acknowledgements and citations. To avoid spamming the SDSS website, the data has been copied to Azure blob storage.
The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). To avoid spamming the SDSS website, the data has been copied to Azure blob storage. See [cite.md](cite.md) for a list of acknowledgements and citations.
## Setup and configuration
## Setting up the backend
This demo assumes that you have access to a SQL Server instance, and have already setup SQL Server R Services and R Server Operationalisation. For more information about these, see the following MSDN pages:
* [Setting up SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt696069.aspx)
* [Configuring R Server for Operationalisation](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/configuration-initial)
The MicrosoftML package can use GPU acceleration to fit neural network models. To enable this, see the help for `MicrosoftML::rxNeuralNet`. In a nutshell, you install the NVidia [CUDA Toolkit 6.5](https://developer.nvidia.com/cuda-toolkit-65) and [cuDNN v2 Library](https://developer.nvidia.com/rdp/cudnn-archive), and then copy some .dlls to the MicrosoftML mxLibs directory. Currently only CUDA acceleration is supported.
You'll also need the following R packages installed, other than those that come with R Server: dplyr, imager, purrr, shiny, shinyjs, RMLtools. All of these are available on CRAN, except for RMLtools which is on Github. You can install this package using devtools:
install.packages("devtools")
devtools::install_github("andrie/RMLtools")
The script `code/settings.R` is used to set some configuration options. It will read two additional files which you should put into the `code` directory:
* `deployDbConnStr.txt` containing the ODBC connection string for your SQL Server instance.
* `deployCred.txt` containing your RServe login credentials (for R Server operationalisation).
You should also edit `settings.R` to set the paths where you want the raw and processed galaxy images saved. The image files total about 2.4GB after processing, so make sure you point this to a location that has enough space.
Finally, run the script `code/dataprep/initDataDirs.R`. This will create the `data` directory structure in which model objects, R data frames etc are stored.
For a description of the steps required, see [setup.md](setup.md).
## Running the demo
Once you've setup your backend, carry out the following steps to run the demo.
Once you've setup the backend, carry out the following steps to run the demo.
1. Run `code/dataprep/downloadCatalog.R` to download the galaxy catalog.
@ -48,7 +29,7 @@ Once you've setup your backend, carry out the following steps to run the demo.
5. (Optional) Modify `code/model/trainModel.R` to fit only the specific neural network model you want to keep. As supplied, the script will fit three models which are minor variations on each other.
6. Run `code/model/trainModel.R` to fit the chosen model(s). It's highly recommended that you have GPU acceleration enabled (see above).
6. Run `code/model/trainModel.R` to fit the chosen model(s).
7. Run `code/deploy/deploySqlModel.R`. This will serialise the model(s) to SQL Server, and also create a stored procedure for scoring new data.
@ -56,9 +37,11 @@ Once you've setup your backend, carry out the following steps to run the demo.
9. (Optional) Run `code/testing/testApi.R` to test that everything works.
10. (Optional, if you have [autorest](https://www.nuget.org/packages/autorest/) installed) Run `autorest -CodeGenerator <yourLanguage> -Input "<path-to-demo>\data\output\swagPredictGalaxyClass.json"` to generate code to consume the API.
10. At the R prompt, run the shiny frontend:
```
``` r
library(shiny)
runApp("code/frontend")
```

Просмотреть файл

@ -72,7 +72,7 @@ spBasePredictGalaxyClass <- function(model, imgData)
img <- as.raw(strtoi(img, base=16))
writeBin(img, inFile)
outFile <- basename(outFile) # workaround bug in imager::save.image: save to home dir, manually delete on exit
outFile <- basename(outFile) # workaround bug in imager::save.image 0.31: save to workdir, delete on exit
imager::load.image(inFile) %>%
resizeGalaxy(c(424, 424)) %>% # match dimensions of training images
cropGalaxy(0.26) %>%

Просмотреть файл

@ -3,9 +3,9 @@
settings <- as.environment(list())
settings$imgPath <- 'd:/data/galaxyzoo/img'
settings$imgPath <- 'data/img'
settings$procImgPath <- 'd:/data/galaxyzoo/processed'
settings$procImgPath <- 'data/imgProcessed'
settings$deployDbConnStr <- if(file.exists("code/deployDbConnStr.txt")) readLines("code/deployDbConnStr.txt")[1] else NULL

Просмотреть файл

@ -0,0 +1,44 @@
## Setup
### Backend configuration
This demo assumes that you have access to a SQL Server instance, and have already setup SQL Server R Services and R Server Operationalisation. For more information about these, see the following MSDN pages:
* [Setting up SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt696069.aspx)
* [Configuring R Server for Operationalisation](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/configuration-initial)
You'll also need to ensure that the R Server Operationalisation accounts can access the database. See your database administrator for help if you are using a shared SQL Server instance.
### R configuration
The MicrosoftML package can use GPU acceleration to fit neural network models, and this is highly recommended. To enable this, see the help for `MicrosoftML::rxNeuralNet`. In a nutshell, you install the NVidia [CUDA Toolkit 6.5](https://developer.nvidia.com/cuda-toolkit-65) and [cuDNN v2 Library](https://developer.nvidia.com/rdp/cudnn-archive), and then copy some .dlls to the MicrosoftML mxLibs directory. Currently only CUDA acceleration is supported.
The demo uses the following R packages, other than those that come with R Server:
* [dplyr](https://cloud.r-project.org/package=dplyr)
* [imager](https://cloud.r-project.org/package=imager)
* [purrr](https://cloud.r-project.org/package=purrr)
* [shiny](https://cloud.r-project.org/package=shiny)
* [shinyjs](https://cloud.r-project.org/package=shinyjs)
* [RMLtools](https://github.com/andrie/RMLtools)
All of these are available on CRAN, except for RMLtools which is on Github. You can install this package using devtools:
``` r
install.packages("devtools")
devtools::install_github("andrie/RMLtools")
```
In addition the imager package and its dependencies must also be installed into a location accessible by SQL Server R Services, so that they are available to R code running inside a stored procedure.
### Project settings
The script `code/settings.R` is used to set project options. It will read two additional files which you should put into the `code` directory:
* `deployDbConnStr.txt` containing the ODBC connection string for your SQL Server instance.
* `deployCred.txt` containing your RServe login credentials (for R Server operationalisation).
You should also edit `settings.R` to set the paths where you want the raw and processed galaxy images saved. The image files total about 2.4GB after processing, so make sure you point this to a location that has enough space.
Finally, run the script `code/dataprep/initDataDirs.R`. This will create the `data` directory structure in which model objects, R data frames etc are stored.