documentation tweak
This commit is contained in:
Родитель
fdb68aa2f2
Коммит
a44b5adef4
|
@ -7,36 +7,17 @@ This is the repo for the data science workflow demo presented at Microsoft Ignit
|
|||
|
||||
You can see a video of the presentation on [Channel 9](https://channel9.msdn.com/events/Ignite/Australia-2017/DA334), and the slide deck itself is at [docs/galaxy-ignite.pptx](docs/galaxy-ignite.pptx).
|
||||
|
||||
The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). See [cite.md](cite.md) for a list of acknowledgements and citations. To avoid spamming the SDSS website, the data has been copied to Azure blob storage.
|
||||
The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). To avoid spamming the SDSS website, the data has been copied to Azure blob storage. See [cite.md](cite.md) for a list of acknowledgements and citations.
|
||||
|
||||
|
||||
## Setup and configuration
|
||||
## Setting up the backend
|
||||
|
||||
This demo assumes that you have access to a SQL Server instance, and have already setup SQL Server R Services and R Server Operationalisation. For more information about these, see the following MSDN pages:
|
||||
|
||||
* [Setting up SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt696069.aspx)
|
||||
* [Configuring R Server for Operationalisation](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/configuration-initial)
|
||||
|
||||
The MicrosoftML package can use GPU acceleration to fit neural network models. To enable this, see the help for `MicrosoftML::rxNeuralNet`. In a nutshell, you install the NVidia [CUDA Toolkit 6.5](https://developer.nvidia.com/cuda-toolkit-65) and [cuDNN v2 Library](https://developer.nvidia.com/rdp/cudnn-archive), and then copy some .dlls to the MicrosoftML mxLibs directory. Currently only CUDA acceleration is supported.
|
||||
|
||||
You'll also need the following R packages installed, other than those that come with R Server: dplyr, imager, purrr, shiny, shinyjs, RMLtools. All of these are available on CRAN, except for RMLtools which is on Github. You can install this package using devtools:
|
||||
|
||||
install.packages("devtools")
|
||||
devtools::install_github("andrie/RMLtools")
|
||||
|
||||
The script `code/settings.R` is used to set some configuration options. It will read two additional files which you should put into the `code` directory:
|
||||
|
||||
* `deployDbConnStr.txt` containing the ODBC connection string for your SQL Server instance.
|
||||
* `deployCred.txt` containing your RServe login credentials (for R Server operationalisation).
|
||||
|
||||
You should also edit `settings.R` to set the paths where you want the raw and processed galaxy images saved. The image files total about 2.4GB after processing, so make sure you point this to a location that has enough space.
|
||||
|
||||
Finally, run the script `code/dataprep/initDataDirs.R`. This will create the `data` directory structure in which model objects, R data frames etc are stored.
|
||||
For a description of the steps required, see [setup.md](setup.md).
|
||||
|
||||
|
||||
## Running the demo
|
||||
|
||||
Once you've setup your backend, carry out the following steps to run the demo.
|
||||
Once you've setup the backend, carry out the following steps to run the demo.
|
||||
|
||||
1. Run `code/dataprep/downloadCatalog.R` to download the galaxy catalog.
|
||||
|
||||
|
@ -48,7 +29,7 @@ Once you've setup your backend, carry out the following steps to run the demo.
|
|||
|
||||
5. (Optional) Modify `code/model/trainModel.R` to fit only the specific neural network model you want to keep. As supplied, the script will fit three models which are minor variations on each other.
|
||||
|
||||
6. Run `code/model/trainModel.R` to fit the chosen model(s). It's highly recommended that you have GPU acceleration enabled (see above).
|
||||
6. Run `code/model/trainModel.R` to fit the chosen model(s).
|
||||
|
||||
7. Run `code/deploy/deploySqlModel.R`. This will serialise the model(s) to SQL Server, and also create a stored procedure for scoring new data.
|
||||
|
||||
|
@ -56,9 +37,11 @@ Once you've setup your backend, carry out the following steps to run the demo.
|
|||
|
||||
9. (Optional) Run `code/testing/testApi.R` to test that everything works.
|
||||
|
||||
10. (Optional, if you have [autorest](https://www.nuget.org/packages/autorest/) installed) Run `autorest -CodeGenerator <yourLanguage> -Input "<path-to-demo>\data\output\swagPredictGalaxyClass.json"` to generate code to consume the API.
|
||||
|
||||
10. At the R prompt, run the shiny frontend:
|
||||
|
||||
```
|
||||
``` r
|
||||
library(shiny)
|
||||
runApp("code/frontend")
|
||||
```
|
||||
|
|
|
@ -72,7 +72,7 @@ spBasePredictGalaxyClass <- function(model, imgData)
|
|||
img <- as.raw(strtoi(img, base=16))
|
||||
writeBin(img, inFile)
|
||||
|
||||
outFile <- basename(outFile) # workaround bug in imager::save.image: save to home dir, manually delete on exit
|
||||
outFile <- basename(outFile) # workaround bug in imager::save.image 0.31: save to workdir, delete on exit
|
||||
imager::load.image(inFile) %>%
|
||||
resizeGalaxy(c(424, 424)) %>% # match dimensions of training images
|
||||
cropGalaxy(0.26) %>%
|
||||
|
|
|
@ -3,9 +3,9 @@
|
|||
|
||||
settings <- as.environment(list())
|
||||
|
||||
settings$imgPath <- 'd:/data/galaxyzoo/img'
|
||||
settings$imgPath <- 'data/img'
|
||||
|
||||
settings$procImgPath <- 'd:/data/galaxyzoo/processed'
|
||||
settings$procImgPath <- 'data/imgProcessed'
|
||||
|
||||
settings$deployDbConnStr <- if(file.exists("code/deployDbConnStr.txt")) readLines("code/deployDbConnStr.txt")[1] else NULL
|
||||
|
||||
|
|
|
@ -0,0 +1,44 @@
|
|||
## Setup
|
||||
|
||||
### Backend configuration
|
||||
|
||||
This demo assumes that you have access to a SQL Server instance, and have already setup SQL Server R Services and R Server Operationalisation. For more information about these, see the following MSDN pages:
|
||||
|
||||
* [Setting up SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt696069.aspx)
|
||||
* [Configuring R Server for Operationalisation](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/configuration-initial)
|
||||
|
||||
You'll also need to ensure that the R Server Operationalisation accounts can access the database. See your database administrator for help if you are using a shared SQL Server instance.
|
||||
|
||||
|
||||
### R configuration
|
||||
|
||||
The MicrosoftML package can use GPU acceleration to fit neural network models, and this is highly recommended. To enable this, see the help for `MicrosoftML::rxNeuralNet`. In a nutshell, you install the NVidia [CUDA Toolkit 6.5](https://developer.nvidia.com/cuda-toolkit-65) and [cuDNN v2 Library](https://developer.nvidia.com/rdp/cudnn-archive), and then copy some .dlls to the MicrosoftML mxLibs directory. Currently only CUDA acceleration is supported.
|
||||
|
||||
The demo uses the following R packages, other than those that come with R Server:
|
||||
|
||||
* [dplyr](https://cloud.r-project.org/package=dplyr)
|
||||
* [imager](https://cloud.r-project.org/package=imager)
|
||||
* [purrr](https://cloud.r-project.org/package=purrr)
|
||||
* [shiny](https://cloud.r-project.org/package=shiny)
|
||||
* [shinyjs](https://cloud.r-project.org/package=shinyjs)
|
||||
* [RMLtools](https://github.com/andrie/RMLtools)
|
||||
|
||||
All of these are available on CRAN, except for RMLtools which is on Github. You can install this package using devtools:
|
||||
``` r
|
||||
install.packages("devtools")
|
||||
devtools::install_github("andrie/RMLtools")
|
||||
```
|
||||
In addition the imager package and its dependencies must also be installed into a location accessible by SQL Server R Services, so that they are available to R code running inside a stored procedure.
|
||||
|
||||
|
||||
### Project settings
|
||||
|
||||
The script `code/settings.R` is used to set project options. It will read two additional files which you should put into the `code` directory:
|
||||
|
||||
* `deployDbConnStr.txt` containing the ODBC connection string for your SQL Server instance.
|
||||
* `deployCred.txt` containing your RServe login credentials (for R Server operationalisation).
|
||||
|
||||
You should also edit `settings.R` to set the paths where you want the raw and processed galaxy images saved. The image files total about 2.4GB after processing, so make sure you point this to a location that has enough space.
|
||||
|
||||
Finally, run the script `code/dataprep/initDataDirs.R`. This will create the `data` directory structure in which model objects, R data frames etc are stored.
|
||||
|
Загрузка…
Ссылка в новой задаче