documentation tweak

2017-03-12 05:16:42 +11:00 · 2017-03-12 05:16:42 +11:00 · a44b5adef4
--- a/GalaxyClassificationWorkflow/README.md
+++ b/GalaxyClassificationWorkflow/README.md
@ -7,36 +7,17 @@ This is the repo for the data science workflow demo presented at Microsoft Ignit

 You can see a video of the presentation on [Channel 9](https://channel9.msdn.com/events/Ignite/Australia-2017/DA334), and the slide deck itself is at [docs/galaxy-ignite.pptx](docs/galaxy-ignite.pptx).

-The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). See [cite.md](cite.md) for a list of acknowledgements and citations. To avoid spamming the SDSS website, the data has been copied to Azure blob storage.
+The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). To avoid spamming the SDSS website, the data has been copied to Azure blob storage. See [cite.md](cite.md) for a list of acknowledgements and citations.


-## Setup and configuration
+## Setting up the backend

-This demo assumes that you have access to a SQL Server instance, and have already setup SQL Server R Services and R Server Operationalisation. For more information about these, see the following MSDN pages:
-
-* [Setting up SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt696069.aspx)
-* [Configuring R Server for Operationalisation](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/configuration-initial)
-
-The MicrosoftML package can use GPU acceleration to fit neural network models. To enable this, see the help for `MicrosoftML::rxNeuralNet`. In a nutshell, you install the NVidia [CUDA Toolkit 6.5](https://developer.nvidia.com/cuda-toolkit-65) and [cuDNN v2 Library](https://developer.nvidia.com/rdp/cudnn-archive), and then copy some .dlls to the MicrosoftML mxLibs directory. Currently only CUDA acceleration is supported.
-
-You'll also need the following R packages installed, other than those that come with R Server: dplyr, imager, purrr, shiny, shinyjs, RMLtools. All of these are available on CRAN, except for RMLtools which is on Github. You can install this package using devtools:
-
-    install.packages("devtools")
-    devtools::install_github("andrie/RMLtools")
-
-The script `code/settings.R` is used to set some configuration options. It will read two additional files which you should put into the `code` directory:
-
-* `deployDbConnStr.txt` containing the ODBC connection string for your SQL Server instance.
-* `deployCred.txt` containing your RServe login credentials (for R Server operationalisation).
-
-You should also edit `settings.R` to set the paths where you want the raw and processed galaxy images saved. The image files total about 2.4GB after processing, so make sure you point this to a location that has enough space.
-
-Finally, run the script `code/dataprep/initDataDirs.R`. This will create the `data` directory structure in which model objects, R data frames etc are stored.
+For a description of the steps required, see [setup.md](setup.md).


 ## Running the demo

-Once you've setup your backend, carry out the following steps to run the demo.
+Once you've setup the backend, carry out the following steps to run the demo.

 1. Run `code/dataprep/downloadCatalog.R` to download the galaxy catalog.

@ -48,7 +29,7 @@ Once you've setup your backend, carry out the following steps to run the demo.

 5. (Optional) Modify `code/model/trainModel.R` to fit only the specific neural network model you want to keep. As supplied, the script will fit three models which are minor variations on each other.

-6. Run `code/model/trainModel.R` to fit the chosen model(s). It's highly recommended that you have GPU acceleration enabled (see above).
+6. Run `code/model/trainModel.R` to fit the chosen model(s).

 7. Run `code/deploy/deploySqlModel.R`. This will serialise the model(s) to SQL Server, and also create a stored procedure for scoring new data.

@ -56,9 +37,11 @@ Once you've setup your backend, carry out the following steps to run the demo.

 9. (Optional) Run `code/testing/testApi.R` to test that everything works.

+10. (Optional, if you have [autorest](https://www.nuget.org/packages/autorest/) installed) Run `autorest -CodeGenerator <yourLanguage> -Input "<path-to-demo>\data\output\swagPredictGalaxyClass.json"` to generate code to consume the API.
+
 10. At the R prompt, run the shiny frontend:

-```
+``` r
 library(shiny)
 runApp("code/frontend")
 ```
--- a/GalaxyClassificationWorkflow/code/deploy/deploySqlModel.R
+++ b/GalaxyClassificationWorkflow/code/deploy/deploySqlModel.R
@ -72,7 +72,7 @@ spBasePredictGalaxyClass <- function(model, imgData)
        img <- as.raw(strtoi(img, base=16))
        writeBin(img, inFile)
        
-        outFile <- basename(outFile)  # workaround bug in imager::save.image: save to home dir, manually delete on exit
+        outFile <- basename(outFile)  # workaround bug in imager::save.image 0.31: save to workdir, delete on exit
        imager::load.image(inFile) %>% 
            resizeGalaxy(c(424, 424)) %>%  # match dimensions of training images
            cropGalaxy(0.26) %>%
--- a/GalaxyClassificationWorkflow/code/settings.R
+++ b/GalaxyClassificationWorkflow/code/settings.R
@ -3,9 +3,9 @@

 settings <- as.environment(list())

-settings$imgPath <- 'd:/data/galaxyzoo/img'
+settings$imgPath <- 'data/img'

-settings$procImgPath <- 'd:/data/galaxyzoo/processed'
+settings$procImgPath <- 'data/imgProcessed'

 settings$deployDbConnStr <- if(file.exists("code/deployDbConnStr.txt")) readLines("code/deployDbConnStr.txt")[1] else NULL

--- a/GalaxyClassificationWorkflow/setup.md
+++ b/GalaxyClassificationWorkflow/setup.md
@ -0,0 +1,44 @@
+## Setup
+
+### Backend configuration
+
+This demo assumes that you have access to a SQL Server instance, and have already setup SQL Server R Services and R Server Operationalisation. For more information about these, see the following MSDN pages:
+
+* [Setting up SQL Server R Services](https://msdn.microsoft.com/en-us/library/mt696069.aspx)
+* [Configuring R Server for Operationalisation](https://msdn.microsoft.com/en-us/microsoft-r/operationalize/configuration-initial)
+
+You'll also need to ensure that the R Server Operationalisation accounts can access the database. See your database administrator for help if you are using a shared SQL Server instance.
+
+
+### R configuration
+
+The MicrosoftML package can use GPU acceleration to fit neural network models, and this is highly recommended. To enable this, see the help for `MicrosoftML::rxNeuralNet`. In a nutshell, you install the NVidia [CUDA Toolkit 6.5](https://developer.nvidia.com/cuda-toolkit-65) and [cuDNN v2 Library](https://developer.nvidia.com/rdp/cudnn-archive), and then copy some .dlls to the MicrosoftML mxLibs directory. Currently only CUDA acceleration is supported.
+
+The demo uses the following R packages, other than those that come with R Server:
+
+* [dplyr](https://cloud.r-project.org/package=dplyr)
+* [imager](https://cloud.r-project.org/package=imager)
+* [purrr](https://cloud.r-project.org/package=purrr)
+* [shiny](https://cloud.r-project.org/package=shiny)
+* [shinyjs](https://cloud.r-project.org/package=shinyjs)
+* [RMLtools](https://github.com/andrie/RMLtools)
+
+All of these are available on CRAN, except for RMLtools which is on Github. You can install this package using devtools:
+``` r
+install.packages("devtools")
+devtools::install_github("andrie/RMLtools")
+```
+In addition the imager package and its dependencies must also be installed into a location accessible by SQL Server R Services, so that they are available to R code running inside a stored procedure.
+
+
+### Project settings
+
+The script `code/settings.R` is used to set project options. It will read two additional files which you should put into the `code` directory:
+
+* `deployDbConnStr.txt` containing the ODBC connection string for your SQL Server instance.
+* `deployCred.txt` containing your RServe login credentials (for R Server operationalisation).
+
+You should also edit `settings.R` to set the paths where you want the raw and processed galaxy images saved. The image files total about 2.4GB after processing, so make sure you point this to a location that has enough space.
+
+Finally, run the script `code/dataprep/initDataDirs.R`. This will create the `data` directory structure in which model objects, R data frames etc are stored.
+