acceleratoRs/GalaxyClassificationWorkflow
hong-revo ab408c31da add slide deck 2017-04-20 21:13:13 +10:00
..
code project files 2017-03-14 13:31:55 +11:00
docs add slide deck 2017-04-20 21:13:13 +10:00
.gitattributes project files 2017-03-14 13:31:55 +11:00
.gitignore project files 2017-03-14 13:31:55 +11:00
README.md documentation tweak 2017-03-12 05:16:42 +11:00
cite.md refactor for RTVS compat 2017-03-11 18:56:17 +11:00
galaxy.rproj project files 2017-03-14 13:31:55 +11:00
galaxy.rxproj project files 2017-03-14 13:31:55 +11:00
galaxy.sln project files 2017-03-14 13:31:55 +11:00
setup.md documentation tweak 2017-03-12 05:16:42 +11:00

README.md

Galaxy classification: a data science workflow with Microsoft R Server 9

This is the repo for the data science workflow demo presented at Microsoft Ignite 2017. This demo showcases a number of features released as part of Microsoft R Server 9, combined into an example workflow for classifying galaxy images:

  • MicrosoftML, a powerful package for machine learning
  • Easy deployment of models using SQL Server R Services
  • Creating web service APIs with R Server Operationalisation (previously known as DeployR)

You can see a video of the presentation on Channel 9, and the slide deck itself is at docs/galaxy-ignite.pptx.

The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). To avoid spamming the SDSS website, the data has been copied to Azure blob storage. See cite.md for a list of acknowledgements and citations.

Setting up the backend

For a description of the steps required, see setup.md.

Running the demo

Once you've setup the backend, carry out the following steps to run the demo.

  1. Run code/dataprep/downloadCatalog.R to download the galaxy catalog.

  2. Run code/dataprep/downloadImages.R. This will download a zip archive containing the individual galaxy images, and then extract them.

  3. Run code/dataprep/processImages.R to process the images for analysis. (This will take a while.)

  4. Run code/dataprep/genDataset.R to create the analysis dataset. This is a data frame that contains the names of the image files that will be used by the modelling functions, not the image data as such.

  5. (Optional) Modify code/model/trainModel.R to fit only the specific neural network model you want to keep. As supplied, the script will fit three models which are minor variations on each other.

  6. Run code/model/trainModel.R to fit the chosen model(s).

  7. Run code/deploy/deploySqlModel.R. This will serialise the model(s) to SQL Server, and also create a stored procedure for scoring new data.

  8. Run code/deploy/deployModel.R to create a web service that will call the SQL stored procedure.

  9. (Optional) Run code/testing/testApi.R to test that everything works.

  10. (Optional, if you have autorest installed) Run autorest -CodeGenerator <yourLanguage> -Input "<path-to-demo>\data\output\swagPredictGalaxyClass.json" to generate code to consume the API.

  11. At the R prompt, run the shiny frontend:

library(shiny)
runApp("code/frontend")

More information