ab408c31da | ||
---|---|---|
.. | ||
code | ||
docs | ||
.gitattributes | ||
.gitignore | ||
README.md | ||
cite.md | ||
galaxy.rproj | ||
galaxy.rxproj | ||
galaxy.sln | ||
setup.md |
README.md
Galaxy classification: a data science workflow with Microsoft R Server 9
This is the repo for the data science workflow demo presented at Microsoft Ignite 2017. This demo showcases a number of features released as part of Microsoft R Server 9, combined into an example workflow for classifying galaxy images:
- MicrosoftML, a powerful package for machine learning
- Easy deployment of models using SQL Server R Services
- Creating web service APIs with R Server Operationalisation (previously known as DeployR)
You can see a video of the presentation on Channel 9, and the slide deck itself is at docs/galaxy-ignite.pptx.
The original data for the demo was obtained from the Sloan Digital Sky Survey (http://www.sdss.org). To avoid spamming the SDSS website, the data has been copied to Azure blob storage. See cite.md for a list of acknowledgements and citations.
Setting up the backend
For a description of the steps required, see setup.md.
Running the demo
Once you've setup the backend, carry out the following steps to run the demo.
-
Run
code/dataprep/downloadCatalog.R
to download the galaxy catalog. -
Run
code/dataprep/downloadImages.R
. This will download a zip archive containing the individual galaxy images, and then extract them. -
Run
code/dataprep/processImages.R
to process the images for analysis. (This will take a while.) -
Run
code/dataprep/genDataset.R
to create the analysis dataset. This is a data frame that contains the names of the image files that will be used by the modelling functions, not the image data as such. -
(Optional) Modify
code/model/trainModel.R
to fit only the specific neural network model you want to keep. As supplied, the script will fit three models which are minor variations on each other. -
Run
code/model/trainModel.R
to fit the chosen model(s). -
Run
code/deploy/deploySqlModel.R
. This will serialise the model(s) to SQL Server, and also create a stored procedure for scoring new data. -
Run
code/deploy/deployModel.R
to create a web service that will call the SQL stored procedure. -
(Optional) Run
code/testing/testApi.R
to test that everything works. -
(Optional, if you have autorest installed) Run
autorest -CodeGenerator <yourLanguage> -Input "<path-to-demo>\data\output\swagPredictGalaxyClass.json"
to generate code to consume the API. -
At the R prompt, run the shiny frontend:
library(shiny)
runApp("code/frontend")
More information
- About Microsoft R Server
- Team Data Science Process
- Sloan Digital Sky Survey
- Galaxy Zoo
- Mapping the Universe with SQL Server -- in-depth article on Technet