Sample Project for creating an RRS Recommendation System using AzureML
Перейти к файлу
dciborow 4b9f92100b working web service 2018-02-12 15:54:33 -05:00
.azureml Initial commit 2018-02-12 02:17:20 +00:00
aml_config refactoring a little bit 2018-02-12 12:00:13 -05:00
assets refactoring a little bit 2018-02-12 12:00:13 -05:00
code working web service 2018-02-12 15:54:33 -05:00
docs Initial commit 2018-02-12 02:17:20 +00:00
sample_data adding sample files and test notebook\\ 2018-02-12 08:11:59 -05:00
.gitignore refactoring a little bit 2018-02-12 12:00:13 -05:00
LICENSE Initial commit 2018-02-12 02:17:20 +00:00
Readme.md Initial commit 2018-02-12 02:17:20 +00:00
Train.ipynb adding sample files and test notebook\\ 2018-02-12 08:11:59 -05:00
ratings.dsource adding sample files and test notebook\\ 2018-02-12 08:11:59 -05:00
ratings.dsource.user working web service 2018-02-12 15:54:33 -05:00
ratings.ipynb refactoring a little bit 2018-02-12 12:00:13 -05:00
score.py working web service 2018-02-12 15:54:33 -05:00
score_iris.py refactoring a little bit 2018-02-12 12:00:13 -05:00
service_schema.json refactoring a little bit 2018-02-12 12:00:13 -05:00

Readme.md

TDSP Project Dashboard

Summary

TDSP Project Dashboard

This is the project dashboard where you put key project information (for example, a project summary, with relevant links). In your actual project, replace the rest of the content with project-specific summary.

Team Data Science Process From Microsoft (TDSP)

This repository contains an instantiation of the Team Data Science Process (TDSP) from Microsoft for project Azure Machine Learning. The TDSP is an agile, iterative, data science methodology designed to improve team collaboration and learning. It facilitates better coordinated and more productive data science enterprises by providing:

  • a lifecycle that defines the steps in project development
  • a standard project structure
  • artifact templates for reporting
  • tools to assist with data science tasks and project execution

Information About TDSP In Azure Machine Learning

When you instantiate the TDSP from Azure Machine Learning, you get the TDSP-recommended standardized directory structure and document templates for project execution and delivery. The workflow then consists of the following steps:

  • modify the documentation templates provided here for your project
  • execute your project (fill in with your project's code, documents, and artifact outputs)
  • prepare the Data Science deliverables for your client or customer, including the ProjectReport.md report.

We provide instructions on how to instantiate and use TDSP in Azure Machine Learning.

The Data Science Lifecycle

TDSP uses the data science lifecycle to structure projects. The lifecycle defines the steps that a project typically must execute, from start to finish. This lifecycle is valid for data science projects that build data products and intelligent applications that include predictive analytics. The goal is to incorporate machine learning or artificial intelligence (AI) models into commercial products. Exploratory data science projects or ad hoc/on-off analytics projects can also use this process, but in this case some steps of this lifecycle may not be needed.

Here is a depiction of the TDSP lifecycle.

The TDSP data science lifecycle is composed of four major stages that are executed iteratively. This includes:

  • Business Understanding
  • Data Acquisition and Understanding
  • Modeling
  • Deployment

These stages should, ideally, be followed by customer acceptance for successful projects.

If you are using a different lifecycle schema, such as CRISP-DM, [KDD, or your own custom process that is working well in your organization, you can still use the TDSP in the context of those development lifecycles.

For reference, see a more detailed description of the TDSP life-cycle. That version also provides additional documentation templates that are associated with each phase of the TDSP lifecycle.

Documenting Your Project

Refer to TDSP documentation templates to see how you can document your project for efficient collaboration and reproducibility. In the current Azure Machine Learning TDSP documentation template, we recommend that you include all the information in the ProjectReport file. This template should be filled out with information that is specific to your project.

In addition to the ProjectReport, which serves as the primary project document, we provide another template, ProjectLearnings, to include any learnings and information, which may not be included in the primary project document, but still useful to document.

Documents received from a customer can be stored in .\docs\dustomer_docs. Documents prepared for sharing information with a customer (for example, ProjectReport, graphs, tables etc.) can be stored in .\docs\deliveralbe_docs.

Project Folder Structure

The TDSP project template contains following top-level folders:

  1. code: Contains code
  2. docs: Contains necessary documentation about the project
  3. sample_data: Contains SAMPLE (small) data that can be used for early development or testing. Typically, not more than several (5) Mbs. Not for full or large data-sets.

NOTE: Make sure other than the readme.md file, all documentation-related content (text, markdowns, images, other document files) that are NOT used during the project execution must reside in the folder named “docs” (all lowercase). This is a special folder ignored by Azure Machine Learning execution so that contents in this folder do not get copied to compute target unnecessarily. Objects in this folder also dont count towards the 25-MB cap for project size, so you can store large image files needed in your documentation for example. They are still tracked by Git through Run History.

Project Planning And Execution

To deploy Visual Studio Online (Team Services) for planning, managing and executing your data science projects, detailed instructions are provided here.

Release Notes

Release of this template is associated with the preview release of Azure Machine Learning (September 2017). We are continuously improving TDSP based on customer experience and feedback, and releasing new features. Refer to TDSP page for more information.

Ask Questions

We would love to hear back from your own experience with the TDSP. Should you have any questions or suggestions, create a new discussion thread on the Issues Tab.