f81f78edcf | ||
---|---|---|
.. | ||
Code | ||
Data | ||
Docs | ||
README.md |
README.md
Data Science Accelerator - Spark based movie recommender
Overview
The accelerator is to illustrate how to efficiently build a movie recommendation system within 30 minutes!
The repository contains three parts
- Data Schemas and references to sample data used in the accelerator.
- Code Codes for training and scoring a movie recommender.
- Docs Documents helping to build a recommender with Azure Machine Learning Service.
Business domain
Recommendation (e-commerce, entertainment, retail, etc.).
Data science problem
The problem a recommendation system tries to resolve is
Given historical observations of user preferences (i.e., ratings) on a set of items, how to predict and generate a set of items that the users will like most probably.
Data understanding
Typically data in a recommendation system has a schema of |user|item|rating|[timestamp]| where user, item, and rating refer to user ID, item ID, and ratings given by a user towards an item.
Modeling
A recommender is built by using Spark built-in collaborative filtering algorithm, which is a matrix factorization typed algorithm that is regularized by alternating least squares technique.
Solution architecture
The whole recommendation solution consists of Azure services such as Azure Data Science Virtual Machine, Azure blob storage, Azure Container Registry, Azure Container Services, etc. The building process is completed with Azure Machine Learning Service.