Best Practices on Recommendation Systems

artificial-intelligence azure data-science deep-learning jupyter-notebook kubernetes machine-learning microsoft operationalization python ranking rating recommendation recommendation-algorithm recommendation-engine recommendation-system recommender tutorial

Перейти к файлу

Andreas Argyriou c6fbc2e17d Add info about virtual-env in SETUP.md		2021-09-03 16:47:06 +00:00
.github	Update pipeline summary	2021-08-27 16:44:58 +00:00
contrib	replacing 793 occurrences accross 246 files with recommenders	2021-07-22 14:48:08 +00:00
docs	Sync docs/reqirements.txt with setup.py	2021-09-02 18:15:15 +00:00
examples	fix package import	2021-07-27 10:52:29 +00:00
recommenders	Clarify examples option	2021-08-31 14:51:54 +00:00
scenarios	Update README.md	2020-08-03 21:06:13 -04:00
tests	Approximate assert in NDCG test	2021-08-24 10:09:39 +00:00
tools	Merge branch 'staging' into andreas/simon_feedback	2021-09-01 11:21:36 +00:00
.gitignore	tox run-able	2021-08-06 16:22:54 +00:00
AUTHORS.md	Deep dive notebook for BiVAE model using Cornac (#1358 )	2021-03-29 10:06:54 -04:00
CODE_OF_CONDUCT.md	conduct	2021-04-28 12:05:26 +01:00
CONTRIBUTING.md	Update CONTRIBUTING.md	2021-06-15 17:54:34 +01:00
GLOSSARY.md	Update GLOSSARY.md	2020-07-22 12:07:27 +01:00
LICENSE	Initial commit	2018-09-19 03:06:13 -07:00
MANIFEST.in	replacing 793 occurrences accross 246 files with recommenders	2021-07-22 14:48:08 +00:00
NEWS.md	change package name	2021-07-15 15:53:01 +00:00
README.md	Clarify examples option	2021-08-31 14:51:54 +00:00
SECURITY.md	added security.md as per Open Source Programs Office guidance	2019-09-09 16:31:32 +02:00
SETUP.md	Add info about virtual-env in SETUP.md	2021-09-03 16:47:06 +00:00
conda.md	Rearrange the guides; separate conda yaml instructions	2021-06-07 17:33:09 +00:00
pyproject.toml	Indent pyproject.toml	2021-06-11 17:42:08 +00:00
setup.py	Put jinja back	2021-09-02 11:11:51 +00:00
tox.ini	Remove html cov	2021-08-12 21:49:44 +00:00

README.md

Recommenders

What's New (June 21, 2021)

We have a new release Recommenders 0.6.0!

Recommenders is now on PyPI and can be installed using pip! In addition there are lots of bug fixes and utilities improvements.

Here you can find the PyPi page: https://pypi.org/project/recommenders/

Here you can find the package documentation: https://microsoft-recommenders.readthedocs.io/en/latest/

Introduction

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

Prepare Data: Preparing and loading data for each recommender algorithm
Model: Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
Evaluate: Evaluating algorithms with offline metrics
Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
Operationalize: Operationalizing models in a production environment on Azure

Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the recommenders documentation.

For a more detailed overview of the repository, please see the documents on the wiki page.

Getting Started

Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks.

The installation of the recommenders package has been tested with

Python version 3.6 and venv
Python versions 3.6, 3.7 and conda

and currently does not support version 3.8 and above. It is recommended to install the package and its dependencies inside a clean environment (such as conda or venv).

To set up on your local machine:

To install core utilities, CPU-based algorithms, and dependencies:

Ensure software required for compilation and Python libraries is installed. On Linux this can be supported by adding:

sudo apt-get install -y build-essential libpython<version>

where <version> should be 3.6 or 3.7 as appropriate.

On Windows you will need Microsoft C++ Build Tools.

Create a conda or virtual environment. See the setup guide for more details.
Within the created environment, install the package from PyPI:

pip install --upgrade pip
pip install --upgrade setuptools
pip install recommenders[examples]

In the case of conda, you also need to

conda install numpy-base

after the pip installation.

python -m ipykernel install --user --name my_environment_name --display-name "Python (reco)"

Start the Jupyter notebook server

jupyter notebook

Run the SAR Python CPU MovieLens notebook under the 00_quick_start folder. Make sure to change the kernel to "Python (reco)".

For additional options to install the package (support for GPU, Spark etc.) see this guide.

NOTE - The Alternating Least Squares (ALS) notebooks require a PySpark environment to run. Please follow the steps in the setup guide to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine and to follow the steps in the setup guide to set up Nvidia libraries.

NOTE for DSVM Users - Please follow the steps in the Dependencies setup - Set PySpark environment variables on Linux or MacOS and Troubleshooting for the DSVM sections if you encounter any issue.

DOCKER - Another easy way to try the recommenders repository and get started quickly is to build docker images suitable for different environments.

Algorithms

The table below lists the recommender algorithms currently available in the repository. Notebooks are linked under the Environment column when different implementations are available.

Algorithm	Environment	Type	Description
Alternating Least Squares (ALS)	PySpark	Collaborative Filtering	Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability
Attentive Asynchronous Singular Value Decomposition (A2SVD)^*	Python CPU / Python GPU	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism
Cornac/Bayesian Personalized Ranking (BPR)	Python CPU	Collaborative Filtering	Matrix factorization algorithm for predicting item ranking with implicit feedback
Cornac/Bilateral Variational Autoencoder (BiVAE)	Python CPU / Python GPU	Collaborative Filtering	Generative model for dyadic data (e.g., user-item interactions)
Convolutional Sequence Embedding Recommendation (Caser)	Python CPU / Python GPU	Collaborative Filtering	Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns
Deep Knowledge-Aware Network (DKN)^*	Python CPU / Python GPU	Content-Based Filtering	Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations
Extreme Deep Factorization Machine (xDeepFM)^*	Python CPU / Python GPU	Hybrid	Deep learning based algorithm for implicit and explicit feedback with user/item features
FastAI Embedding Dot Bias (FAST)	Python CPU / Python GPU	Collaborative Filtering	General purpose algorithm with embeddings and biases for users and items
LightFM/Hybrid Matrix Factorization	Python CPU	Hybrid	Hybrid matrix factorization algorithm for both implicit and explicit feedbacks
LightGBM/Gradient Boosting Tree^*	Python CPU / PySpark	Content-Based Filtering	Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems
LightGCN	Python CPU / Python GPU	Collaborative Filtering	Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback
GeoIMC^*	Python CPU	Hybrid	Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach.
GRU4Rec	Python CPU / Python GPU	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks
Multinomial VAE	Python CPU / Python GPU	Collaborative Filtering	Generative Model for predicting user/item interactions
Neural Recommendation with Long- and Short-term User Representations (LSTUR)^*	Python CPU / Python GPU	Content-Based Filtering	Neural recommendation algorithm with long- and short-term user interest modeling
Neural Recommendation with Attentive Multi-View Learning (NAML)^*	Python CPU / Python GPU	Content-Based Filtering	Neural recommendation algorithm with attentive multi-view learning
Neural Collaborative Filtering (NCF)	Python CPU / Python GPU	Collaborative Filtering	Deep learning algorithm with enhanced performance for implicit feedback
Neural Recommendation with Personalized Attention (NPA)^*	Python CPU / Python GPU	Content-Based Filtering	Neural recommendation algorithm with personalized attention network
Neural Recommendation with Multi-Head Self-Attention (NRMS)^*	Python CPU / Python GPU	Content-Based Filtering	Neural recommendation algorithm with multi-head self-attention
Next Item Recommendation (NextItNet)	Python CPU / Python GPU	Collaborative Filtering	Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns
Restricted Boltzmann Machines (RBM)	Python CPU / Python GPU	Collaborative Filtering	Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback
Riemannian Low-rank Matrix Completion (RLRMC)^*	Python CPU	Collaborative Filtering	Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption.
Simple Algorithm for Recommendation (SAR)^*	Python CPU	Collaborative Filtering	Similarity-based algorithm for implicit feedback dataset
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)^*	Python CPU / Python GPU	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller
Multi-Interest-Aware Sequential User Modeling (SUM)^*	Python CPU / Python GPU	Collaborative Filtering	An enhanced memory network-based sequential user model which aims to capture users' multiple interests.
Standard VAE	Python CPU / Python GPU	Collaborative Filtering	Generative Model for predicting user/item interactions
Surprise/Singular Value Decomposition (SVD)	Python CPU	Collaborative Filtering	Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large
Term Frequency - Inverse Document Frequency (TF-IDF)	Python CPU	Content-Based Filtering	Simple similarity-based algorithm for content-based recommendations with text datasets
Vowpal Wabbit (VW)^*	Python CPU (online training)	Content-Based Filtering	Fast online learning algorithms, great for scenarios where user features / context are constantly changing
Wide and Deep	Python CPU / Python GPU	Hybrid	Deep learning algorithm that can memorize feature interactions and generalize user features
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM)	Python CPU	Hybrid	Quick and memory efficient algorithm to predict labels with user/item features

NOTE: ^* indicates algorithms invented/contributed by Microsoft.

Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Algorithm	Environment	Type	Description
SARplus ^*	PySpark	Collaborative Filtering	Optimized implementation of SAR for Spark

Preliminary Comparison

We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs.

Algo	MAP	nDCG@k	Precision@k	Recall@k	RMSE	MAE	R²	Explained Variance
ALS	0.004732	0.044239	0.048462	0.017796	0.965038	0.753001	0.255647	0.251648
BiVAE	0.146126	0.475077	0.411771	0.219145	N/A	N/A	N/A	N/A
BPR	0.132478	0.441997	0.388229	0.212522	N/A	N/A	N/A	N/A
FastAI	0.025503	0.147866	0.130329	0.053824	0.943084	0.744337	0.285308	0.287671
LightGCN	0.088526	0.419846	0.379626	0.144336	N/A	N/A	N/A	N/A
NCF	0.107720	0.396118	0.347296	0.180775	N/A	N/A	N/A	N/A
SAR	0.110591	0.382461	0.330753	0.176385	1.253805	1.048484	-0.569363	0.030474
SVD	0.012873	0.095930	0.091198	0.032783	0.938681	0.742690	0.291967	0.291971

Code of Conduct

This project adheres to Microsoft's Open Source Code of Conduct in order to foster a welcoming and inspiring communtity for all.

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

These tests are the nightly builds, which compute the smoke and integration tests. main is our principal branch and staging is our development branch. We use pytest for testing python utilities in recommenders and papermill for the notebooks. For more information about the testing pipelines, please see the test documentation.

DSVM Build Status

The following tests run on a Linux DSVM daily. These machines run 24/7.

Build Type	Branch	Branch
Linux CPU	main	staging
Linux GPU	main	staging
Linux Spark	main	staging

Microsoft AI Github: Find other Best Practice projects, and Azure AI design patterns in our central repository.
NLP best practices: Best practices and examples on NLP.
Computer vision best practices: Best practices and examples on computer vision.
Forecasting best practices: Best practices and examples on time series forecasting.

Reference papers

A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", WWW 2020: International World Wide Web Conference Taipei, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692
L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019), 2019.
S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967

README.md Убрать экранирование Экранировать