Best Practices on Recommendation Systems

artificial-intelligence azure data-science deep-learning jupyter-notebook kubernetes machine-learning microsoft operationalization python ranking rating recommendation recommendation-algorithm recommendation-engine recommendation-system recommender tutorial

Перейти к файлу

wutaomsft 9edd7f6f93 Merge pull request #1923 from microsoft/readme_dev Add a section in the readme for developers		2023-04-24 10:08:55 -04:00
.devcontainer	Adding codespace deployment (#1521 )	2021-10-08 12:52:51 -04:00
.github	Merge pull request #1917 from microsoft/setoutput	2023-04-11 11:00:59 -04:00
contrib	Restored url line to remove linebreaks	2022-08-13 23:51:13 -07:00
docs	ssept	2022-03-31 12:10:48 +02:00
examples	rerun and clean dataprep notebooks	2023-01-11 18:32:53 +01:00
recommenders	updated standard vae.py	2023-04-11 15:35:34 +05:30
scenarios	fixed typo	2022-07-01 10:28:56 +08:00
tests	comment gpu error #1883	2023-03-30 13:11:00 +02:00
tools	new docker images	2022-05-30 11:24:03 +02:00
.gitignore	update setup	2021-12-10 11:53:04 +00:00
.readthedocs.yaml	clarification	2022-03-31 12:07:02 +02:00
AUTHORS.md	simon	2022-11-10 10:45:05 +01:00
CODE_OF_CONDUCT.md	conduct	2021-04-28 12:05:26 +01:00
CONTRIBUTING.md	🐛	2023-04-21 15:31:51 +02:00
GLOSSARY.md	Update GLOSSARY.md	2020-07-22 12:07:27 +01:00
LICENSE	Initial commit	2018-09-19 03:06:13 -07:00
MANIFEST.in	replacing 793 occurrences accross 246 files with recommenders	2021-07-22 14:48:08 +00:00
NEWS.md	15k stars	2023-02-09 16:59:18 +01:00
README.md	updated	2023-04-16 22:36:08 -04:00
SECURITY.md	added security.md as per Open Source Programs Office guidance	2019-09-09 16:31:32 +02:00
SETUP.md	Update SETUP.md	2023-04-24 14:47:31 +02:00
conda.md	Update docs and conda script	2021-10-07 16:43:19 +00:00
pyproject.toml	Register custom marks to avoid unknown mark warnings (#1855 )	2022-11-18 13:15:00 +08:00
setup.py	remove support of Python 3.6 in description	2023-03-25 10:44:49 -04:00

README.md

Recommenders

What's New (April, 2023)

We reached 15,000 stars!!

Our latest release is Recommenders 1.1.1!

We have introduced a new way of testing our repository using AzureML. With AzureML we are able to distribute our tests to different machines and run them in parallel. This allows us to test our repository on a wider range of machines and provides us with a much faster test cycle. Our total computation time went from around 9h to 35min, and we were able to reduce the costs by half. See more details here.

We also made other improvements like faster evaluation metrics and improving SAR algorithm.

Starting with release 0.6.0, Recommenders has been available on PyPI and can be installed using pip!

Here you can find the PyPi page: https://pypi.org/project/recommenders/

Here you can find the package documentation: https://microsoft-recommenders.readthedocs.io/en/latest/

Introduction

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

Prepare Data: Preparing and loading data for each recommender algorithm
Model: Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
Evaluate: Evaluating algorithms with offline metrics
Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
Operationalize: Operationalizing models in a production environment on Azure

Several utilities are provided in recommenders to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the Recommenders documentation.

For a more detailed overview of the repository, please see the documents on the wiki page.

Getting Started

We recommend conda for environment management, and VS Code for development. To install the recommenders package and run an example notebook on Linux/WSL:

# 1. Install gcc if it is not installed already. On Ubuntu, this could done by using the command
# sudo apt install gcc

# 2. Create and activate a new conda environment
conda create -n <environment_name> python=3.9
conda activate <environment_name>

# 3. Install the recommenders package with examples
pip install recommenders[examples]

# 4. create a Jupyter kernel
python -m ipykernel install --user --name <environment_name> --display-name <kernel_name>

# 5. Clone this repo within vscode or using command:
git clone https://github.com/microsoft/recommenders.git

# 6. Within VS Code:
#   a. Open a notebook, e.g., examples/00_quick_start/sar_movielens.ipynb;  
#   b. Select Jupyter kernel <kernel_name>;
#   c. Run the notebook.

For more information about setup on other platforms (e.g., Windows and macOS) and different configurations (e.g., GPU, Spark and experimental features), see the Setup Guide.

In addition to the core package, several extras are also provided, including:

[examples]: Needed for running examples.
[gpu]: Needed for running GPU models.
[spark]: Needed for running Spark models.
[dev]: Needed for development for the repo.
[all]: [examples]|[gpu]|[spark]|[dev]
[experimental]: Models that are not thoroughly tested and/or may require additional steps in installation.
[nni]: Needed for running models integrated with NNI.

Algorithms

The table below lists the recommender algorithms currently available in the repository. Notebooks are linked under the Example column as Quick start, showcasing an easy to run example of the algorithm, or as Deep dive, explaining in detail the math and implementation of the algorithm.

Algorithm	Type	Description	Example
Alternating Least Squares (ALS)	Collaborative Filtering	Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized for scalability and distributed computing capability. It works in the PySpark environment.	Quick start / Deep dive
Attentive Asynchronous Singular Value Decomposition (A2SVD)^*	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism. It works in the CPU/GPU environment.	Quick start
Cornac/Bayesian Personalized Ranking (BPR)	Collaborative Filtering	Matrix factorization algorithm for predicting item ranking with implicit feedback. It works in the CPU environment.	Deep dive
Cornac/Bilateral Variational Autoencoder (BiVAE)	Collaborative Filtering	Generative model for dyadic data (e.g., user-item interactions). It works in the CPU/GPU environment.	Deep dive
Convolutional Sequence Embedding Recommendation (Caser)	Collaborative Filtering	Algorithm based on convolutions that aim to capture both user’s general preferences and sequential patterns. It works in the CPU/GPU environment.	Quick start
Deep Knowledge-Aware Network (DKN)^*	Content-Based Filtering	Deep learning algorithm incorporating a knowledge graph and article embeddings for providing news or article recommendations. It works in the CPU/GPU environment.	Quick start / Deep dive
Extreme Deep Factorization Machine (xDeepFM)^*	Hybrid	Deep learning based algorithm for implicit and explicit feedback with user/item features. It works in the CPU/GPU environment.	Quick start
FastAI Embedding Dot Bias (FAST)	Collaborative Filtering	General purpose algorithm with embeddings and biases for users and items. It works in the CPU/GPU environment.	Quick start
LightFM/Hybrid Matrix Factorization	Hybrid	Hybrid matrix factorization algorithm for both implicit and explicit feedbacks. It works in the CPU environment.	Quick start
LightGBM/Gradient Boosting Tree^*	Content-Based Filtering	Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments.	Quick start in CPU / Deep dive in PySpark
LightGCN	Collaborative Filtering	Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment.	Deep dive
GeoIMC^*	Hybrid	Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. It works in the CPU environment.	Quick start
GRU4Rec	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment.	Quick start
Multinomial VAE	Collaborative Filtering	Generative model for predicting user/item interactions. It works in the CPU/GPU environment.	Deep dive
Neural Recommendation with Long- and Short-term User Representations (LSTUR)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment.	Quick start
Neural Recommendation with Attentive Multi-View Learning (NAML)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with attentive multi-view learning. It works in the CPU/GPU environment.	Quick start
Neural Collaborative Filtering (NCF)	Collaborative Filtering	Deep learning algorithm with enhanced performance for user/item implicit feedback. It works in the CPU/GPU environment.	Quick start / Deep dive
Neural Recommendation with Personalized Attention (NPA)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with personalized attention network. It works in the CPU/GPU environment.	Quick start
Neural Recommendation with Multi-Head Self-Attention (NRMS)^*	Content-Based Filtering	Neural recommendation algorithm for recommending news articles with multi-head self-attention. It works in the CPU/GPU environment.	Quick start
Next Item Recommendation (NextItNet)	Collaborative Filtering	Algorithm based on dilated convolutions and residual network that aims to capture sequential patterns. It considers both user/item interactions and features. It works in the CPU/GPU environment.	Quick start
Restricted Boltzmann Machines (RBM)	Collaborative Filtering	Neural network based algorithm for learning the underlying probability distribution for explicit or implicit user/item feedback. It works in the CPU/GPU environment.	Quick start / Deep dive
Riemannian Low-rank Matrix Completion (RLRMC)^*	Collaborative Filtering	Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption to predict user/item interactions. It works in the CPU environment.	Quick start
Simple Algorithm for Recommendation (SAR)^*	Collaborative Filtering	Similarity-based algorithm for implicit user/item feedback. It works in the CPU environment.	Quick start / Deep dive
Self-Attentive Sequential Recommendation (SASRec)	Collaborative Filtering	Transformer based algorithm for sequential recommendation. It works in the CPU/GPU environment.	Quick start
Short-term and Long-term Preference Integrated Recommender (SLi-Rec)^*	Collaborative Filtering	Sequential-based algorithm that aims to capture both long and short-term user preferences using attention mechanism, a time-aware controller and a content-aware controller. It works in the CPU/GPU environment.	Quick start
Multi-Interest-Aware Sequential User Modeling (SUM)^*	Collaborative Filtering	An enhanced memory network-based sequential user model which aims to capture users' multiple interests. It works in the CPU/GPU environment.	Quick start
Sequential Recommendation Via Personalized Transformer (SSEPT)	Collaborative Filtering	Transformer based algorithm for sequential recommendation with User embedding. It works in the CPU/GPU environment.	Quick start
Standard VAE	Collaborative Filtering	Generative Model for predicting user/item interactions. It works in the CPU/GPU environment.	Deep dive
Surprise/Singular Value Decomposition (SVD)	Collaborative Filtering	Matrix factorization algorithm for predicting explicit rating feedback in small datasets. It works in the CPU/GPU environment.	Deep dive
Term Frequency - Inverse Document Frequency (TF-IDF)	Content-Based Filtering	Simple similarity-based algorithm for content-based recommendations with text datasets. It works in the CPU environment.	Quick start
Vowpal Wabbit (VW)^*	Content-Based Filtering	Fast online learning algorithms, great for scenarios where user features / context are constantly changing. It uses the CPU for online learning.	Deep dive
Wide and Deep	Hybrid	Deep learning algorithm that can memorize feature interactions and generalize user features. It works in the CPU/GPU environment.	Quick start
xLearn/Factorization Machine (FM) & Field-Aware FM (FFM)	Hybrid	Quick and memory efficient algorithm to predict labels with user/item features. It works in the CPU/GPU environment.	Deep dive

NOTE: ^* indicates algorithms invented/contributed by Microsoft.

Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Algorithm	Type	Description	Example
SARplus ^*	Collaborative Filtering	Optimized implementation of SAR for Spark	Quick start

Algorithm Comparison

We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs.

Algo	MAP	nDCG@k	Precision@k	Recall@k	RMSE	MAE	R²	Explained Variance
ALS	0.004732	0.044239	0.048462	0.017796	0.965038	0.753001	0.255647	0.251648
BiVAE	0.146126	0.475077	0.411771	0.219145	N/A	N/A	N/A	N/A
BPR	0.132478	0.441997	0.388229	0.212522	N/A	N/A	N/A	N/A
FastAI	0.025503	0.147866	0.130329	0.053824	0.943084	0.744337	0.285308	0.287671
LightGCN	0.088526	0.419846	0.379626	0.144336	N/A	N/A	N/A	N/A
NCF	0.107720	0.396118	0.347296	0.180775	N/A	N/A	N/A	N/A
SAR	0.110591	0.382461	0.330753	0.176385	1.253805	1.048484	-0.569363	0.030474
SVD	0.012873	0.095930	0.091198	0.032783	0.938681	0.742690	0.291967	0.291971

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

This project adheres to Microsoft's Open Source Code of Conduct in order to foster a welcoming and inspiring community for all.

Build Status

These tests are the nightly builds, which compute the smoke and integration tests. main is our principal branch and staging is our development branch. We use pytest for testing python utilities in recommenders and Papermill and Scrapbook for the notebooks.

For more information about the testing pipelines, please see the test documentation.

AzureML Nightly Build Status

Smoke and integration tests are run daily on AzureML.

Build Type	Branch	Branch
Linux CPU	main	staging
Linux GPU	main	staging
Linux Spark	main	staging

References

D. Li, J. Lian, L. Zhang, K. Ren, D. Lu, T. Wu, X. Xie, "Recommender Systems: Frontiers and Practices" (in Chinese), Publishing House of Electronics Industry, Beijing 2022.
A. Argyriou, M. González-Fierro, and L. Zhang, "Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems", WWW 2020: International World Wide Web Conference Taipei, 2020. Available online: https://dl.acm.org/doi/abs/10.1145/3366424.3382692
L. Zhang, T. Wu, X. Xie, A. Argyriou, M. González-Fierro and J. Lian, "Building Production-Ready Recommendation System at Scale", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2019 (KDD 2019), 2019.
S. Graham, J.K. Min, T. Wu, "Microsoft recommenders: tools to accelerate developing recommender systems", RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems, 2019. Available online: https://dl.acm.org/doi/10.1145/3298689.3346967

README.md Убрать экранирование Экранировать