e20f124787
This pr is auto merged as it contains a mandatory file and is opened for more than 10 days. |
||
---|---|---|
init_scripts | ||
LICENSE | ||
README.md | ||
SECURITY.md | ||
data_preparation.ipynb | ||
model_inference_hvd.ipynb | ||
model_inference_hvd_deepspeed.ipynb | ||
model_inference_pudf.ipynb | ||
model_inference_pudf_deepspeed.ipynb | ||
model_training_hvd.ipynb | ||
model_training_hvd_deepspeed.ipynb |
README.md
Training and Inference of Hugging Face models on Azure Databricks
This repository contains the code for the blog post series Optimized Training and Inference of Hugging Face Models on Azure Databricks.
If you want to reproduce the Databricks Notebooks, you should first follow the steps below to set up your environment:
-
Create a Azure Databricks Workspace: you can create one by following these instructions and you can select the Standard pricing tier.
-
Create a Cluster: you can follow these instructions to create your cluster. Your cluster configuration should be based on nodes of the type Standard_NC4as_T4_v3. Please make sure you have enough CPU cores of that type, otherwise work with your Azure subscription administrator to request a quota increase. Use the information below when creating your cluster:
- Databricks runtime version should be at least 11.2 ML (GPU, Scala 2.12, Spark 3.3.0)
- worker type should be Standard_NC4as_T4_v3 and number of workers should be at least 2 (the notebooks here were run with 8 worker nodes)
- Driver type should be the same as worker type
- Disable autoscaling
- Install Python libraries in your cluster: you can follow these instructions to install the libraries. Please install the following PyPI libraries in your cluster:
- transformers==4.20.1
- sentencepiece
- datasets
- deepspeed
- mpi4py
- ninja
- Install a cluster-scoped init script in your cluster. This is needed for installing the ninja Linux library. The script to be installed is the ninja_install.sh. You can follow these instructions to learn how to install it.
The notebooks should be run in the following order:
- data_preparation.ipynb: it downloads and prepares the datasets needed for model training and inference.
- model_training_hvd.ipynb: it performs distributed fine tuning using PyTorch and Horovod on the pre-trained Hugging Face model.
- model_training_hvd_deepspeed.ipynb: it performs distributed fine tuning using PyTorch and Horovod, optimized with DeepSpeed, on the pre-trained Hugging Face model.
- model_inference_hvd.ipynb: it performs distributed inference using PyTorch and Horovod on the fine-tuned model.
- model_inference_hvd_deepspeed.ipynb: it performs distributed inference using PyTorch and Horovod, optimized with DeepSpeed, on the fine-tuned model.
- model_inference_pudf.ipynb: it performs distributed inference using Transformer's Pipeline and Pandas UDF on the fine-tuned model.
- model_inference_pudf_deepspeed.ipynb it performs distributed inference using Transformer's Pipeline and Pandas UDF, optimized with DeepSpeed, on the fine-tuned model.