PyTorch Lightning + Hydra. A very general, feature-rich template for rapid and scalable ML experimentation with best practices. ⚡🔥⚡

Перейти к файлу

hobogalaxy 3249fb45ab improve smoke tests		2021-03-05 23:09:21 +01:00
configs	improve smoke tests	2021-03-05 23:09:21 +01:00
data	Dev (#78 )	2021-03-01 15:00:10 +01:00
logs	Dev (#78 )	2021-03-01 15:00:10 +01:00
notebooks	Dev (#78 )	2021-03-01 15:00:10 +01:00
src	better printing for quick_tests.sh	2021-03-05 16:28:39 +01:00
tests	improve smoke tests	2021-03-05 23:09:21 +01:00
.gitignore	Dev (#78 )	2021-03-01 15:00:10 +01:00
LICENSE	Initial commit	2020-11-04 15:30:10 +01:00
README.md	Update README.md	2021-03-04 13:30:51 +01:00
conda_env_cpu.yaml	Dev (#78 )	2021-03-01 15:00:10 +01:00
conda_env_gpu.yaml	Dev (#78 )	2021-03-01 15:00:10 +01:00
requirements.txt	Dev (#78 )	2021-03-01 15:00:10 +01:00
setup.py	Dev (#78 )	2021-03-01 15:00:10 +01:00
train.py	Dev (#78 )	2021-03-01 15:00:10 +01:00

README.md

PyTorch Lightning + Hydra Template

A clean and scalable template to kickstart your deep learning project 🚀⚡🔥
Click on Use this template button above to initialize new repository.

This template tries to be as generic as possible. You should be able to easily modify behavior in train.py in case you need some unconventional configuration wiring.

This is work in progress. I'm currently figuring out the best workflow for scalable experimentation process. Suggestions are always welcome!

PyTorch Lightning + Hydra Template

Why Lightning + Hydra?

PyTorch Lightning provides great abstractions for well structured ML code and advanced features like checkpointing, gradient accumulation, distributed training, etc.
Hydra provides convenient way to manage experiment configurations and advanced features like overriding any config parameter from command line, scheduling execution of many runs, etc.

Main Ideas

Predefined Structure: clean and scalable so that work can easily be extended and replicated (see #Project Structure)
Rapid Experimentation: thanks to automating pipeline with config files and hydra command line superpowers
Little Boilerplate: so pipeline can be easily modified (see train.py)
Main Configuration: main config file specifies default training configuration (see #Main Project Configuration)
Experiment Configurations: stored in a separate folder, they can be composed out of smaller configs, override chosen parameters or define everything from scratch (see #Experiment Configuration)
Experiment Tracking: many logging frameworks can be easily integrated! (see #Experiment Tracking)
Logs: all logs (checkpoints, data from loggers, chosen hparams, etc.) are stored in a convenient folder structure imposed by Hydra (see #Logs)
Smoke Tests: simple bash scripts running 1-2 epoch experiments to check if your model doesn't crash under different conditions (see tests)
Hyperparameter Search: made easier with Hydra built in plugins like Optuna Sweeper
Workflow: comes down to 4 simple steps (see #Workflow)

Some Notes

Warning: this template currently uses development version of hydra which might be unstable (we wait until Hydra 1.1 is released).
Inspired by: PyTorchLightning/deep-learninig-project-template, drivendata/cookiecutter-data-science, tchaton/lightning-hydra-seed, Erlemar/pytorch_tempest, ryul99/pytorch-project-template.
To learn how to configure PyTorch with Hydra take a look at this detailed MNIST tutorial.
Repositories useful for configuring PyTorch and PyTorch Lightning classes with Hydra: romesco/hydra-lightning, pytorch/hydra-torch.
Suggestions are always welcome!

Project Structure

The directory structure of new project looks like this:

├── configs                 <- Hydra configuration files
│   ├── trainer                 <- Configurations of Lightning trainers
│   ├── model                   <- Configurations of Lightning models
│   ├── datamodule              <- Configurations of Lightning datamodules
│   ├── callbacks               <- Configurations of Lightning callbacks
│   ├── logger                  <- Configurations of Lightning loggers
│   ├── experiment              <- Configurations of experiments
│   │         
│   ├── config.yaml             <- Main project configuration file
│   └── config_optuna.yaml      <- Configuration of Optuna hyperparameter search
│
├── data                    <- Project data
│
├── logs                    <- Logs generated by Hydra and PyTorch Lightning loggers
│
├── notebooks               <- Jupyter notebooks
│
├── tests                   <- Tests of any kind
│   ├── quick_tests.sh          <- A couple of quick experiments to test if your model
│   │                              doesn't crash under different training conditions
│   └── ...
│
├── src
│   ├── architectures           <- PyTorch model architectures
│   ├── callbacks               <- PyTorch Lightning callbacks
│   ├── datamodules             <- PyTorch Lightning datamodules
│   ├── datasets                <- PyTorch datasets
│   ├── models                  <- PyTorch Lightning models
│   ├── transforms              <- Data transformations
│   └── utils                   <- Utility scripts
│       ├── inference_example.py    <- Example of inference with trained model 
│       └── template_utils.py       <- Some extra template utilities
│
├── train.py                <- Train model with chosen experiment configuration
│
├── .gitignore
├── LICENSE
├── README.md
├── conda_env_gpu.yaml      <- File for installing conda env for GPU
├── conda_env_cpu.yaml      <- File for installing conda env for CPU
├── requirements.txt        <- File for installing python dependencies
└── setup.py                <- File for installing project as a package

Features

Hydra superpowers
- Override any config parameter from command line
- Easily switch between different loggers, callbacks sets, optimizers, etc. from command line
- Sweep over hyperparameters from command line
- Automatic logging of run history
- Sweeper integrations for Optuna, Ray and others
Optional callbacks for Weigths&Biases (wandb_callbacks.py)
- To support reproducibility:
  - UploadCodeToWandbAsArtifact
  - UploadCheckpointsToWandbAsArtifact
  - WatchModelWithWandb
- To provide examples of logging custom visualisations and metrics with callbacks:
  - LogBestMetricScoresToWandb
  - LogF1PrecisionRecallHeatmapToWandb
  - LogConfusionMatrixToWandb
~~Validating correctness of config with Hydra schemas~~ (TODO)
Method to pretty print configuration composed by Hydra at the start of the run, using Rich library (template_utils.py)
Method to log chosen parts of Hydra config to all loggers (template_utils.py)
Example of hyperparameter search with Optuna sweeps (config_optuna.yaml)
~~Example of hyperparameter search with Weights&Biases sweeps~~ (TODO)
Examples of simple bash scripts to check if your model doesn't crash under different training conditions (tests/)
Example of inference with trained model (inference_example.py)
Built in requirements (requirements.txt)
Built in conda environment initialization (conda_env_gpu.yaml, conda_env_cpu.yaml)
Built in python package setup (setup.py)
Example with MNIST classification (mnist_model.py, mnist_datamodule.py)

Main Project Configuration

Location: configs/config.yaml
Main project config contains default training configuration.
It determines how config is composed when simply executing command: python train.py

# to execute run with default training configuration simply run: 
# python train.py


# specify here default training configuration
defaults:
    - trainer: default_trainer.yaml
    - model: mnist_model.yaml
    - datamodule: mnist_datamodule.yaml
    - callbacks: default_callbacks.yaml  # set this to null if you don't want to use callbacks
    - logger: null  # set logger here or use command line (e.g. `python train.py logger=wandb`)


# path to original working directory (that `train.py` was executed from in command line)
# hydra hijacks working directory by changing it to the current log directory,
# so it's useful to have path to original working directory as a special variable
# read more here: https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
work_dir: ${hydra:runtime.cwd}


# path to folder with data
data_dir: ${work_dir}/data/


# pretty print config at the start of the run using Rich library
print_config: True


# output paths for hydra logs
hydra:
    run:
        dir: logs/runs/${now:%Y-%m-%d}/${now:%H-%M-%S}
    sweep:
        dir: logs/multiruns/${now:%Y-%m-%d_%H-%M-%S}
        subdir: ${hydra.job.num}

Experiment Configuration

Location: configs/experiment
You can store many experiment configurations in this folder.
Example experiment configuration:

# to execute this experiment run:
# python train.py +experiment=exp_example_simple

defaults:
    - override /trainer: default_trainer.yaml
    - override /model: mnist_model.yaml
    - override /datamodule: mnist_datamodule.yaml
    - override /callbacks: default_callbacks.yaml
    - override /logger: null

# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters

seed: 12345 

trainer:
    max_epochs: 10
    gradient_clip_val: 0.5

model:
    lr: 0.001
    lin1_size: 128
    lin2_size: 256
    lin3_size: 64

datamodule:
    batch_size: 64
    train_val_test_split: [55_000, 5_000, 10_000]

More advanced experiment configuration:

# to execute this experiment run:
# python train.py +experiment=exp_example_full

defaults:
    - override /trainer: null
    - override /model: null
    - override /datamodule: null 
    - override /callbacks: null
    - override /logger: null

# we override default configurations with nulls to prevent them from loading at all
# instead we define all modules and their paths directly in this config, 
# so everything is stored in one place for more readibility

seed: 12345

trainer:
    _target_: pytorch_lightning.Trainer
    gpus: 0
    min_epochs: 1
    max_epochs: 10
    gradient_clip_val: 0.5

model:
    _target_: src.models.mnist_model.LitModelMNIST
    optimizer: adam
    lr: 0.001
    weight_decay: 0.00005
    architecture: SimpleDenseNet
    input_size: 784
    lin1_size: 256
    dropout1: 0.30
    lin2_size: 256
    dropout2: 0.25
    lin3_size: 128
    dropout3: 0.20
    output_size: 10

datamodule:
    _target_: src.datamodules.mnist_datamodule.MNISTDataModule
    data_dir: ${data_dir}
    batch_size: 64
    train_val_test_split: [55_000, 5_000, 10_000]
    num_workers: 0
    pin_memory: False

logger:
    wandb:
        tags: ["best_model", "uwu"]
        notes: "Description of this model."

Workflow

Write your PyTorch Lightning model (see mnist_model.py for example)
Write your PyTorch Lightning datamodule (see mnist_datamodule.py for example)
Write your experiment config, containing paths to your model and datamodule (see project/configs/experiment for examples)

Run training with chosen experiment config:

python train.py +experiment=experiment_name.yaml

Logs

Hydra creates new working directory for every executed run.
By default, logs have the following structure:

│
├── logs
│   ├── runs                    # Folder for logs generated from single runs
│   │   ├── 2021-02-15              # Date of executing run
│   │   │   ├── 16-50-49                # Hour of executing run
│   │   │   │   ├── .hydra                  # Hydra logs
│   │   │   │   ├── wandb                   # Weights&Biases logs
│   │   │   │   ├── checkpoints             # Training checkpoints
│   │   │   │   └── ...                     # Any other thing saved during training
│   │   │   ├── ...
│   │   │   └── ...
│   │   ├── ...
│   │   └── ...
│   │
│   └── multiruns               # Folder for logs generated from multiruns (sweeps)
│       ├── 2021-02-15_16-50-49     # Date and hour of executing sweep
│       │   ├── 0                       # Job number
│       │   │   ├── .hydra                  # Hydra logs
│       │   │   ├── wandb                   # Weights&Biases logs
│       │   │   ├── checkpoints             # Training checkpoints
│       │   │   └── ...                     # Any other thing saved during training
│       │   ├── 1
│       │   ├── 2
│       │   └── ...
│       ├── ...
│       └── ...
│

You can change this structure by modifying paths in config.yaml.

Experiment Tracking

PyTorch Lightning provides built in loggers for Weights&Biases, Neptune, Comet, MLFlow, Tensorboard and CSV. To use one of them, simply add its config to configs/logger and run:

python train.py logger=logger_config.yaml

You can use many of them at once (see configs/logger/many_loggers.yaml for example).

Tests

(TODO)

Distributed Training

(TODO)

Tricks

(TODO)

DELETE EVERYTHING ABOVE FOR YOUR PROJECT

Your Project Name

Some short description.

Description

What it does

How to run

First, install dependencies:

# clone project
git clone https://github.com/YourGithubName/your-repo-name
cd your-repo-name

# optionally create conda environment
conda env create -f conda_env_gpu.yaml -n your_env_name
conda activate your_env_name

# install requirements
pip install -r requirements.txt

Next, you can train model with default configuration without logging:

python train.py

Or you can train model with chosen logger like Weights&Biases:

# set project and entity names in `project/configs/logger/wandb.yaml`
wandb:
    project: "your_project_name"
    entity: "your_wandb_team_name"

# train model with Weights&Biases
python train.py logger=wandb

Or you can train model with chosen experiment config:

# experiment configurations are placed in folder `configs/experiment/`
python train.py +experiment=exp_example_simple

To execute all experiments from folder run:

# execute all experiments from folder `configs/experiment/`
python train.py -m '+experiment=glob(*)'

You can override any parameter from command line like this:

python train.py trainer.max_epochs=20 model.lr=0.0005

To train on GPU:

python train.py trainer.gpus=1

Attach some callback set to run:

# callback sets configurations are placed in `configs/callbacks/`
python train.py callbacks=default_callbacks

Combaining it all:

python train.py -m '+experiment=glob(*)' trainer.max_epochs=10 logger=wandb

To create a sweep over some hyperparameters run:

# this will run 6 experiments one after the other, 
# each with different combination of batch_size and learning rate
python train.py -m datamodule.batch_size=32,64,128 model.lr=0.001,0.0005

To sweep with Optuna:

# this will run hyperparameter search defined in `configs/config_optuna.yaml`
python train.py -m --config-name config_optuna.yaml +experiment=exp_example_simple

Resume from checkpoint:

# checkpoint can be either path or URL
# path should be either absolute or prefixed with `${work_dir}/`
# use quotes '' around argument or otherwise $ symbol breaks it
python train.py '+trainer.resume_from_checkpoint=${work_dir}/logs/runs/2021-02-28/16-50-49/checkpoints/last.ckpt'

Installing project as a package

Optionally you can install project as a package with setup.py:

# install from local files
pip install -e .

# or install from git repo
pip install git+git://github.com/YourGithubName/your-repo-name.git --upgrade

So you can easily import any file into any other file like so:

from src.models.mnist_model import LitModelMNIST
from src.datamodules.mnist_datamodule import MNISTDataModule

README.md

PyTorch Lightning + Hydra Template

Contents

Why Lightning + Hydra?

Main Ideas

Some Notes

Project Structure

Features

Main Project Configuration

Experiment Configuration

Workflow

Logs

Experiment Tracking

Tests

Distributed Training

Tricks

DELETE EVERYTHING ABOVE FOR YOUR PROJECT

Your Project Name

Description

How to run

Installing project as a package