Update documentation for submodules (#481)

The current documentation uses the submodule setup at many places. However, we found that it can easily mislead users. Switching to suggesting plain forking.
This commit is contained in:
Anton Schwaighofer 2021-06-09 17:35:51 +01:00 коммит произвёл GitHub
Родитель a74c8e2b8c
Коммит 5af8b01dc1
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
4 изменённых файлов: 158 добавлений и 89 удалений

Просмотреть файл

@ -100,9 +100,18 @@ Further detailed instructions, including setup in Azure, are here:
1. [Debugging and monitoring models](docs/debugging_and_monitoring.md)
1. [Model diagnostics](docs/model_diagnostics.md)
1. [Move a model to a different workspace](docs/move_model.md)
1. [Deployment](docs/deploy_on_aml.md)
1. [Working with FastMRI models](docs/fastmri.md)
## Deployment
We offer a companion set of open-sourced tools that help to integrate trained CT segmentation models with clinical
software systems:
- The [InnerEye-Gateway](https://github.com/microsoft/InnerEye-Gateway) is a Windows service running in a DICOM network,
that can route anonymized DICOM images to an inference service.
- The [InnerEye-Inference](https://github.com/microsoft/InnerEye-Inference) component offers a REST API that integrates
with the InnnEye-Gateway, to run inference on InnerEye-DeepLearning models.
Details can be found [here](docs/deploy_on_aml.md).
![docs/deployment.png](docs/deployment.png)
## More information

Просмотреть файл

@ -4,30 +4,14 @@
In order to work with the solution, your OS environment will need [git](https://git-scm.com/) and [git lfs](https://git-lfs.github.com/) installed. Depending on the OS that you are running the installation instructions may vary. Please refer to respective documentation sections on the tools' websites for detailed instructions.
## Using the InnerEye code as a git submodule of your project
You have two options for working with our codebase:
* You can fork the InnerEye-DeepLearning repository, and work off that.
* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
submodule.
If you go down the second route, here's the list of files you will need in your project (that's the same as those
given in [this document](building_models.md))
* `environment.yml`: Conda environment with python, pip, pytorch
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
* A folder like `ML` that contains your additional code, and model configurations.
* A file `ML/runner.py` that invokes the InnerEye training runner, but that points the code to your environment and Azure
settings; see the [Building models](building_models.md) instructions for details.
You then need to add the InnerEye code as a git submodule, in folder `innereye-submodule`:
```shell script
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-submodule
```
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-submodule` subfolder as inputs.
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
`innereye-submodule` as well.
We recommend using PyCharm or VSCode as the Python editor.
You have two options for working with our codebase:
* You can fork the InnerEye-DeepLearning repository, and work off that. We recommend that because it is easiest to set up.
* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
submodule. We only recommended that if you are very handy with Python. More details about this option
[are here](innereye_as_submodule.md).
## Windows Subsystem for Linux Setup
When developing on a Windows machine, we recommend using [the Windows Subsystem for Linux, WSL2](https://docs.microsoft.com/en-us/windows/wsl/about).
That's because PyTorch has better support for Linux.

Просмотреть файл

@ -0,0 +1,95 @@
# Using the InnerEye code as a git submodule of your project
You can use InnerEye as a submodule in your own project.
If you go down that route, here's the list of files you will need in your project (that's the same as those
given in [this document](building_models.md))
* `environment.yml`: Conda environment with python, pip, pytorch
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
* A folder like `ML` that contains your additional code, and model configurations.
* A file like `myrunner.py` that invokes the InnerEye training runner, but that points the code to your environment
and Azure settings; see the [Building models](building_models.md) instructions for details. Please see below for how
`myrunner.py` should look like.
You then need to add the InnerEye code as a git submodule, in folder `innereye-deeplearning`:
```shell script
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-deeplearning
```
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-deeplearning` subfolder as inputs.
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
`innereye-deeplearning` as well.
Example commandline runner that uses the InnerEye runner (called `myrunner.py` above):
```python
import sys
from pathlib import Path
# This file here mimics how the InnerEye code would be used as a git submodule.
# Ensure that this path correctly points to the root folder of your repository.
repository_root = Path(__file__).absolute()
def add_package_to_sys_path_if_needed() -> None:
"""
Checks if the Python paths in sys.path already contain the /innereye-deeplearning folder. If not, add it.
"""
is_package_in_path = False
innereye_submodule_folder = repository_root / "innereye-deeplearning"
for path_str in sys.path:
path = Path(path_str)
if path == innereye_submodule_folder:
is_package_in_path = True
break
if not is_package_in_path:
print(f"Adding {innereye_submodule_folder} to sys.path")
sys.path.append(str(innereye_submodule_folder))
def main() -> None:
try:
from InnerEye import ML # noqa: 411
except:
add_package_to_sys_path_if_needed()
from InnerEye.ML import runner
print(f"Repository root: {repository_root}")
# Check here that yaml_config_file correctly points to your settings file
runner.run(project_root=repository_root,
yaml_config_file=Path("settings.yml"),
post_cross_validation_hook=None)
if __name__ == '__main__':
main()
```
## Adding new models
1. Set up a directory outside of InnerEye to holds your configs. In your repository root, you could have a folder
`InnerEyeLocal`, parallel to the InnerEye submodule, alongside `settings.yml` and `myrunner.py`.
The example below creates a new flavour of the Glaucoma model in `InnerEye/ML/configs/classification/GlaucomaPublic`.
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
stored in `InnerEyeLocal/configs`
1. Create folder `InnerEyeLocal/configs`
1. Create a config file `InnerEyeLocal/configs/GlaucomaPublicExt.py` which extends the `GlaucomaPublic` class
like this:
```python
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
class MyGlaucomaModel(GlaucomaPublic):
def __init__(self) -> None:
super().__init__()
self.azure_dataset_id="name_of_your_dataset_on_azure"
```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
#### Start Training
Run the following to start a job on AzureML:
```
python myrunner.py --azureml=True --model=MyGlaucomaModel
```
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.

Просмотреть файл

@ -1,7 +1,8 @@
# Sample Tasks
Two sample tasks for the classification and segmentation pipelines.
This document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
This document contains two sample tasks for the classification and segmentation pipelines.
The document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
Before trying tp train these models, you should have followed steps to set up an [environment](environment.md) and [AzureML](setting_up_aml.md)
## Sample classification task: Glaucoma Detection on OCT volumes
@ -9,61 +10,42 @@ Before trying tp train these models, you should have followed steps to set up an
This example is based on the paper [A feature agnostic approach for glaucoma detection in OCT volumes](https://arxiv.org/pdf/1807.04855v3.pdf).
### Downloading and preparing the dataset
1. The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.
The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.
1. After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
script on the extracted folder.
```
python create_dataset_csv.py /path/to/extracted/folder
```
This will convert the dataset to csv form and create a file `dataset.csv`.
1. Upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account,
Finally, upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account,
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). The dataset should go
into a container called `datasets`, with a folder name of your choice (`name_of_your_dataset_on_azure` in the
description below).
### Setting up training
### Creating the model configuration and starting training
You have two options for running the Glaucoma model:
- You can directly work on a fork of the InnerEye repository. In this case, you need to modify `AZURE_DATASET_ID`
in `GlaucomaPublic.py` to match the dataset upload location, called `name_of_your_dataset_on_azure` above.
If you choose that, you can start training via
```
python InnerEye/ML/runner.py --model=GlaucomaPublic --azureml=True
```
- Alternatively, you can create a separate runner and a separate model configuration folder. The steps described
below refer to this route.
#### Setting up a second runner
1. Set up a directory outside of InnerEye to holds your configs, as in
[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
beside InnerEye with files `settings.yml` and `ML/runner.py`.
#### Creating the classification model configuration
The full configuration for the Glaucoma model is at `InnerEye/ML/configs/classification/GlaucomaPublic`.
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
stored in `InnerEyeLocal/ML`
1. Create folder configs/classification under InnerEyeLocal/ML
1. Create a config file called GlaucomaPublicExt.py there which extends the GlaucomaPublic class that looks like
Next, you need to create a configuration file `InnerEye/ML/configs/MyGlaucoma.py`
which extends the GlaucomaPublic class like this:
```python
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
class GlaucomaPublicExt(GlaucomaPublic):
class MyGlaucomaModel(GlaucomaPublic):
def __init__(self) -> None:
super().__init__()
self.azure_dataset_id="name_of_your_dataset_on_azure"
```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
The value for `self.azure_dataset_id` should match the dataset upload location, called
`name_of_your_dataset_on_azure` above.
#### Start Training
Run the following to start a job on AzureML
Once that config is in place, you can start training in AzureML via
```
python InnerEyeLocal/ML/runner.py --azureml=True --model=GlaucomaPublicExt
python InnerEye/ML/runner.py --model=MyGlaucomaModel --azureml=True
```
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
As an alternative to working with a fork of the repository, you can use InnerEye-DeepLearning via a submodule.
Please check [here](innereye_as_submodule.md) for details.
## Sample segmentation task: Segmentation of Lung CT
@ -71,46 +53,45 @@ This example is based on the [Lung CT Segmentation Challenge 2017](https://wiki.
### Downloading and preparing the dataset
1. The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
1. The next step is to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
parent folder, which we will call `datasets`. This file structure is expected by the converison tool.
1. Use the [InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) to create a NIFTI dataset
from the downloaded (DICOM) files.
The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
You need to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
parent folder, which we will call `datasets`. This file structure is expected by the conversion tool.
Next, use the
[InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) commandline tools to create a
NIFTI dataset from the downloaded (DICOM) files.
After installing the tool, run
```batch
InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to the 'datasets' folder> --niftiDatasetDirectory=<output folder name for converted dataset> --dicomDatasetDirectory=<name of downloaded folder inside 'datasets'> --geoNorm 1;1;3
```
Now, you should have another folder under `datasets` with the converted Nifti files.
The `geonorm` tag tells the tool to normalize the voxel sizes during conversion.
1. Upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account,
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets).
### Setting up training
1. Set up a directory outside of InnerEye to holds your configs, as in
[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
beside InnerEye with files settings.yml and ML/runner.py.
### Creating the segmentation model configuration
The full configuration for the Lung model is at InnerEye/ML/configs/segmentation/Lung.
All that needs to be done is change the dataset. We will do this by subclassing Lung in a new config
stored in InnerEyeLocal/ML
1. Create folder configs/segmentation under InnerEyeLocal/ML
1. Create a config file called LungExt.py there which extends the GlaucomaPublic class that looks like this:
Finally, upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account,
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). All files should go
into a folder in the `datasets` container, for example `my_lung_dataset`. This folder name will need to go into the
`azure_dataset_id` field of the model configuration, see below.
### Creating the model configuration and starting training
You can then create a new model configuration, based on the template
[Lung.py](../InnerEye/ML/configs/segmentation/Lung.py). To do this, create a file
`InnerEye/ML/configs/segmentation/MyLungModel.py`, where you create a subclass of the template Lung model, and
add the `azure_dataset_id` field (i.e., the name of the folder that contains the uploaded data from above),
so that it looks like:
```python
from InnerEye.ML.configs.segmentation.Lung import Lung
class LungExt(Lung):
from InnerEye.ML.configs.segmentation.Lung import Lung
class MyLungModel(Lung):
def __init__(self) -> None:
super().__init__(azure_dataset_id="name_of_your_dataset_on_azure")
```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
### Start Training
Run the following to start a job on AzureML
super().__init__()
self.azure_dataset_id = "my_lung_dataset"
```
python InnerEyeLocal/ML/runner.py --azureml=True --model=LungExt --train=True
If you are using InnerEye as a submodule, please add this configuration in your private configuration folder,
as described for the Glaucoma model [here](innereye_as_submodule.md).
You can now run the following command to start a job on AzureML:
```
python InnerEye/ML/runner.py --azureml=True --model=MyLungModel
```
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.