Update documentation for submodules (#481)
The current documentation uses the submodule setup at many places. However, we found that it can easily mislead users. Switching to suggesting plain forking.
This commit is contained in:
Родитель
a74c8e2b8c
Коммит
5af8b01dc1
11
README.md
11
README.md
|
@ -100,9 +100,18 @@ Further detailed instructions, including setup in Azure, are here:
|
|||
1. [Debugging and monitoring models](docs/debugging_and_monitoring.md)
|
||||
1. [Model diagnostics](docs/model_diagnostics.md)
|
||||
1. [Move a model to a different workspace](docs/move_model.md)
|
||||
1. [Deployment](docs/deploy_on_aml.md)
|
||||
1. [Working with FastMRI models](docs/fastmri.md)
|
||||
|
||||
## Deployment
|
||||
We offer a companion set of open-sourced tools that help to integrate trained CT segmentation models with clinical
|
||||
software systems:
|
||||
- The [InnerEye-Gateway](https://github.com/microsoft/InnerEye-Gateway) is a Windows service running in a DICOM network,
|
||||
that can route anonymized DICOM images to an inference service.
|
||||
- The [InnerEye-Inference](https://github.com/microsoft/InnerEye-Inference) component offers a REST API that integrates
|
||||
with the InnnEye-Gateway, to run inference on InnerEye-DeepLearning models.
|
||||
|
||||
Details can be found [here](docs/deploy_on_aml.md).
|
||||
|
||||
![docs/deployment.png](docs/deployment.png)
|
||||
|
||||
## More information
|
||||
|
|
|
@ -4,30 +4,14 @@
|
|||
|
||||
In order to work with the solution, your OS environment will need [git](https://git-scm.com/) and [git lfs](https://git-lfs.github.com/) installed. Depending on the OS that you are running the installation instructions may vary. Please refer to respective documentation sections on the tools' websites for detailed instructions.
|
||||
|
||||
## Using the InnerEye code as a git submodule of your project
|
||||
You have two options for working with our codebase:
|
||||
* You can fork the InnerEye-DeepLearning repository, and work off that.
|
||||
* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
|
||||
submodule.
|
||||
|
||||
If you go down the second route, here's the list of files you will need in your project (that's the same as those
|
||||
given in [this document](building_models.md))
|
||||
* `environment.yml`: Conda environment with python, pip, pytorch
|
||||
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
|
||||
* A folder like `ML` that contains your additional code, and model configurations.
|
||||
* A file `ML/runner.py` that invokes the InnerEye training runner, but that points the code to your environment and Azure
|
||||
settings; see the [Building models](building_models.md) instructions for details.
|
||||
|
||||
You then need to add the InnerEye code as a git submodule, in folder `innereye-submodule`:
|
||||
```shell script
|
||||
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-submodule
|
||||
```
|
||||
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-submodule` subfolder as inputs.
|
||||
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
|
||||
`innereye-submodule` as well.
|
||||
|
||||
We recommend using PyCharm or VSCode as the Python editor.
|
||||
|
||||
You have two options for working with our codebase:
|
||||
* You can fork the InnerEye-DeepLearning repository, and work off that. We recommend that because it is easiest to set up.
|
||||
* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
|
||||
submodule. We only recommended that if you are very handy with Python. More details about this option
|
||||
[are here](innereye_as_submodule.md).
|
||||
|
||||
## Windows Subsystem for Linux Setup
|
||||
When developing on a Windows machine, we recommend using [the Windows Subsystem for Linux, WSL2](https://docs.microsoft.com/en-us/windows/wsl/about).
|
||||
That's because PyTorch has better support for Linux.
|
||||
|
|
|
@ -0,0 +1,95 @@
|
|||
# Using the InnerEye code as a git submodule of your project
|
||||
|
||||
You can use InnerEye as a submodule in your own project.
|
||||
If you go down that route, here's the list of files you will need in your project (that's the same as those
|
||||
given in [this document](building_models.md))
|
||||
* `environment.yml`: Conda environment with python, pip, pytorch
|
||||
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
|
||||
* A folder like `ML` that contains your additional code, and model configurations.
|
||||
* A file like `myrunner.py` that invokes the InnerEye training runner, but that points the code to your environment
|
||||
and Azure settings; see the [Building models](building_models.md) instructions for details. Please see below for how
|
||||
`myrunner.py` should look like.
|
||||
|
||||
You then need to add the InnerEye code as a git submodule, in folder `innereye-deeplearning`:
|
||||
```shell script
|
||||
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-deeplearning
|
||||
```
|
||||
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-deeplearning` subfolder as inputs.
|
||||
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
|
||||
`innereye-deeplearning` as well.
|
||||
|
||||
Example commandline runner that uses the InnerEye runner (called `myrunner.py` above):
|
||||
```python
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
# This file here mimics how the InnerEye code would be used as a git submodule.
|
||||
|
||||
# Ensure that this path correctly points to the root folder of your repository.
|
||||
repository_root = Path(__file__).absolute()
|
||||
|
||||
|
||||
def add_package_to_sys_path_if_needed() -> None:
|
||||
"""
|
||||
Checks if the Python paths in sys.path already contain the /innereye-deeplearning folder. If not, add it.
|
||||
"""
|
||||
is_package_in_path = False
|
||||
innereye_submodule_folder = repository_root / "innereye-deeplearning"
|
||||
for path_str in sys.path:
|
||||
path = Path(path_str)
|
||||
if path == innereye_submodule_folder:
|
||||
is_package_in_path = True
|
||||
break
|
||||
if not is_package_in_path:
|
||||
print(f"Adding {innereye_submodule_folder} to sys.path")
|
||||
sys.path.append(str(innereye_submodule_folder))
|
||||
|
||||
|
||||
def main() -> None:
|
||||
try:
|
||||
from InnerEye import ML # noqa: 411
|
||||
except:
|
||||
add_package_to_sys_path_if_needed()
|
||||
|
||||
from InnerEye.ML import runner
|
||||
print(f"Repository root: {repository_root}")
|
||||
# Check here that yaml_config_file correctly points to your settings file
|
||||
runner.run(project_root=repository_root,
|
||||
yaml_config_file=Path("settings.yml"),
|
||||
post_cross_validation_hook=None)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
||||
```
|
||||
|
||||
## Adding new models
|
||||
|
||||
1. Set up a directory outside of InnerEye to holds your configs. In your repository root, you could have a folder
|
||||
`InnerEyeLocal`, parallel to the InnerEye submodule, alongside `settings.yml` and `myrunner.py`.
|
||||
|
||||
The example below creates a new flavour of the Glaucoma model in `InnerEye/ML/configs/classification/GlaucomaPublic`.
|
||||
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
|
||||
stored in `InnerEyeLocal/configs`
|
||||
1. Create folder `InnerEyeLocal/configs`
|
||||
1. Create a config file `InnerEyeLocal/configs/GlaucomaPublicExt.py` which extends the `GlaucomaPublic` class
|
||||
like this:
|
||||
```python
|
||||
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
|
||||
|
||||
class MyGlaucomaModel(GlaucomaPublic):
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
self.azure_dataset_id="name_of_your_dataset_on_azure"
|
||||
```
|
||||
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.configs` so this config
|
||||
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
|
||||
|
||||
#### Start Training
|
||||
Run the following to start a job on AzureML:
|
||||
```
|
||||
python myrunner.py --azureml=True --model=MyGlaucomaModel
|
||||
```
|
||||
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
|
|
@ -1,7 +1,8 @@
|
|||
# Sample Tasks
|
||||
|
||||
Two sample tasks for the classification and segmentation pipelines.
|
||||
This document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
|
||||
This document contains two sample tasks for the classification and segmentation pipelines.
|
||||
|
||||
The document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
|
||||
Before trying tp train these models, you should have followed steps to set up an [environment](environment.md) and [AzureML](setting_up_aml.md)
|
||||
|
||||
## Sample classification task: Glaucoma Detection on OCT volumes
|
||||
|
@ -9,61 +10,42 @@ Before trying tp train these models, you should have followed steps to set up an
|
|||
This example is based on the paper [A feature agnostic approach for glaucoma detection in OCT volumes](https://arxiv.org/pdf/1807.04855v3.pdf).
|
||||
|
||||
### Downloading and preparing the dataset
|
||||
1. The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.
|
||||
The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.
|
||||
|
||||
1. After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
|
||||
After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
|
||||
script on the extracted folder.
|
||||
```
|
||||
python create_dataset_csv.py /path/to/extracted/folder
|
||||
```
|
||||
This will convert the dataset to csv form and create a file `dataset.csv`.
|
||||
|
||||
1. Upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account,
|
||||
Finally, upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account,
|
||||
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). The dataset should go
|
||||
into a container called `datasets`, with a folder name of your choice (`name_of_your_dataset_on_azure` in the
|
||||
description below).
|
||||
|
||||
### Setting up training
|
||||
### Creating the model configuration and starting training
|
||||
|
||||
You have two options for running the Glaucoma model:
|
||||
- You can directly work on a fork of the InnerEye repository. In this case, you need to modify `AZURE_DATASET_ID`
|
||||
in `GlaucomaPublic.py` to match the dataset upload location, called `name_of_your_dataset_on_azure` above.
|
||||
If you choose that, you can start training via
|
||||
```
|
||||
python InnerEye/ML/runner.py --model=GlaucomaPublic --azureml=True
|
||||
```
|
||||
- Alternatively, you can create a separate runner and a separate model configuration folder. The steps described
|
||||
below refer to this route.
|
||||
|
||||
#### Setting up a second runner
|
||||
1. Set up a directory outside of InnerEye to holds your configs, as in
|
||||
[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
|
||||
beside InnerEye with files `settings.yml` and `ML/runner.py`.
|
||||
|
||||
#### Creating the classification model configuration
|
||||
The full configuration for the Glaucoma model is at `InnerEye/ML/configs/classification/GlaucomaPublic`.
|
||||
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
|
||||
stored in `InnerEyeLocal/ML`
|
||||
1. Create folder configs/classification under InnerEyeLocal/ML
|
||||
1. Create a config file called GlaucomaPublicExt.py there which extends the GlaucomaPublic class that looks like
|
||||
Next, you need to create a configuration file `InnerEye/ML/configs/MyGlaucoma.py`
|
||||
which extends the GlaucomaPublic class like this:
|
||||
```python
|
||||
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
|
||||
|
||||
|
||||
class GlaucomaPublicExt(GlaucomaPublic):
|
||||
class MyGlaucomaModel(GlaucomaPublic):
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
self.azure_dataset_id="name_of_your_dataset_on_azure"
|
||||
```
|
||||
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
|
||||
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
|
||||
The value for `self.azure_dataset_id` should match the dataset upload location, called
|
||||
`name_of_your_dataset_on_azure` above.
|
||||
|
||||
#### Start Training
|
||||
Run the following to start a job on AzureML
|
||||
Once that config is in place, you can start training in AzureML via
|
||||
```
|
||||
python InnerEyeLocal/ML/runner.py --azureml=True --model=GlaucomaPublicExt
|
||||
python InnerEye/ML/runner.py --model=MyGlaucomaModel --azureml=True
|
||||
```
|
||||
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
|
||||
|
||||
As an alternative to working with a fork of the repository, you can use InnerEye-DeepLearning via a submodule.
|
||||
Please check [here](innereye_as_submodule.md) for details.
|
||||
|
||||
|
||||
## Sample segmentation task: Segmentation of Lung CT
|
||||
|
||||
|
@ -71,46 +53,45 @@ This example is based on the [Lung CT Segmentation Challenge 2017](https://wiki.
|
|||
|
||||
### Downloading and preparing the dataset
|
||||
|
||||
1. The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
|
||||
1. The next step is to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
|
||||
parent folder, which we will call `datasets`. This file structure is expected by the converison tool.
|
||||
1. Use the [InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) to create a NIFTI dataset
|
||||
from the downloaded (DICOM) files.
|
||||
The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
|
||||
|
||||
You need to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
|
||||
parent folder, which we will call `datasets`. This file structure is expected by the conversion tool.
|
||||
|
||||
Next, use the
|
||||
[InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) commandline tools to create a
|
||||
NIFTI dataset from the downloaded (DICOM) files.
|
||||
After installing the tool, run
|
||||
```batch
|
||||
InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to the 'datasets' folder> --niftiDatasetDirectory=<output folder name for converted dataset> --dicomDatasetDirectory=<name of downloaded folder inside 'datasets'> --geoNorm 1;1;3
|
||||
```
|
||||
Now, you should have another folder under `datasets` with the converted Nifti files.
|
||||
The `geonorm` tag tells the tool to normalize the voxel sizes during conversion.
|
||||
1. Upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account,
|
||||
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets).
|
||||
|
||||
|
||||
### Setting up training
|
||||
1. Set up a directory outside of InnerEye to holds your configs, as in
|
||||
[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
|
||||
beside InnerEye with files settings.yml and ML/runner.py.
|
||||
|
||||
### Creating the segmentation model configuration
|
||||
The full configuration for the Lung model is at InnerEye/ML/configs/segmentation/Lung.
|
||||
All that needs to be done is change the dataset. We will do this by subclassing Lung in a new config
|
||||
stored in InnerEyeLocal/ML
|
||||
1. Create folder configs/segmentation under InnerEyeLocal/ML
|
||||
1. Create a config file called LungExt.py there which extends the GlaucomaPublic class that looks like this:
|
||||
Finally, upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account,
|
||||
see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). All files should go
|
||||
into a folder in the `datasets` container, for example `my_lung_dataset`. This folder name will need to go into the
|
||||
`azure_dataset_id` field of the model configuration, see below.
|
||||
|
||||
### Creating the model configuration and starting training
|
||||
You can then create a new model configuration, based on the template
|
||||
[Lung.py](../InnerEye/ML/configs/segmentation/Lung.py). To do this, create a file
|
||||
`InnerEye/ML/configs/segmentation/MyLungModel.py`, where you create a subclass of the template Lung model, and
|
||||
add the `azure_dataset_id` field (i.e., the name of the folder that contains the uploaded data from above),
|
||||
so that it looks like:
|
||||
```python
|
||||
from InnerEye.ML.configs.segmentation.Lung import Lung
|
||||
|
||||
class LungExt(Lung):
|
||||
from InnerEye.ML.configs.segmentation.Lung import Lung
|
||||
class MyLungModel(Lung):
|
||||
def __init__(self) -> None:
|
||||
super().__init__(azure_dataset_id="name_of_your_dataset_on_azure")
|
||||
```
|
||||
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
|
||||
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
|
||||
|
||||
### Start Training
|
||||
Run the following to start a job on AzureML
|
||||
super().__init__()
|
||||
self.azure_dataset_id = "my_lung_dataset"
|
||||
```
|
||||
python InnerEyeLocal/ML/runner.py --azureml=True --model=LungExt --train=True
|
||||
If you are using InnerEye as a submodule, please add this configuration in your private configuration folder,
|
||||
as described for the Glaucoma model [here](innereye_as_submodule.md).
|
||||
|
||||
You can now run the following command to start a job on AzureML:
|
||||
```
|
||||
python InnerEye/ML/runner.py --azureml=True --model=MyLungModel
|
||||
```
|
||||
See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче