Update documentation for submodules (#481)

The current documentation uses the submodule setup at many places. However, we found that it can easily mislead users. Switching to suggesting plain forking.
2021-06-09 17:35:51 +01:00 · 2021-06-09 17:35:51 +01:00 · 5af8b01dc1
--- a/README.md
+++ b/README.md
@ -100,9 +100,18 @@ Further detailed instructions, including setup in Azure, are here:
 1. [Debugging and monitoring models](docs/debugging_and_monitoring.md)
 1. [Model diagnostics](docs/model_diagnostics.md)
 1. [Move a model to a different workspace](docs/move_model.md)   
-1. [Deployment](docs/deploy_on_aml.md)
 1. [Working with FastMRI models](docs/fastmri.md)

+## Deployment
+We offer a companion set of open-sourced tools that help to integrate trained CT segmentation models with clinical
+software systems:
+- The [InnerEye-Gateway](https://github.com/microsoft/InnerEye-Gateway) is a Windows service running in a DICOM network,
+that can route anonymized DICOM images to an inference service.
+- The [InnerEye-Inference](https://github.com/microsoft/InnerEye-Inference) component offers a REST API that integrates
+with the InnnEye-Gateway, to run inference on InnerEye-DeepLearning models.
+
+Details can be found [here](docs/deploy_on_aml.md).
+
 ![docs/deployment.png](docs/deployment.png)

 ## More information
--- a/docs/environment.md
+++ b/docs/environment.md
@ -4,30 +4,14 @@

 In order to work with the solution, your OS environment will need [git](https://git-scm.com/) and [git lfs](https://git-lfs.github.com/) installed. Depending on the OS that you are running the installation instructions may vary. Please refer to respective documentation sections on the tools' websites for detailed instructions. 

-## Using the InnerEye code as a git submodule of your project
-You have two options for working with our codebase:
-* You can fork the InnerEye-DeepLearning repository, and work off that.
-* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
-submodule.
-
-If you go down the second route, here's the list of files you will need in your project (that's the same as those
-given in [this document](building_models.md))
-* `environment.yml`: Conda environment with python, pip, pytorch
-* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
-* A folder like `ML` that contains your additional code, and model configurations.
-* A file `ML/runner.py` that invokes the InnerEye training runner, but that points the code to your environment and Azure
-settings; see the [Building models](building_models.md) instructions for details.
-
-You then need to add the InnerEye code as a git submodule, in folder `innereye-submodule`:
-```shell script
-git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-submodule
-```
-Then configure your Python IDE to consume *both* your repository root *and* the `innereye-submodule` subfolder as inputs.
-In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and 
-`innereye-submodule` as well.
-
 We recommend using PyCharm or VSCode as the Python editor. 

+You have two options for working with our codebase:
+* You can fork the InnerEye-DeepLearning repository, and work off that. We recommend that because it is easiest to set up.
+* Or you can create your project that uses the InnerEye-DeepLearning code, and include InnerEye-DeepLearning as a git
+submodule. We only recommended that if you are very handy with Python. More details about this option 
+[are here](innereye_as_submodule.md).
+
 ## Windows Subsystem for Linux Setup
 When developing on a Windows machine, we recommend using [the Windows Subsystem for Linux, WSL2](https://docs.microsoft.com/en-us/windows/wsl/about).
 That's because PyTorch has better support for Linux.
--- a/docs/innereye_as_submodule.md
+++ b/docs/innereye_as_submodule.md
@ -0,0 +1,95 @@
+# Using the InnerEye code as a git submodule of your project
+
+You can use InnerEye as a submodule in your own project. 
+If you go down that route, here's the list of files you will need in your project (that's the same as those
+given in [this document](building_models.md))
+* `environment.yml`: Conda environment with python, pip, pytorch
+* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
+* A folder like `ML` that contains your additional code, and model configurations.
+* A file like `myrunner.py` that invokes the InnerEye training runner, but that points the code to your environment 
+and Azure settings; see the [Building models](building_models.md) instructions for details. Please see below for how
+`myrunner.py` should look like.
+
+You then need to add the InnerEye code as a git submodule, in folder `innereye-deeplearning`:
+```shell script
+git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-deeplearning
+```
+Then configure your Python IDE to consume *both* your repository root *and* the `innereye-deeplearning` subfolder as inputs.
+In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and 
+`innereye-deeplearning` as well.
+
+Example commandline runner that uses the InnerEye runner (called `myrunner.py` above):
+```python
+import sys
+from pathlib import Path
+
+
+# This file here mimics how the InnerEye code would be used as a git submodule. 
+
+# Ensure that this path correctly points to the root folder of your repository.
+repository_root = Path(__file__).absolute()
+
+
+def add_package_to_sys_path_if_needed() -> None:
+    """
+    Checks if the Python paths in sys.path already contain the /innereye-deeplearning folder. If not, add it.
+    """
+    is_package_in_path = False
+    innereye_submodule_folder = repository_root / "innereye-deeplearning"
+    for path_str in sys.path:
+        path = Path(path_str)
+        if path == innereye_submodule_folder:
+            is_package_in_path = True
+            break
+    if not is_package_in_path:
+        print(f"Adding {innereye_submodule_folder} to sys.path")
+        sys.path.append(str(innereye_submodule_folder))
+
+
+def main() -> None:
+    try:
+        from InnerEye import ML  # noqa: 411
+    except:
+        add_package_to_sys_path_if_needed()
+
+    from InnerEye.ML import runner
+    print(f"Repository root: {repository_root}")
+    # Check here that yaml_config_file correctly points to your settings file
+    runner.run(project_root=repository_root,
+               yaml_config_file=Path("settings.yml"),
+               post_cross_validation_hook=None)
+
+
+if __name__ == '__main__':
+    main()
+
+```
+
+## Adding new models
+
+1. Set up a directory outside of InnerEye to holds your configs. In your repository root, you could have a folder
+`InnerEyeLocal`, parallel to the InnerEye submodule, alongside `settings.yml` and `myrunner.py`.
+
+The example below creates a new flavour of the Glaucoma model in `InnerEye/ML/configs/classification/GlaucomaPublic`. 
+All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config 
+stored in `InnerEyeLocal/configs`
+1. Create folder `InnerEyeLocal/configs`
+1. Create a config file `InnerEyeLocal/configs/GlaucomaPublicExt.py` which extends the `GlaucomaPublic` class 
+like this:
+```python
+from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
+
+class MyGlaucomaModel(GlaucomaPublic):
+    def __init__(self) -> None:
+        super().__init__()
+        self.azure_dataset_id="name_of_your_dataset_on_azure"
+``` 
+1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.configs` so this config  
+is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
+
+#### Start Training
+Run the following to start a job on AzureML: 
+```
+python myrunner.py --azureml=True --model=MyGlaucomaModel
+```
+See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
--- a/docs/sample_tasks.md
+++ b/docs/sample_tasks.md
@ -1,7 +1,8 @@
 # Sample Tasks

-Two sample tasks for the classification and segmentation pipelines. 
-This document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
+This document contains two sample tasks for the classification and segmentation pipelines. 
+
+The document will walk through the steps in [Training Steps](building_models.md), but with specific examples for each task.
 Before trying tp train these models, you should have followed steps to set up an [environment](environment.md) and [AzureML](setting_up_aml.md)

 ## Sample classification task: Glaucoma Detection on OCT volumes
@ -9,61 +10,42 @@ Before trying tp train these models, you should have followed steps to set up an
 This example is based on the paper [A feature agnostic approach for glaucoma detection in OCT volumes](https://arxiv.org/pdf/1807.04855v3.pdf).

 ### Downloading and preparing the dataset
-1. The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.
+The dataset is available [here](https://zenodo.org/record/1481223#.Xs-ehzPiuM_) <sup>[[1]](#1)</sup>.

-1. After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
+After downloading and extracting the zip file, run the [create_glaucoma_dataset_csv.py](https://github.com/microsoft/InnerEye-DeepLearning/blob/main/InnerEye/Scripts/create_glaucoma_dataset_csv.py)
 script on the extracted folder.
 ```
 python create_dataset_csv.py /path/to/extracted/folder
 ```
 This will convert the dataset to csv form and create a file `dataset.csv`.
 
-1.  Upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account, 
+Finally, upload this folder (with the images and `dataset.csv`) to Azure Blob Storage. For details on creating a storage account, 
 see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). The dataset should go
 into a container called `datasets`, with a folder name of your choice (`name_of_your_dataset_on_azure` in the 
 description below).

-### Setting up training
+### Creating the model configuration and starting training

-You have two options for running the Glaucoma model:
- You can directly work on a fork of the InnerEye repository. In this case, you need to modify `AZURE_DATASET_ID`
-in `GlaucomaPublic.py` to match the dataset upload location, called `name_of_your_dataset_on_azure` above. 
-If you choose that, you can start training via
-```
-python InnerEye/ML/runner.py --model=GlaucomaPublic --azureml=True
-```
- Alternatively, you can create a separate runner and a separate model configuration folder. The steps described
-below refer to this route.
-
-#### Setting up a second runner
-1. Set up a directory outside of InnerEye to holds your configs, as in 
-[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal
- beside InnerEye with files `settings.yml` and `ML/runner.py`.
-
-#### Creating the classification model configuration
-The full configuration for the Glaucoma model is at `InnerEye/ML/configs/classification/GlaucomaPublic`. 
-All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config 
-stored in `InnerEyeLocal/ML`
-1. Create folder configs/classification under InnerEyeLocal/ML
-1. Create a config file called GlaucomaPublicExt.py there which extends the GlaucomaPublic class that looks like
+Next, you need to create a configuration file `InnerEye/ML/configs/MyGlaucoma.py`
+ which extends the GlaucomaPublic class like this:
 ```python
 from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
-
-
-class GlaucomaPublicExt(GlaucomaPublic):
+class MyGlaucomaModel(GlaucomaPublic):
    def __init__(self) -> None:
        super().__init__()
        self.azure_dataset_id="name_of_your_dataset_on_azure"
 ``` 
-1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config  
-is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
+The value for `self.azure_dataset_id` should match the dataset upload location, called 
+`name_of_your_dataset_on_azure` above. 

-#### Start Training
-Run the following to start a job on AzureML
+Once that config is in place, you can start training in AzureML via
 ```
-python InnerEyeLocal/ML/runner.py --azureml=True --model=GlaucomaPublicExt
+python InnerEye/ML/runner.py --model=MyGlaucomaModel --azureml=True
 ```
-See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.
+
+As an alternative to working with a fork of the repository, you can use InnerEye-DeepLearning via a submodule. 
+Please check [here](innereye_as_submodule.md) for details.
+

 ## Sample segmentation task: Segmentation of Lung CT
 
@ -71,46 +53,45 @@ This example is based on the [Lung CT Segmentation Challenge 2017](https://wiki.

 ### Downloading and preparing the dataset

-1. The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
-1. The next step is to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
- parent folder, which we will call `datasets`. This file structure is expected by the converison tool.
-1. Use the [InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) to create a NIFTI dataset
- from the downloaded (DICOM) files.
+The dataset <sup>[[3]](#3)[[4]](#4)</sup> can be downloaded [here](https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017#021ca3c9a0724b0d9df784f1699d35e2).
+
+You need to convert the dataset from DICOM-RT to NIFTI. Before this, place the downloaded dataset in another
+ parent folder, which we will call `datasets`. This file structure is expected by the conversion tool.
+
+Next, use the 
+[InnerEye-CreateDataset](https://github.com/microsoft/InnerEye-createdataset) commandline tools to create a 
+NIFTI dataset from the downloaded (DICOM) files.
 After installing the tool, run
 ```batch
 InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to the 'datasets' folder> --niftiDatasetDirectory=<output folder name for converted dataset> --dicomDatasetDirectory=<name of downloaded folder inside 'datasets'> --geoNorm 1;1;3
 ```
 Now, you should have another folder under `datasets` with the converted Nifti files.
 The `geonorm` tag tells the tool to normalize the voxel sizes during conversion.
-1.  Upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account, 
-see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). 
- 
-    
-### Setting up training
-1. Set up a directory outside of InnerEye to holds your configs, as in 
-[Setting Up Training](building_models.md#setting-up-training). After this step, you should have a folder InnerEyeLocal 
-beside InnerEye with files settings.yml and ML/runner.py.

-### Creating the segmentation model configuration
-The full configuration for the Lung model is at InnerEye/ML/configs/segmentation/Lung. 
-All that needs to be done is change the dataset. We will do this by subclassing Lung in a new config 
-stored in InnerEyeLocal/ML
-1. Create folder configs/segmentation under InnerEyeLocal/ML
-1. Create a config file called LungExt.py there which extends the GlaucomaPublic class that looks like this:
+Finally, upload this folder (with the images and dataset.csv) to Azure Blob Storage. For details on creating a storage account, 
+see [Setting up AzureML](setting_up_aml.md#step-4-create-a-storage-account-for-your-datasets). All files should go
+into a folder in the `datasets` container, for example `my_lung_dataset`. This folder name will need to go into the
+`azure_dataset_id` field of the model configuration, see below.
+
+### Creating the model configuration and starting training
+You can then create a new model configuration, based on the template 
+[Lung.py](../InnerEye/ML/configs/segmentation/Lung.py). To do this, create a file 
+`InnerEye/ML/configs/segmentation/MyLungModel.py`, where you create a subclass of the template Lung model, and
+add the `azure_dataset_id` field (i.e., the name of the folder that contains the uploaded data from above), 
+so that it looks like:
 ```python
-from InnerEye.ML.configs.segmentation.Lung import Lung  
-
-class LungExt(Lung):
+from InnerEye.ML.configs.segmentation.Lung import Lung
+class MyLungModel(Lung):
    def __init__(self) -> None:
-        super().__init__(azure_dataset_id="name_of_your_dataset_on_azure")
-``` 
-1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config  
-is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
-
-### Start Training
-Run the following to start a job on AzureML
+        super().__init__()
+        self.azure_dataset_id = "my_lung_dataset"
 ```
-python InnerEyeLocal/ML/runner.py --azureml=True --model=LungExt --train=True
+If you are using InnerEye as a submodule, please add this configuration in your private configuration folder, 
+as described for the Glaucoma model [here](innereye_as_submodule.md).
+
+You can now run the following command to start a job on AzureML:
+```
+python InnerEye/ML/runner.py --azureml=True --model=MyLungModel
 ```
 See [Model Training](building_models.md) for details on training outputs, resuming training, testing models and model ensembles.