* Fix shell lexer name

* Update CHANGELOG

* Fix CHANGELOG

* Fix "html_static_path entry '_static' does not exist"

* Clean up preprocess script

* Fix link to InnerEye-DataQuality

* Use shutil.copy to copy files

* Remove extra info from CHANGELOG

* Fix broken link to LICENSE

* Fix lexer name for YAML

* Remove colons from headers

* Fix InnerEye module not being found
This commit is contained in:
Fernando Pérez-García 2022-03-22 09:46:15 +00:00 коммит произвёл GitHub
Родитель 95d8b72ae6
Коммит 45e7d5ff4d
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
10 изменённых файлов: 142 добавлений и 143 удалений

Просмотреть файл

@ -37,7 +37,7 @@ jobs that run in AzureML.
`NIH_COVID_BYOL` to specify the name of the SSL training dataset. `NIH_COVID_BYOL` to specify the name of the SSL training dataset.
- ([#560](https://github.com/microsoft/InnerEye-DeepLearning/pull/560)) Added pre-commit hooks. - ([#560](https://github.com/microsoft/InnerEye-DeepLearning/pull/560)) Added pre-commit hooks.
- ([#619](https://github.com/microsoft/InnerEye-DeepLearning/pull/619)) Add DeepMIL PANDA - ([#619](https://github.com/microsoft/InnerEye-DeepLearning/pull/619)) Add DeepMIL PANDA
- ([#559](https://github.com/microsoft/InnerEye-DeepLearning/pull/559)) Adding the accompanying code for the ["Active label cleaning: Improving dataset quality under resource constraints"](https://arxiv.org/abs/2109.00574) paper. The code can be found in the [InnerEye-DataQuality](InnerEye-DataQuality/README.md) subfolder. It provides tools for training noise robust models, running label cleaning simulation and loading our label cleaning benchmark datasets. - ([#559](https://github.com/microsoft/InnerEye-DeepLearning/pull/559)) Adding the accompanying code for the ["Active label cleaning: Improving dataset quality under resource constraints"](https://arxiv.org/abs/2109.00574) paper.
- ([#589](https://github.com/microsoft/InnerEye-DeepLearning/pull/589)) Add `LightningContainer.update_azure_config()` - ([#589](https://github.com/microsoft/InnerEye-DeepLearning/pull/589)) Add `LightningContainer.update_azure_config()`
hook to enable overriding `AzureConfig` parameters from a container (e.g. `experiment_name`, `cluster`, `num_nodes`). hook to enable overriding `AzureConfig` parameters from a container (e.g. `experiment_name`, `cluster`, `num_nodes`).
- ([#617](https://github.com/microsoft/InnerEye-DeepLearning/pull/617)) Commandline flag `pl_check_val_every_n_epoch` to control how often validation is happening - ([#617](https://github.com/microsoft/InnerEye-DeepLearning/pull/617)) Commandline flag `pl_check_val_every_n_epoch` to control how often validation is happening
@ -97,6 +97,7 @@ gets uploaded to AzureML, by skipping all test folders.
### Fixed ### Fixed
- ([#699](https://github.com/microsoft/InnerEye-DeepLearning/pull/699)) Fix Sphinx warnings.
- ([#682](https://github.com/microsoft/InnerEye-DeepLearning/pull/682)) Ensure the shape of input patches is compatible with model constraints. - ([#682](https://github.com/microsoft/InnerEye-DeepLearning/pull/682)) Ensure the shape of input patches is compatible with model constraints.
- ([#681](https://github.com/microsoft/InnerEye-DeepLearning/pull/681)) Pad model outputs if they are smaller than the inputs. - ([#681](https://github.com/microsoft/InnerEye-DeepLearning/pull/681)) Pad model outputs if they are smaller than the inputs.
- ([#683](https://github.com/microsoft/InnerEye-DeepLearning/pull/683)) Fix missing separator error in docs Makefile. - ([#683](https://github.com/microsoft/InnerEye-DeepLearning/pull/683)) Fix missing separator error in docs Makefile.

Просмотреть файл

@ -106,7 +106,7 @@ Further detailed instructions, including setup in Azure, are here:
1. [Model diagnostics](docs/model_diagnostics.md) 1. [Model diagnostics](docs/model_diagnostics.md)
1. [Move a model to a different workspace](docs/move_model.md) 1. [Move a model to a different workspace](docs/move_model.md)
1. [Working with FastMRI models](docs/fastmri.md) 1. [Working with FastMRI models](docs/fastmri.md)
1. [Active label cleaning and noise robust learning toolbox](InnerEye-DataQuality/README.md) 1. [Active label cleaning and noise robust learning toolbox](https://github.com/microsoft/InnerEye-DeepLearning/blob/1606729c7a16e1bfeb269694314212b6e2737939/InnerEye-DataQuality/README.md)
## Deployment ## Deployment
@ -133,7 +133,7 @@ Details can be found [here](docs/deploy_on_aml.md).
## Licensing ## Licensing
[MIT License](LICENSE) [MIT License](/LICENSE)
**You are responsible for the performance, the necessary testing, and if needed any regulatory clearance for **You are responsible for the performance, the necessary testing, and if needed any regulatory clearance for
any of the models produced by this toolbox.** any of the models produced by this toolbox.**
@ -157,7 +157,7 @@ Oktay O., Nanavati J., Schwaighofer A., Carter D., Bristow M., Tanno R., Jena R.
Bannur S., Oktay O., Bernhardt M, Schwaighofer A., Jena R., Nushi B., Wadhwani S., Nori A., Natarajan K., Ashraf S., Alvarez-Valle J., Castro D. C.: Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs. ICML 2021 Workshop on Interpretable Machine Learning in Healthcare. [https://arxiv.org/abs/2107.06618](https://arxiv.org/abs/2107.06618) Bannur S., Oktay O., Bernhardt M, Schwaighofer A., Jena R., Nushi B., Wadhwani S., Nori A., Natarajan K., Ashraf S., Alvarez-Valle J., Castro D. C.: Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs. ICML 2021 Workshop on Interpretable Machine Learning in Healthcare. [https://arxiv.org/abs/2107.06618](https://arxiv.org/abs/2107.06618)
Bernhardt M., Castro D. C., Tanno R., Schwaighofer A., Tezcan K. C., Monteiro M., Bannur S., Lungren M., Nori S., Glocker B., Alvarez-Valle J., Oktay. O: Active label cleaning for improved dataset quality under resource constraints. [https://www.nature.com/articles/s41467-022-28818-3](https://www.nature.com/articles/s41467-022-28818-3). Accompagnying code [InnerEye-DataQuality](InnerEye-DataQuality/README.md) Bernhardt M., Castro D. C., Tanno R., Schwaighofer A., Tezcan K. C., Monteiro M., Bannur S., Lungren M., Nori S., Glocker B., Alvarez-Valle J., Oktay. O: Active label cleaning for improved dataset quality under resource constraints. [https://www.nature.com/articles/s41467-022-28818-3](https://www.nature.com/articles/s41467-022-28818-3). Accompagnying code [InnerEye-DataQuality](https://github.com/microsoft/InnerEye-DeepLearning/blob/1606729c7a16e1bfeb269694314212b6e2737939/InnerEye-DataQuality/README.md)
## Contributing ## Contributing
@ -175,5 +175,3 @@ contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additio
## This toolbox is maintained by the ## This toolbox is maintained by the
[Microsoft Medical Image Analysis team](https://www.microsoft.com/en-us/research/project/medical-image-analysis/). [Microsoft Medical Image Analysis team](https://www.microsoft.com/en-us/research/project/medical-image-analysis/).

Просмотреть файл

@ -9,8 +9,8 @@ create a directory `InnerEyeLocal` beside `InnerEye`.
As well as your configurations (dealt with below) you will need these files: As well as your configurations (dealt with below) you will need these files:
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings. * `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings.
The value of `extra_code_directory` should (in our example) be `'InnerEyeLocal'`, The value of `extra_code_directory` should (in our example) be `'InnerEyeLocal'`,
and model_configs_namespace should be `'InnerEyeLocal.ML.configs'`. and model_configs_namespace should be `'InnerEyeLocal.ML.configs'`.
* A folder like `InnerEyeLocal` that contains your additional code, and model configurations. * A folder like `InnerEyeLocal` that contains your additional code, and model configurations.
* A file `InnerEyeLocal/ML/runner.py` that invokes the InnerEye training runner, but that points the code to your environment and Azure * A file `InnerEyeLocal/ML/runner.py` that invokes the InnerEye training runner, but that points the code to your environment and Azure
settings. settings.
@ -38,7 +38,7 @@ You will find a variety of model configurations [here](/InnerEye/ML/configs/segm
in `Base.py` reference open-sourced data and can be used as they are. Those ending in `Base.py` in `Base.py` reference open-sourced data and can be used as they are. Those ending in `Base.py`
are partially specified, and can be used by having other model configurations inherit from them and supply the missing are partially specified, and can be used by having other model configurations inherit from them and supply the missing
parameter values: a dataset ID at least, and optionally other values. For example, a `Prostate` model might inherit parameter values: a dataset ID at least, and optionally other values. For example, a `Prostate` model might inherit
very simply from `ProstateBase` by creating `Prostate.py` in the directory `InnerEyeLocal/ML/configs/segmentation` very simply from `ProstateBase` by creating `Prostate.py` in the directory `InnerEyeLocal/ML/configs/segmentation`
with the following contents: with the following contents:
```python ```python
from InnerEye.ML.configs.segmentation.ProstateBase import ProstateBase from InnerEye.ML.configs.segmentation.ProstateBase import ProstateBase
@ -51,8 +51,8 @@ class Prostate(ProstateBase):
azure_dataset_id="name-of-your-AML-dataset-with-prostate-data") azure_dataset_id="name-of-your-AML-dataset-with-prostate-data")
``` ```
The allowed parameters and their meanings are defined in [`SegmentationModelBase`](/InnerEye/ML/config.py). The allowed parameters and their meanings are defined in [`SegmentationModelBase`](/InnerEye/ML/config.py).
The class name must be the same as the basename of the file containing it, so `Prostate.py` must contain `Prostate`. The class name must be the same as the basename of the file containing it, so `Prostate.py` must contain `Prostate`.
In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.ML.configs` so this config
is found by the runner. is found by the runner.
A `Head and Neck` model might inherit from `HeadAndNeckBase` by creating `HeadAndNeck.py` with the following contents: A `Head and Neck` model might inherit from `HeadAndNeckBase` by creating `HeadAndNeck.py` with the following contents:
@ -70,11 +70,11 @@ class HeadAndNeck(HeadAndNeckBase):
### Training a new model ### Training a new model
* Set up your model configuration as above and update `azure_dataset_id` to the name of your Dataset in the AML workspace. * Set up your model configuration as above and update `azure_dataset_id` to the name of your Dataset in the AML workspace.
It is enough to put your dataset into blob storage. The dataset should be a contained in a folder at the root of the datasets container. It is enough to put your dataset into blob storage. The dataset should be a contained in a folder at the root of the datasets container.
The InnerEye runner will check if there is a dataset in the AzureML workspace already, and if not, generate it directly from blob storage. The InnerEye runner will check if there is a dataset in the AzureML workspace already, and if not, generate it directly from blob storage.
* Train a new model, for example `Prostate`: * Train a new model, for example `Prostate`:
```shell script ```shell
python InnerEyeLocal/ML/runner.py --azureml --model=Prostate python InnerEyeLocal/ML/runner.py --azureml --model=Prostate
``` ```
@ -101,12 +101,12 @@ Conversely, for command line options that take a boolean argument, and that are
### Training using multiple machines ### Training using multiple machines
To speed up training in AzureML, you can use multiple machines, by specifying the additional To speed up training in AzureML, you can use multiple machines, by specifying the additional
`--num_nodes` argument. For example, to use 2 machines to train, specify: `--num_nodes` argument. For example, to use 2 machines to train, specify:
```shell script ```shell
python InnerEyeLocal/ML/runner.py --azureml --model=Prostate --num_nodes=2 python InnerEyeLocal/ML/runner.py --azureml --model=Prostate --num_nodes=2
``` ```
On each of the 2 machines, all available GPUs will be used. Model inference will always use only one machine. On each of the 2 machines, all available GPUs will be used. Model inference will always use only one machine.
For the Prostate model, we observed a 2.8x speedup for model training when using 4 nodes, and a 1.65x speedup For the Prostate model, we observed a 2.8x speedup for model training when using 4 nodes, and a 1.65x speedup
when using 2 nodes. when using 2 nodes.
### AzureML Run Hierarchy ### AzureML Run Hierarchy
@ -127,8 +127,8 @@ at the same time (provided that the cluster has capacity). This means that a com
takes as long as a single training run. takes as long as a single training run.
To start cross validation, you can either modify the `number_of_cross_validation_splits` property of your model, To start cross validation, you can either modify the `number_of_cross_validation_splits` property of your model,
or supply it on the command line: provide all the usual switches, and add `--number_of_cross_validation_splits=N`, or supply it on the command line: provide all the usual switches, and add `--number_of_cross_validation_splits=N`,
for some `N` greater than 1; a value of 5 is typical. This will start a for some `N` greater than 1; a value of 5 is typical. This will start a
[HyperDrive run](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters): a parent [HyperDrive run](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters): a parent
AzureML job, with `N` child runs that will execute in parallel. You can see the child runs in the AzureML UI in the AzureML job, with `N` child runs that will execute in parallel. You can see the child runs in the AzureML UI in the
"Child Runs" tab. "Child Runs" tab.
@ -144,12 +144,12 @@ To train further with an already-created model, give the above command with the
--run_recovery_id=foo_bar:foo_bar_12345_abcd --run_recovery_id=foo_bar:foo_bar_12345_abcd
``` ```
The run recovery ID is of the form "experiment_id:run_id". When you trained your original model, it will have been The run recovery ID is of the form "experiment_id:run_id". When you trained your original model, it will have been
queued as a "Run" inside of an "Experiment". The experiment will be given a name derived from the branch name - for queued as a "Run" inside of an "Experiment". The experiment will be given a name derived from the branch name - for
example, branch `foo/bar` will queue a run in experiment `foo_bar`. Inside the "Tags" section of your run, you should example, branch `foo/bar` will queue a run in experiment `foo_bar`. Inside the "Tags" section of your run, you should
see an element `run_recovery_id`. It will look something like `foo_bar:foo_bar_12345_abcd`. see an element `run_recovery_id`. It will look something like `foo_bar:foo_bar_12345_abcd`.
If you are recovering a HyperDrive run, the value of `--run_recovery_id` should for the parent, If you are recovering a HyperDrive run, the value of `--run_recovery_id` should for the parent,
and `--number_of_cross_validation_splits` should have the same value as in the recovered run. and `--number_of_cross_validation_splits` should have the same value as in the recovered run.
For example: For example:
``` ```
--run_recovery_id=foo_bar:HD_55d4beef-7be9-45d7-89a5-1acf1f99078a --start_epoch=120 --number_of_cross_validation_splits=5 --run_recovery_id=foo_bar:HD_55d4beef-7be9-45d7-89a5-1acf1f99078a --start_epoch=120 --number_of_cross_validation_splits=5
@ -169,38 +169,38 @@ You will need to specify the registered model to run on using the `model_id` arg
version by clicking on `Registered Models` on the Details tab of a run in the AzureML UI. version by clicking on `Registered Models` on the Details tab of a run in the AzureML UI.
The model id is of the form "model_name:model_version". Thus your command should look like this: The model id is of the form "model_name:model_version". Thus your command should look like this:
```shell script ```shell
python Inner/ML/runner.py --azureml --model=Prostate --cluster=my_cluster_name \ python Inner/ML/runner.py --azureml --model=Prostate --cluster=my_cluster_name \
--no-train --model_id=Prostate:1 --no-train --model_id=Prostate:1
``` ```
#### From local checkpoints: #### From local checkpoints:
To evaluate a model using one or more local checkpoints, use the `local_weights_path` argument to specify the path(s) to the To evaluate a model using one or more local checkpoints, use the `local_weights_path` argument to specify the path(s) to the
model checkpoint(s) on the local disk. model checkpoint(s) on the local disk.
```shell script ```shell
python Inner/ML/runner.py --model=Prostate --no-train --local_weights_path=path_to_your_checkpoint python Inner/ML/runner.py --model=Prostate --no-train --local_weights_path=path_to_your_checkpoint
``` ```
To run on multiple checkpoints (if you have trained an ensemble model), specify each checkpoint using the argument To run on multiple checkpoints (if you have trained an ensemble model), specify each checkpoint using the argument
`local_weights_path`. `local_weights_path`.
```shell script ```shell
python Inner/ML/runner.py --model=Prostate --no-train --local_weights_path=path_to_first_checkpoint,path_to_second_checkpoint python Inner/ML/runner.py --model=Prostate --no-train --local_weights_path=path_to_first_checkpoint,path_to_second_checkpoint
``` ```
#### From URLs: #### From URLs:
To evaluate a model using one or more checkpoints each specified by a URL, use the `weights_url` argument to specify the To evaluate a model using one or more checkpoints each specified by a URL, use the `weights_url` argument to specify the
url(s) from which the model checkpoint(s) should be downloaded. url(s) from which the model checkpoint(s) should be downloaded.
```shell script ```shell
python Inner/ML/runner.py --model=Prostate --no-train --weights_url=url_for_your_checkpoint python Inner/ML/runner.py --model=Prostate --no-train --weights_url=url_for_your_checkpoint
``` ```
To run on multiple checkpoints (if you have trained an ensemble model), specify each checkpoint using the argument To run on multiple checkpoints (if you have trained an ensemble model), specify each checkpoint using the argument
`weights_url`. `weights_url`.
```shell script ```shell
python Inner/ML/runner.py --model=Prostate --no-train --weights_url=url_for_first_checkpoint,url_for_second_checkpoint python Inner/ML/runner.py --model=Prostate --no-train --weights_url=url_for_first_checkpoint,url_for_second_checkpoint
``` ```
#### Running a registered AzureML model on a single image on the local disk #### Running a registered AzureML model on a single image on the local disk
To submit an AzureML run to apply a model to a single image on your local disc, To submit an AzureML run to apply a model to a single image on your local disc,
you can use the script `submit_for_inference.py`, with a command of this form: you can use the script `submit_for_inference.py`, with a command of this form:
```shell script ```shell
python InnerEye/Scripts/submit_for_inference.py --image_file ~/somewhere/ct.nii.gz --model_id Prostate:555 \ python InnerEye/Scripts/submit_for_inference.py --image_file ~/somewhere/ct.nii.gz --model_id Prostate:555 \
--settings ../somewhere_else/settings.yml --download_folder ~/my_existing_folder --settings ../somewhere_else/settings.yml --download_folder ~/my_existing_folder
``` ```
@ -208,8 +208,8 @@ python InnerEye/Scripts/submit_for_inference.py --image_file ~/somewhere/ct.nii.
### Model Ensembles ### Model Ensembles
An ensemble model will be created automatically and registered in the AzureML model registry whenever cross-validation An ensemble model will be created automatically and registered in the AzureML model registry whenever cross-validation
models are trained. The ensemble model creation is done by the child whose `cross_validation_split_index` is 0; models are trained. The ensemble model creation is done by the child whose `cross_validation_split_index` is 0;
you can identify this child by looking at the "Child Runs" tab in the parent run page in AzureML. you can identify this child by looking at the "Child Runs" tab in the parent run page in AzureML.
To find the registered ensemble model, find the Hyperdrive parent run in AzureML. In the "Details" tab, there is an To find the registered ensemble model, find the Hyperdrive parent run in AzureML. In the "Details" tab, there is an
entry for "Registered models", that links to the ensemble model that was just created. Note that each of the child runs entry for "Registered models", that links to the ensemble model that was just created. Note that each of the child runs
@ -225,12 +225,12 @@ and the generated posteriors are passed to the usual model testing downstream pi
Once your HyperDrive AzureML runs are completed, you can visualize the results by running the Once your HyperDrive AzureML runs are completed, you can visualize the results by running the
[`plot_cross_validation.py`](/InnerEye/ML/visualizers/plot_cross_validation.py) script locally: [`plot_cross_validation.py`](/InnerEye/ML/visualizers/plot_cross_validation.py) script locally:
```shell script ```shell
python InnerEye/ML/visualizers/plot_cross_validation.py --run_recovery_id ... --epoch ... python InnerEye/ML/visualizers/plot_cross_validation.py --run_recovery_id ... --epoch ...
``` ```
filling in the run recovery ID of the parent run and the epoch number (one of the test epochs, e.g. the last epoch) filling in the run recovery ID of the parent run and the epoch number (one of the test epochs, e.g. the last epoch)
for which you want results plotted. The script will also output several `..._outliers.txt` file with all of the outliers for which you want results plotted. The script will also output several `..._outliers.txt` file with all of the outliers
across the splits and a portal query to across the splits and a portal query to
find them in the production portal, and run statistical tests to compute the significance of differences between scores find them in the production portal, and run statistical tests to compute the significance of differences between scores
across the splits and with respect to other runs that you specify. This is done for you during across the splits and with respect to other runs that you specify. This is done for you during
the run itself (see below), but you can use the script post hoc to compare arbitrary runs the run itself (see below), but you can use the script post hoc to compare arbitrary runs
@ -241,18 +241,18 @@ and [`mann_whitney_test.py`](/InnerEye/Common/Statistics/mann_whitney_test.py).
## Where are my outputs and models? ## Where are my outputs and models?
* AzureML writes all its results to the storage account you have specified. Inside of that account, you will * AzureML writes all its results to the storage account you have specified. Inside of that account, you will
find a container named `azureml`. You can access that with find a container named `azureml`. You can access that with
[Azure StorageExplorer](https://azure.microsoft.com/en-us/features/storage-explorer/). The checkpoints and other [Azure StorageExplorer](https://azure.microsoft.com/en-us/features/storage-explorer/). The checkpoints and other
files of a run will be in folder `azureml/ExperimentRun/dcid.my_run_id`, where `my_run_id` is the "Run Id" visible in files of a run will be in folder `azureml/ExperimentRun/dcid.my_run_id`, where `my_run_id` is the "Run Id" visible in
the "Details" section of the run. If you want to download all the results files or a large subset of them, the "Details" section of the run. If you want to download all the results files or a large subset of them,
we recommend you access them this way. we recommend you access them this way.
* The results can also be viewed in the "Outputs and Logs" section of the run. This is likely to be more * The results can also be viewed in the "Outputs and Logs" section of the run. This is likely to be more
convenient for viewing and inspecting single files. convenient for viewing and inspecting single files.
* All files that the model training writes to the `./outputs` folder are automatically uploaded at the end of * All files that the model training writes to the `./outputs` folder are automatically uploaded at the end of
the AzureML training job, and are put into `outputs` in Blob Storage and in the run itself. the AzureML training job, and are put into `outputs` in Blob Storage and in the run itself.
Similarly, what the model training writes to the `./logs` folder gets uploaded to `logs`. Similarly, what the model training writes to the `./logs` folder gets uploaded to `logs`.
* You can monitor the file system that is mounted on the compute node, by navigating to your * You can monitor the file system that is mounted on the compute node, by navigating to your
storage account in Azure. In the blade, click on "Files" and, navigate through to `azureml/azureml/my_run_id`. This storage account in Azure. In the blade, click on "Files" and, navigate through to `azureml/azureml/my_run_id`. This
will show all files that are mounted as the working directory on the compute VM. will show all files that are mounted as the working directory on the compute VM.
The organization of the `outputs` directory is as follows: The organization of the `outputs` directory is as follows:
@ -281,25 +281,25 @@ the `metrics.csv` files of the current run and the comparison run(s).
and `test_dataset.csv`, `train_dataset.csv` and `val_dataset.csv` for those subsets of it. and `test_dataset.csv`, `train_dataset.csv` and `val_dataset.csv` for those subsets of it.
* `BaselineComparisonWilcoxonSignedRankTestResults.txt`, containing the results of comparisons * `BaselineComparisonWilcoxonSignedRankTestResults.txt`, containing the results of comparisons
between the current run and any specified baselines (earlier runs) to compare with. Each paragraph of that file compares two models and between the current run and any specified baselines (earlier runs) to compare with. Each paragraph of that file compares two models and
indicates, for each structure, when the Dice scores for the second model are significantly better indicates, for each structure, when the Dice scores for the second model are significantly better
or worse than the first. For full details, see the or worse than the first. For full details, see the
[source code](../InnerEye/Common/Statistics/wilcoxon_signed_rank_test.py). [source code](../InnerEye/Common/Statistics/wilcoxon_signed_rank_test.py).
* A directory `scatterplots`, containing a `png` file for every pairing of the current model * A directory `scatterplots`, containing a `png` file for every pairing of the current model
with one of the baselines. Each one is named `AAA_vs_BBB.png`, where `AAA` and `BBB` are the run IDs with one of the baselines. Each one is named `AAA_vs_BBB.png`, where `AAA` and `BBB` are the run IDs
of the two models. Each plot shows the Dice scores on the test set for the models. of the two models. Each plot shows the Dice scores on the test set for the models.
* For both segmentation and classification models an IPython Notebook `report.ipynb` will be generated in the * For both segmentation and classification models an IPython Notebook `report.ipynb` will be generated in the
`outputs` directory. `outputs` directory.
* For segmentation models, this report is based on the full image results of the model checkpoint that performed * For segmentation models, this report is based on the full image results of the model checkpoint that performed
the best on the validation set. This report will contain detailed metrics per structure, and outliers to help the best on the validation set. This report will contain detailed metrics per structure, and outliers to help
model development. model development.
* For classification models, the report is based on the validation and test results from the last epoch. It shows * For classification models, the report is based on the validation and test results from the last epoch. It shows
metrics on the validation and test sets, ROC and PR Curves, and a list of the best and worst performing images metrics on the validation and test sets, ROC and PR Curves, and a list of the best and worst performing images
from the test set. from the test set.
Ensemble models are created by the zero'th child (with `cross_validation_split_index=0`) in each Ensemble models are created by the zero'th child (with `cross_validation_split_index=0`) in each
cross-validation run. Results from inference on the test and validation sets are uploaded to the cross-validation run. Results from inference on the test and validation sets are uploaded to the
parent run, and can be found in `epoch_NNN` directories as above. parent run, and can be found in `epoch_NNN` directories as above.
In addition, various scores and plots from the ensemble and from individual child In addition, various scores and plots from the ensemble and from individual child
runs are uploaded to the parent run, in the `CrossValResults` directory. This contains: runs are uploaded to the parent run, in the `CrossValResults` directory. This contains:
* Subdirectories named 0, 1, 2, ... for all the child runs including the zero'th one, as well * Subdirectories named 0, 1, 2, ... for all the child runs including the zero'th one, as well
as `ENSEMBLE`, containing their respective `epoch_NNN` directories. as `ENSEMBLE`, containing their respective `epoch_NNN` directories.
@ -320,24 +320,24 @@ scatterplots for the ensemble, as described above for single runs.
### Augmentations for classification models. ### Augmentations for classification models.
For classification models, you can define an augmentation pipeline to apply to your images input (resp. segmentations) at For classification models, you can define an augmentation pipeline to apply to your images input (resp. segmentations) at
training, validation and test time. In order to define such a series of transformations, you will need to overload the training, validation and test time. In order to define such a series of transformations, you will need to overload the
`get_image_transform` (resp. `get_segmention_transform`) method of your config class. This method expects you to return `get_image_transform` (resp. `get_segmention_transform`) method of your config class. This method expects you to return
a `ModelTransformsPerExecutionMode`, that maps each execution mode to one transform function. We also provide the a `ModelTransformsPerExecutionMode`, that maps each execution mode to one transform function. We also provide the
`ImageTransformationPipeline` a class that creates a pipeline of transforms, from a list of individual transforms and `ImageTransformationPipeline` a class that creates a pipeline of transforms, from a list of individual transforms and
ensures the correct conversion of 2D or 3D PIL.Image or tensor inputs to the obtained pipeline. ensures the correct conversion of 2D or 3D PIL.Image or tensor inputs to the obtained pipeline.
`ImageTransformationPipeline` takes two arguments for its constructor: `ImageTransformationPipeline` takes two arguments for its constructor:
* `transforms`: a list of image transforms, in particular you can feed in standard [torchvision transforms](https://pytorch.org/vision/0.8/transforms.html) or * `transforms`: a list of image transforms, in particular you can feed in standard [torchvision transforms](https://pytorch.org/vision/0.8/transforms.html) or
any other transforms as long as they support an input `[Z, C, H, W]` (where Z is the 3rd dimension (1 for 2D images), any other transforms as long as they support an input `[Z, C, H, W]` (where Z is the 3rd dimension (1 for 2D images),
C number of channels, H and W the height and width of each 2D slide - this is supported for standard torchvision C number of channels, H and W the height and width of each 2D slide - this is supported for standard torchvision
transforms.). You can also define your own transforms as long as they expect such a `[Z, C, H, W]` input. You can transforms.). You can also define your own transforms as long as they expect such a `[Z, C, H, W]` input. You can
find some examples of custom transforms class in `InnerEye/ML/augmentation/image_transforms.py`. find some examples of custom transforms class in `InnerEye/ML/augmentation/image_transforms.py`.
* `use_different_transformation_per_channel`: if True, apply a different version of the augmentation pipeline * `use_different_transformation_per_channel`: if True, apply a different version of the augmentation pipeline
for each channel. If False, applies the same transformation to each channel, separately. Default to False. for each channel. If False, applies the same transformation to each channel, separately. Default to False.
Below you can find an example of `get_image_transform` that would resize your input images to 256 x 256, and at Below you can find an example of `get_image_transform` that would resize your input images to 256 x 256, and at
training time only apply random rotation of +/- 10 degrees, and apply some brightness distortion, training time only apply random rotation of +/- 10 degrees, and apply some brightness distortion,
using standard pytorch vision transforms. using standard pytorch vision transforms.
```python ```python
@ -353,9 +353,9 @@ def get_image_transform(self) -> ModelTransformsPerExecutionMode:
### Segmentation Models and Inference. ### Segmentation Models and Inference.
By default when building a segmentation model a full image inference will be performed on the validation and test data sets; By default when building a segmentation model a full image inference will be performed on the validation and test data sets;
and when building an ensemble model, a full image inference will be performed on the test data set only (because the and when building an ensemble model, a full image inference will be performed on the test data set only (because the
training and validation sets are first combined before being split into each of the folds). training and validation sets are first combined before being split into each of the folds).
There are a total of six command line options for controlling this in more detail. There are a total of six command line options for controlling this in more detail.
For non-ensemble models use any of the following command line options to enable or disable inference on training, test, or validation data sets: For non-ensemble models use any of the following command line options to enable or disable inference on training, test, or validation data sets:

Просмотреть файл

@ -2,21 +2,21 @@
### Using TensorBoard to monitor AzureML jobs ### Using TensorBoard to monitor AzureML jobs
* **Existing jobs**: execute [`InnerEye/Azure/tensorboard_monitor.py`](/InnerEye/Azure/tensorboard_monitor.py) * **Existing jobs**: execute [`InnerEye/Azure/tensorboard_monitor.py`](/InnerEye/Azure/tensorboard_monitor.py)
with either an experiment id `--experiment_name` or a list of run ids `--run_ids job1,job2,job3`. with either an experiment id `--experiment_name` or a list of run ids `--run_ids job1,job2,job3`.
If an experiment id is provided then all of the runs in that experiment will be monitored. Additionally You can also If an experiment id is provided then all of the runs in that experiment will be monitored. Additionally You can also
filter runs by type by the run's status, setting the `--filters Running,Completed` parameter to a subset of filter runs by type by the run's status, setting the `--filters Running,Completed` parameter to a subset of
`[Running, Completed, Failed, Canceled]`. By default Failed and Canceled runs are excluded. `[Running, Completed, Failed, Canceled]`. By default Failed and Canceled runs are excluded.
To quickly access this script from PyCharm, there is a template PyCharm run configuration To quickly access this script from PyCharm, there is a template PyCharm run configuration
`Template: Tensorboard monitoring` in the repository. Create a copy of that, and modify the commandline `Template: Tensorboard monitoring` in the repository. Create a copy of that, and modify the commandline
arguments with your jobs to monitor. arguments with your jobs to monitor.
* **New jobs**: when queuing a new AzureML job, pass `--tensorboard`, which will automatically start a new TensorBoard * **New jobs**: when queuing a new AzureML job, pass `--tensorboard`, which will automatically start a new TensorBoard
session, monitoring the newly queued job. session, monitoring the newly queued job.
### Resource Monitor ### Resource Monitor
GPU and CPU usage can be monitored throughout the execution of a run (local and AML) by setting the monitoring interval GPU and CPU usage can be monitored throughout the execution of a run (local and AML) by setting the monitoring interval
for the resource monitor eg: `--monitoring_interval_seconds=5`. This will spawn a separate process at the start of the for the resource monitor eg: `--monitoring_interval_seconds=5`. This will spawn a separate process at the start of the
run which will log both GPU and CPU utilization and memory consumption. These metrics will be written to AzureML as run which will log both GPU and CPU utilization and memory consumption. These metrics will be written to AzureML as
well as a separate TensorBoard logs file under `Diagnostics`. well as a separate TensorBoard logs file under `Diagnostics`.
@ -26,12 +26,12 @@ well as a separate TensorBoard logs file under `Diagnostics`.
For full debugging of any non-trivial model, you will need a GPU. Some basic debugging can also be carried out on For full debugging of any non-trivial model, you will need a GPU. Some basic debugging can also be carried out on
standard Linux or Windows machines. standard Linux or Windows machines.
The main entry point into the code is [`InnerEye/ML/runner.py`](/InnerEye/ML/runner.py). The code takes its The main entry point into the code is [`InnerEye/ML/runner.py`](/InnerEye/ML/runner.py). The code takes its
configuration elements from commandline arguments and a settings file, configuration elements from commandline arguments and a settings file,
[`InnerEye/settings.yml`](/InnerEye/settings.yml). [`InnerEye/settings.yml`](/InnerEye/settings.yml).
A password for the (optional) Azure Service A password for the (optional) Azure Service
Principal is read from `InnerEyeTestVariables.txt` in the repository root directory. The file Principal is read from `InnerEyeTestVariables.txt` in the repository root directory. The file
is expected to contain a line of the form is expected to contain a line of the form
``` ```
APPLICATION_KEY=<app key for your AML workspace> APPLICATION_KEY=<app key for your AML workspace>
@ -48,7 +48,7 @@ create a copy of the template run configuration, and change the arguments to sui
Here are a few hints how you can reduce the complexity of training if you need to debug an issue. In most cases, Here are a few hints how you can reduce the complexity of training if you need to debug an issue. In most cases,
you should then be able to rely on a CPU machine. you should then be able to rely on a CPU machine.
* Reduce the number of feature channels in your model. If you run a UNet, for example, you can set * Reduce the number of feature channels in your model. If you run a UNet, for example, you can set
`feature_channels = [1]` in your model definition file. `feature_channels = [1]` in your model definition file.
* Train only for a single epoch. You can set `--num_epochs=1` via the commandline or the `more_switches` variable * Train only for a single epoch. You can set `--num_epochs=1` via the commandline or the `more_switches` variable
if you start your training via a build definition. This will only create a model checkpoint at epoch 1, and ignore if you start your training via a build definition. This will only create a model checkpoint at epoch 1, and ignore
@ -63,7 +63,7 @@ With the above settings, you should be able to get a model training run to compl
### Verify your changes using a simplified fast model ### Verify your changes using a simplified fast model
If you made any changes to the code that submits experiments (either `azure_runner.py` or `runner.py` or code If you made any changes to the code that submits experiments (either `azure_runner.py` or `runner.py` or code
imported by those), validate them using a model training run in Azure. You can queue a model training run for the imported by those), validate them using a model training run in Azure. You can queue a model training run for the
simplified `BasicModel2Epochs` model. simplified `BasicModel2Epochs` model.
@ -71,8 +71,8 @@ simplified `BasicModel2Epochs` model.
It is sometimes possible to get a Python debugging (pdb) session on the main process for a model It is sometimes possible to get a Python debugging (pdb) session on the main process for a model
training run on an AzureML compute cluster, for example if a run produces unexpected output, training run on an AzureML compute cluster, for example if a run produces unexpected output,
or is silent what seems like an unreasonably long time. For this to work, you will need to or is silent what seems like an unreasonably long time. For this to work, you will need to
have created the cluster with ssh access enabled; it is not currently possible to add this have created the cluster with ssh access enabled; it is not currently possible to add this
after the cluster is created. The steps are as follows. after the cluster is created. The steps are as follows.
* From the "Details" tab in the run's page, note the Run ID, then click on the target name under * From the "Details" tab in the run's page, note the Run ID, then click on the target name under
@ -82,13 +82,13 @@ after the cluster is created. The steps are as follows.
supply the password chosen when the cluster was created. supply the password chosen when the cluster was created.
* Type "bash" for a nicer command shell (optional). * Type "bash" for a nicer command shell (optional).
* Identify the main python process with a command such as * Identify the main python process with a command such as
```shell script ```shell
ps aux | grep 'python.*runner.py' | egrep -wv 'bash|grep' ps aux | grep 'python.*runner.py' | egrep -wv 'bash|grep'
``` ```
You may need to vary this if it does not yield exactly one line of output. You may need to vary this if it does not yield exactly one line of output.
* Note the process identifier (the value in the PID column, generally the second one). * Note the process identifier (the value in the PID column, generally the second one).
* Issue the commands * Issue the commands
```shell script ```shell
kill -TRAP nnnn kill -TRAP nnnn
nc 127.0.0.1 4444 nc 127.0.0.1 4444
``` ```

Просмотреть файл

@ -17,7 +17,7 @@ AWS into Azure blob storage.
## Registering for the challenge ## Registering for the challenge
In order to download the dataset, you need to register [here](https://fastmri.org/dataset/). In order to download the dataset, you need to register [here](https://fastmri.org/dataset/).
You will shortly receive an email with links to the dataset. In that email, there are two sections containing You will shortly receive an email with links to the dataset. In that email, there are two sections containing
scripts to download the data, like this: scripts to download the data, like this:
``` ```
To download Knee MRI files, we recommend using curl with recovery mode turned on: To download Knee MRI files, we recommend using curl with recovery mode turned on:
@ -25,22 +25,22 @@ curl -C "https://....amazonaws.com/knee_singlecoil_train.tar.gz?AWSAccessKeyId=.
... ...
``` ```
There are two sections of that kind, one for the knee data and one for the brain data. Copy and paste *all* the lines There are two sections of that kind, one for the knee data and one for the brain data. Copy and paste *all* the lines
with `curl` commands into a text file, for example called `curl.txt`. In total, there should be 10 lines with `curl` with `curl` commands into a text file, for example called `curl.txt`. In total, there should be 10 lines with `curl`
commands for the knee data, and 7 for the brain data (including the SHA256 file). commands for the knee data, and 7 for the brain data (including the SHA256 file).
## Download the dataset directly to blob storage via Azure Data Factory ## Download the dataset directly to blob storage via Azure Data Factory
We are providing a script that will bulk download all files in the FastMRI dataset from AWS to Azure blob storage. We are providing a script that will bulk download all files in the FastMRI dataset from AWS to Azure blob storage.
To start that script, you need To start that script, you need
- The file that contains all the `curl` commands to download the data (see above). The downloading script will - The file that contains all the `curl` commands to download the data (see above). The downloading script will
extract all the AWS access tokens from the `curl` commands. extract all the AWS access tokens from the `curl` commands.
- The connection string to the Azure storage account that stores your dataset. - The connection string to the Azure storage account that stores your dataset.
- To get that, navigate to the [Azure Portal](https://portal.azure.com), and search for the storage account - To get that, navigate to the [Azure Portal](https://portal.azure.com), and search for the storage account
that you created to hold your datasets (Step 4 in [AzureML setup](setting_up_aml.md)). that you created to hold your datasets (Step 4 in [AzureML setup](setting_up_aml.md)).
- On the left hand navigation, there is a section "Access Keys", select that and copy out the connection string - On the left hand navigation, there is a section "Access Keys", select that and copy out the connection string
(sanity check: it should look something like `DefaultEndpointsProtocol=....==;EndpointSuffix=core.windows.net`) (sanity check: it should look something like `DefaultEndpointsProtocol=....==;EndpointSuffix=core.windows.net`)
- The Azure location where the Data Factory should be created (for example "westeurope"). The Data Factory should - The Azure location where the Data Factory should be created (for example "westeurope"). The Data Factory should
live in the same Azure location as your AzureML workspace and storage account. To check the location, live in the same Azure location as your AzureML workspace and storage account. To check the location,
find the workspace in the [Azure Portal](https://portal.azure.com), the location is shown on the overview page. find the workspace in the [Azure Portal](https://portal.azure.com), the location is shown on the overview page.
Then run the script to download the dataset as follows, providing the path the the file with the curl commands Then run the script to download the dataset as follows, providing the path the the file with the curl commands
@ -57,21 +57,21 @@ you supplied, and uncompress them.
- Run all the pipelines and delete the Data Factory. - Run all the pipelines and delete the Data Factory.
This whole process can take a few hours to complete. It will print progress information every 30 seconds to the console. This whole process can take a few hours to complete. It will print progress information every 30 seconds to the console.
Alternatively, find the Data Factory "fastmri-copy-data" in your Azure portal, and click on the "Monitor" icon to Alternatively, find the Data Factory "fastmri-copy-data" in your Azure portal, and click on the "Monitor" icon to
drill down into all running pipelines. drill down into all running pipelines.
Once the script is complete, you will have the following datasets in Azure blob storage: Once the script is complete, you will have the following datasets in Azure blob storage:
- `knee_singlecoil`, `knee_multicoil`, and `brain_multicoil` with all files unpacked - `knee_singlecoil`, `knee_multicoil`, and `brain_multicoil` with all files unpacked
- `knee_singlecoil_compressed`, `knee_multicoil_compressed`, and `brain_multicoil_compressed` with the `.tar` and - `knee_singlecoil_compressed`, `knee_multicoil_compressed`, and `brain_multicoil_compressed` with the `.tar` and
`.tar.gz` files as downloaded. NOTE: The raw challenge data files all have a `.tar.gz` extension, even though some `.tar.gz` files as downloaded. NOTE: The raw challenge data files all have a `.tar.gz` extension, even though some
of them are plain (uncompressed) `.tar` files. The pipeline corrects these mistakes and puts the files into blob storage of them are plain (uncompressed) `.tar` files. The pipeline corrects these mistakes and puts the files into blob storage
with their corrected extension. with their corrected extension.
- The DICOM files are stored in the folders `knee_DICOMs` and `brain_DICOMs` (uncompressed) and - The DICOM files are stored in the folders `knee_DICOMs` and `brain_DICOMs` (uncompressed) and
`knee_DICOMs_compressed` and `brain_DICOMs_compressed` (as `.tar` files) `knee_DICOMs_compressed` and `brain_DICOMs_compressed` (as `.tar` files)
### Troubleshooting the data downloading ### Troubleshooting the data downloading
If you see a runtime error saying "The subscription is not registered to use namespace 'Microsoft.DataFactory'", then If you see a runtime error saying "The subscription is not registered to use namespace 'Microsoft.DataFactory'", then
follow the steps described [here](https://stackoverflow.com/a/48419951/5979993), to enable DataFactory for your follow the steps described [here](https://stackoverflow.com/a/48419951/5979993), to enable DataFactory for your
subscription. subscription.
@ -83,19 +83,19 @@ If set up correctly, this is the Azure storage account that holds all datasets u
Hence, after the downloading completes, you are ready to use the InnerEye toolbox to submit an AzureML job that uses Hence, after the downloading completes, you are ready to use the InnerEye toolbox to submit an AzureML job that uses
the FastMRI data. the FastMRI data.
There are 2 example models already coded up in the InnerEye toolbox, defined in There are 2 example models already coded up in the InnerEye toolbox, defined in
[fastmri_varnet.py](../InnerEye/ML/configs/other/fastmri_varnet.py): `KneeMulticoil` and [fastmri_varnet.py](../InnerEye/ML/configs/other/fastmri_varnet.py): `KneeMulticoil` and
`BrainMulticoil`. As with all InnerEye models, you can start a training run by specifying the name of the class `BrainMulticoil`. As with all InnerEye models, you can start a training run by specifying the name of the class
that defines the model, like this: that defines the model, like this:
```shell script ```shell
python InnerEye/ML/runner.py --model KneeMulticoil --azureml --num_nodes=4 python InnerEye/ML/runner.py --model KneeMulticoil --azureml --num_nodes=4
``` ```
This will start an AzureML job with 4 nodes training at the same time. Depending on how you set up your compute This will start an AzureML job with 4 nodes training at the same time. Depending on how you set up your compute
cluster, this will use a different number of GPUs: For example, if your cluster uses ND24 virtual machines, where cluster, this will use a different number of GPUs: For example, if your cluster uses ND24 virtual machines, where
each VM has 4 Tesla P40 cards, training will use a total of 16 GPUs. each VM has 4 Tesla P40 cards, training will use a total of 16 GPUs.
As common with multiple nodes, training time will not scale linearly with increased number of nodes. The following As common with multiple nodes, training time will not scale linearly with increased number of nodes. The following
table gives a rough overview of time to train 1 epoch of the FastMri model in the InnerEye toolbox table gives a rough overview of time to train 1 epoch of the FastMri model in the InnerEye toolbox
on our cluster (`Standard_ND24s` nodes with 4 Tesla P40 cards): on our cluster (`Standard_ND24s` nodes with 4 Tesla P40 cards):
| Step | 1 node (4 GPUs) | 2 nodes (8 GPUs) | 4 nodes (16 GPUs) | 8 nodes (32 GPUs) | | Step | 1 node (4 GPUs) | 2 nodes (8 GPUs) | 4 nodes (16 GPUs) | 8 nodes (32 GPUs) |
@ -106,7 +106,7 @@ on our cluster (`Standard_ND24s` nodes with 4 Tesla P40 cards):
| Total time for 1 epoch | 5h 5min | 3h 5min | 1h 58min | 1h 26min | | Total time for 1 epoch | 5h 5min | 3h 5min | 1h 58min | 1h 26min |
| Total time for 50 epochs | 9 days | 4.6 days | 2.3 days | 1.2 days| | Total time for 50 epochs | 9 days | 4.6 days | 2.3 days | 1.2 days|
Note that the download times depend on the type of Azure storage account that your workspace is using. We recommend Note that the download times depend on the type of Azure storage account that your workspace is using. We recommend
using Premium storage accounts for optimal performance. using Premium storage accounts for optimal performance.
You can avoid the time to download the dataset, by specifying that the data is always read on-the-fly from the network. You can avoid the time to download the dataset, by specifying that the data is always read on-the-fly from the network.
@ -120,36 +120,36 @@ when training on 8 nodes in parallel. For more details around dataset mounting p
Training a FastMri model on the `brain_multicoil` dataset is particularly challenging because the dataset is larger. Training a FastMri model on the `brain_multicoil` dataset is particularly challenging because the dataset is larger.
Downloading the dataset can - depending on the types of nodes - already make the nodes go out of disk space. Downloading the dataset can - depending on the types of nodes - already make the nodes go out of disk space.
The InnerEye toolbox has a way of working around that problem, by reading the dataset on-the-fly from the network, The InnerEye toolbox has a way of working around that problem, by reading the dataset on-the-fly from the network,
rather than downloading it at the start of the job. You can trigger this behaviour by supplying an additional rather than downloading it at the start of the job. You can trigger this behaviour by supplying an additional
commandline argument `--use_dataset_mount`, for example: commandline argument `--use_dataset_mount`, for example:
```shell script ```shell
python InnerEye/ML/runner.py --model BrainMulticoil --azureml --num_nodes=4 --use_dataset_mount python InnerEye/ML/runner.py --model BrainMulticoil --azureml --num_nodes=4 --use_dataset_mount
``` ```
With this flag, the InnerEye training script will start immediately, without downloading data beforehand. With this flag, the InnerEye training script will start immediately, without downloading data beforehand.
However, the fastMRI data module generates a cache file before training, and to build that, it needs to traverse the However, the fastMRI data module generates a cache file before training, and to build that, it needs to traverse the
full dataset. This will lead to a long (1-2 hours) startup time before starting the first epoch, while it is full dataset. This will lead to a long (1-2 hours) startup time before starting the first epoch, while it is
creating this cache file. This can be avoided by copying the cache file from a previous run into to the dataset folder. creating this cache file. This can be avoided by copying the cache file from a previous run into to the dataset folder.
More specifically, you need to follow these steps: More specifically, you need to follow these steps:
* Start a training job, training for only 1 epoch, like * Start a training job, training for only 1 epoch, like
```shell script ```shell
python InnerEye/ML/runner.py --model BrainMulticoil --azureml --use_dataset_mount --num_epochs=1 python InnerEye/ML/runner.py --model BrainMulticoil --azureml --use_dataset_mount --num_epochs=1
``` ```
* Wait until the job starts has finished creating the cache file - the job will print out a message * Wait until the job starts has finished creating the cache file - the job will print out a message
"Saving dataset cache to dataset_cache.pkl", visible in the log file `azureml-logs/70_driver_log.txt`, about 1-2 hours "Saving dataset cache to dataset_cache.pkl", visible in the log file `azureml-logs/70_driver_log.txt`, about 1-2 hours
after start. At that point, you can cancel the job. after start. At that point, you can cancel the job.
* In the "Outputs + logs" section of the AzureML job, you will now see a file `outputs/dataset_cache.pkl` that has * In the "Outputs + logs" section of the AzureML job, you will now see a file `outputs/dataset_cache.pkl` that has
been produced by the job. Download that file. been produced by the job. Download that file.
* Upload the file `dataset_cache.pkl` to the storage account that holds the fastMRI datasets, in the `brain_multicoil` * Upload the file `dataset_cache.pkl` to the storage account that holds the fastMRI datasets, in the `brain_multicoil`
folder that was previously created by the Azure Data Factory. You can do that via the Azure Portal or Azure Storage folder that was previously created by the Azure Data Factory. You can do that via the Azure Portal or Azure Storage
Explorer. Via the Azure Portal, you can search for the storage account that holds your data, then select Explorer. Via the Azure Portal, you can search for the storage account that holds your data, then select
"Data storage: Containers" in the left hand navigation. You should see a folder named `datasets`, and inside of that "Data storage: Containers" in the left hand navigation. You should see a folder named `datasets`, and inside of that
`brain_multicoil`. Once in that folder, press the "Upload" button at the top and select the `dataset_cache.pkl` file. `brain_multicoil`. Once in that folder, press the "Upload" button at the top and select the `dataset_cache.pkl` file.
* Start the training job again, this time you can start multi-node training right away, like this: * Start the training job again, this time you can start multi-node training right away, like this:
```shell script ```shell
python InnerEye/ML/runner.py --model BrainMulticoil --azureml --use_dataset_mount --num_nodes=8. This new python InnerEye/ML/runner.py --model BrainMulticoil --azureml --use_dataset_mount --num_nodes=8. This new
``` ```
This job should pick up the existing cache file, and output a message like "Copying a pre-computed dataset cache This job should pick up the existing cache file, and output a message like "Copying a pre-computed dataset cache
file ..." file ..."
The same trick can of course be applied to other models as well (`KneeMulticoil`). The same trick can of course be applied to other models as well (`KneeMulticoil`).
@ -157,20 +157,20 @@ The same trick can of course be applied to other models as well (`KneeMulticoil`
# Running on a GPU machine # Running on a GPU machine
You can of course run the InnerEye fastMRI models on a reasonably large machine with a GPU for development and You can of course run the InnerEye fastMRI models on a reasonably large machine with a GPU for development and
debugging purposes. Before running, we recommend to download the datasets using a tool debugging purposes. Before running, we recommend to download the datasets using a tool
like [azcopy](http://aka.ms/azcopy) into a folder, for example the `datasets` folder at the repository root. like [azcopy](http://aka.ms/azcopy) into a folder, for example the `datasets` folder at the repository root.
To use `azcopy`, you will need the access key to the storage account that holds your data - it's the same storage To use `azcopy`, you will need the access key to the storage account that holds your data - it's the same storage
account that was used when creating the Data Factory that downloaded the data. account that was used when creating the Data Factory that downloaded the data.
- To get that, navigate to the [Azure Portal](https://portal.azure.com), and search for the storage account - To get that, navigate to the [Azure Portal](https://portal.azure.com), and search for the storage account
that you created to hold your datasets (Step 4 in [AzureML setup](setting_up_aml.md)). that you created to hold your datasets (Step 4 in [AzureML setup](setting_up_aml.md)).
- On the left hand navigation, there is a section "Access Keys". Select that and copy out one of the two keys (_not_ - On the left hand navigation, there is a section "Access Keys". Select that and copy out one of the two keys (_not_
the connection strings). The key is a base64 encoded string, it should not contain any special characters apart from the connection strings). The key is a base64 encoded string, it should not contain any special characters apart from
`+`, `/`, `.` and `=` `+`, `/`, `.` and `=`
Then run this script in the repository root folder: Then run this script in the repository root folder:
```shell script ```shell
mkdir datasets mkdir datasets
azcopy --source-key <storage_account_key> --source https://<your_storage_acount>.blob.core.windows.net/datasets/brain_multicoil --destination datasets/brain_multicoil --recursive azcopy --source-key <storage_account_key> --source https://<your_storage_acount>.blob.core.windows.net/datasets/brain_multicoil --destination datasets/brain_multicoil --recursive
``` ```
@ -178,7 +178,7 @@ Replace `brain_multicoil` with any of the other datasets names if needed.
If you follow these suggested folder structures, there is no further change necessary to the models. You can then If you follow these suggested folder structures, there is no further change necessary to the models. You can then
run, for example, the `BrainMulticoil` model by dropping the `--azureml` flag like this: run, for example, the `BrainMulticoil` model by dropping the `--azureml` flag like this:
```shell script ```shell
python InnerEye/ML/runner.py --model BrainMulticoil python InnerEye/ML/runner.py --model BrainMulticoil
``` ```
The code will recognize that an Azure dataset named `brain_multicoil` is already present in the `datasets` folder, The code will recognize that an Azure dataset named `brain_multicoil` is already present in the `datasets` folder,

Просмотреть файл

@ -1,21 +1,21 @@
# Using the InnerEye code as a git submodule of your project # Using the InnerEye code as a git submodule of your project
You can use InnerEye as a submodule in your own project. You can use InnerEye as a submodule in your own project.
If you go down that route, here's the list of files you will need in your project (that's the same as those If you go down that route, here's the list of files you will need in your project (that's the same as those
given in [this document](building_models.md)) given in [this document](building_models.md))
* `environment.yml`: Conda environment with python, pip, pytorch * `environment.yml`: Conda environment with python, pip, pytorch
* `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings * `settings.yml`: A file similar to `InnerEye\settings.yml` containing all your Azure settings
* A folder like `ML` that contains your additional code, and model configurations. * A folder like `ML` that contains your additional code, and model configurations.
* A file like `myrunner.py` that invokes the InnerEye training runner, but that points the code to your environment * A file like `myrunner.py` that invokes the InnerEye training runner, but that points the code to your environment
and Azure settings; see the [Building models](building_models.md) instructions for details. Please see below for how and Azure settings; see the [Building models](building_models.md) instructions for details. Please see below for how
`myrunner.py` should look like. `myrunner.py` should look like.
You then need to add the InnerEye code as a git submodule, in folder `innereye-deeplearning`: You then need to add the InnerEye code as a git submodule, in folder `innereye-deeplearning`:
```shell script ```shell
git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-deeplearning git submodule add https://github.com/microsoft/InnerEye-DeepLearning innereye-deeplearning
``` ```
Then configure your Python IDE to consume *both* your repository root *and* the `innereye-deeplearning` subfolder as inputs. Then configure your Python IDE to consume *both* your repository root *and* the `innereye-deeplearning` subfolder as inputs.
In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and In Pycharm, you would do that by going to Settings/Project Structure. Mark your repository root as "Source", and
`innereye-deeplearning` as well. `innereye-deeplearning` as well.
Example commandline runner that uses the InnerEye runner (called `myrunner.py` above): Example commandline runner that uses the InnerEye runner (called `myrunner.py` above):
@ -24,7 +24,7 @@ import sys
from pathlib import Path from pathlib import Path
# This file here mimics how the InnerEye code would be used as a git submodule. # This file here mimics how the InnerEye code would be used as a git submodule.
# Ensure that this path correctly points to the root folder of your repository. # Ensure that this path correctly points to the root folder of your repository.
repository_root = Path(__file__).absolute() repository_root = Path(__file__).absolute()
@ -70,11 +70,11 @@ if __name__ == '__main__':
1. Set up a directory outside of InnerEye to holds your configs. In your repository root, you could have a folder 1. Set up a directory outside of InnerEye to holds your configs. In your repository root, you could have a folder
`InnerEyeLocal`, parallel to the InnerEye submodule, alongside `settings.yml` and `myrunner.py`. `InnerEyeLocal`, parallel to the InnerEye submodule, alongside `settings.yml` and `myrunner.py`.
The example below creates a new flavour of the Glaucoma model in `InnerEye/ML/configs/classification/GlaucomaPublic`. The example below creates a new flavour of the Glaucoma model in `InnerEye/ML/configs/classification/GlaucomaPublic`.
All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config All that needs to be done is change the dataset. We will do this by subclassing GlaucomaPublic in a new config
stored in `InnerEyeLocal/configs` stored in `InnerEyeLocal/configs`
1. Create folder `InnerEyeLocal/configs` 1. Create folder `InnerEyeLocal/configs`
1. Create a config file `InnerEyeLocal/configs/GlaucomaPublicExt.py` which extends the `GlaucomaPublic` class 1. Create a config file `InnerEyeLocal/configs/GlaucomaPublicExt.py` which extends the `GlaucomaPublic` class
like this: like this:
```python ```python
from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic from InnerEye.ML.configs.classification.GlaucomaPublic import GlaucomaPublic
@ -83,12 +83,12 @@ class MyGlaucomaModel(GlaucomaPublic):
def __init__(self) -> None: def __init__(self) -> None:
super().__init__() super().__init__()
self.azure_dataset_id="name_of_your_dataset_on_azure" self.azure_dataset_id="name_of_your_dataset_on_azure"
``` ```
1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.configs` so this config 1. In `settings.yml`, set `model_configs_namespace` to `InnerEyeLocal.configs` so this config
is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`. is found by the runner. Set `extra_code_directory` to `InnerEyeLocal`.
#### Start Training #### Start Training
Run the following to start a job on AzureML: Run the following to start a job on AzureML:
``` ```
python myrunner.py --azureml --model=MyGlaucomaModel python myrunner.py --azureml --model=MyGlaucomaModel
``` ```

Просмотреть файл

@ -200,7 +200,7 @@ Leave all other fields as they are for now.
Summarizing, here is how the file should look like: Summarizing, here is how the file should look like:
```yml ```yaml
variables: variables:
tenant_id: '<Azure tenant ID of your company>' tenant_id: '<Azure tenant ID of your company>'
subscription_id: '<Azure subscription ID that your project is using' subscription_id: '<Azure subscription ID that your project is using'

Просмотреть файл

@ -10,29 +10,28 @@ def replace_in_file(filepath: Path, original_str: str, replace_str: str) -> None
""" """
Replace all occurences of the original_str with replace_str in the file provided. Replace all occurences of the original_str with replace_str in the file provided.
""" """
with filepath.open('r') as file: text = filepath.read_text()
text = file.read()
text = text.replace(original_str, replace_str) text = text.replace(original_str, replace_str)
with filepath.open('w') as file: filepath.write_text(text)
file.write(text)
if __name__ == '__main__': if __name__ == '__main__':
sphinx_root = Path(__file__).absolute().parent sphinx_root = Path(__file__).absolute().parent
repository_root = sphinx_root.parent repository_root = sphinx_root.parent
md_root = sphinx_root / "source/md" markdown_root = sphinx_root / "source" / "md"
repository_url = "https://github.com/microsoft/InnerEye-DeepLearning" repository_url = "https://github.com/microsoft/InnerEye-DeepLearning"
# Create directories source/md and source/md/docs where files will be copied to # Create directories source/md and source/md/docs where files will be copied to
if md_root.exists(): if markdown_root.exists():
shutil.rmtree(md_root) shutil.rmtree(markdown_root)
md_root.mkdir() markdown_root.mkdir()
# copy README.md and doc files # copy README.md and doc files
shutil.copyfile(repository_root / "README.md", md_root / "README.md") shutil.copy(repository_root / "README.md", markdown_root)
shutil.copytree(repository_root / "docs", md_root / "docs") shutil.copy(repository_root / "CHANGELOG.md", markdown_root)
shutil.copytree(repository_root / "docs", markdown_root / "docs")
# replace links to files in repository with urls # replace links to files in repository with urls
md_file_list = md_root.rglob("*.md") md_files = markdown_root.rglob("*.md")
for filepath in md_file_list: for filepath in md_files:
replace_in_file(filepath, "](/", f"]({repository_url}/blob/main/") replace_in_file(filepath, "](/", f"]({repository_url}/blob/main/")

Просмотреть файл

@ -13,11 +13,12 @@
# If extensions (or modules to document with autodoc) are in another directory, # If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the # add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here. # documentation root, make it absolute.
# #
# import os import sys
# import sys from pathlib import Path
# sys.path.insert(0, os.path.abspath('.')) repo_dir = Path(__file__).absolute().parents[2]
sys.path.insert(0, str(repo_dir))
# -- Imports ----------------------------------------------------------------- # -- Imports -----------------------------------------------------------------
@ -64,7 +65,7 @@ html_theme = 'sphinx_rtd_theme'
# Add any paths that contain custom static files (such as style sheets) here, # Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files, # relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css". # so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static'] # html_static_path = ['_static']
source_parsers = { source_parsers = {
'.md': CommonMarkParser, '.md': CommonMarkParser,

Просмотреть файл

@ -8,7 +8,7 @@ InnerEye-DeepLearning Documentation
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: Contents: :caption: Contents
md/README.md md/README.md
md/docs/WSL.md md/docs/WSL.md
@ -21,13 +21,13 @@ InnerEye-DeepLearning Documentation
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: About Model Configs: :caption: About Model Configs
rst/configs.rst rst/configs.rst
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: Further reading for contributors: :caption: Further reading for contributors
md/docs/pull_requests.md md/docs/pull_requests.md
md/docs/testing.md md/docs/testing.md