DOC: Better onboarding instructions in Readme files (#223)

Also cleaned up many .md files with a linter
This commit is contained in:
Anton Schwaighofer 2022-03-16 14:31:38 +00:00 коммит произвёл GitHub
Родитель 3986632282
Коммит 3c443abf72
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
13 изменённых файлов: 159 добавлений и 57 удалений

5
.gitignore поставляемый
Просмотреть файл

@ -159,4 +159,9 @@ temp_environment-*
temp_config_for_unittests.py
# Temp file from building requirements for histo
temp_requirements.txt
# Temp folders created from SSL tests
cifar-10-python.tar.gz
cifar-10-batches-py/
None/
hi-ml-histopathology/testSSL/test_outputs
test_outputs/

3
.vscode/extensions.json поставляемый
Просмотреть файл

@ -3,6 +3,7 @@
"njpwerner.autodocstring",
"doi.fileheadercomment",
"ms-python.python",
"ms-python.vscode-pylance"
"ms-python.vscode-pylance",
"DavidAnson.vscode-markdownlint"
]
}

1
.vscode/settings.json поставляемый
Просмотреть файл

@ -43,6 +43,7 @@
"${workspaceFolder}/hi-ml/testhiml/testhiml",
"${workspaceFolder}/hi-ml-azure/testazure/testazure",
"${workspaceFolder}/hi-ml-histopathology/testhisto",
"${workspaceFolder}/hi-ml-histopathology/testSSL",
],
"python.testing.unittestEnabled": false,
"python.testing.nosetestsEnabled": false,

Просмотреть файл

@ -10,6 +10,7 @@ Each release contains a link for "Full Changelog"
## 0.1.14
### Added
- ([#227](https://github.com/microsoft/hi-ml/pull/227)) Add TransformerPooling.
- ([#179](https://github.com/microsoft/hi-ml/pull/179)) Add GaussianBlur and RotationByMultiplesOf90 augmentations. Added torchvision and opencv to
the environment file since it is necessary for the augmentations.
@ -22,6 +23,7 @@ the environment file since it is necessary for the augmentations.
- ([#198](https://github.com/microsoft/hi-ml/pull/198)) Improved editor setup for VSCode.
### Changed
- ([#227](https://github.com/microsoft/hi-ml/pull/227)) Pooling constructor is outside of DeepMIL and inside of BaseMIL now.
- ([#198](https://github.com/microsoft/hi-ml/pull/198)) Model config loader is now more flexible, can accept fully qualified class name or just top-level module name and class (like histopathology.DeepSMILECrck)
- ([#198](https://github.com/microsoft/hi-ml/pull/198)) Runner raises an error when Conda environment file contains a pip include (-r) statement
@ -29,6 +31,7 @@ the environment file since it is necessary for the augmentations.
- ([#196](https://github.com/microsoft/hi-ml/pull/196)) Show current workspace name in error message.
### Fixed
- ([#198](https://github.com/microsoft/hi-ml/pull/198)) Dependencies for histopathology folder are no longer specified in `test_requirements.txt`, but correctly in the histopathology Conda environment.
- ([#188](https://github.com/microsoft/hi-ml/pull/188)) Updated DeepSMILES models. Now they are uptodate with innereye-dl.
- ([#179](https://github.com/microsoft/hi-ml/pull/179)) HEDJitter was jittering the D channel as well. StainNormalization was relying on skimage.
@ -38,37 +41,40 @@ the environment file since it is necessary for the augmentations.
### Deprecated
## 0.1.13
### Added
- ([#170](https://github.com/microsoft/hi-ml/pull/170)) Add utils including bag sampling, bounding boxes, HEDJitter, StainNormalisation and add attention layers
### Changed
- ([#173](https://github.com/microsoft/hi-ml/pull/173)) Improve report tool: allow lists of tables, option for zipping report folder, option for base64 encoding images
### Fixed
- ([#169](https://github.com/microsoft/hi-ml/pull/169)) Fix a test that was failing occasionally
### Removed
### Deprecated
## 0.1.12
### Added
- ([#159](https://github.com/microsoft/hi-ml/pull/159)) Add profiling for loading png image files as numpy arrays.
- ([#152](https://github.com/microsoft/hi-ml/pull/152)) Add a custom HTML reporting tool
- ([#167](https://github.com/microsoft/hi-ml/pull/167)) Ability to log to an AzureML run when outside of AzureML
### Changed
- ([164](https://github.com/microsoft/hi-ml/pull/164)) Look in more locations for std out from AzureML run.
- ([#167](https://github.com/microsoft/hi-ml/pull/167)) The AzureMLLogger has one mandatory argument now, that controls
whether it should log to AzureML also when running on a VM.
### Fixed
- ([#161](https://github.com/microsoft/hi-ml/pull/161)) Empty string as target folder for a dataset creates an invalid mounting path for the dataset in AzureML (fixes #160)
- ([#167](https://github.com/microsoft/hi-ml/pull/167)) Fix bugs in logging hyperparameters: logging as name/value
table, rather than one column per hyperparameter. Use string logging for all hyperparameters
@ -81,6 +87,7 @@ the environment file since it is necessary for the augmentations.
## 0.1.11
### Added
- ([#145](https://github.com/microsoft/hi-ml/pull/145)) Add ability to mount datasets when running locally.
- ([#149](https://github.com/microsoft/hi-ml/pull/149)) Add a k-fold cross validation wrapper around HyperDrive
- ([#132](https://github.com/microsoft/hi-ml/pull/132)) Profile methods for loading png image files.
@ -88,6 +95,7 @@ the environment file since it is necessary for the augmentations.
### Changed
### Fixed
- ([#156](https://github.com/microsoft/hi-ml/pull/156) AzureML Runs should use registered environment after retrieval)
### Removed
@ -97,11 +105,13 @@ the environment file since it is necessary for the augmentations.
## 0.1.10
### Added
- ([#142](https://github.com/microsoft/hi-ml/pull/142)) Adding AzureML progress bar and diagnostics for batch loading
- ([#138](https://github.com/microsoft/hi-ml/pull/138)) Guidelines and profiling for whole slide images.
### Changed
- ([#129])https://github.com/microsoft/hi-ml/pull/129)) Refactor command line tools' arguments. Refactor health_azure.utils' various get_run functions. Replace
- ([#129])<https://github.com/microsoft/hi-ml/pull/129>)) Refactor command line tools' arguments. Refactor health_azure.utils' various get_run functions. Replace
argparsing with parametrized classes.
### Fixed
@ -110,53 +120,58 @@ argparsing with parametrized classes.
### Deprecated
## 0.1.9 (2021-10-20)
### Added
- ([#133](https://github.com/microsoft/hi-ml/pull/133)) PyTorch Lightning logger for AzureML. Helper functions for consistent logging
- ([#136](https://github.com/microsoft/hi-ml/pull/136)) Documentation for using low priority nodes
### Changed
- ([#133](https://github.com/microsoft/hi-ml/pull/133)) Made _**large breaking changes**_ to module names,
from `health.azure` to `health_azure`.
- ([#144])(https://github.com/microsoft/hi-ml/pull/141) Update changelog for release and increase scope of test_register_environment to ensure that by default environments are registered with a version number
### Fixed
- ([#134](https://github.com/microsoft/hi-ml/pull/134)) Fixed repo references and added pyright to enforce global checking
- ([#139](https://github.com/microsoft/hi-ml/pull/139) Fix register_environment, which was ignoring existing environemnts
previously. Also ensure that the environment is given version 1 by default instead of "autosave")
## 0.1.8 (2021-10-06)
### Added
- ([#123](https://github.com/microsoft/hi-ml/pull/123)) Add helper function to download checkpoint files
- ([#128](https://github.com/microsoft/hi-ml/pull/128)) When downloading files in a distributed PyTorch job, a barrier is used to synchronize the processes.
### Changed
- ([#127](https://github.com/microsoft/hi-ml/pull/127)) The field `is_running_in_azure` of `AzureRunInfo` has been renamed to `is_running_in_azure_ml`
### Fixed
- ([#127](https://github.com/microsoft/hi-ml/pull/127)) Fixing bug #126: get_workspace was assuming it runs in AzureML, when it was running on a plain Azure build agent.
- ([#127](https://github.com/microsoft/hi-ml/pull/127)) Fixing bug #126: get_workspace was assuming it runs in AzureML, when it was running on a plain Azure build agent.
## 0.1.7 (2021-10-04)
### Added
- ([#111](https://github.com/microsoft/hi-ml/pull/111)) Adding changelog. Displaying changelog in sphinx docu. Ensure changelog is updated.
### Changed
- ([#112](https://github.com/microsoft/hi-ml/pull/112)) Update himl_tensorboard to work with files not in 'logs' directory
- ([#106](https://github.com/microsoft/hi-ml/pull/106)) Split into two packages. Most of existing package renamed to hi-ml-azure, remained remains hi-ml.
- ([#113](https://github.com/microsoft/hi-ml/pull/113)) Add helper function to download files from AML Run, tidied up some command line args, and moved some functions from himl.py to azure_util.py
- ([#122](https://github.com/microsoft/hi-ml/pull/122)) Add helper functions to upload to and download from AML Datastores
### Fixed
- ([#117](https://github.com/microsoft/hi-ml/pull/117)) Bug fix: Config.json file was expected to be present, even if workspace was provided explicitly.
- ([#119](https://github.com/microsoft/hi-ml/pull/119)) Bug fix: Code coverage wasn't formatted correctly.
## 0.1.4 (2021-09-15)
- This is the baseline release.

Просмотреть файл

@ -1,3 +1,11 @@
# Make commands for the toolbox users
# Create a Conda environment for use with both the hi-ml and hi-ml-azure folder
env:
conda env create --file environment.yml
# Make commands that are used in the build pipeline
# call make for each sub package
define call_packages
cd hi-ml-histopathology && ${MAKE} $(1)

Просмотреть файл

@ -31,13 +31,13 @@ If you would like to contribute to the code, please check the [developer guide](
The detailed package documentation, with examples and API reference, is on
[readthedocs](https://hi-ml.readthedocs.io/en/latest/).
## Quick start: Using the Azure layer
Use case: you have a Python script that does something - that could be training a model, or pre-processing some data.
The `hi-ml-azure` package can help easily run that on Azure Machine Learning (AML) services.
Here is an example script that reads images from a folder, resizes and saves them to an output folder:
```python
from pathlib import Path
if __name__ == '__main__':
@ -48,6 +48,7 @@ if __name__ == '__main__':
resized = contents.resize(0.5)
write_image(output_folder / file.name)
```
Doing that at scale can take a long time. **We'd like to run that script in AzureML, consume the data from a folder in
blob storage, and write the results back to blob storage**.
@ -92,16 +93,19 @@ For details, please refer to the [onboarding page](docs/source/first_steps.md).
For more examples, please see [examples.md](docs/source/examples.md).
## Issues
If you've found a bug in the code, please check the [issues](https://github.com/microsoft/hi-ml/issues) page.
If no existing issue exists, please open a new one. Be sure to include
- A descriptive title
- Expected behaviour (including a code sample if possible)
- Actual behavior
* A descriptive title
* Expected behaviour (including a code sample if possible)
* Actual behavior
## Contributing
We welcome all contributions that help us achieve our aim of speeding up ML/AI research in health and life sciences.
Examples of contributions are
* Data loaders for specific health & life sciences data
* Network architectures and components for deep learning models
* Tools to analyze and/or visualize data
@ -121,11 +125,11 @@ Please check the [detailed page about contributions](./CONTRIBUTING.md).
If you have any feature requests, or find issues in the code, please create an
[issue on GitHub](https://github.com/microsoft/hi-ml/issues).
## Contributing
## Contribution Licensing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
the rights to use your contribution. For details, visit <https://cla.opensource.microsoft.com>.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions

Просмотреть файл

@ -5,46 +5,63 @@
We suggest using Visual Studio Code (VSCode), available for multiple platforms [here](https://code.visualstudio.com/).
On Windows system, we recommend using WSL, the Windows Subsystem for Linux, because some PyTorch features are not available on Windows.
Inside VSCode, please install the extensions that are recommended for this project - they are available in `.vscode/extensions.json` in the
repository root.
repository root.
## Creating a Conda environment
To create a separate Conda environment with all packages that `hi-ml` requires for running and testing,
use the provided `environment.yml` file. Create a Conda environment called `himl` from that via
use the provided `environment.yml` file. You can create a Conda environment called `himl` from that via either
```shell script
conda env create --file environment.yml
conda activate himl
```
or
```shell script
make env
```
Afterwards, please activate this environment via `conda activate himl`. Select this Python interpreter also inside VSCode,
by choosing "Python: Select Interpreter" from the command palette (Ctrl-Shift-P on VSCode for Windows)
## Installing `pyright`
We are using static typechecking for our code via `mypy` and `pyright`. The latter requires a separate installation
outside the Conda environment. For WSL, these are the required steps (see also
outside the Conda environment. For WSL, these are the required steps (see also
[here](https://docs.microsoft.com/en-us/windows/dev-environment/javascript/nodejs-on-wsl)):
```shell
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
```
Close your terminal and re-open it, then run:
```shell
nvm install node
npm install -g pyright
```
## Using specific versions `hi-ml` in your Python environments
## Using specific versions of `hi-ml` in your Python environments
If you'd like to test specific changes to the `hi-ml` package in your code, you can use two different routes:
* You can clone the `hi-ml` repository on your machine, and use `hi-ml` in your Python environment via a local package
install:
```shell script
```shell
pip install -e <your_git_folder>/hi-ml
```
* You can consume an early version of the package from `test.pypi.org` via `pip`:
```shell script
```shell
pip install --extra-index-url https://test.pypi.org/simple/ hi-ml==0.1.0.post165
```
* If you are using Conda, you can add an additional parameter for `pip` into the Conda `environment.yml` file like this:
```
```yml
name: foo
dependencies:
- pip=20.1.1
@ -56,25 +73,28 @@ dependencies:
## Common things to do
The repository contains a makefile with definitions for common operations.
The repository contains a makefile with definitions for common operations.
* `make check`: Run `flake8` and `mypy` on the repository.
* `make test`: Run `flake8` and `mypy` on the repository, then all tests via `pytest`
* `make pip`: Install all packages for running and testing in the current interpreter.
* `make conda`: Update the hi-ml Conda environment and activate it
## Building documentation
To build the sphinx documentation, you must have sphinx and related packages installed
To build the sphinx documentation, you must have sphinx and related packages installed
(see `build_requirements.txt` in the repository root). Then run:
```
```shell
cd docs
make html
```
This will build all your documentation in `docs/build/html`.
This will build all your documentation in `docs/build/html`.
## Setting up your AzureML workspace
* In the browser, navigate to the AzureML workspace that you want to use for running your tests.
* In the browser, navigate to the AzureML workspace that you want to use for running your tests.
* In the top right section, there will be a dropdown menu showing the name of your AzureML workspace. Expand that.
* In the panel, there is a link "Download config file". Click that.
* This will download a file `config.json`. Move that file to both of the folders `hi-ml/testhiml` and `hi-ml/testazure`
@ -96,24 +116,29 @@ run the example in `src/health/azure/examples` (i.e. run `python elevate_this.py
When running the tests locally, they can either be run against the source directly, or the source built into a package.
- To run the tests against the source directly in the local `src` folder, ensure that there is no wheel in the `dist` folder (for example by running `make clean`). If a wheel is not detected, then the local `src` folder will be copied into the temporary test folder as part of the test process.
* To run the tests against the source directly in the local `src` folder, ensure that there is no wheel in the `dist` folder (for example by running `make clean`). If a wheel is not detected, then the local `src` folder will be copied into the temporary test folder as part of the test process.
- To run the tests against the source as a package, build it with `make build`. This will build the local `src` folder into a new wheel in the `dist` folder. This wheel will be detected and passed to AzureML as a private package as part of the test process.
* To run the tests against the source as a package, build it with `make build`. This will build the local `src` folder into a new wheel in the `dist` folder. This wheel will be detected and passed to AzureML as a private package as part of the test process.
### Test discovery in VSCode
All tests in the repository should be picked up automatically by VSCode. In particular, this includes the tests in the `hi-ml-histopathology` folder, which
are not always necessary when working on the core `hi-ml` projects.
You can exclude a set of tests from test discovery by modifying `python.testing.pytestArgs` in the VSCode `.vscode/settings.json` file.
## Creating a New Release
To create a new package release, follow these steps:
* On the repository's github page, click on "Releases", then "Draft a new release"
* In the "Draft a new release" page, click "Choose a tag". In the text box, enter a (new) tag name that has
the desired version number, plus a "v" prefix. For example, to create package version 0.12.17, create a
* In the "Draft a new release" page, click "Choose a tag". In the text box, enter a (new) tag name that has
the desired version number, plus a "v" prefix. For example, to create package version 0.12.17, create a
tag `v0.12.17`. Then choose "+ Create new tag" below the text box.
* Enter a "Release title" that highlights the main feature(s) of this new package version.
* Click "Auto-generate release notes" to pull in the titles of the Pull Requests since the last release.
* Before the auto-generated "What's changed" section, add a few sentences that summarize what's new.
* Click "Publish release"
## Troubleshooting
### Debugging a test in VSCode fails on Windows

Просмотреть файл

@ -7,20 +7,20 @@ practitioners. ML components can be found in the sibling package `hi-ml`.
## Installation
You can install the latest version from `pypi` via
You can install the latest version from `pypi` via
```
```shell
pip install hi-ml-azure
```
## Documentation
The detailed package documentation, with examples and API reference, is on
The detailed package documentation, with examples and API reference, is on
[readthedocs](https://hi-ml.readthedocs.io/en/latest/).
## Getting started
Examples that illustrate the use of the `hi-ml` toolbox can be found on
Examples that illustrate the use of the `hi-ml` toolbox can be found on
[readthedocs](https://hi-ml.readthedocs.io/en/latest/).
## Changelog

Просмотреть файл

@ -465,7 +465,7 @@ def create_from_matching_params(from_object: param.Parameterized, cls_: Type[T])
c = cls_()
if not isinstance(c, param.Parameterized):
raise ValueError(f"The created object must be a subclass of param.Parameterized, but got {type(c)}")
for param_name, p in c.param.params().items():
for param_name, p in c.param.params().items(): # type: ignore
if not p.constant and not p.readonly:
setattr(c, param_name, getattr(from_object, param_name))
return c

Просмотреть файл

@ -1,3 +1,10 @@
# Make commands for the toolbox users
# Create a Conda environment for this folder only
env:
conda env create --file environment.yml
pip install -r ../test_requirements.txt
# call make for parent
define call_parent
cd .. && $(MAKE) $(1)

Просмотреть файл

@ -2,8 +2,48 @@
## Getting started
- Build environment
- Download config to AzureML workspace
- Run a first workflow.
### Setting up Python
To be completed.
For working on the histopathology folder, please create a separate Conda environment.
```shell
cd hi-ml-histopathology
make env
```
You can then activate the environment via `conda activate HimlHisto`. Set VSCode to use this Conda environment, by choosing "Python: Select Interpreter"
from the command palette.
### Setting up AzureML
In addition, please download an AzureML workspace configuration file for the workspace that you wish to use:
* In the browser, navigate to the workspace in question
* Click on the drop-down menu on upper right of the page, to the left of your account picture.
* Select "Download config file".
* Save that file into the the repository root.
Once that config file is in place, all Python runs that you start inside the `hi-ml-histopathology` folder will automatically use this config file.
## Running histopathology models
To test your setup, please execute in the `hi-ml-histopathology` folder:
```shell
conda activate HimlHisto
python ../hi-ml/src/health_ml/runner.py --model histopathology.DeepSMILECrck --cluster=training-nd24
```
This should start an AzureML job in the AzureML workspace that you configured above via `config.json`. You may need to adjust the name of
the compute cluster (`training-nd24` in the above example).
## Running histopathology tests
In the `hi-ml-histopathology` folder, run
```shell
make call_pytest
```
Inside of VSCode, all tests in the repository should be picked up automatically. You can exclude the tests for the `hi-ml` and `hi-ml-azure` packages by
modifying `python.testing.pytestArgs` in the VSCode `.vscode/settings.json` file.

Просмотреть файл

@ -14,7 +14,7 @@ from typing import Generator
import pytest
# temporary workaround until these hi-ml package release
testSSL_root_dir = Path(__file__).parent
testSSL_root_dir = Path(__file__).resolve().parent
print(f"Adding {testSSL_root_dir} to sys path")
sys.path.insert(0, str(testSSL_root_dir))

Просмотреть файл

@ -41,7 +41,7 @@ from health_ml.utils.fixed_paths import repository_root_directory, OutputFolderF
from health_ml.utils.lightning_loggers import StoringLogger
from testSSL.configs_for_tests import DummyContainerWithModel, DummySimCLR
from testSSL.utils import check_config_json, TEST_OUTPUTS_PATH, write_test_dicom
from testSSL.utils import TEST_OUTPUTS_PATH, write_test_dicom
common_test_args = ["",
@ -119,9 +119,8 @@ def test_ssl_container_cifar10_resnet_simclr() -> None:
model_namespace_simclr = "SSL.configs.CIFAR10SimCLR"
args = common_test_args + [f"--model={model_namespace_simclr}"]
runner = default_runner()
with check_config_json(Path.cwd()):
with mock.patch("sys.argv", args):
loaded_config, actual_run = runner.run()
with mock.patch("sys.argv", args):
loaded_config, _ = runner.run()
assert loaded_config is not None
assert isinstance(loaded_config.model, SimClrHiml)
assert loaded_config.encoder_output_dim == 2048
@ -163,9 +162,8 @@ def test_ssl_container_cifar10_resnet_simclr() -> None:
model_namespace_cifar = "SSL.configs.SSLClassifierCIFAR"
args = common_test_args + [f"--model={model_namespace_cifar}",
f"--local_ssl_weights_path={checkpoint_path}"]
with check_config_json(Path.cwd()):
with mock.patch("sys.argv", args):
loaded_config2, actual_run = default_runner().run()
with mock.patch("sys.argv", args):
loaded_config2, actual_run = default_runner().run()
assert loaded_config2 is not None
assert isinstance(loaded_config2.model, SSLClassifier)
assert loaded_config2.model.class_weights is None
@ -204,9 +202,8 @@ def test_ssl_container_rsna() -> None:
f"--local_datasets={str(path_to_cxr_test_dataset)},{str(path_to_cxr_test_dataset)}",
"--use_balanced_binary_loss_for_linear_head=True",
f"--ssl_encoder={EncoderName.densenet121.value}"]
with check_config_json(Path.cwd()):
with mock.patch("sys.argv", args):
loaded_config, actual_run = runner.run()
with mock.patch("sys.argv", args):
loaded_config, _ = runner.run()
assert loaded_config is not None
assert isinstance(loaded_config.model, BootstrapYourOwnLatent)
assert loaded_config.online_eval.dataset == SSLDatasetName.RSNAKaggleCXR.value
@ -254,9 +251,8 @@ def test_ssl_container_rsna() -> None:
f"--local_datasets={str(path_to_cxr_test_dataset)}",
"--use_balanced_binary_loss_for_linear_head=True",
f"--local_ssl_weights_path={checkpoint_path}"]
with check_config_json(Path.cwd()):
with mock.patch("sys.argv", args):
loaded_config2, actual_run = runner.run()
with mock.patch("sys.argv", args):
loaded_config2, _ = runner.run()
assert loaded_config2 is not None
assert isinstance(loaded_config2, CXRImageClassifier)
assert loaded_config2.model.freeze_encoder