* Merged PR 42: Python package structure

Created Python package structure

* Merged PR 50: Röth-Tarantola generative model for velocities

- Created Python package structure for generative models for velocities
- Implemented the [Röth-Tarantola model](https://doi.org/10.1029/93JB01563)

* Merged PR 51: Isotropic AWE forward modelling using Devito

Implemented forward modelling for the isotropic acoustic wave equation using [Devito](https://www.devitoproject.org/)

* Merged PR 52: PRNG seed

Exposed PRNG seed in generative models for velocities

* Merged PR 53: Docs update

- Updated LICENSE
- Added Microsoft Open Source Code of Conduct
- Added Contributing section to README

* Merged PR 54: CLI for velocity generators

Implemented CLI for velocity generators

* Merged PR 69: CLI subpackage using Click

Reimplemented CLI as subpackage using Click

* Merged PR 70: VS Code settings

Added VS Code settings

* Merged PR 73: CLI for forward modelling

Implemented CLI for forward modelling

* Merged PR 76: Unit fixes

- Changed to use km/s instead of m/s for velocities
- Fixed CLI interface

* Merged PR 78: Forward modelling CLI fix

* Merged PR 85: Version 0.1.0

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* adding cgmanifest to staging

* adding a yml file with CG build task

* added prelim NOTICE file

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Merged PR 126: updated notice file with previously excluded components

updated notice file with previously excluded components

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* added cela copyright headers to all non-empty .py files (#3)

* switched to ACR instead of docker hub (#4)

* sdk.v1.0.69, plus switched to ACR push. ACR pull coming next

* full acr use, push and pull, and use in Estimator

* temp fix for dcker image bug

* fixed the az acr login --username and --password issue

* full switch to ACR for docker image storage

* Vapaunic/metrics (#1)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* BUILD: added build setup files. (#5)

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added pytest to environmetn, and pytest job to the main build (#18)

* Update main_build.yml for Azure Pipelines

* minor stylistic changes (#19)

* Update main_build.yml for Azure Pipelines
Added template for integration tests for scripts and experiments
Added setup and env
Increased job timeout
added complete set of tests

* BUILD: placeholder for Azure pipelines for notebooks build.
BUILD: added notebooks job placeholders.
BUILD: added github badgets for notebook builds

* CLEANUP: moved non-release items to contrib (#20)

* Updates HRNet notebook 🚀  (#25)

* Modifies pre-commit hook to modify output

* Modifies the HRNet notebook to use Penobscot dataset
Adds parameters to limit iterations
Adds parameters meta tag for papermil

* Fixing merge peculiarities

* Updates environment.yaml (#21)

* Pins main libraries
Adds cudatoolkit version based on issues faced during workshop

* removing files

* Updates Readme (#22)

* Adds model instructions to readme

* Update README.md (#24)

I have collected points to all of our BP repos into this central place. We are trying to create links between everything to draw people from one to the other. 

Can we please add a pointer here to the readme?
I have spoken with Max and will be adding Deep Seismic there once you have gone public.

* CONTRIB: cleanup for imaging. (#28)

* Create Unit Test Build.yml (#29)

Adding Unit Test Build.

* Update README.md

* Update README.md

* Create Unit Test Build.yml (#29)

Adding Unit Test Build.

Update README.md

Update README.md

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* TESTS: added notebook integreation tests. (#65)

* TESTS: added notebook integreation tests.

* TEST: typo in env name

* Addressing a number of minor issues with README and broken links (#67)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fix for seyviewer and mkdir splits in README + broken link in F3 notebook

* issue edits to README

* download complete message

* Added Yacs info to README.md (#69)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added info on yacs files

* MODEL.PRETRAINED key missing in default.py (#70)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added MODEL.PRETRAINED key to default.py

* Update README.md (#59)

* Update README.md (#58)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* Adds premium storage (#79)

* Adds premium storage method

* update test.py for section based approach to use command line arguments (#76)

* added README documentation per bug bush feedback (#78)

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* https://github.com/microsoft/DeepSeismic/issues/71 (#80)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* addressing multiple issues from first bug bash (#81)

* added README documentation per bug bush feedback

* DOC: added HRNET download info to README

* added hrnet download script and tested it

* added legal headers to a few scripts.

* changed /data to ~data in the main README

* added Troubleshooting section to the README

* Dciborow/build bug (#68)

* Update unit_test_steps.yml

* Update environment.yml

* Update setup_step.yml

* Update setup_step.yml

* Update unit_test_steps.yml

* Update setup_step.yml

* Adds AzureML libraries (#82)

* Adds azure dependencies

* Adds AzureML components

* Fixes download script  (#84)

* Fixes download script
* Updates readme

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* modified hrnet notebook, addressing bug bash issues (#95)

* Update environment.yml (#93)

* Update environment.yml

* Update environment.yml

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* notebook integration tests complete (#106)

* added README documentation per bug bush feedback

* HRNet notebook works with tests now

* removed debug material from the notebook

* corrected duplicate build names

* conda init fix

* changed setup deps

* fixed F3 notebook - merge conflict and pytorch bug

* main and notebook builds have functional setup now

* Mat/test (#105)

* added README documentation per bug bush feedback

* Modifies scripts to run for only afew iterations when in debug/test mode

* Updates training scripts and build

* Making names unique

* Fixes conda issue

* HRNet notebook works with tests now

* removed debug material from the notebook

* corrected duplicate build names

* conda init fix

* Adds docstrings to training script

* Testing somehting out

* testing

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* adds seresnet

* Modifies to work outside of git env

* test

* test

* Fixes typo in DATASET

* reducing steps

* test

* test

* fixes the argument

* Altering batch size to fit k80

* reducing batch size further

* test

* test

* test

* test

* fixes distributed

* test

* test

* adds missing import

* Adds further tests

* test

* updates

* test

* Fixes section script

* test

* testing everyting once through

* Final run for badge

* changed setup deps, fixed F3 notebook

* Adds missing tests (#111)

* added missing tests

* Adding fixes for test

* reinstating all tests

* Maxkaz/issues (#110)

* added README documentation per bug bush feedback

* added missing tests

* closing out multiple post bug bash issues with single PR

* Addressed comments

* minor change

* Adds Readme information to experiments (#112)

* Adds readmes to experiments

* Updates instructions based on feedback

* Update README.md

* BugBash2 Issue #83 and #89: clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12  (#88)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* Update README.md

* BugBash2 Issue #83 and #89: clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12  (#88) (#2)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* Update README.md

* BugBash2 Issue #83 and #89: clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12  (#88) (#3)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* Update README.md

* Remove related projects on AI Labs

* Added a reference to Azure machine learning (#115)

Added a reference to Azure machine learning to show how folks can get started with using Azure Machine Learning

* Update README.md

* update fork from upstream (#4)

* fixed merge conflict resolution in LICENSE

* BugBash2 Issue #83 and #89: clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12  (#88)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* Update README.md

* Remove related projects on AI Labs

* Added a reference to Azure machine learning (#115)

Added a reference to Azure machine learning to show how folks can get started with using Azure Machine Learning

* Update README.md

* Update AUTHORS.md (#117)

* Update AUTHORS.md (#118)

* pre-release items (#119)

* added README documentation per bug bush feedback

* added missing tests

* closing out multiple post bug bash issues with single PR

* new badges in README

* cleared notebook output

* notebooks links

* fixed bad merge

* forked branch name is misleading.  (#116)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* Update README.md

* BugBash2 Issue #83 and #89: clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12  (#88) (#2)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* fixed link for F3 download

* MINOR: python version fix to 3.6.7 (#72)

* Adding system requirements in README (#74)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* DOC: forking dislaimer and new build names. (#9)

* Updating README.md with introduction material (#10)

* Update README with introduction to DeepSeismic

Add intro material for DeepSeismic

* Adding logo file

* Adding image to readme

* Update README.md

* Updates the 3D visualisation to use itkwidgets (#11)

* Updates notebook to use itkwidgets for interactive visualisation

* Adds jupytext to pre-commit (#12)


* Add jupytext

* Adds demo notebook for HRNet (#13)

* Adding TF 2.0 to allow for tensorboard vis in notebooks

* Modifies hrnet config for notebook

* Add HRNet notebook for demo

* Updates HRNet notebook and tidies F3

* removed my username references (#15)

* moving 3D models into contrib folder (#16)

* Weetok (#17)

* Update it to include sections for imaging

* Update README.md

* Update README.md

* added system requirements to readme

* sdk 1.0.76; tested conda env vs docker image; extented readme

* removed reference to imaging

* minor md formatting

* minor md formatting

* clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12 - Issue #83

* Add Troubleshooting section for DSVM warnings #89

* Add Troubleshooting section for DSVM warnings, plus typo #89

* tested both yml conda env and docker; udated conda yml to have docker sdk

* tested both yml conda env and docker; udated conda yml to have docker sdk; added

* NVIDIA Tesla K80 (or V100 GPU for NCv2 series) - per Vanja's comment

* Update README.md

* BugBash2 Issue #83 and #89: clarify which DSVM we want to use - Ubuntu GPU-enabled VM, preferably NC12  (#88) (#3)

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* azureml sdk 1.0.74; foxed a few issues around ACR access; added nb 030 for scalability testing

* merge upstream into my fork (#1)

* MINOR: addressing broken F3 download link (#73)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build stat…

* Minor fix: broken links in README (#120)

* fully-run notebooks links and fixed contrib voxel models (#123)

* added README documentation per bug bush feedback

* added missing tests

* - added notebook links
- made sure orginal voxel2pixel code runs

* update ignite port of texturenet

* resolved merge conflict

* formatting change

* Adds reproduction instructions to readme (#122)

* Update main_build.yml for Azure Pipelines

* Update main_build.yml for Azure Pipelines

* BUILD: added build status badges (#6)

* Adds dataloader for numpy datasets as well as demo pipeline for such a dataset (#7)

* Finished version of numpy data loader

* Working training script for demo

* Adds the new metrics

* Fixes docstrings and adds header

* Removing extra setup.py

* Log config file now experiment specific (#8)

* Merging work on salt dataset

* Adds computer vision to dependencies

* Updates dependecies

* Update

* Updates the environemnt files

* Updates readme and envs

* Initial running version of dutchf3

* INFRA: added structure templates.

* VOXEL: initial rough code push - need to clean up before PRing.

* Working version

* Working version before refactor

* quick minor fixes in README

* 3D SEG: first commit for PR.

* 3D SEG: removed data files to avoid redistribution.

* Updates

* 3D SEG: restyled batch file, moving onto others.

* Working HRNet

* 3D SEG: finished going through Waldeland code

* Updates test scripts and makes it take processing arguments

* minor update

* Fixing imports

* Refactoring the experiments

* Removing .vscode

* Updates gitignore

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* added instructions for running f3dutch experiments, and fixed some issues in prepare_data.py script

* minor wording fix

* minor wording fix

* enabled splitting dataset into sections, rather than only patches

* enabled splitting dataset into sections, rather than only patches

* merged duplicate ifelse blocks

* merged duplicate ifelse blocks

* refactored prepare_data.py

* refactored prepare_data.py

* added scripts for section train test

* added scripts for section train test

* section train/test works for single channel input

* section train/test works for single channel input

* Merged PR 174: F3 Dutch README, and fixed issues in prepare_data.py

This PR includes the following changes:
- added README instructions for running f3dutch experiments
- prepare_dataset.py didn't work for creating section-based splits, so I fixed a few issues. There are no changes to the patch-based splitting logic.
- ran black formatter on the file, which created all the formatting changes (sorry!)

* Merged PR 204: Adds loaders to deepseismic from cv_lib

* train and test script for section based training/testing

* train and test script for section based training/testing

* Merged PR 209: changes to section loaders in data.py

Changes in this PR will affect patch scripts as well. The following are required changes in patch scripts:
- get_train_loader() in train.py should be changed to get_patch_loader(). I created separate function to load section and patch loaders.
- SectionLoader now swaps H and W dims. When loading test data in patch, this line can be removed (and tested) from test.py
h, w = img.shape[-2], img.shape[-1]  # height and width

* Merged PR 210: BENCHMARKS: added placeholder for benchmarks.

BENCHMARKS: added placeholder for benchmarks.

* Merged PR 211: Fixes issues left over from changes to data.py

* removing experiments from deep_seismic, following the new struct

* removing experiments from deep_seismic, following the new struct

* Merged PR 220: Adds Horovod and fixes

Add Horovod training script
Updates dependencies in Horovod docker file
Removes hard coding of path in data.py

* section train/test scripts

* section train/test scripts

* Add cv_lib to repo and updates instructions

* Add cv_lib to repo and updates instructions

* Removes data.py and updates readme

* Removes data.py and updates readme

* Updates requirements

* Updates requirements

* Merged PR 222: Moves cv_lib into repo and updates setup instructions

* renamed train/test scripts

* renamed train/test scripts

* train test works on alaudah section experiments, a few minor bugs left

* train test works on alaudah section experiments, a few minor bugs left

* cleaning up loaders

* cleaning up loaders

* Merged PR 236: Cleaned up dutchf3 data loaders

@<Mathew Salvaris> , @<Ilia Karmanov> , @<Max Kaznady> , please check out if this PR will affect your experiments.

The main change is with the initialization of sections/patches attributes of loaders. Previously, we were unnecessarily assigning all train/val splits to train loaders, rather than only those belonging to the given split for that loader. Similar for test loaders.

This will affect your code if you access these attributes. E.g. if you have something like this in your experiments:
```
train_set = TrainPatchLoader(…)
patches = train_set.patches[train_set.split]
```

or
```
train_set = TrainSectionLoader(…)
sections = train_set.sections[train_set.split]
```

* training testing for sections works

* training testing for sections works

* minor changes

* minor changes

* reverting changes on dutchf3/local/default.py file

* reverting changes on dutchf3/local/default.py file

* added config file

* added config file

* Updates the repo with preliminary results for 2D segmentation

* Merged PR 248: Experiment: section-based Alaudah training/testing

This PR includes the section-based experiments on dutchf3 to replicate Alaudah's work. No changes were introduced to the code outside this experiment.

* Merged PR 253: Waldeland based voxel loaders and TextureNet model

Related work items: #16357

* Merged PR 290: A demo notebook on local train/eval on F3 data set

Notebook and associated files + minor change in a patch_deconvnet_skip.py model file.

Related work items: #17432

* Merged PR 312: moved dutchf3_section to experiments/interpretation

moved dutchf3_section to experiments/interpretation

Related work items: #17683

* Merged PR 309: minor change to README to reflect the changes in prepare_data script

minor change to README to reflect the changes in prepare_data script

Related work items: #17681

* Merged PR 315: Removing voxel exp

Related work items: #17702

* sync with new experiment structure

* sync with new experiment structure

* added a logging handler for array metrics

* added a logging handler for array metrics

* first draft of metrics based on the ignite confusion matrix

* first draft of metrics based on the ignite confusion matrix

* metrics now based on ignite.metrics

* metrics now based on ignite.metrics

* modified patch train.py with new metrics

* modified patch train.py with new metrics

* Merged PR 361: VOXEL: fixes to original voxel2pixel code to make it work with the rest of the repo.

Realized there was one bug in the code and the rest of the functions did not work with the different versions of libraries which we have listed in the conda yaml file. Also updated the download script.

Related work items: #18264

* modified metrics with ignore_index

* modified metrics with ignore_index

* Merged PR 405: minor mods to notebook, more documentation

A very small PR - Just a few more lines of documentation in the notebook, to improve clarity.

Related work items: #17432

* Merged PR 368: Adds penobscot

Adds for penobscot
- Dataset reader
- Training script
- Testing script
- Section depth augmentation
- Patch depth augmentation
- Iinline visualisation for Tensorboard

Related work items: #14560, #17697, #17699, #17700

* Merged PR 407: Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Azure ML SDK Version:  1.0.65; running devito in AzureML Estimators

Related work items: #16362

* Merged PR 452: decouple docker image creation from azureml

removed all azureml dependencies from 010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb

All other changes are due to trivial reruns

Related work items: #18346

* Merged PR 512: Pre-commit hooks for formatting and style checking

Opening this PR to start the discussion -

I added the required dotenv files and instructions for setting up pre-commit hooks for formatting and style checking. For formatting, we are using black, and style checking flake8. The following files are added:
- .pre-commit-config.yaml - defines git hooks to be installed
- .flake8 - settings for flake8 linter
- pyproject.toml - settings for black formatter

The last two files define the formatting and linting style we want to enforce on the repo.

All of us would set up the pre-commit hooks locally, so regardless of what formatting/linting settings we have in our local editors, the settings specified by the git hooks would still be enforced prior to the commit, to ensure consistency among contributors.

Some questions to start the discussion:
- Do you want to change any of the default settings in the dotenv files - like the line lengths, error messages we exclude or include, or anything like that.
- Do we want to have a requirements-dev.txt file for contributors? This setup uses pre-commit package, I didn't include it in the environment.yaml file, but instead instructed the user to install it in the CONTRIBUTING.MD file.
- Once you have the hooks installed, it will only affect the files you are committing in the future. A big chunk of our codebase does not conform to the formatting/style settings. We will have to run the hooks on the codebase retrospectively. I'm happy to do that, but it will create many changes and a significant looking PR :) Any thoughts on how we should approach this?

Thanks!

Related work items: #18350

* Merged PR 513: 3D training script for Waldeland's model with Ignite

Related work items: #16356

* Merged PR 565: Demo notebook updated with 3D graph

Changes:
1) Updated demo notebook with the 3D visualization
2) Formatting changes due to new black/flake8 git hook

Related work items: #17432

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* Merged PR 341: Tests for cv_lib/metrics

This PR is dependent on the tests created in the previous branch !333. That's why the PR is to merge tests into vapaunic/metrics branch (so the changed files below only include the diff between these two branches. However, I can change this once the vapaunic/metrics is merged.

I created these tests under cv_lib/ since metrics are a part of that library. I imagine we will have tests under deepseismic_interpretation/, and the top level /tests for integration testing.

Let me know if you have any comments on this test, or the structure. As agreed, I'm using pytest.

Related work items: #16955

* merged tests into this branch

* merged tests into this branch

* Merged PR 569: Minor PR: change to pre-commit configuration files

Related work items: #18350

* Merged PR 586: Purging unused files and experiments

Purging unused files and experiments

Related work items: #20499

* moved prepare data under scripts

* moved prepare data under scripts

* removed untested model configs

* removed untested model configs

* fixed weird bug in penobscot data loader

* fixed weird bug in penobscot data loader

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* penobscot experiments working for hrnet, seresnet, no depth and patch depth

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* removed a section loader bug in the penobscot loader

* fixed bugs in my previous 'fix'

* fixed bugs in my previous 'fix'

* removed redundant _open_mask from subclasses

* removed redundant _open_mask from subclasses

* Merged PR 601: Fixes to penobscot experiments

A few changes:
- Instructions in README on how to download and process Penobscot and F3 2D data sets
- moved prepare_data scripts to the scripts/ directory
- fixed a weird issue with a class method in Penobscot data loader
- fixed a bug in section loader (_add_extra_channel in section loader was not necessary and was causing an issue)
- removed config files that were not tested or working in Penobscot experiments
- modified default.py so it's working if train.py ran without a config file

Related work items: #20694

* Merged PR 605: added common metrics to Waldeland model in Ignite

Related work items: #19550

* Removed redundant extract_metric_from

* Removed redundant extract_metric_from

* formatting changes in metrics

* formatting changes in metrics

* modified penobscot experiment to use new local metrics

* modified penobscot experiment to use new local metrics

* modified section experimen to pass device to metrics

* modified section experimen to pass device to metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* moved metrics out of dutchf3, modified distributed to work with the new metrics

* fixed other experiments after new metrics

* fixed other experiments after new metrics

* removed apex metrics from distributed train.py

* removed apex metrics from distributed train.py

* added ignite-based metrics to dutch voxel experiment

* added ignite-based metrics to dutch voxel experiment

* removed apex metrics

* removed apex metrics

* modified penobscot test script to use new metrics

* pytorch-ignite pre-release with new metrics until stable available

* removed cell output from the F3 notebook

* deleted .vscode

* modified metric import in test_metrics.py

* separated metrics out as a module

* relative logger file path, modified section experiment

* removed the REPO_PATH from init

* created util logging function, and moved logging file to each experiment

* modified demo experiment

* modified penobscot experiment

* modified dutchf3_voxel experiment

* no logging in voxel2pixel

* modified dutchf3 patch local experiment

* modified patch distributed experiment

* modified interpretation notebook

* minor changes to comments

* Updates notebook to use itkwidgets for interactive visualisation

* Further updates

* Fixes merge conflicts

* removing files

* Adding reproduction experiment instructions to readme

* checking in ablation study from ilkarman (#124)

tests pass but final results aren't communicated to github. No way to trigger another commit other than to do a dummy commit
This commit is contained in:
maxkazmsft 2019-12-17 07:14:43 -05:00 коммит произвёл GitHub
Родитель 341bb01b3b
Коммит b75e6476c9
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
178 изменённых файлов: 23796 добавлений и 821 удалений

27
.ci/steps/setup_step.yml Normal file
Просмотреть файл

@ -0,0 +1,27 @@
parameters:
storagename: #
storagekey: #
conda: seismic-interpretation
steps:
- bash: |
echo "##vso[task.prependpath]$CONDA/bin"
- bash: |
echo "Running setup..."
# make sure we have the latest and greatest
conda env create -f environment/anaconda/local/environment.yml python=3.6 --force
conda init bash
source activate ${{parameters.conda}}
pip install -e interpretation
pip install -e cv_lib
# add this if pytorch stops detecting GPU
# conda install pytorch torchvision cudatoolkit=9.2 -c pytorch
# copy your model files like so - using dummy file to illustrate
azcopy --quiet --source:https://${{parameters.storagename}}.blob.core.windows.net/models/model --source-key ${{parameters.storagekey}} --destination ./models/your_model_name
displayName: Setup
failOnStderr: True

Просмотреть файл

@ -0,0 +1,18 @@
parameters:
conda: seismic-interpretation
steps:
- bash: |
echo "Starting unit tests"
source activate ${{parameters.conda}}
pytest --durations=0 --junitxml 'reports/test-unit.xml' cv_lib/tests/
echo "Unit test job passed"
displayName: Unit Tests Job
failOnStderr: True
- task: PublishTestResults@2
displayName: 'Publish Test Results **/test-*.xml'
inputs:
testResultsFiles: '**/test-*.xml'
failTaskOnFailedTests: true
condition: succeededOrFailed()

28
.ci/unit_test_build.yml Normal file
Просмотреть файл

@ -0,0 +1,28 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Pull request against these branches will trigger this build
pr:
- master
- staging
# Any commit to this branch will trigger the build.
trigger:
- master
- staging
jobs:
# partially disable setup for now - done manually on build VM
- job: DeepSeismic
displayName: Deep Seismic Main Build
pool:
name: $(AgentName)
steps:
- template: steps/setup_step.yml
parameters:
storagename: $(storageaccoutname)
storagekey: $(storagekey)
- template: steps/unit_test_steps.yml

17
.flake8 Normal file
Просмотреть файл

@ -0,0 +1,17 @@
[flake8]
max-line-length = 120
max-complexity = 18
select = B,C,E,F,W,T4,B9
ignore =
# slice notation whitespace, invalid
E203
# too many leading # for block comment
E266
# module level import not at top of file
E402
# line break before binary operator
W503
# blank line contains whitespace
W293
# line too long
E501

24
.gitignore поставляемый
Просмотреть файл

@ -89,6 +89,24 @@ venv/
ENV/
env.bak/
venv.bak/
wheels/
.dev_env
.azureml
# Logs
*.tfevents.*
**/runs
**/log
**/output
#
interpretation/environment/anaconda/local/src/*
interpretation/environment/anaconda/local/src/cv-lib
.code-workspace.code-workspace
**/.vscode
**/.idea
# Spyder project settings
.spyderproject
@ -97,8 +115,4 @@ venv.bak/
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
*.pth

17
.pre-commit-config.yaml Normal file
Просмотреть файл

@ -0,0 +1,17 @@
repos:
- repo: https://github.com/psf/black
rev: stable
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v1.2.3
hooks:
- id: flake8
- repo: local
hooks:
- id: jupytext
name: jupytext
entry: jupytext --from ipynb --pipe black --check flake8
pass_filenames: true
files: .ipynb
language: python

6
.vscode/settings.json поставляемый
Просмотреть файл

@ -1,6 +0,0 @@
{
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
"python.linting.pylintEnabled": false,
}

32
AUTHORS.md Normal file
Просмотреть файл

@ -0,0 +1,32 @@
Contributor
============
All names are sorted alphabetically by last name.
Contributors, please add your name to the list when you submit a patch to the project.
Contributors (sorted alphabetically)
-------------------------------------
To contributors: please add your name to the list when you submit a patch to the project.
* Ashish Bhatia
* Daniel Ciborowski
* George Iordanescu
* Ilia Karmanov
* Max Kaznady
* Vanja Paunic
* Mathew Salvaris
## How to be a contributor to the repository
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

88
CONTRIBUTING.md Normal file
Просмотреть файл

@ -0,0 +1,88 @@
# Contribution Guidelines
Contributions are welcomed! Here's a few things to know:
* [Steps to Contributing](#steps-to-contributing)
* [Coding Guidelines](#coding-guidelines)
* [Microsoft Contributor License Agreement](#microsoft-contributor-license-agreement)
* [Code of Conduct](#code-of-conduct)
## Steps to Contributing
**TL;DR for contributing: We use the staging branch to land all new features and fixes. To make a contribution, please create a branch from staging, make a modification in the code and create a PR to staging.**
Here are the basic steps to get started with your first contribution. Please reach out with any questions.
1. Use [open issues](https://github.com/Microsoft/DeepSeismic/issues) to discuss the proposed changes. Create an issue describing changes if necessary to collect feedback. Also, please use provided labels to tag issues so everyone can easily sort issues of interest.
2. [Fork the repo](https://help.github.com/articles/fork-a-repo/) so you can make and test local changes.
3. Create a new branch **from staging branch** for the issue (please do not create a branch from master). We suggest prefixing the branch with your username and then a descriptive title: (e.g. username/update_contributing_docs)
4. Create a test that replicates the issue.
5. Make code changes.
6. Ensure unit tests pass and code style / formatting is consistent TODO: add docstring links.
7. Create a pull request against **staging** branch.
Once the features included in a [milestone](https://github.com/Microsoft/DeepSeismic/milestones) are completed, we will merge contrib into staging. TODO: make a wiki with coding guidelines.
## Coding Guidelines
We strive to maintain high quality code to make the utilities in the repository easy to understand, use, and extend. We also work hard to maintain a friendly and constructive environment. We've found that having clear expectations on the development process and consistent style helps to ensure everyone can contribute and collaborate effectively.
### Code formatting and style checking
We use `git-hooks` to automate the process of formatting and style checking the code. In particular, we use `black` as a code formatter, `flake8` for style checking, and the `pre-commit` Python framework, which ensures that both, code formatter and checker, are ran on the code during commit. If they are executed with no issues, then the commit is made, otherwise, the commit is denied until stylistic or formatting changes are made.
Please follow these instructions to set up `pre-commit` in your environment.
```
pip install pre-commit
pre-commit install
```
The above will install the pre-commit package, and install git hooks specified in `.pre-commit-config.yaml` into your `.git/` directory.
## Microsoft Contributor License Agreement
Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
TODO: add CLA-bot
## Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
Apart from the official Code of Conduct developed by Microsoft, in the Computer Vision team we adopt the following behaviors, to ensure a great working environment:
#### Do not point fingers
Lets be constructive.
<details>
<summary><em>Click here to see some examples</em></summary>
"This method is missing docstrings" instead of "YOU forgot to put docstrings".
</details>
#### Provide code feedback based on evidence
When making code reviews, try to support your ideas based on evidence (papers, library documentation, stackoverflow, etc) rather than your personal preferences.
<details>
<summary><em>Click here to see some examples</em></summary>
"When reviewing this code, I saw that the Python implementation the metrics are based on classes, however, [scikit-learn](https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics) and [tensorflow](https://www.tensorflow.org/api_docs/python/tf/metrics) use functions. We should follow the standard in the industry."
</details>
#### Ask questions - do not give answers
Try to be empathic.
<details>
<summary><em>Click here to see some examples</em></summary>
* Would it make more sense if ...?
* Have you considered this ... ?
</details>

Двоичные данные
DeepSeismicLogo.jpg Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 151 KiB

43
LICENSE
Просмотреть файл

@ -1,21 +1,22 @@
MIT License
Copyright (c) Microsoft Corporation.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
MIT License
Copyright (c) Microsoft Corporation. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE

2058
NOTICE.txt Executable file

Разница между файлами не показана из-за своего большого размера Загрузить разницу

471
README.md
Просмотреть файл

@ -1,69 +1,402 @@
---
page_type: sample
languages:
- csharp
products:
- dotnet
description: "Add 150 character max description"
urlFragment: "update-this-to-unique-url-stub"
---
# DeepSeismic
![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.DeepSeismic?branchName=master)
[![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.DeepSeismic?branchName=master)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=108&branchName=master)
# Official Microsoft Sample
<!--
Guidelines on README format: https://review.docs.microsoft.com/help/onboard/admin/samples/concepts/readme-template?branch=master
Guidance on onboarding samples to docs.microsoft.com/samples: https://review.docs.microsoft.com/help/onboard/admin/samples/process/onboarding?branch=master
Taxonomies for products and languages: https://review.docs.microsoft.com/new-hope/information-architecture/metadata/taxonomies?branch=master
-->
Give a short description for your sample here. What does it do and why is it important?
## Contents
Outline the file contents of the repository. It helps users navigate the codebase, build configuration and any related assets.
| File/folder | Description |
|-------------------|--------------------------------------------|
| `src` | Sample source code. |
| `.gitignore` | Define what to ignore at commit time. |
| `CHANGELOG.md` | List of changes to the sample. |
| `CONTRIBUTING.md` | Guidelines for contributing to the sample. |
| `README.md` | This README file. |
| `LICENSE` | The license for the sample. |
## Prerequisites
Outline the required components and tools that a user might need to have on their machine in order to run the sample. This can be anything from frameworks, SDKs, OS versions or IDE releases.
## Setup
Explain how to prepare the sample once the user clones or downloads the repository. The section should outline every step necessary to install dependencies and set up any settings (for example, API keys and output folders).
## Runnning the sample
Outline step-by-step instructions to execute the sample and see its output. Include steps for executing the sample from the IDE, starting specific services in the Azure portal or anything related to the overall launch of the code.
## Key concepts
Provide users with more context on the tools and services used in the sample. Explain some of the code that is being used and how services interact with each other.
## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
# DeepSeismic
![DeepSeismic](./assets/DeepSeismicLogo.jpg )
This repository shows you how to perform seismic imaging and interpretation on Azure. It empowers geophysicists and data scientists to run seismic experiments using state-of-art DSL-based PDE solvers and segmentation algorithms on Azure.
The repository provides sample notebooks, data loaders for seismic data, utilities, and out-of-the box ML pipelines, organized as follows:
- **sample notebooks**: these can be found in the `examples` folder - they are standard Jupyter notebooks which highlight how to use the codebase by walking the user through a set of pre-made examples
- **experiments**: the goal is to provide runnable Python scripts which train and test (score) our machine learning models in `experiments` folder. The models themselves are swappable, meaning a single train script can be used to run a different model on the same dataset by simply swapping out the configuration file which defines the model. Experiments are organized by model types and datasets - for example, "2D segmentation on Dutch F3 dataset", "2D segmentation on Penobscot dataset" and "3D segmentation on Penobscot dataset" are all different experiments. As another example, if one is swapping 2D segmentation models on Dutch F3 dataset, one would just point the train and test scripts to a different configuration file within the same experiment.
- **pip installable utilities**: we provide `cv_lib` and `deepseismic_interpretation` utilities (more info below) which are used by both sample notebooks and experiments mentioned above
DeepSeismic currently focuses on Seismic Interpretation (3D segmentation aka facies classification) with experimental code provided around Seismic Imaging.
### Quick Start
There are two ways to get started with the DeepSeismic codebase, which currently focuses on Interpretation:
- if you'd like to get an idea of how our interpretation (segmentation) models are used, simply review the [HRNet demo notebook](https://github.com/microsoft/DeepSeismic/blob/master/examples/interpretation/notebooks/HRNet_Penobscot_demo_notebook.ipynb)
- to actually run the code, you'll need to set up a compute environment (which includes setting up a GPU-enabled Linux VM and downloading the appropriate Anaconda Python packages) and download the datasets which you'd like to work with - detailed steps for doing this are provided in the next `Interpretation` section below.
If you run into any problems, chances are your problem has already been solved in the [Troubleshooting](#troubleshooting) section.
### Pre-run notebooks
Notebooks stored in the repository have output intentionally displaced - you can find full auto-generated versions of the notebooks here:
- **HRNet Penobscot demo**: [[HTML](https://deepseismicstore.blob.core.windows.net/shared/HRNet_Penobscot_demo_notebook.html)] [[.ipynb](https://deepseismicstore.blob.core.windows.net/shared/HRNet_Penobscot_demo_notebook.ipynb)]
- **Dutch F3 dataset**: [[HTML](https://deepseismicstore.blob.core.windows.net/shared/F3_block_training_and_evaluation_local.html)] [[.ipynb](https://deepseismicstore.blob.core.windows.net/shared/F3_block_training_and_evaluation_local.ipynb)]
### Azure Machine Learning
[Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/) enables you to train and deploy your machine learning models and pipelines at scale, ane leverage open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. If you are looking at getting started with using the code in this repository with Azure Machine Learning, refer to [Azure Machine Learning How-to](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml) to get started.
## Interpretation
For seismic interpretation, the repository consists of extensible machine learning pipelines, that shows how you can leverage state-of-the-art segmentation algorithms (UNet, SEResNET, HRNet) for seismic interpretation, and also benchmarking results from running these algorithms using various seismic datasets (Dutch F3, and Penobscot).
To run examples available on the repo, please follow instructions below to:
1) [Set up the environment](#setting-up-environment)
2) [Download the data sets](#dataset-download-and-preparation)
3) [Run example notebooks and scripts](#run-examples)
### Setting up Environment
Follow the instruction bellow to read about compute requirements and install required libraries.
#### Compute environment
We recommend using a virtual machine to run the example notebooks and scripts. Specifically, you will need a GPU powered Linux machine, as this repository is developed and tested on __Linux only__. The easiest way to get started is to use the [Azure Data Science Virtual Machine (DSVM) for Linux (Ubuntu)](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro). This VM will come installed with all the system requirements that are needed to create the conda environment described below and then run the notebooks in this repository.
For this repo, we recommend selecting a multi-GPU Ubuntu VM of type [Standard_NC12](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nc-series). The machine is powered by NVIDIA Tesla K80 (or V100 GPU for NCv2 series) which can be found in most Azure regions.
> NOTE: For users new to Azure, your subscription may not come with a quota for GPUs. You may need to go into the Azure portal to increase your quota for GPU VMs. Learn more about how to do this here: https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits.
#### Package Installation
To install packages contained in this repository, navigate to the directory where you pulled the DeepSeismic repo to run:
```bash
conda env create -f environment/anaconda/local/environment.yml
```
This will create the appropriate conda environment to run experiments.
Next you will need to install the common package for interpretation:
```bash
conda activate seismic-interpretation
pip install -e interpretation
```
Then you will also need to install `cv_lib` which contains computer vision related utilities:
```bash
pip install -e cv_lib
```
Both repos are installed in developer mode with the `-e` flag. This means that to update simply go to the folder and pull the appropriate commit or branch.
During development, in case you need to update the environment due to a conda env file change, you can run
```
conda env update --file environment/anaconda/local/environment.yml
```
from the root of DeepSeismic repo.
### Dataset download and preparation
This repository provides examples on how to run seismic interpretation on two publicly available annotated seismic datasets: [Penobscot](https://zenodo.org/record/1341774) and [F3 Netherlands](https://github.com/olivesgatech/facies_classification_benchmark). Their respective sizes (uncompressed on disk in your folder after downloading and pre-processing) are:
- **Penobscot**: 7.9 GB
- **Dutch F3**: 2.2 GB
Please make sure you have enough disk space to download either dataset.
We have experiments and notebooks which use either one dataset or the other. Depending on which experiment/notebook you want to run you'll need to download the corresponding dataset. We suggest you start by looking at [HRNet demo notebook](https://github.com/microsoft/DeepSeismic/blob/master/examples/interpretation/notebooks/HRNet_Penobscot_demo_notebook.ipynb) which requires the Penobscot dataset.
#### Penobscot
To download the Penobscot dataset run the [download_penobscot.sh](scripts/download_penobscot.sh) script, e.g.
```
data_dir="$HOME/data/penobscot"
mkdir -p "$data_dir"
./scripts/download_penobscot.sh "$data_dir"
```
Note that the specified download location should be configured with appropriate `write` permissions. On some Linux virtual machines, you may want to place the data into `/mnt` or `/data` folder so you have to make sure you have write access.
To make things easier, we suggested you use your home directory where you might run out of space. If this happens on an [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/) you can resize the disk quite easily from [Azure Portal](https://portal.azure.com) - please see the [Troubleshooting](#troubleshooting) section at the end of this README regarding [how to do this](#how-to-resize-data-science-virtual-machine-disk).
To prepare the data for the experiments (e.g. split into train/val/test), please run the following script (modifying arguments as desired):
```
python scripts/prepare_penobscot.py split_inline --data-dir="$HOME/data/penobscot" --val-ratio=.1 --test-ratio=.2
```
#### F3 Netherlands
To download the F3 Netherlands dataset for 2D experiments, please follow the data download instructions at
[this github repository](https://github.com/yalaudah/facies_classification_benchmark) (section Dataset).
Once you've downloaded the data set, make sure to create an empty `splits` directory, under the downloaded `data` directory; you can re-use the same data directory as the one for Penobscot dataset created earlier. This is where your training/test/validation splits will be saved.
```
cd data
mkdir splits
```
At this point, your `data` directory tree should look like this:
```
data
├── splits
├── test_once
│ ├── test1_labels.npy
│ ├── test1_seismic.npy
│ ├── test2_labels.npy
│ └── test2_seismic.npy
└── train
├── train_labels.npy
└── train_seismic.npy
```
To prepare the data for the experiments (e.g. split into train/val/test), please run the following script:
```
# For section-based experiments
python scripts/prepare_dutchf3.py split_train_val section --data-dir=/mnt/dutchf3
# For patch-based experiments
python scripts/prepare_dutchf3.py split_train_val patch --data-dir=/mnt/dutchf3 --stride=50 --patch=100
```
Refer to the script itself for more argument options.
### Run Examples
#### Notebooks
We provide example notebooks under `examples/interpretation/notebooks/` to demonstrate how to train seismic interpretation models and evaluate them on Penobscot and F3 datasets.
Make sure to run the notebooks in the conda environment we previously set up (`seismic-interpretation`). To register the conda environment in Jupyter, please run:
```
python -m ipykernel install --user --name seismic-interpretation
```
#### Experiments
We also provide scripts for a number of experiments we conducted using different segmentation approaches. These experiments are available under `experiments/interpretation`, and can be used as examples. Within each experiment start from the `train.sh` and `test.sh` scripts under the `local/` (single GPU) and `distributed/` (multiple GPUs) directories, which invoke the corresponding python scripts, `train.py` and `test.py`. Take a look at the experiment configurations (see Experiment Configuration Files section below) for experiment options and modify if necessary.
Please refer to individual experiment README files for more information.
- [Penobscot](experiments/interpretation/penobscot/README.md)
- [F3 Netherlands Patch](experiments/interpretation/dutchf3_patch/README.md)
- [F3 Netherlands Section](experiments/interpretation/dutchf3_section/README.md)
#### Configuration Files
We use [YACS](https://github.com/rbgirshick/yacs) configuration library to manage configuration options for the experiments. There are three ways to pass arguments to the experiment scripts (e.g. train.py or test.py):
- __default.py__ - A project config file `default.py` is a one-stop reference point for all configurable options, and provides sensible defaults for all arguments. If no arguments are passed to `train.py` or `test.py` script (e.g. `python train.py`), the arguments are by default loaded from `default.py`. Please take a look at `default.py` to familiarize yourself with the experiment arguments the script you run uses.
- __yml config files__ - YAML configuration files under `configs/` are typically created one for each experiment. These are meant to be used for repeatable experiment runs and reproducible settings. Each configuration file only overrides the options that are changing in that experiment (e.g. options loaded from `defaults.py` during an experiment run will be overridden by arguments loaded from the yaml file). As an example, to use yml configuration file with the training script, run:
```
python train.py --cfg "configs/hrnet.yaml"
```
- __command line__ - Finally, options can be passed in through `options` argument, and those will override arguments loaded from the configuration file. We created CLIs for all our scripts (using Python Fire library), so you can pass these options via command-line arguments, like so:
```
python train.py DATASET.ROOT "/mnt/dutchf3" TRAIN.END_EPOCH 10
```
### Pretrained Models
#### HRNet
To achieve the same results as the benchmarks above you will need to download the HRNet model [pretrained](https://github.com/HRNet/HRNet-Image-Classification) on ImageNet. We are specifically using the [HRNet-W48-C](https://1drv.ms/u/s!Aus8VCZ_C_33dKvqI6pBZlifgJk) pre-trained model; other HRNet variants are also available [here](https://github.com/HRNet/HRNet-Image-Classification) - you can navigate to those from the [main HRNet landing page](https://github.com/HRNet/HRNet-Object-Detection) for object detection.
Unfortunately the OneDrive location which is used to host the model is using a temporary authentication token, so there is no way for us to scipt up model download. There are two ways to upload and use the pre-trained HRNet model on DS VM:
- download the model to your local drive using a web browser of your choice and then upload the model to the DS VM using something like `scp`; navigate to Portal and copy DS VM's public IP from the Overview panel of your DS VM (you can search your DS VM by name in the search bar of the Portal) then use `scp local_model_location username@DS_VM_public_IP:./model/save/path` to upload
- alternatively you can use the same public IP to open remote desktop over SSH to your Linux VM using [X2Go](https://wiki.x2go.org/doku.php/download:start): you can basically open the web browser on your VM this way and download the model to VM's disk
### Viewers (optional)
For seismic interpretation (segmentation), if you want to visualize cross-sections of a 3D volume (both the input velocity model and the segmented output) you can use
[segyviewer](https://github.com/equinor/segyviewer). To install and use segyviewer, please follow the instructions below.
#### segyviewer
To install [segyviewer](https://github.com/equinor/segyviewer) run:
```bash
conda env create -n segyviewer python=2.7
conda activate segyviewer
conda install -c anaconda pyqt=4.11.4
pip install segyviewer
```
To visualize cross-sections of a 3D volume, you can run
[segyviewer](https://github.com/equinor/segyviewer) like so:
```bash
segyviewer "${HOME}/data/dutchf3/data.segy"
```
### Benchmarks
#### Dense Labels
This section contains benchmarks of different algorithms for seismic interpretation on 3D seismic datasets with densely-annotated data.
Below are the results from the models contained in this repo. To run them check the instructions in <benchmarks> folder. Alternatively take a look in <examples> for how to run them on your own dataset
#### Netherlands F3
| Source | Experiment | PA | FW IoU | MCA |
|------------------|-----------------------------------|-------------|--------------|------------|
| Alaudah et al.| Section-based | 0.905 | 0.817 | .832 |
| | Patch-based | 0.852 | 0.743 | .689 |
| DeepSeismic | Patch-based+fixed | .869 | .761 | .775 |
| | SEResNet UNet+section depth | .917 | .849 | .834 |
| | HRNet(patch)+patch_depth | .908 | .843 | .837 |
| | HRNet(patch)+section_depth | .928 | .871 | .871 |
#### Penobscot
Trained and tested on full dataset. Inlines with artefacts were left in for training, validation and testing.
The dataset was split 70% training, 10% validation and 20% test. The results below are from the test set
| Source | Experiment | PA | IoU | MCA |
|------------------|-------------------------------------|-------------|--------------|------------|
| DeepSeismic | SEResNet UNet + section depth | 1.0 | .98 | .99 |
| | HRNet(patch) + section depth | 1.0 | .97 | .98 |
![Best Penobscot SEResNet](assets/penobscot_seresnet_best.png "Best performing inlines, Mask and Predictions from SEResNet")
![Worst Penobscot SEResNet](assets/penobscot_seresnet_worst.png "Worst performing inlines Mask and Predictions from SEResNet")
#### Reproduce benchmarks
In order to reproduce the benchmarks you will need to navigate to the [experiments](experiments) folder. In there each of the experiments
are split into different folders. To run the Netherlands F3 experiment navigate to the [dutchf3_patch/local](experiments/dutchf3_patch/local) folder. In there is a training script [([train.sh](experiments/dutchf3_patch/local/train.sh))
which will run the training for any configuration you pass in. Once you have run the training you will need to run the [test.sh](experiments/dutchf3_patch/local/test.sh) script. Make sure you specify
the path to the best performing model from your training run, either by passing it in as an argument or altering the YACS config file.
To reproduce the benchmarks
for the Penobscot dataset follow the same instructions but navigate to the [penobscot](penobscot) folder.
#### Scripts
- [parallel_training.sh](scripts/parallel_training.sh): Script to launch multiple jobs in parallel. Used mainly for local hyperparameter tuning. Look at the script for further instructions
- [kill_windows.sh](scripts/kill_windows.sh): Script to kill multiple tmux windows. Used to kill jobs that parallel_training.sh might have started.
## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
### Submitting a Pull Request
We try to keep the repo in a clean state, which means that we only enable read access to the repo - read access still enables one to submit a PR or an issue. To do so, fork the repo, and submit a PR from a branch in your forked repo into our staging branch.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## Build Status
| Build | Branch | Status |
| --- | --- | --- |
| **Legal Compliance** | staging | [![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.ComponentGovernance%20(seismic-deeplearning)?branchName=staging)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=124&branchName=staging) |
| **Legal Compliance** | master | [![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.ComponentGovernance%20(seismic-deeplearning)?branchName=master)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=124&branchName=master) |
| **Tests** | staging | [![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.Notebooks%20(seismic-deeplearning)?branchName=staging)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=125&branchName=staging) |
| **Tests** | master | [![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.Notebooks%20(seismic-deeplearning)?branchName=master)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=125&branchName=master) |
| **Notebook Tests** | staging | [![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.Tests%20(seismic-deeplearning)?branchName=staging)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=126&branchName=staging) |
| **Notebook Tests** | master | [![Build Status](https://dev.azure.com/best-practices/deepseismic/_apis/build/status/microsoft.Tests%20(seismic-deeplearning)?branchName=master)](https://dev.azure.com/best-practices/deepseismic/_build/latest?definitionId=126&branchName=master) |
# Troubleshooting
For Data Science Virtual Machine conda package installation issues, make sure you locate the anaconda location on the DSVM, for example by running:
```bash
which python
```
A typical output will be:
```bash
someusername@somevm:/projects/DeepSeismic$ which python
/anaconda/envs/py35/bin/python
```
which will indicate that anaconda folder is __/anaconda__. We'll refer to this location in instructions below, but you should update the commands according to your local anaconda folder.
<details>
<summary><b>Data Science Virtual Machine conda package installation errors</b></summary>
It could happen that you don't have sufficient permissions to run conda commands / install packages in an Anaconda packages directory. To remedy the situation, please run the following commands
```bash
rm -rf /anaconda/pkgs/*
sudo chown -R $(whoami) /anaconda
```
After these commands complete, try installing the packages again.
</details>
<details>
<summary><b>Data Science Virtual Machine conda package installation warnings</b></summary>
It could happen that while creating the conda environment defined by environment/anaconda/local/environment.yml on an Ubuntu DSVM, one can get multiple warnings like so:
```
WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /anaconda/pkgs/ipywidgets-7.5.1-py_0/site-packages/ipywidgets-7.5.1.dist-info/LICENSE. Please remove this file manually (you may need to reboot to free file handles)
```
If this happens, similar to instructions above, stop the conda environment creation (type ```Ctrl+C```) and then change recursively the ownership /anaconda directory from root to current user, by running this command:
```bash
sudo chown -R $USER /anaconda
```
After these command completes, try creating the conda environment in __environment/anaconda/local/environment.yml__ again.
</details>
<details>
<summary><b>Model training or scoring is not using GPU</b></summary>
To see if GPU is being using while your model is being trained or used for inference, run
```bash
nvidia-smi
```
and confirm that you see you Python process using the GPU.
If not, you may want to try reverting to an older version of CUDA for use with pyTorch. After the environment has been setup, run the following command (by default we use CUDA 10) after running `conda activate seismic-interpretation` to activate the conda environment:
```bash
conda install pytorch torchvision cudatoolkit=9.2 -c pytorch
```
To test whether this setup worked, right after you can open `ipython` and execute the following code
```python
import torch
torch.cuda.is_available()
```
The output should say "True".
If the output is still "False", you may want to try setting your environment variable to specify the device manually - to test this, start a new `ipython` session and type:
```python
import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
import torch
torch.cuda.is_available()
```
Output should say "True" this time. If it does, you can make the change permanent by adding
```bash
export CUDA_VISIBLE_DEVICES=0
```
to your `$HOME/.bashrc` file.
</details>
<details>
<summary><b>GPU out of memory errors</b></summary>
You should be able to see how much GPU memory your process is using by running
```bash
nvidia-smi
```
and seeing if this amount is close to the physical memory limit specified by the GPU manufacturer.
If we're getting close to the memory limit, you may want to lower the batch size in the model configuration file. Specifically, `TRAIN.BATCH_SIZE_PER_GPU` and `VALIDATION.BATCH_SIZE_PER_GPU` settings.
</details>
<details>
<summary><b>How to resize Data Science Virtual Machine disk</b></summary>
1. Go to the [Azure Portal](https://portal.azure.com) and find your virtual machine by typing its name in the search bar at the very top of the page.
2. In the Overview panel on the left hand side, click Stop button to stop the virtual machine.
3. Next, select Disks in the same panel on the left hand side.
4. Click the Name of the OS Disk - you'll be navigated to the Disk view. From this view, select Configuration on the left hand side and then increase Size in GB and hit the Save button.
5. Navigate back to the Virtual Machine view in Step 2 and click the Start button to start the virtual machine.
</details>

51
WORKERS Normal file
Просмотреть файл

@ -0,0 +1,51 @@
AUTO_RESUME: False
CUDNN:
BENCHMARK: True
DETERMINISTIC: False
ENABLED: True
DATASET:
CLASS_WEIGHTS: [0.7151, 0.8811, 0.5156, 0.9346, 0.9683, 0.9852]
NUM_CLASSES: 6
ROOT:
GPUS: (0,)
LOG_CONFIG: logging.conf
LOG_DIR:
MODEL:
IN_CHANNELS: 1
NAME: patch_deconvnet
OUTPUT_DIR: output
PIN_MEMORY: True
PRINT_FREQ: 20
SEED: 42
TEST:
CROSSLINE: True
INLINE: True
MODEL_PATH:
SPLIT: Both
TEST_STRIDE: 10
TRAIN:
AUGMENTATION: True
AUGMENTATIONS:
PAD:
HEIGHT: 256
WIDTH: 256
RESIZE:
HEIGHT: 200
WIDTH: 200
BATCH_SIZE_PER_GPU: 32
BEGIN_EPOCH: 0
DEPTH: no
END_EPOCH: 484
MAX_LR: 0.01
MEAN: 0.0009997
MIN_LR: 0.001
MODEL_DIR: models
MOMENTUM: 0.9
PATCH_SIZE: 99
SNAPSHOTS: 5
STD: 0.20977
STRIDE: 50
WEIGHT_DECAY: 0.0001
VALIDATION:
BATCH_SIZE_PER_GPU: 32
WORKERS: 4

Двоичные данные
assets/DeepSeismicLogo.jpg Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 151 KiB

Двоичные данные
assets/penobscot_seresnet_best.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 48 KiB

Двоичные данные
assets/penobscot_seresnet_worst.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 50 KiB

Просмотреть файл

@ -1,19 +0,0 @@
# Starter pipeline
# Start with a minimal pipeline that you can customize to build and deploy your code.
# Add steps that build, run tests, deploy, and more:
# https://aka.ms/yaml
trigger:
- master
pool:
vmImage: 'ubuntu-latest'
steps:
- script: echo Hello, world!
displayName: 'Run a one-line script'
- script: |
echo Add other tasks to build, test, and deploy your project.
echo See https://aka.ms/yaml
displayName: 'Run a multi-line script'

6
bin/ds
Просмотреть файл

@ -1,6 +0,0 @@
#!/usr/bin/env python
from deepseismic import cli
if __name__ == "__main__":
cli.main()

64
cgmanifest.json Normal file
Просмотреть файл

@ -0,0 +1,64 @@
{"Registrations":[
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/olivesgatech/facies_classification_benchmark",
"commitHash": "12102683a1ae78f8fbc953823c35a43b151194b3"
}
},
"license": "MIT"
},
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/waldeland/CNN-for-ASI",
"commitHash": "6f985cccecf9a811565d0b7cd919412569a22b7b"
}
},
"license": "MIT"
},
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/opesci/devito",
"commitHash": "f6129286d9c0b3a8bfe07e724ac5b00dc762efee"
}
},
"license": "MIT"
},
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/pytorch/ignite",
"commitHash": "38a4f37de759e33bc08441bde99bcb50f3d81f55"
}
},
"license": "BSD-3-Clause"
},
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/HRNet/HRNet-Semantic-Segmentation",
"commitHash": "06142dc1c7026e256a7561c3e875b06622b5670f"
}
},
"license": "MIT"
},
{
"component": {
"type": "git",
"git": {
"repositoryUrl": "https://github.com/dask/dask",
"commitHash": "54019e9c05134585c9c40e4195206aa78e2ea61a"
}
},
"license": "IPL-1.0"
}
],
"Version": 1
}

8
contrib/README.md Normal file
Просмотреть файл

@ -0,0 +1,8 @@
### Contrib folder
Code in this folder has not been tested, and are meant for exploratory work only.
We encourage submissions to the contrib folder, and once they are well-tested, do submit a pull request and work with the repository owners to graduate it to the main DeepSeismic repository.
Thank you.

Просмотреть файл

@ -0,0 +1,6 @@
# Benchmarks
In this folder we show benchmarks using different algorithms. To facilitate the benchmark computation, we provide a set of wrapper functions that can be found in the file [benchmark_utils.py](benchmark_utils.py).
TODO

Просмотреть файл

Просмотреть файл

@ -0,0 +1,17 @@
First, make sure that `${HOME}/data/dutch_f3` folder exists and you have write access.
Next, to get the main input dataset which is the [Dutch F3 dataset](https://terranubis.com/datainfo/Netherlands-Offshore-F3-Block-Complete),
navigate to [MalenoV](https://github.com/bolgebrygg/MalenoV) project website and follow the links (which will lead to
[this](https://drive.google.com/drive/folders/0B7brcf-eGK8CbGhBdmZoUnhiTWs) download). Save this file as
`${HOME}/data/dutch_f3/data.segy`
To download the train and validation masks, from the root of the repo, run
```bash
./contrib/scripts/get_F3_voxel.sh ${HOME}/data/dutch_f3
```
This will also download train and validation masks to the same location as data.segy.
That's it!
To run the training script, run `python train.py --cfg=configs/texture_net.yaml`.

Просмотреть файл

@ -0,0 +1,41 @@
# TextureNet configuration
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 10
LOG_CONFIG: logging.conf
SEED: 2019
WINDOW_SIZE: 65
DATASET:
NUM_CLASSES: 2
ROOT: /home/maxkaz/data/dutchf3
FILENAME: data.segy
MODEL:
NAME: texture_net
IN_CHANNELS: 1
NUM_FILTERS: 50
TRAIN:
BATCH_SIZE_PER_GPU: 32
END_EPOCH: 5000
LR: 0.02
MOMENTUM: 0.9
WEIGHT_DECAY: 0.0001
DEPTH: "voxel" # Options are No, Patch, Section and Voxel
MODEL_DIR: "models"
VALIDATION:
BATCH_SIZE_PER_GPU: 32
TEST:
MODEL_PATH: ""
SPLIT: 'Both' # Can be Both, Test1, Test2

Просмотреть файл

@ -0,0 +1,82 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# ------------------------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from yacs.config import CfgNode as CN
_C = CN()
# Cudnn related params
_C.CUDNN = CN()
_C.CUDNN.BENCHMARK = True
_C.CUDNN.DETERMINISTIC = False
_C.CUDNN.ENABLED = True
_C.GPUS = (0,)
_C.OUTPUT_DIR = "output" # This will be the base directory for all output, such as logs and saved models
_C.LOG_DIR = "" # This will be a subdirectory inside OUTPUT_DIR
_C.WORKERS = 4
_C.PRINT_FREQ = 20
_C.LOG_CONFIG = "logging.conf"
_C.SEED = 42
# size of voxel cube: WINDOW_SIZE x WINDOW_SIZE x WINDOW_SIZE; used for 3D models only
_C.WINDOW_SIZE = 65
# DATASET related params
_C.DATASET = CN()
_C.DATASET.NUM_CLASSES = 2
_C.DATASET.ROOT = ""
_C.DATASET.FILENAME = "data.segy"
# common params for NETWORK
_C.MODEL = CN()
_C.MODEL.NAME = "texture_net"
_C.MODEL.IN_CHANNELS = 1
_C.MODEL.NUM_FILTERS = 50
_C.MODEL.EXTRA = CN(new_allowed=True)
# training
_C.TRAIN = CN()
_C.TRAIN.BATCH_SIZE_PER_GPU = 32
# number of batches per epoch
_C.TRAIN.BATCH_PER_EPOCH = 10
# total number of epochs
_C.TRAIN.END_EPOCH = 200
_C.TRAIN.LR = 0.01
_C.TRAIN.MOMENTUM = 0.9
_C.TRAIN.WEIGHT_DECAY = 0.0001
_C.TRAIN.DEPTH = "voxel" # Options are None, Patch and Section
_C.TRAIN.MODEL_DIR = "models" # This will be a subdirectory inside OUTPUT_DIR
# validation
_C.VALIDATION = CN()
_C.VALIDATION.BATCH_SIZE_PER_GPU = 32
# TEST
_C.TEST = CN()
_C.TEST.MODEL_PATH = ""
_C.TEST.SPLIT = "Both" # Can be Both, Test1, Test2
def update_config(cfg, options=None, config_file=None):
cfg.defrost()
if config_file:
cfg.merge_from_file(config_file)
if options:
cfg.merge_from_list(options)
cfg.freeze()
if __name__ == "__main__":
import sys
with open(sys.argv[1], "w") as f:
print(_C, file=f)

Просмотреть файл

@ -0,0 +1,34 @@
[loggers]
keys=root,__main__,event_handlers
[handlers]
keys=consoleHandler
[formatters]
keys=simpleFormatter
[logger_root]
level=INFO
handlers=consoleHandler
[logger___main__]
level=INFO
handlers=consoleHandler
qualname=__main__
propagate=0
[logger_event_handlers]
level=INFO
handlers=consoleHandler
qualname=event_handlers
propagate=0
[handler_consoleHandler]
class=StreamHandler
level=INFO
formatter=simpleFormatter
args=(sys.stdout,)
[formatter_simpleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s

Просмотреть файл

@ -0,0 +1,230 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# /* spell-checker: disable */
import logging
import logging.config
from os import path
import fire
import numpy as np
import torch
from torch.utils import data
from ignite.engine import Events
from ignite.handlers import ModelCheckpoint
from ignite.metrics import Loss
from ignite.utils import convert_tensor
from tqdm import tqdm
from deepseismic_interpretation.dutchf3.data import get_voxel_loader
from deepseismic_interpretation.models.texture_net import TextureNet
from cv_lib.utils import load_log_configuration
from cv_lib.event_handlers import (
SnapshotHandler,
logging_handlers,
tensorboard_handlers,
)
from cv_lib.event_handlers.logging_handlers import Evaluator
from cv_lib.event_handlers.tensorboard_handlers import create_summary_writer
from cv_lib.segmentation.metrics import (
pixelwise_accuracy,
class_accuracy,
mean_class_accuracy,
class_iou,
mean_iou,
)
from cv_lib.segmentation import extract_metric_from
# from cv_lib.segmentation.dutchf3.engine import (
# create_supervised_evaluator,
# create_supervised_trainer,
# )
# Use ignite generic versions for now
from ignite.engine import create_supervised_trainer, create_supervised_evaluator
from default import _C as config
from default import update_config
def _prepare_batch(batch, device=None, non_blocking=False, t_type=torch.FloatTensor):
x, y = batch
new_x = convert_tensor(torch.squeeze(x, 1), device=device, non_blocking=non_blocking)
new_y = convert_tensor(torch.unsqueeze(y, 2), device=device, non_blocking=non_blocking)
if device == "cuda":
return (
new_x.type(t_type).cuda(),
torch.unsqueeze(new_y, 3).type(torch.LongTensor).cuda(),
)
else:
return new_x.type(t_type), torch.unsqueeze(new_y, 3).type(torch.LongTensor)
def run(*options, cfg=None):
"""Run training and validation of model
Notes:
Options can be passed in via the options argument and loaded from the cfg file
Options from default.py will be overridden by options loaded from cfg file
Options passed in via options argument will override option loaded from cfg file
Args:
*options (str,int ,optional): Options used to overide what is loaded from the
config. To see what options are available consult
default.py
cfg (str, optional): Location of config file to load. Defaults to None.
"""
update_config(config, options=options, config_file=cfg)
# Start logging
load_log_configuration(config.LOG_CONFIG)
logger = logging.getLogger(__name__)
logger.debug(config.WORKERS)
torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK
torch.manual_seed(config.SEED)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(config.SEED)
np.random.seed(seed=config.SEED)
# load the data
TrainVoxelLoader = get_voxel_loader(config)
train_set = TrainVoxelLoader(
config.DATASET.ROOT,
config.DATASET.FILENAME,
split="train",
window_size=config.WINDOW_SIZE,
len=config.TRAIN.BATCH_SIZE_PER_GPU * config.TRAIN.BATCH_PER_EPOCH,
batch_size=config.TRAIN.BATCH_SIZE_PER_GPU,
)
val_set = TrainVoxelLoader(
config.DATASET.ROOT,
config.DATASET.FILENAME,
split="val",
window_size=config.WINDOW_SIZE,
len=config.TRAIN.BATCH_SIZE_PER_GPU * config.TRAIN.BATCH_PER_EPOCH,
batch_size=config.TRAIN.BATCH_SIZE_PER_GPU,
)
n_classes = train_set.n_classes
# set dataset length to batch size to be consistent with 5000 iterations
# each of size 32 in the original Waldeland implementation
train_loader = data.DataLoader(
train_set, batch_size=config.TRAIN.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS, shuffle=False,
)
val_loader = data.DataLoader(
val_set, batch_size=config.VALIDATION.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS, shuffle=False,
)
# this is how we import model for CV - here we're importing a seismic
# segmentation model
model = TextureNet(n_classes=config.DATASET.NUM_CLASSES)
optimizer = torch.optim.Adam(
model.parameters(),
lr=config.TRAIN.LR,
# momentum=config.TRAIN.MOMENTUM,
weight_decay=config.TRAIN.WEIGHT_DECAY,
)
device = "cpu"
if torch.cuda.is_available():
device = "cuda"
model = model.cuda()
loss = torch.nn.CrossEntropyLoss()
trainer = create_supervised_trainer(model, optimizer, loss, prepare_batch=_prepare_batch, device=device)
desc = "ITERATION - loss: {:.2f}"
pbar = tqdm(initial=0, leave=False, total=len(train_loader), desc=desc.format(0))
# add model checkpointing
output_dir = path.join(config.OUTPUT_DIR, config.TRAIN.MODEL_DIR)
checkpoint_handler = ModelCheckpoint(
output_dir, "model", save_interval=1, n_saved=3, create_dir=True, require_empty=False,
)
criterion = torch.nn.CrossEntropyLoss(reduction="mean")
# save model at each epoch
trainer.add_event_handler(Events.EPOCH_COMPLETED, checkpoint_handler, {config.MODEL.NAME: model})
def _select_pred_and_mask(model_out):
# receive a tuple of (x, y_pred), y
# so actually in line 51 of
# cv_lib/cv_lib/segmentation/dutch_f3/metrics/__init__.py
# we do the following line, so here we just select the model
# _, y_pred = torch.max(model_out[0].squeeze(), 1, keepdim=True)
y_pred = model_out[0].squeeze()
y = model_out[1].squeeze()
return (y_pred.squeeze(), y)
evaluator = create_supervised_evaluator(
model,
metrics={
"nll": Loss(criterion, device=device),
"pixa": pixelwise_accuracy(n_classes, output_transform=_select_pred_and_mask, device=device),
"cacc": class_accuracy(n_classes, output_transform=_select_pred_and_mask, device=device),
"mca": mean_class_accuracy(n_classes, output_transform=_select_pred_and_mask, device=device),
"ciou": class_iou(n_classes, output_transform=_select_pred_and_mask, device=device),
"mIoU": mean_iou(n_classes, output_transform=_select_pred_and_mask, device=device),
},
device=device,
prepare_batch=_prepare_batch,
)
# Set the validation run to start on the epoch completion of the training run
trainer.add_event_handler(Events.EPOCH_COMPLETED, Evaluator(evaluator, val_loader))
summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR))
evaluator.add_event_handler(
Events.EPOCH_COMPLETED,
logging_handlers.log_metrics(
"Validation results",
metrics_dict={
"mIoU": "Avg IoU :",
"nll": "Avg loss :",
"pixa": "Pixelwise Accuracy :",
"mca": "Mean Class Accuracy :",
},
),
)
evaluator.add_event_handler(
Events.EPOCH_COMPLETED,
tensorboard_handlers.log_metrics(
summary_writer,
trainer,
"epoch",
metrics_dict={"mIoU": "Validation/IoU", "nll": "Validation/Loss", "mca": "Validation/MCA",},
),
)
summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR))
snapshot_duration = 1
def snapshot_function():
return (trainer.state.iteration % snapshot_duration) == 0
checkpoint_handler = SnapshotHandler(
path.join(output_dir, config.TRAIN.MODEL_DIR),
config.MODEL.NAME,
extract_metric_from("mIoU"),
snapshot_function,
)
evaluator.add_event_handler(Events.EPOCH_COMPLETED, checkpoint_handler, {"model": model})
logger.info("Starting training")
trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH // config.TRAIN.BATCH_PER_EPOCH)
pbar.close()
if __name__ == "__main__":
fire.Fire(run)

Просмотреть файл

@ -0,0 +1,54 @@
# Voxel to Pixel approach to Seismic Interpretation
The code which is used in this approach is greatly described in the paper
<br />
**Convolutional Neural Networks for Automated Seismic Interpretation**,<br />
A. U. Waldeland, A. C. Jensen, L. Gelius and A. H. S. Solberg <br />
[*The Leading Edge, July 2018*](https://library.seg.org/doi/abs/10.1190/tle37070529.1)
There is also an
EAGE E-lecture which you can watch: [*Seismic interpretation with deep learning*](https://www.youtube.com/watch?v=lm85Ap4OstM) (YouTube)
### Setup to get started
- make sure you follow `README.md` file in root of repo to install all the proper dependencies.
- downgrade TensorFlow and pyTorch's CUDA:
- downgrade TensorFlow by running `pip install tensorflow-gpu==1.14`
- make sure pyTorch uses downgraded CUDA `pip install torch==1.3.1+cu92 torchvision==0.4.2+cu92 -f https://download.pytorch.org/whl/torch_stable.html`
- download the data by running `contrib/scrips/get_F3_voxel.sh` from the `contrib` folder of this repo.
This will download the training and validation labels/masks.
- to get the main input dataset which is the [Dutch F3 dataset](https://terranubis.com/datainfo/Netherlands-Offshore-F3-Block-Complete),
navigate to [MalenoV](https://github.com/bolgebrygg/MalenoV) project website and follow the links (which will lead to
[this](https://drive.google.com/drive/folders/0B7brcf-eGK8CbGhBdmZoUnhiTWs) download). Save this file as
`interpretation/voxel2pixel/F3/data.segy`
If you want to revert downgraded packages, just run `conda env update -f environment/anaconda/local/environment.yml` from the root folder of the repo.
### Monitoring progress with TensorBoard
- from the `voxel2pixel` directory, run `tensorboard --logdir='log'` (all runtime logging information is
written to the `log` folder <br />
- open a web-browser and go to localhost:6006<br />
More information can be found [here](https://www.tensorflow.org/get_started/summaries_and_tensorboard#launching_tensorboard).
### Usage
- `python train.py` will train the CNN and produce a model after a few hours on a decent gaming GPU
with at least 6GB of onboard memory<br />
- `python test_parallel.py` - Example of how the trained CNN can be applied to predict salt in a slice or
the full cube in distributed fashion on a single multi-GPU machine (single GPU mode is also supported).
In addition it shows how learned attributes can be extracted.<br />
### Files
In addition, it may be useful to have a look on these files<br/>
- texture_net.py - this is where the network is defined <br/>
- batch.py - provides functionality to generate training batches with random augmentation <br/>
- data.py - load/save data sets with segy-format and labeled slices as images <br/>
- tb_logger.py - connects to the tensorboard functionality <br/>
- utils.py - some help functions <br/>
- test_parallel.py - multi-GPU prediction script for scoring<br />
### Using a different data set and custom training labels
If you want to use a different data set, do the following:
- Make a new folder where you place the segy-file
- Make a folder for the training labels
- Save images of the slices you want to train on as 'SLICETYPE_SLICENO.png' (or jpg), where SLICETYPE is either 'inline', 'crossline', or 'timeslice' and SLICENO is the slice number.
- Draw the classes on top of the seismic data, using a simple image editing program with the class colors. Currently up to six classes are supported, indicated by the colors: red, blue, green, cyan, magenta and yellow.

Просмотреть файл

@ -0,0 +1,351 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
import numpy as np
def get_random_batch(
data_cube,
label_coordinates,
im_size,
num_batch_size,
random_flip=False,
random_stretch=None,
random_rot_xy=None,
random_rot_z=None,
):
"""
Returns a batch of augmented samples with center pixels randomly drawn from label_coordinates
Args:
data_cube: 3D numpy array with floating point velocity values
label_coordinates: 3D coordinates of the labeled training slice
im_size: size of the 3D voxel which we're cutting out around each label_coordinate
num_batch_size: size of the batch
random_flip: bool to perform random voxel flip
random_stretch: bool to enable random stretch
random_rot_xy: bool to enable random rotation of the voxel around dim-0 and dim-1
random_rot_z: bool to enable random rotation around dim-2
Returns:
a tuple of batch numpy array array of data with dimension
(batch, 1, data_cube.shape[0], data_cube.shape[1], data_cube.shape[2]) and the associated labels as an array
of size (batch).
"""
# Make 3 im_size elements
if isinstance(im_size, int):
im_size = [im_size, im_size, im_size]
# Output arrays
batch = np.zeros([num_batch_size, 1, im_size[0], im_size[1], im_size[2]])
ret_labels = np.zeros([num_batch_size])
class_keys = list(label_coordinates)
n_classes = len(class_keys)
# Loop through batch
n_for_class = 0
class_ind = 0
for i in range(num_batch_size):
# Start by getting a grid centered around (0,0,0)
grid = get_grid(im_size)
# Apply random flip
if random_flip:
grid = augment_flip(grid)
# Apply random rotations
if random_rot_xy:
grid = augment_rot_xy(grid, random_rot_xy)
if random_rot_z:
grid = augment_rot_z(grid, random_rot_z)
# Apply random stretch
if random_stretch:
grid = augment_stretch(grid, random_stretch)
# Pick random location from the label_coordinates for this class:
coords_for_class = label_coordinates[class_keys[class_ind]]
random_index = rand_int(0, coords_for_class.shape[1])
coord = coords_for_class[:, random_index : random_index + 1]
# Move grid to be centered around this location
grid += coord
# Interpolate samples at grid from the data:
sample = trilinear_interpolation(data_cube, grid)
# Insert in output arrays
ret_labels[i] = class_ind
batch[i, 0, :, :, :] = np.reshape(sample, (im_size[0], im_size[1], im_size[2]))
# We seek to have a balanced batch with equally many samples from each class.
n_for_class += 1
if n_for_class + 1 > int(0.5 + num_batch_size / float(n_classes)):
if class_ind < n_classes - 1:
class_ind += 1
n_for_class = 0
return batch, ret_labels
def get_grid(im_size):
"""
getGrid returns z,x,y coordinates centered around (0,0,0)
Args:
im_size: size of window
Returns
numpy int array with size: 3 x im_size**3
"""
win0 = np.linspace(-im_size[0] // 2, im_size[0] // 2, im_size[0])
win1 = np.linspace(-im_size[1] // 2, im_size[1] // 2, im_size[1])
win2 = np.linspace(-im_size[2] // 2, im_size[2] // 2, im_size[2])
x0, x1, x2 = np.meshgrid(win0, win1, win2, indexing="ij")
ex0 = np.expand_dims(x0.ravel(), 0)
ex1 = np.expand_dims(x1.ravel(), 0)
ex2 = np.expand_dims(x2.ravel(), 0)
grid = np.concatenate((ex0, ex1, ex2), axis=0)
return grid
def augment_flip(grid):
"""
Random flip of non-depth axes.
Args:
grid: 3D coordinates of the voxel
Returns:
flipped grid coordinates
"""
# Flip x axis
if rand_bool():
grid[1, :] = -grid[1, :]
# Flip y axis
if rand_bool():
grid[2, :] = -grid[2, :]
return grid
def augment_stretch(grid, stretch_factor):
"""
Random stretch/scale
Args:
grid: 3D coordinate grid of the voxel
stretch_factor: this is actually a boolean which triggers stretching
TODO: change this to just call the function and not do -1,1 in rand_float
Returns:
stretched grid coordinates
"""
stretch = rand_float(-stretch_factor, stretch_factor)
grid *= 1 + stretch
return grid
def augment_rot_xy(grid, random_rot_xy):
"""
Random rotation
Args:
grid: coordinate grid list of 3D points
random_rot_xy: this is actually a boolean which triggers rotation
TODO: change this to just call the function and not do -1,1 in rand_float
Returns:
randomly rotated grid
"""
theta = np.deg2rad(rand_float(-random_rot_xy, random_rot_xy))
x = grid[2, :] * np.cos(theta) - grid[1, :] * np.sin(theta)
y = grid[2, :] * np.sin(theta) + grid[1, :] * np.cos(theta)
grid[1, :] = x
grid[2, :] = y
return grid
def augment_rot_z(grid, random_rot_z):
"""
Random tilt around z-axis (dim-2)
Args:
grid: coordinate grid list of 3D points
random_rot_z: this is actually a boolean which triggers rotation
TODO: change this to just call the function and not do -1,1 in rand_float
Returns:
randomly tilted coordinate grid
"""
theta = np.deg2rad(rand_float(-random_rot_z, random_rot_z))
z = grid[0, :] * np.cos(theta) - grid[1, :] * np.sin(theta)
x = grid[0, :] * np.sin(theta) + grid[1, :] * np.cos(theta)
grid[0, :] = z
grid[1, :] = x
return grid
def trilinear_interpolation(input_array, indices):
"""
Linear interpolation
code taken from
http://stackoverflow.com/questions/6427276/3d-interpolation-of-numpy-arrays-without-scipy
Args:
input_array: 3D data array
indices: 3D grid coordinates
Returns:
interpolated input array
"""
x_indices, y_indices, z_indices = indices[0:3]
n0, n1, n2 = input_array.shape
x0 = x_indices.astype(np.integer)
y0 = y_indices.astype(np.integer)
z0 = z_indices.astype(np.integer)
x1 = x0 + 1
y1 = y0 + 1
z1 = z0 + 1
# put all samples outside datacube to 0
inds_out_of_range = (
(x0 < 0)
| (x1 < 0)
| (y0 < 0)
| (y1 < 0)
| (z0 < 0)
| (z1 < 0)
| (x0 >= n0)
| (x1 >= n0)
| (y0 >= n1)
| (y1 >= n1)
| (z0 >= n2)
| (z1 >= n2)
)
x0[inds_out_of_range] = 0
y0[inds_out_of_range] = 0
z0[inds_out_of_range] = 0
x1[inds_out_of_range] = 0
y1[inds_out_of_range] = 0
z1[inds_out_of_range] = 0
x = x_indices - x0
y = y_indices - y0
z = z_indices - z0
output = (
input_array[x0, y0, z0] * (1 - x) * (1 - y) * (1 - z)
+ input_array[x1, y0, z0] * x * (1 - y) * (1 - z)
+ input_array[x0, y1, z0] * (1 - x) * y * (1 - z)
+ input_array[x0, y0, z1] * (1 - x) * (1 - y) * z
+ input_array[x1, y0, z1] * x * (1 - y) * z
+ input_array[x0, y1, z1] * (1 - x) * y * z
+ input_array[x1, y1, z0] * x * y * (1 - z)
+ input_array[x1, y1, z1] * x * y * z
)
output[inds_out_of_range] = 0
return output
def rand_float(low, high):
"""
Generate random floating point number between two limits
Args:
low: low limit
high: high limit
Returns:
single random floating point number
"""
return (high - low) * np.random.random_sample() + low
def rand_int(low, high):
"""
Generate random integer between two limits
Args:
low: low limit
high: high limit
Returns:
random integer between two limits
"""
return np.random.randint(low, high)
def rand_bool():
"""
Generate random boolean.
Returns:
Random boolean
"""
return bool(np.random.randint(0, 2))
"""
TODO: the following is not needed and should be added as tests later.
# Test the batch-functions
if __name__ == "__main__":
from data import read_segy, read_labels, get_slice
import tb_logger
import numpy as np
import os
data, data_info = read_segy(os.path.join("F3", "data.segy"))
train_coordinates = {"1": np.expand_dims(np.array([50, 50, 50]), 1)}
logger = tb_logger.TBLogger("log", "batch test")
[batch, labels] = get_random_batch(data, train_coordinates, 65, 32)
logger.log_images("normal", batch)
[batch, labels] = get_random_batch(
data, train_coordinates, 65, 32, random_flip=True
)
logger.log_images("flipping", batch)
[batch, labels] = get_random_batch(
data, train_coordinates, 65, 32, random_stretch=0.50
)
logger.log_images("stretching", batch)
[batch, labels] = get_random_batch(
data, train_coordinates, 65, 32, random_rot_xy=180
)
logger.log_images("rot", batch)
[batch, labels] = get_random_batch(
data, train_coordinates, 65, 32, random_rot_z=15
)
logger.log_images("dip", batch)
train_cls_imgs, train_coordinates = read_labels(
os.path.join("F3", "train"), data_info
)
[batch, labels] = get_random_batch(data, train_coordinates, 65, 32)
logger.log_images("salt", batch[:16, :, :, :, :])
logger.log_images("not salt", batch[16:, :, :, :, :])
logger.log_images("data", data[:, :, 50])
"""

Просмотреть файл

@ -0,0 +1,326 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
from __future__ import print_function
from os.path import isfile, join
import segyio
from os import listdir
import numpy as np
import scipy.misc
def read_segy(filename):
"""
Read in a SEGY-format file given a filename
Args:
filename: input filename
Returns:
numpy data array and its info as a dictionary (tuple)
"""
print("Loading data cube from", filename, "with:")
# Read full data cube
data = segyio.tools.cube(filename)
# Put temporal axis first
data = np.moveaxis(data, -1, 0)
# Make data cube fast to access
data = np.ascontiguousarray(data, "float32")
# Read meta data
segyfile = segyio.open(filename, "r")
print(" Crosslines: ", segyfile.xlines[0], ":", segyfile.xlines[-1])
print(" Inlines: ", segyfile.ilines[0], ":", segyfile.ilines[-1])
print(" Timeslices: ", "1", ":", data.shape[0])
# Make dict with cube-info
# TODO: read this from segy
# Read dt and other params needed to do create a new
data_info = {
"crossline_start": segyfile.xlines[0],
"inline_start": segyfile.ilines[0],
"timeslice_start": 1,
"shape": data.shape,
}
return data, data_info
def write_segy(out_filename, in_filename, out_cube):
"""
Writes out_cube to a segy-file (out_filename) with same header/size as in_filename
Args:
out_filename:
in_filename:
out_cube:
Returns:
"""
# Select last channel
if type(out_cube) is list:
out_cube = out_cube[-1]
print("Writing interpretation to " + out_filename)
# Copy segy file
from shutil import copyfile
copyfile(in_filename, out_filename)
# Moving temporal axis back again
out_cube = np.moveaxis(out_cube, 0, -1)
# Open out-file
with segyio.open(out_filename, "r+") as src:
iline_start = src.ilines[0]
dtype = src.iline[iline_start].dtype
# loop through inlines and insert output
for i in src.ilines:
iline = out_cube[i - iline_start, :, :]
src.iline[i] = np.ascontiguousarray(iline.astype(dtype))
# TODO: rewrite this whole function
# Moving temporal axis first again - just in case the user want to keep working on it
out_cube = np.moveaxis(out_cube, -1, 0)
print("Writing interpretation - Finished")
return
# Alternative writings for slice-type
inline_alias = ["inline", "in-line", "iline", "y"]
crossline_alias = ["crossline", "cross-line", "xline", "x"]
timeslice_alias = ["timeslice", "time-slice", "t", "z", "depthslice", "depth"]
def read_labels(fname, data_info):
"""
Read labels from an image.
Args:
fname: filename of labelling mask (image)
data_info: dictionary describing the data
Returns:
list of labels and list of coordinates
"""
label_imgs = []
label_coordinates = {}
# Find image files in folder
tmp = fname.split("/")[-1].split("_")
slice_type = tmp[0].lower()
tmp = tmp[1].split(".")
slice_no = int(tmp[0])
if slice_type not in inline_alias + crossline_alias + timeslice_alias:
print(
"File:", fname, "could not be loaded.", "Unknown slice type",
)
return None
if slice_type in inline_alias:
slice_type = "inline"
if slice_type in crossline_alias:
slice_type = "crossline"
if slice_type in timeslice_alias:
slice_type = "timeslice"
# Read file
print("Loading labels for", slice_type, slice_no, "with")
img = scipy.misc.imread(fname)
img = interpolate_to_fit_data(img, slice_type, slice_no, data_info)
label_img = parse_labels_in_image(img)
# Get coordinates for slice
coords = get_coordinates_for_slice(slice_type, slice_no, data_info)
# Loop through labels in label_img and append to label_coordinates
for cls in np.unique(label_img):
if cls > -1:
if str(cls) not in label_coordinates.keys():
label_coordinates[str(cls)] = np.array(np.zeros([3, 0]))
inds_with_cls = label_img == cls
cords_with_cls = coords[:, inds_with_cls.ravel()]
label_coordinates[str(cls)] = np.concatenate((label_coordinates[str(cls)], cords_with_cls), 1)
print(
" ", str(np.sum(inds_with_cls)), "labels for class", str(cls),
)
if len(np.unique(label_img)) == 1:
print(" ", 0, "labels", str(cls))
# Add label_img to output
label_imgs.append([label_img, slice_type, slice_no])
return label_imgs, label_coordinates
# Add colors to this table to make it possible to have more classes
class_color_coding = [
[0, 0, 255], # blue
[0, 255, 0], # green
[0, 255, 255], # cyan
[255, 0, 0], # red
[255, 0, 255], # blue
[255, 255, 0], # yellow
]
def parse_labels_in_image(img):
"""
Convert RGB image to class img.
Args:
img: 3-channel image array
Returns:
monotonically increasing class labels
"""
label_img = np.int16(img[:, :, 0]) * 0 - 1 # -1 = no class
# decompose color channels (#Alpha is ignored)
r = img[:, :, 0]
g = img[:, :, 1]
b = img[:, :, 2]
# Alpha channel
if img.shape[2] == 4:
a = 1 - img.shape[2] // 255
r = r * a
g = g * a
b = b * a
tolerance = 1
# Go through classes and find pixels with this class
cls = 0
for color in class_color_coding:
# Find pixels with these labels
inds = (
(np.abs(r - color[0]) < tolerance) & (np.abs(g - color[1]) < tolerance) & (np.abs(b - color[2]) < tolerance)
)
label_img[inds] = cls
cls += 1
return label_img
def interpolate_to_fit_data(img, slice_type, slice_no, data_info):
"""
Function to resize image if needed
Args:
img: image array
slice_type: inline, crossline or timeslice slice type
slice_no: slice number
data_info: data info dictionary distracted from SEGY file
Returns:
resized image array
"""
# Get wanted output size
if slice_type == "inline":
n0 = data_info["shape"][0]
n1 = data_info["shape"][2]
elif slice_type == "crossline":
n0 = data_info["shape"][0]
n1 = data_info["shape"][1]
elif slice_type == "timeslice":
n0 = data_info["shape"][1]
n1 = data_info["shape"][2]
return scipy.misc.imresize(img, (n0, n1), interp="nearest")
def get_coordinates_for_slice(slice_type, slice_no, data_info):
"""
Get coordinates for slice in the full cube
Args:
slice_type: type of slice, e.g. inline, crossline, etc
slice_no: slice number
data_info: data dictionary array
Returns:
index coordinates of the voxel
"""
ds = data_info["shape"]
# Coordinates for cube
x0, x1, x2 = np.meshgrid(
np.linspace(0, ds[0] - 1, ds[0]),
np.linspace(0, ds[1] - 1, ds[1]),
np.linspace(0, ds[2] - 1, ds[2]),
indexing="ij",
)
if slice_type == "inline":
start = data_info["inline_start"]
slice_no = slice_no - start
x0 = x0[:, slice_no, :]
x1 = x1[:, slice_no, :]
x2 = x2[:, slice_no, :]
elif slice_type == "crossline":
start = data_info["crossline_start"]
slice_no = slice_no - start
x0 = x0[:, :, slice_no]
x1 = x1[:, :, slice_no]
x2 = x2[:, :, slice_no]
elif slice_type == "timeslice":
start = data_info["timeslice_start"]
slice_no = slice_no - start
x0 = x0[slice_no, :, :]
x1 = x1[slice_no, :, :]
x2 = x2[slice_no, :, :]
# Collect indexes
x0 = np.expand_dims(x0.ravel(), 0)
x1 = np.expand_dims(x1.ravel(), 0)
x2 = np.expand_dims(x2.ravel(), 0)
coords = np.concatenate((x0, x1, x2), axis=0)
return coords
def get_slice(data, data_info, slice_type, slice_no, window=0):
"""
Return data-slice
Args:
data: input 3D voxel numpy array
data_info: data info dictionary
slice_type: type of slice, like inline, crossline, etc
slice_no: slice number
window: window size around center pixel
Returns:
2D slice of the voxel as a numpy array
"""
if slice_type == "inline":
start = data_info["inline_start"]
elif slice_type == "crossline":
start = data_info["crossline_start"]
elif slice_type == "timeslice":
start = data_info["timeslice_start"]
slice_no = slice_no - start
slice = data[:, slice_no - window : slice_no + window + 1, :]
return np.squeeze(slice)

Просмотреть файл

@ -0,0 +1,181 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
from __future__ import print_function
from os.path import join
# TODO: make this nicer and remove the non-bare except for PEP8 compliance
try:
import tensorflow as tf
except:
print("Tensorflow could not be imported, therefore tensorboard cannot be used.")
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import torch
import datetime
# TODO: it looks like the majority of the methods of this class are static and as such they should be in utils
class TBLogger(object):
"""
TensorBoard logger class
"""
def __init__(self, log_dir, folder_name=""):
self.log_dir = join(log_dir, folder_name + " " + datetime.datetime.now().strftime("%I%M%p, %B %d, %Y"),)
self.log_dir = self.log_dir.replace("//", "/")
self.writer = tf.summary.FileWriter(self.log_dir)
def log_scalar(self, tag, value, step=0):
"""
Add scalar
Args:
tag: tag
value: simple_value
step: step
"""
summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])
self.writer.add_summary(summary, step)
# TODO: this should probably be a static method - take care of this when re-writing the whole thing
def make_list_of_2d_array(self, im):
"""
Flatten 2D array to a list
Args:
im: image
Returns:
Flattened image list
"""
if isinstance(im, list):
return im
ims = []
if len(im.shape) == 2:
ims.append(im)
elif len(im.shape) == 3:
for i in range(im.shape[0]):
ims.append(np.squeeze(im[i, :, :]))
elif len(im.shape) == 4:
for i in range(im.shape[0]):
ims.append(np.squeeze(im[i, 0, :, :]))
return ims
def log_images(self, tag, images, step=0, dim=2, max_imgs=50, cm="jet"):
"""
Log images to TensorBoard
Args:
tag: image tag
images: list of images
step: training step
dim: image shape (3 for voxel)
max_imgs: max number of images
cm: colormap
"""
# Make sure images are on numpy format in case the input is a Torch-variable
images = self.convert_to_numpy(images)
if len(images.shape) > 2:
dim = 3
# Make list of images
if dim == 2:
images = self.make_list_of_2d_array(images)
# If 3D we make one list for each slice-type
if dim == 3:
new_images_ts, new_images_il, new_images_cl = self.get_slices_from_3d(images)
self.log_images(tag + "_timeslice", new_images_ts, step, 2, max_imgs)
self.log_images(tag + "_inline", new_images_il, step, 2, max_imgs)
self.log_images(tag + "_crossline", new_images_cl, step, 2, max_imgs)
return
im_summaries = []
for nr, img in enumerate(images):
# Grayscale
if cm == "gray" or cm == "grey":
img = img.astype("float")
img = np.repeat(np.expand_dims(img, 2), 3, 2)
img -= img.min()
img /= img.max()
img *= 255
img = img.astype("uint8")
# Write the image to a string
s = BytesIO()
plt.imsave(s, img, format="png")
# Create an Image object
img_sum = tf.Summary.Image(encoded_image_string=s.getvalue(), height=img.shape[0], width=img.shape[1],)
# Create a Summary value
im_summaries.append(tf.Summary.Value(tag="%s/%d" % (tag, nr), image=img_sum))
# if nr == max_imgs-1:
# break
# Create and write Summary
summary = tf.Summary(value=im_summaries)
self.writer.add_summary(summary, step)
# TODO: probably another static method
def get_slices_from_3d(self, img):
"""
Cuts out middle slices from image
Args:
img: image array
"""
new_images_ts = []
new_images_il = []
new_images_cl = []
if len(img.shape) == 3:
new_images_ts.append(np.squeeze(img[img.shape[0] / 2, :, :]))
new_images_il.append(np.squeeze(img[:, img.shape[1] / 2, :]))
new_images_cl.append(np.squeeze(img[:, :, img.shape[2] / 2]))
elif len(img.shape) == 4:
for i in range(img.shape[0]):
new_images_ts.append(np.squeeze(img[i, img.shape[1] / 2, :, :]))
new_images_il.append(np.squeeze(img[i, :, img.shape[2] / 2, :]))
new_images_cl.append(np.squeeze(img[i, :, :, img.shape[3] / 2]))
elif len(img.shape) == 5:
for i in range(img.shape[0]):
new_images_ts.append(np.squeeze(img[i, 0, img.shape[2] / 2, :, :]))
new_images_il.append(np.squeeze(img[i, 0, :, img.shape[3] / 2, :]))
new_images_cl.append(np.squeeze(img[i, 0, :, :, img.shape[4] / 2]))
return new_images_ts, new_images_il, new_images_cl
# TODO: another static method most likely
def convert_to_numpy(self, im):
"""
Convert torch to numpy
Args:
im: image array
"""
if type(im) == torch.autograd.Variable:
# Put on CPU
im = im.cpu()
# Get np-data
im = im.data.numpy()
return im

Просмотреть файл

@ -0,0 +1,426 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
from __future__ import print_function
import os
# set default number of GPUs which are discoverable
N_GPU = 4
DEVICE_IDS = list(range(N_GPU))
os.environ["CUDA_VISIBLE_DEVICES"] = ",".join([str(x) for x in DEVICE_IDS])
# static parameters
RESOLUTION = 1
# these match how the model is trained
N_CLASSES = 2
IM_SIZE = 65
import random
import argparse
import json
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
from torch.utils.data import Dataset, DataLoader
import torch.distributed as dist
if torch.cuda.is_available():
device_str = os.environ["CUDA_VISIBLE_DEVICES"]
device = torch.device("cuda:" + device_str)
else:
raise Exception("No GPU detected for parallel scoring!")
# ability to perform multiprocessing
import multiprocessing
from os.path import join
from data import read_segy, get_slice
from texture_net import TextureNet
import itertools
import numpy as np
import tb_logger
from data import write_segy
# graphical progress bar
from tqdm import tqdm
class ModelWrapper(nn.Module):
"""
Wrap TextureNet for (Distributed)DataParallel to invoke classify method
"""
def __init__(self, texture_model):
super(ModelWrapper, self).__init__()
self.texture_model = texture_model
def forward(self, input_net):
return self.texture_model.classify(input_net)
class MyDataset(Dataset):
def __init__(self, data, window, coord_list):
# main array
self.data = data
self.coord_list = coord_list
self.window = window
self.len = len(coord_list)
def __getitem__(self, index):
# TODO: can we specify a pixel mathematically by index?
pixel = self.coord_list[index]
x, y, z = pixel
# TODO: current bottleneck - can we slice out voxels any faster
small_cube = self.data[
x - self.window : x + self.window + 1,
y - self.window : y + self.window + 1,
z - self.window : z + self.window + 1,
]
return small_cube[np.newaxis, :, :, :], pixel
def __len__(self):
return self.len
def main_worker(gpu, ngpus_per_node, args):
"""
Main worker function, given the gpu parameter and how many GPUs there are per node
it can figure out its rank
:param gpu: rank of the process if gpu >= ngpus_per_node, otherwise just gpu ID which worker will run on.
:param ngpus_per_node: total number of GPU available on this node.
:param args: various arguments for the code in the worker.
:return: nothing
"""
print("I got GPU", gpu)
args.rank = gpu
# loop around in round-robin fashion if we want to run multiple processes per GPU
args.gpu = gpu % ngpus_per_node
# initialize the distributed process and join the group
print(
"setting rank", args.rank, "world size", args.world_size, args.dist_backend, args.dist_url,
)
dist.init_process_group(
backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size, rank=args.rank,
)
# set default GPU device for this worker
torch.cuda.set_device(args.gpu)
# set up device for the rest of the code
local_device = torch.device("cuda:" + str(args.gpu))
# Load trained model (run train.py to create trained
network = TextureNet(n_classes=N_CLASSES)
model_state_dict = torch.load(join(args.data, "saved_model.pt"), map_location=local_device)
network.load_state_dict(model_state_dict)
network.eval()
network.cuda(args.gpu)
# set the scoring wrapper also to eval mode
model = ModelWrapper(network)
model.eval()
model.cuda(args.gpu)
# When using a single GPU per process and per
# DistributedDataParallel, we need to divide the batch size
# ourselves based on the total number of GPUs we have.
# Min batch size is 1
args.batch_size = max(int(args.batch_size / ngpus_per_node), 1)
# obsolete: number of data loading workers - this is only used when reading from disk, which we're not
# args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node)
# wrap the model for distributed use - for scoring this is not needed
# model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
# set to benchmark mode because we're running the same workload multiple times
cudnn.benchmark = True
# Read 3D cube
# NOTE: we cannot pass this data manually as serialization of data into each python process is costly,
# so each worker has to load the data on its own.
data, data_info = read_segy(join(args.data, "data.segy"))
# Get half window size
window = IM_SIZE // 2
# reduce data size for debugging
if args.debug:
data = data[0 : 3 * window]
# generate full list of coordinates
# memory footprint of this isn't large yet, so not need to wrap as a generator
nx, ny, nz = data.shape
x_list = range(window, nx - window)
y_list = range(window, ny - window)
z_list = range(window, nz - window)
print("-- generating coord list --")
# TODO: is there any way to use a generator with pyTorch data loader?
coord_list = list(itertools.product(x_list, y_list, z_list))
# we need to map the data manually to each rank - DistributedDataParallel doesn't do this at score time
print("take a subset of coord_list by chunk")
coord_list = list(np.array_split(np.array(coord_list), args.world_size)[args.rank])
coord_list = [tuple(x) for x in coord_list]
# we only score first batch in debug mode
if args.debug:
coord_list = coord_list[0 : args.batch_size]
# prepare the data
print("setup dataset")
# TODO: RuntimeError: cannot pin 'torch.cuda.FloatTensor' only dense CPU tensors can be pinned
data_torch = torch.cuda.FloatTensor(data).cuda(args.gpu, non_blocking=True)
dataset = MyDataset(data_torch, window, coord_list)
# not sampling like in training
# datasampler = DistributedSampler(dataset)
# just set some default epoch
# datasampler.set_epoch(1)
# we use 0 workers because we're reading from memory
print("setting up loader")
my_loader = DataLoader(
dataset=dataset,
batch_size=args.batch_size,
shuffle=False,
num_workers=0,
pin_memory=False,
sampler=None
# sampler=datasampler
)
print("running loop")
pixels_x = []
pixels_y = []
pixels_z = []
predictions = []
# Loop through center pixels in output cube
with torch.no_grad():
print("no grad")
for (chunk, pixel) in tqdm(my_loader):
data_input = chunk.cuda(args.gpu, non_blocking=True)
output = model(data_input)
# save and deal with it later on CPU
# we want to make sure order is preserved
pixels_x += pixel[0].tolist()
pixels_y += pixel[1].tolist()
pixels_z += pixel[2].tolist()
predictions += output.tolist()
# just score a single batch in debug mode
if args.debug:
break
# TODO: legacy Queue Manager code from multiprocessing which we left here for illustration purposes
# result_queue.append([deepcopy(coord_list), deepcopy(predictions)])
# result_queue.append([coord_list, predictions])
# transform pixels into x, y, z list format
with open("results_{}.json".format(args.rank), "w") as f:
json.dump(
{
"pixels_x": pixels_x,
"pixels_y": pixels_y,
"pixels_z": pixels_z,
"preds": [int(x[0][0][0][0]) for x in predictions],
},
f,
)
# TODO: we cannot use pickle to dump from multiprocess - processes lock up
# with open("result_predictions_{}.pkl".format(args.rank), "wb") as f:
# print ("dumping predictions pickle file")
# pickle.dump(predictions, f)
parser = argparse.ArgumentParser(description="Seismic Distributed Scoring")
parser.add_argument("-d", "--data", default="/home/maxkaz/data/dutchf3", type=str, help="default dataset folder name")
parser.add_argument(
"-s",
"--slice",
default="inline",
type=str,
choices=["inline", "crossline", "timeslice", "full"],
help="slice type which we want to score on",
)
parser.add_argument(
"-n", "--slice-num", default=339, type=int, help="slice number which we want to score",
)
parser.add_argument(
"-b", "--batch-size", default=2 ** 11, type=int, help="batch size which we use for scoring",
)
parser.add_argument(
"-p", "--n-proc-per-gpu", default=1, type=int, help="number of multiple processes to run per each GPU",
)
parser.add_argument(
"--dist-url", default="tcp://127.0.0.1:12345", type=str, help="url used to set up distributed training",
)
parser.add_argument("--dist-backend", default="nccl", type=str, help="distributed backend")
parser.add_argument("--seed", default=0, type=int, help="default random number seed")
parser.add_argument(
"--debug", action="store_true", help="debug flag - if on we will only process one batch",
)
def main():
# use distributed scoring+
if RESOLUTION != 1:
raise Exception("Currently we only support pixel-level scoring")
args = parser.parse_args()
args.gpu = None
args.rank = 0
# world size is the total number of processes we want to run across all nodes and GPUs
args.world_size = N_GPU * args.n_proc_per_gpu
if args.debug:
args.batch_size = 4
# fix away any kind of randomness - although for scoring it should not matter
random.seed(args.seed)
torch.manual_seed(args.seed)
cudnn.deterministic = True
print("RESOLUTION {}".format(RESOLUTION))
##########################################################################
print("-- scoring on GPU --")
ngpus_per_node = torch.cuda.device_count()
print("nGPUs per node", ngpus_per_node)
"""
First, read this: https://thelaziestprogrammer.com/python/a-multiprocessing-pool-pickle
OK, so there are a few ways in which we can spawn a running process with pyTorch:
1) Default mp.spawn should work just fine but won't let us access internals
2) So we copied out the code from mp.spawn below to control how processes get created
3) One could spawn their own processes but that would not be thread-safe with CUDA, line
"mp = multiprocessing.get_context('spawn')" guarantees we use the proper pyTorch context
Input data serialization is too costly, in general so is output data serialization as noted here:
https://docs.python.org/3/library/multiprocessing.html
Feeding data into each process is too costly, so each process loads its own data.
For deserialization we could try and fail using:
1) Multiprocessing queue manager
manager = Manager()
return_dict = manager.dict()
OR
result_queue = multiprocessing.Queue()
CALLING
with Manager() as manager:
results_list = manager.list()
mp.spawn(main_worker, nprocs=args.world_size, args=(ngpus_per_node, results_list/dict/queue, args))
results = deepcopy(results_list)
2) pickling results to disc.
Turns out that for the reasons mentioned in the first article both approaches are too costly.
The only reasonable way to deserialize data from a Python process is to write it to text, in which case
writing to JSON is a saner approach: https://www.datacamp.com/community/tutorials/pickle-python-tutorial
"""
# invoke processes manually suppressing error queue
mp = multiprocessing.get_context("spawn")
# error_queues = []
processes = []
for i in range(args.world_size):
# error_queue = mp.SimpleQueue()
process = mp.Process(target=main_worker, args=(i, ngpus_per_node, args), daemon=False)
process.start()
# error_queues.append(error_queue)
processes.append(process)
# block on wait
for process in processes:
process.join()
print("-- aggregating results --")
# Read 3D cube
data, data_info = read_segy(join(args.data, "data.segy"))
# Log to tensorboard - input slice
logger = tb_logger.TBLogger("log", "Test")
logger.log_images(
args.slice + "_" + str(args.slice_num), get_slice(data, data_info, args.slice, args.slice_num), cm="gray",
)
x_coords = []
y_coords = []
z_coords = []
predictions = []
for i in range(args.world_size):
with open("results_{}.json".format(i), "r") as f:
results_dict = json.load(f)
x_coords += results_dict["pixels_x"]
y_coords += results_dict["pixels_y"]
z_coords += results_dict["pixels_z"]
predictions += results_dict["preds"]
"""
So because of Python's GIL having multiple workers write to the same array is not efficient - basically
the only way we can have shared memory is with threading but thanks to GIL only one thread can execute at a time,
so we end up with the overhead of managing multiple threads when writes happen sequentially.
A much faster alternative is to just invoke underlying compiled code (C) through the use of array indexing.
So basically instead of the following:
NUM_CORES = multiprocessing.cpu_count()
print("Post-processing will run on {} CPU cores on your machine.".format(NUM_CORES))
def worker(classified_cube, coord):
x, y, z = coord
ind = new_coord_list.index(coord)
# print (coord, ind)
pred_class = predictions[ind]
classified_cube[x, y, z] = pred_class
# launch workers in parallel with memory sharing ("threading" backend)
_ = Parallel(n_jobs=4*NUM_CORES, backend="threading")(
delayed(worker)(classified_cube, coord) for coord in tqdm(pixels)
)
We do this:
"""
# placeholder for results
classified_cube = np.zeros(data.shape)
# store final results
classified_cube[x_coords, y_coords, z_coords] = predictions
print("-- writing segy --")
in_file = join(args.data, "data.segy".format(RESOLUTION))
out_file = join(args.data, "salt_{}.segy".format(RESOLUTION))
write_segy(out_file, in_file, classified_cube)
print("-- logging prediction --")
# log prediction to tensorboard
logger = tb_logger.TBLogger("log", "Test_scored")
logger.log_images(
args.slice + "_" + str(args.slice_num),
get_slice(classified_cube, data_info, args.slice, args.slice_num),
cm="binary",
)
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,157 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
import torch
from torch import nn
from utils import gpu_no_of_var
class TextureNet(nn.Module):
def __init__(self, n_classes=2, n_filters=50):
super(TextureNet, self).__init__()
# Network definition
# Parameters #in_channels, #out_channels, filter_size, stride (downsampling factor)
self.net = nn.Sequential(
nn.Conv3d(1, n_filters, 5, 4, padding=2),
nn.BatchNorm3d(n_filters),
# nn.Dropout3d() #Droput can be added like this ...
nn.ReLU(),
nn.Conv3d(n_filters, n_filters, 3, 2, padding=1, bias=False),
nn.BatchNorm3d(n_filters),
nn.ReLU(),
nn.Conv3d(n_filters, n_filters, 3, 2, padding=1, bias=False),
nn.BatchNorm3d(n_filters),
nn.ReLU(),
nn.Conv3d(n_filters, n_filters, 3, 2, padding=1, bias=False),
nn.BatchNorm3d(n_filters),
nn.ReLU(),
nn.Conv3d(n_filters, n_filters, 3, 3, padding=1, bias=False),
nn.BatchNorm3d(n_filters),
nn.ReLU(),
nn.Conv3d(
n_filters, n_classes, 1, 1
), # This is the equivalent of a fully connected layer since input has width/height/depth = 1
nn.ReLU(),
)
# The filter weights are by default initialized by random
def forward(self, x):
"""
Is called to compute network output
Args:
x: network input - torch tensor
Returns:
output from the neural network
"""
return self.net(x)
def classify(self, x):
"""
Classification wrapper
Args:
x: input tensor for classification
Returns:
classification result
"""
x = self.net(x)
_, class_no = torch.max(x, 1, keepdim=True)
return class_no
# Functions to get output from intermediate feature layers
def f1(self, x):
"""
Wrapper to obtain a particular network layer
Args:
x: input tensor for classification
Returns:
requested layer
"""
return self.getFeatures(x, 0)
def f2(self, x):
"""
Wrapper to obtain a particular network layer
Args:
x: input tensor for classification
Returns:
requested layer
"""
return self.getFeatures(x, 1)
def f3(self, x):
"""
Wrapper to obtain a particular network layer
Args:
x: input tensor for classification
Returns:
requested layer
"""
return self.getFeatures(x, 2)
def f4(self, x):
"""
Wrapper to obtain a particular network layer
Args:
x: input tensor for classification
Returns:
requested layer
"""
return self.getFeatures(x, 3)
def f5(self, x):
"""
Wrapper to obtain a particular network layer
Args:
x: input tensor for classification
Returns:
requested layer
"""
return self.getFeatures(x, 4)
def getFeatures(self, x, layer_no):
"""
Main call method to call the wrapped layers
Args:
x: input tensor for classification
layer_no: number of hidden layer we want to extract
Returns:
requested layer
"""
layer_indexes = [0, 3, 6, 9, 12]
# Make new network that has the layers up to the requested output
tmp_net = nn.Sequential()
layers = list(self.net.children())[0 : layer_indexes[layer_no] + 1]
for i in range(len(layers)):
tmp_net.add_module(str(i), layers[i])
if type(gpu_no_of_var(self)) == int:
tmp_net.cuda(gpu_no_of_var(self))
return tmp_net(x)

Просмотреть файл

@ -0,0 +1,136 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
from __future__ import print_function
from os.path import join
import torch
from torch import nn
from data import read_segy, read_labels, get_slice
from batch import get_random_batch
from torch.autograd import Variable
from texture_net import TextureNet
import tb_logger
import utils
# Parameters
ROOT_PATH = "/home/maxkaz/data/dutchf3"
INPUT_VOXEL = "data.segy"
TRAIN_MASK = "inline_339.png"
VAL_MASK = "inline_405.png"
IM_SIZE = 65
# If you have a GPU with little memory, try reducing this to 16 (may degrade results)
BATCH_SIZE = 32
# Switch to toggle the use of GPU or not
USE_GPU = True
# Log progress on tensor board
LOG_TENSORBOARD = True
# the rest of the code
if LOG_TENSORBOARD:
logger = tb_logger.TBLogger("log", "Train")
# This is the network definition proposed in the paper
network = TextureNet(n_classes=2)
# Loss function - Softmax function is included
cross_entropy = nn.CrossEntropyLoss()
# Optimizer to control step size in gradient descent
optimizer = torch.optim.Adam(network.parameters())
# Transfer model to gpu
if USE_GPU and torch.cuda.is_available():
network = network.cuda()
# Load the data cube and labels
data, data_info = read_segy(join(ROOT_PATH, INPUT_VOXEL))
train_class_imgs, train_coordinates = read_labels(join(ROOT_PATH, TRAIN_MASK), data_info)
val_class_imgs, _ = read_labels(join(ROOT_PATH, VAL_MASK), data_info)
# Plot training/validation data with labels
if LOG_TENSORBOARD:
for class_img in train_class_imgs + val_class_imgs:
logger.log_images(
class_img[1] + "_" + str(class_img[2]), get_slice(data, data_info, class_img[1], class_img[2]), cm="gray",
)
logger.log_images(
class_img[1] + "_" + str(class_img[2]) + "_true_class", class_img[0],
)
# Training loop
for i in range(5000):
# Get random training batch with augmentation
# This is the bottle-neck for training and could be done more efficient on the GPU...
[batch, labels] = get_random_batch(
data,
train_coordinates,
IM_SIZE,
BATCH_SIZE,
random_flip=True,
random_stretch=0.2,
random_rot_xy=180,
random_rot_z=15,
)
# Format data to torch-variable
batch = Variable(torch.Tensor(batch).float())
labels = Variable(torch.Tensor(labels).long())
# Transfer data to gpu
if USE_GPU and torch.cuda.is_available():
batch = batch.cuda()
labels = labels.cuda()
# Set network to training phase
network.train()
# Run the samples through the network
output = network(batch)
# Compute loss
loss = cross_entropy(torch.squeeze(output), labels)
# Do back-propagation to get gradients of weights w.r.t. loss
loss.backward()
# Ask the optimizer to adjust the parameters in the direction of lower loss
optimizer.step()
# Every 10th iteration - print training loss
if i % 10 == 0:
network.eval()
# Log to training loss/acc
print("Iteration:", i, "Training loss:", utils.var_to_np(loss))
if LOG_TENSORBOARD:
logger.log_scalar("training_loss", utils.var_to_np(loss), i)
for k, v in utils.compute_accuracy(torch.argmax(output, 1), labels).items():
if LOG_TENSORBOARD:
logger.log_scalar("training_" + k, v, i)
print(" -", k, v, "%")
# every 100th iteration
if i % 100 == 0 and LOG_TENSORBOARD:
network.eval()
# Output predicted train/validation class/probability images
for class_img in train_class_imgs + val_class_imgs:
slice = class_img[1]
slice_no = class_img[2]
class_img = utils.interpret(
network.classify, data, data_info, slice, slice_no, IM_SIZE, 16, return_full_size=True, use_gpu=USE_GPU,
)
logger.log_images(slice + "_" + str(slice_no) + "_pred_class", class_img[0], step=i)
class_img = utils.interpret(
network, data, data_info, slice, slice_no, IM_SIZE, 16, return_full_size=True, use_gpu=USE_GPU,
)
logger.log_images(slice + "_" + str(slice_no) + "_pred_prob", class_img[0], i)
# Store trained network
torch.save(network.state_dict(), join(ROOT_PATH, "saved_model.pt"))

Просмотреть файл

@ -0,0 +1,337 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
# code modified from https://github.com/waldeland/CNN-for-ASI
from __future__ import print_function
import torch
import numpy as np
from torch.autograd import Variable
from scipy.interpolate import interpn
import sys
import time
# global parameters
ST = 0
LAST_UPDATE = 0
def interpret(
network, data, data_info, slice, slice_no, im_size, subsampl, return_full_size=True, use_gpu=True,
):
"""
Down-samples a slice from the classified image and upsamples to full resolution if needed. Basically
given a full 3D-classified voxel at a particular resolution (say we classify every n-th pixel as given by the
subsampl variable below) we take a particular slice from the voxel and optoinally blow it up to full resolution
as if we classified every single pixel.
Args:
network: pytorch model definition
data: input voxel
data_info: input voxel information
slice: slice type which we want to interpret
slice_no: slice number
im_size: size of the voxel
subsampl: at what resolution do we want to subsample, e.g. we move across every subsampl pixels
return_full_size: boolean flag, enable if you want to return full size without downsampling
use_gpu: boolean flag to use the GPU
Returns:
upsampled slice
"""
# Wrap np.linspace in compact function call
ls = lambda N: np.linspace(0, N - 1, N, dtype="int")
# Size of cube
N0, N1, N2 = data.shape
# Coords for full cube
x0_range = ls(N0)
x1_range = ls(N1)
x2_range = ls(N2)
# Coords for subsampled cube
pred_points = (x0_range[::subsampl], x1_range[::subsampl], x2_range[::subsampl])
# Select slice
if slice == "full":
class_cube = data[::subsampl, ::subsampl, ::subsampl] * 0
elif slice == "inline":
slice_no = slice_no - data_info["inline_start"]
class_cube = data[::subsampl, 0:1, ::subsampl] * 0
x1_range = np.array([slice_no])
pred_points = (pred_points[0], pred_points[2])
elif slice == "crossline":
slice_no = slice_no - data_info["crossline_start"]
class_cube = data[::subsampl, ::subsampl, 0:1,] * 0
x2_range = np.array([slice_no])
pred_points = (pred_points[0], pred_points[1])
elif slice == "timeslice":
slice_no = slice_no - data_info["timeslice_start"]
class_cube = data[0:1, ::subsampl, ::subsampl] * 0
x0_range = np.array([slice_no])
pred_points = (pred_points[1], pred_points[2])
# Grid for small class slice/cube
n0, n1, n2 = class_cube.shape
x0_grid, x1_grid, x2_grid = np.meshgrid(ls(n0,), ls(n1), ls(n2), indexing="ij")
# Grid for full slice/cube
X0_grid, X1_grid, X2_grid = np.meshgrid(x0_range, x1_range, x2_range, indexing="ij")
# Indexes for large cube at small cube pixels
X0_grid_sub = X0_grid[::subsampl, ::subsampl, ::subsampl]
X1_grid_sub = X1_grid[::subsampl, ::subsampl, ::subsampl]
X2_grid_sub = X2_grid[::subsampl, ::subsampl, ::subsampl]
# Get half window size
w = im_size // 2
# Loop through center pixels in output cube
for i in range(X0_grid_sub.size):
# Get coordinates in small and large cube
x0 = x0_grid.ravel()[i]
x1 = x1_grid.ravel()[i]
x2 = x2_grid.ravel()[i]
X0 = X0_grid_sub.ravel()[i]
X1 = X1_grid_sub.ravel()[i]
X2 = X2_grid_sub.ravel()[i]
# Only compute when a full 65x65x65 cube can be extracted around center pixel
if X0 > w and X1 > w and X2 > w and X0 < N0 - w + 1 and X1 < N1 - w + 1 and X2 < N2 - w + 1:
# Get mini-cube around center pixel
mini_cube = data[X0 - w : X0 + w + 1, X1 - w : X1 + w + 1, X2 - w : X2 + w + 1]
# Get predicted "probabilities"
mini_cube = Variable(torch.FloatTensor(mini_cube[np.newaxis, np.newaxis, :, :, :]))
if use_gpu:
mini_cube = mini_cube.cuda()
out = network(mini_cube)
out = out.data.cpu().numpy()
out = out[:, :, out.shape[2] // 2, out.shape[3] // 2, out.shape[4] // 2]
out = np.squeeze(out)
# Make one output pr output channel
if not isinstance(class_cube, list):
class_cube = np.split(np.repeat(class_cube[:, :, :, np.newaxis], out.size, 3), out.size, axis=3,)
# Insert into output
if out.size == 1:
class_cube[0][x0, x1, x2] = out
else:
for i in range(out.size):
class_cube[i][x0, x1, x2] = out[i]
# Keep user informed about progress
if slice == "full":
printProgressBar(i, x0_grid.size)
# Resize to input size
if return_full_size:
if slice == "full":
print("Interpolating down sampled results to fit input cube")
N = X0_grid.size
# Output grid
if slice == "full":
grid_output_cube = np.concatenate(
[X0_grid.reshape([N, 1]), X1_grid.reshape([N, 1]), X2_grid.reshape([N, 1]),], 1,
)
elif slice == "inline":
grid_output_cube = np.concatenate([X0_grid.reshape([N, 1]), X2_grid.reshape([N, 1])], 1)
elif slice == "crossline":
grid_output_cube = np.concatenate([X0_grid.reshape([N, 1]), X1_grid.reshape([N, 1])], 1)
elif slice == "timeslice":
grid_output_cube = np.concatenate([X1_grid.reshape([N, 1]), X2_grid.reshape([N, 1])], 1)
# Interpolation
for i in range(len(class_cube)):
is_int = (
np.sum(
np.unique(class_cube[i]).astype("float") - np.unique(class_cube[i]).astype("int32").astype("float")
)
== 0
)
class_cube[i] = interpn(
pred_points,
class_cube[i].astype("float").squeeze(),
grid_output_cube,
method="linear",
fill_value=0,
bounds_error=False,
)
class_cube[i] = class_cube[i].reshape([x0_range.size, x1_range.size, x2_range.size])
# If ouput is class labels we convert the interpolated array to ints
if is_int:
class_cube[i] = class_cube[i].astype("int32")
if slice == "full":
print("Finished interpolating")
# Squeeze outputs
for i in range(len(class_cube)):
class_cube[i] = class_cube[i].squeeze()
return class_cube
# TODO: this should probably be replaced with TQDM
def print_progress_bar(iteration, total, prefix="", suffix="", decimals=1, length=100, fill="="):
"""
Privides a progress bar implementation.
Adapted from https://stackoverflow.com/questions/3173320/text-progress-bar-in-the-console/14879561#14879561
Args:
iteration: iteration number
total: total number of iterations
prefix: comment prefix in display
suffix: comment suffix in display
decimals: how many decimals to display
length: character length of progress bar
fill: character to display as progress bar
"""
global ST, LAST_UPDATE
# Expect itteration to go from 0 to N-1
iteration = iteration + 1
# Only update every 5 second
if time.time() - LAST_UPDATE < 5:
if iteration == total:
time.sleep(1)
else:
return
if iteration <= 1:
st = time.time()
exp_h = ""
exp_m = ""
exp_s = ""
elif iteration == total:
exp_time = time.time() - ST
exp_h = int(exp_time / 3600)
exp_m = int(exp_time / 60 - exp_h * 60.0)
exp_s = int(exp_time - exp_m * 60.0 - exp_h * 3600.0)
else:
exp_time = (time.time() - ST) / (iteration - 1) * total - (time.time() - ST)
exp_h = int(exp_time / 3600)
exp_m = int(exp_time / 60 - exp_h * 60.0)
exp_s = int(exp_time - exp_m * 60.0 - exp_h * 3600.0)
percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
filled_length = int(length * iteration // total)
bar = fill * filled_length + "-" * (length - filled_length)
if iteration != total:
print("\r%s |%s| %s%% %s - %sh %smin %ss left" % (prefix, bar, percent, suffix, exp_h, exp_m, exp_s))
else:
print("\r%s |%s| %s%% %s - %sh %smin %ss " % (prefix, bar, percent, suffix, exp_h, exp_m, exp_s))
sys.stdout.write("\033[F")
# Print New Line on Complete
if iteration == total:
print("")
# last_update = time.time()
# TODO: rewrite this whole function to get rid of excepts
# TODO: also not sure what this function is for - it's almost as if it's not needed - try to remove it.
def gpu_no_of_var(var):
"""
Function that returns the GPU number or whether the tensor is on GPU or not
Args:
var: torch tensor
Returns:
The CUDA device that the torch tensor is on, or whether the tensor is on GPU
"""
try:
is_cuda = next(var.parameters()).is_cuda
except:
is_cuda = var.is_cuda
if is_cuda:
try:
return next(var.parameters()).get_device()
except:
return var.get_device()
else:
return False
# TODO: remove all the try except statements
def var_to_np(var):
"""
Take a pyTorch tensor and convert it to numpy array of the same shape, as the name suggests.
Args:
var: input variable
Returns:
numpy array of the tensor
"""
if type(var) in [np.array, np.ndarray]:
return var
# If input is list we do this for all elements
if type(var) == type([]):
out = []
for v in var:
out.append(var_to_np(v))
return out
try:
var = var.cpu()
except:
None
try:
var = var.data
except:
None
try:
var = var.numpy()
except:
None
if type(var) == tuple:
var = var[0]
return var
def compute_accuracy(predicted_class, labels):
"""
Accuracy performance metric which needs to be computed
Args:
predicted_class: pyTorch tensor with predictions
labels: pyTorch tensor with ground truth labels
Returns:
Accuracy calculation as a dictionary per class and average class accuracy across classes
"""
labels = var_to_np(labels)
predicted_class = var_to_np(predicted_class)
accuracies = {}
for cls in np.unique(labels):
if cls >= 0:
accuracies["accuracy_class_" + str(cls)] = int(np.mean(predicted_class[labels == cls] == cls) * 100)
accuracies["average_class_accuracy"] = np.mean([acc for acc in accuracies.values()])
return accuracies

Просмотреть файл

@ -0,0 +1,60 @@
# DeepSeismic
## Imaging
This tutorial shows how to run [devito](https://www.devitoproject.org/) tutorial [notebooks](https://github.com/opesci/devito/tree/master/examples/seismic/tutorials) in Azure Machine Learning ([Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/)) using [Azure Machine Learning Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup).
For best experience use a Linux (Ubuntu) Azure [DSVM](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro) and Jupyter Notebook with AzureML Python SDK and [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to run the notebooks (see __Setting up Environment__ section below).
Devito is a domain-specific Language (DSL) and code generation framework for the design of highly optimized finite difference kernels via symbolic computation for use in inversion methods. Here we show how ```devito``` can be openly used in the cloud by leveraging AzureML experimentation framework as a transparent and scalable platform for generic computation workloads. We focus on Full waveform inversion (__FWI__) problems where non-linear data-fitting procedures are applied for computing estimates of subsurface properties from seismic data.
### Setting up Environment
The [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html) that encapsulates all the dependencies needed to run the notebooks described above can be created using the fwi_dev_conda_environment.yml file. See [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/NBSETUP.md) generic instructions on how to install and run AzureML Python SDK in Jupyter Notebooks.
To create the conda environment, run:
```
conda env create -f fwi_dev_conda_environment.yml
```
then, one can see the created environment within the list of available environments and export it as a .yml file:
```
conda env list
conda env export --name fwi_dev_conda_environment -f ./contrib/fwi/azureml_devito/fwi_dev_conda_environment_exported.yml
```
The created conda environment needs to be activated, followed by the installation of its corresponding IPython kernel:
```
conda activate fwi_dev_conda_environment
python -m ipykernel install --user --name fwi_dev_conda_environment --display-name "fwi_dev_conda_environment Python"
```
Finally, start Jupyter notebook from within the activated environment:
```
jupyter notebook
```
One can then choose the __fwi_dev_conda_environment Python__ kernel defined above either when a notebook is opened for the first time, or by using the "Kernel/Change kernel" notebook menu.
[Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) is also used to create an ACR in notebook 000_Setup_GeophysicsTutorial_FWI_Azure_devito, and then push and pull docker images. One can also create the ACR via Azure [portal](https://azure.microsoft.com/).
### Run devito in Azure
The devito fwi examples are run in AzuremL using 4 notebooks:
- ```000_Setup_GeophysicsTutorial_FWI_Azure_devito.ipynb```: sets up Azure resources (like resource groups, AzureML [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace), Azure (docker) [container registry](https://azure.microsoft.com/en-us/services/container-registry/)).
- ```010_CreateExperimentationDockerImage_GeophysicsTutorial_FWI_Azure_devito.ipynb```: Creates a custom docker file and the associated image that contains ```devito``` [github repository](https://github.com/opesci/devito.git) (including devito fwi tutorial [notebooks](https://github.com/opesci/devito/tree/master/examples/seismic/tutorials)) and runs the official devito install [tests](https://github.com/opesci/devito/tree/master/tests).
- ```020_UseAzureMLEstimatorForExperimentation_GeophysicsTutorial_FWI_Azure_devito.ipynb```: shows how the devito fwi tutorial [notebooks](https://github.com/opesci/devito/tree/master/examples/seismic/tutorials) can be run in AzureML using Azure Machine Learning [generic](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py) [estimators](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) with custom docker images. FWI computation takes place on a managed AzureML [remote compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets).
```Devito``` fwi computation artifacts (images and notebooks with data processing output results) are tracked under the AzureML workspace, and can be later downloaded and visualized.
Two ways of running devito code are shown:
(1) using __custom code__ (slightly modified graphing functions that save images to files). The AzureML experimentation job is defined by the devito code packaged as a py file. The experimentation job (defined by [azureml.core.experiment.Experiment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment.experiment?view=azure-ml-py) class can be used to track metrics or other artifacts (images) that are available in Azure portal.
(2) using [__papermill__](https://github.com/nteract/papermill) invoked via its Python API to run unedited devito demo notebooks (including the [dask](https://dask.org/) local cluster [example](https://github.com/opesci/devito/blob/master/examples/seismic/tutorials/04_dask.ipynb) on the remote compute target and the results as saved notebooks that are available in Azure portal.
- ```030_ScaleJobsUsingAzuremL_GeophysicsTutorial_FWI_Azure_devito.ipynb```: shows how the devito fwi tutorial notebooks can be run in parallel on the elastically allocated AzureML [remote compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) created before. By submitting multiple jobs via azureml.core.Experiment submit(azureml.train.estimator.Estimator) one can use the [portal](https://portal.azure.com) to visualize the elastic allocation of AzureML [remote compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) nodes.

Просмотреть файл

@ -0,0 +1,17 @@
name: fwi_dev_conda_environment
channels:
- anaconda
dependencies:
- python=3.7
- numpy
- notebook
- ipykernel #nb_conda
- scikit-learn
- pip
- pip:
- python-dotenv
- papermill[azure]
- azureml-sdk[notebooks,automl,explain]==1.0.76
- docker

Просмотреть файл

@ -0,0 +1,211 @@
name: fwi_dev_conda_environment
channels:
- anaconda
- defaults
dependencies:
- attrs=19.3.0=py_0
- backcall=0.1.0=py37_0
- blas=1.0=mkl
- bleach=3.1.0=py37_0
- ca-certificates=2019.11.27=0
- certifi=2019.11.28=py37_0
- decorator=4.4.1=py_0
- defusedxml=0.6.0=py_0
- entrypoints=0.3=py37_0
- gmp=6.1.2=hb3b607b_0
- importlib_metadata=1.1.0=py37_0
- intel-openmp=2019.5=281
- ipykernel=5.1.3=py37h39e3cac_0
- ipython=7.10.1=py37h39e3cac_0
- ipython_genutils=0.2.0=py37_0
- jedi=0.15.1=py37_0
- jinja2=2.10.3=py_0
- joblib=0.14.0=py_0
- jsonschema=3.2.0=py37_0
- jupyter_client=5.3.4=py37_0
- jupyter_core=4.6.1=py37_0
- libedit=3.1.20181209=hc058e9b_0
- libffi=3.2.1=h4deb6c0_3
- libgcc-ng=9.1.0=hdf63c60_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- markupsafe=1.1.1=py37h7b6447c_0
- mistune=0.8.4=py37h7b6447c_0
- mkl=2019.5=281
- mkl-service=2.3.0=py37he904b0f_0
- mkl_fft=1.0.15=py37ha843d7b_0
- mkl_random=1.1.0=py37hd6b4f25_0
- more-itertools=7.2.0=py37_0
- nbconvert=5.6.1=py37_0
- nbformat=4.4.0=py37_0
- ncurses=6.1=he6710b0_1
- notebook=6.0.2=py37_0
- openssl=1.1.1=h7b6447c_0
- pandoc=2.2.3.2=0
- pandocfilters=1.4.2=py37_1
- parso=0.5.1=py_0
- pexpect=4.7.0=py37_0
- pickleshare=0.7.5=py37_0
- pip=19.3.1=py37_0
- prometheus_client=0.7.1=py_0
- prompt_toolkit=3.0.2=py_0
- ptyprocess=0.6.0=py37_0
- pygments=2.5.2=py_0
- pyrsistent=0.15.6=py37h7b6447c_0
- python=3.7.5=h0371630_0
- python-dateutil=2.8.1=py_0
- pyzmq=18.1.0=py37he6710b0_0
- readline=7.0=h7b6447c_5
- send2trash=1.5.0=py37_0
- setuptools=42.0.2=py37_0
- six=1.13.0=py37_0
- sqlite=3.30.1=h7b6447c_0
- terminado=0.8.3=py37_0
- testpath=0.4.4=py_0
- tk=8.6.8=hbc83047_0
- tornado=6.0.3=py37h7b6447c_0
- traitlets=4.3.3=py37_0
- wcwidth=0.1.7=py37_0
- webencodings=0.5.1=py37_1
- xz=5.2.4=h14c3975_4
- zeromq=4.3.1=he6710b0_3
- zipp=0.6.0=py_0
- zlib=1.2.11=h7b6447c_3
- pip:
- adal==1.2.2
- ansiwrap==0.8.4
- applicationinsights==0.11.9
- azure-common==1.1.23
- azure-core==1.1.1
- azure-datalake-store==0.0.48
- azure-graphrbac==0.61.1
- azure-mgmt-authorization==0.60.0
- azure-mgmt-containerregistry==2.8.0
- azure-mgmt-keyvault==2.0.0
- azure-mgmt-resource==7.0.0
- azure-mgmt-storage==7.0.0
- azure-storage-blob==12.1.0
- azureml-automl-core==1.0.76
- azureml-automl-runtime==1.0.76.1
- azureml-contrib-notebook==1.0.76
- azureml-core==1.0.76
- azureml-dataprep==1.1.33
- azureml-dataprep-native==13.1.0
- azureml-defaults==1.0.76
- azureml-explain-model==1.0.76
- azureml-interpret==1.0.76
- azureml-model-management-sdk==1.0.1b6.post1
- azureml-pipeline==1.0.76
- azureml-pipeline-core==1.0.76
- azureml-pipeline-steps==1.0.76
- azureml-sdk==1.0.76
- azureml-telemetry==1.0.76
- azureml-train==1.0.76
- azureml-train-automl==1.0.76
- azureml-train-automl-client==1.0.76
- azureml-train-automl-runtime==1.0.76.1
- azureml-train-core==1.0.76
- azureml-train-restclients-hyperdrive==1.0.76
- azureml-widgets==1.0.76
- backports-tempfile==1.0
- backports-weakref==1.0.post1
- boto==2.49.0
- boto3==1.10.37
- botocore==1.13.37
- cffi==1.13.2
- chardet==3.0.4
- click==7.0
- cloudpickle==1.2.2
- configparser==3.7.4
- contextlib2==0.6.0.post1
- cryptography==2.8
- cycler==0.10.0
- cython==0.29.14
- dill==0.3.1.1
- distro==1.4.0
- docker==4.1.0
- docutils==0.15.2
- dotnetcore2==2.1.11
- fire==0.2.1
- flake8==3.7.9
- flask==1.0.3
- fusepy==3.0.1
- future==0.18.2
- gensim==3.8.1
- gunicorn==19.9.0
- idna==2.8
- imageio==2.6.1
- interpret-community==0.2.3
- interpret-core==0.1.19
- ipywidgets==7.5.1
- isodate==0.6.0
- itsdangerous==1.1.0
- jeepney==0.4.1
- jmespath==0.9.4
- json-logging-py==0.2
- jsonform==0.0.2
- jsonpickle==1.2
- jsonsir==0.0.2
- keras2onnx==1.6.0
- kiwisolver==1.1.0
- liac-arff==2.4.0
- lightgbm==2.3.0
- matplotlib==3.1.2
- mccabe==0.6.1
- msrest==0.6.10
- msrestazure==0.6.2
- ndg-httpsclient==0.5.1
- networkx==2.4
- nimbusml==1.6.1
- numpy==1.16.2
- oauthlib==3.1.0
- onnx==1.6.0
- onnxconverter-common==1.6.0
- onnxmltools==1.4.1
- packaging==19.2
- pandas==0.23.4
- papermill==1.2.1
- pathspec==0.6.0
- patsy==0.5.1
- pillow==6.2.1
- pmdarima==1.1.1
- protobuf==3.11.1
- psutil==5.6.7
- pyasn1==0.4.8
- pycodestyle==2.5.0
- pycparser==2.19
- pyflakes==2.1.1
- pyjwt==1.7.1
- pyopenssl==19.1.0
- pyparsing==2.4.5
- python-dotenv==0.10.3
- python-easyconfig==0.1.7
- pytz==2019.3
- pywavelets==1.1.1
- pyyaml==5.2
- requests==2.22.0
- requests-oauthlib==1.3.0
- resource==0.2.1
- ruamel-yaml==0.15.89
- s3transfer==0.2.1
- scikit-image==0.16.2
- scikit-learn==0.20.3
- scipy==1.1.0
- secretstorage==3.1.1
- shap==0.29.3
- skl2onnx==1.4.9
- sklearn-pandas==1.7.0
- smart-open==1.9.0
- statsmodels==0.10.2
- tenacity==6.0.0
- termcolor==1.1.0
- textwrap3==0.9.2
- tqdm==4.40.2
- typing-extensions==3.7.4.1
- urllib3==1.25.7
- websocket-client==0.56.0
- werkzeug==0.16.0
- wheel==0.30.0
- widgetsnbextension==3.5.1
prefix: /data/anaconda/envs/fwi_dev_conda_environment

Просмотреть файл

@ -0,0 +1,923 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# FWI in Azure project\n",
"\n",
"## Set-up AzureML resources\n",
"\n",
"This project ports devito (https://github.com/opesci/devito) into Azure and runs tutorial notebooks at:\n",
"https://nbviewer.jupyter.org/github/opesci/devito/blob/master/examples/seismic/tutorials/\n",
"\n",
"\n",
"\n",
"In this notebook we setup AzureML resources. This notebook should be run once and will enable all subsequent notebooks.\n",
"\n",
"<a id='user_input_requiring_steps'></a>\n",
"User input requiring steps:\n",
" - [Fill in and save sensitive information](#dot_env_description)\n",
" - [Azure login](#Azure_login) (may be required first time the notebook is run) \n",
" - [Set __create_ACR_FLAG__ to true to trigger ACR creation and to save of ACR login info](#set_create_ACR_flag)\n",
" - [Azure CLI login ](#Azure_cli_login) (may be required once to create an [ACR](https://azure.microsoft.com/en-us/services/container-registry/)) \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Allow multiple displays per cell\n",
"from IPython.core.interactiveshell import InteractiveShell\n",
"InteractiveShell.ast_node_interactivity = \"all\" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Azure Machine Learning and Pipeline SDK-specific imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import sys, os\n",
"import shutil\n",
"import urllib\n",
"import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"import platform, dotenv\n",
"import pathlib"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Azure ML SDK Version: 1.0.76\n"
]
},
{
"data": {
"text/plain": [
"'Linux-4.15.0-1064-azure-x86_64-with-debian-stretch-sid'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"'/datadrive01/prj/DeepSeismic/contrib/fwi/azureml_devito/notebooks'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)\n",
"platform.platform()\n",
"os.getcwd()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1. Create utilities file\n",
"\n",
"##### 1.1 Define utilities file (project_utils.py) path\n",
"Utilities file created here has code for Azure resources access authorization, project configuration settings like directories and file names in __project_consts__ class."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"utils_file_name = 'project_utils'\n",
"auxiliary_files_dir = os.path.join(*(['.', 'src']))\n",
"\n",
"\n",
"utils_path_name = os.path.join(os.getcwd(), auxiliary_files_dir)\n",
"utils_full_name = os.path.join(utils_path_name, os.path.join(*([utils_file_name+'.py'])))\n",
"os.makedirs(utils_path_name, exist_ok=True)\n",
" \n",
"def ls_l(a_dir):\n",
" return ([f for f in os.listdir(a_dir) if os.path.isfile(os.path.join(a_dir, f))]) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 1.2. Edit/create project_utils.py file"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting /datadrive01/prj/DeepSeismic/contrib/fwi/azureml_devito/notebooks/./src/project_utils.py\n"
]
}
],
"source": [
"%%writefile $utils_full_name\n",
"\n",
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
"from azureml.core.authentication import AzureCliAuthentication\n",
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"from azureml.core.authentication import AuthenticationException\n",
"import dotenv, logging, pathlib, os\n",
"\n",
"\n",
"# credit Mathew Salvaris\n",
"def get_auth(env_path):\n",
" \"\"\"Tries to get authorization info by first trying to get Service Principal info, then CLI, then interactive. \n",
" \"\"\"\n",
" logger = logging.getLogger(__name__)\n",
" crt_sp_pwd = os.environ.get(\"SP_PASSWORD\", None)\n",
" if crt_sp_pwd:\n",
" logger.debug(\"Trying to create Workspace with Service Principal\")\n",
" aml_sp_password = crt_sp_pwd\n",
" aml_sp_tennant_id = dotenv.get_key(env_path, 'SP_TENANT_ID')\n",
" aml_sp_username = dotenv.get_key(env_path, 'SP_APPLICATION_ID')\n",
" auth = ServicePrincipalAuthentication(\n",
" tenant_id=aml_sp_tennant_id,\n",
" username=aml_sp_username,\n",
" password=aml_sp_password,\n",
" )\n",
" else:\n",
" logger.debug(\"Trying to create Workspace with CLI Authentication\")\n",
" try:\n",
" auth = AzureCliAuthentication()\n",
" auth.get_authentication_header()\n",
" except AuthenticationException:\n",
" logger.debug(\"Trying to create Workspace with Interactive login\")\n",
" auth = InteractiveLoginAuthentication()\n",
"\n",
" return auth \n",
"\n",
"\n",
"def set_dotenv_info(dotenv_file_path, env_dict):\n",
" \"\"\"Use dict loop to set multiple keys in dotenv file.\n",
" Minimal file error management.\n",
" \"\"\"\n",
" logger = logging.getLogger(__name__)\n",
" if bool(env_dict):\n",
" dotenv_file = pathlib.Path(dotenv_file_path)\n",
" if not dotenv_file.is_file():\n",
" logger.debug('dotenv file not found, will create \"{}\" using the sensitive info you provided.'.format(dotenv_file_path))\n",
" dotenv_file.touch()\n",
" else:\n",
" logger.debug('dotenv file \"{}\" found, will (over)write it with current sensitive info you provided.'.format(dotenv_file_path))\n",
" \n",
" for crt_key, crt_val in env_dict.items():\n",
" dotenv.set_key(dotenv_file_path, crt_key, crt_val)\n",
"\n",
" else:\n",
" logger.debug(\\\n",
" 'Trying to save empty env_dict variable into {}, please set your sensitive info in a dictionary.'\\\n",
" .format(dotenv_file_path)) \n",
" \n",
"\n",
"class project_consts(object):\n",
" \"\"\"Keep project's file names and directory structure in one place.\n",
" Minimal setattr error management.\n",
" \"\"\"\n",
" \n",
" AML_WORKSPACE_CONFIG_DIR = ['.', '..', 'not_shared']\n",
" AML_EXPERIMENT_DIR = ['.', '..', 'temp']\n",
" AML_WORKSPACE_CONFIG_FILE_NAME = 'aml_ws_config.json'\n",
" DOTENV_FILE_PATH = AML_WORKSPACE_CONFIG_DIR + ['general.env'] \n",
" DOCKER_DOTENV_FILE_PATH = AML_WORKSPACE_CONFIG_DIR + ['dockerhub.env'] \n",
"\n",
" def __setattr__(self, *_):\n",
" raise TypeError\n",
"\n",
" \n",
"if __name__==\"__main__\":\n",
" \"\"\"Basic function/class tests.\n",
" \"\"\"\n",
" import sys, os\n",
" prj_consts = project_consts()\n",
" logger = logging.getLogger(__name__)\n",
" logging.basicConfig(level=logging.DEBUG) # Logging Levels: DEBUG\t10, NOTSET\t0\n",
" logger.debug('AML ws file = {}'.format(os.path.join(*([os.path.join(*(prj_consts.AML_WORKSPACE_CONFIG_DIR)),\n",
" prj_consts.AML_WORKSPACE_CONFIG_FILE_NAME]))))\n",
"\n",
" crt_dotenv_file_path = os.path.join(*(prj_consts.DOTENV_FILE_PATH))\n",
" set_dotenv_info(crt_dotenv_file_path, {})\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 1.3. Import utilities functions defined above"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[None]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def add_path_to_sys_path(path_to_append):\n",
" if not (any(path_to_append in paths for paths in sys.path)):\n",
" sys.path.append(path_to_append)\n",
" \n",
"paths_to_append = [os.path.join(os.getcwd(), auxiliary_files_dir)]\n",
"[add_path_to_sys_path(crt_path) for crt_path in paths_to_append]\n",
"\n",
"\n",
"import project_utils\n",
"prj_consts = project_utils.project_consts()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2. Set-up the AML SDK infrastructure\n",
"\n",
"* Create Azure resource group (rsg), workspaces, \n",
"* save sensitive info using [python-dotenv](https://github.com/theskumar/python-dotenv) \n",
" \n",
"Notebook repeateability notes:\n",
"* The notebook tries to find and use an existing Azure resource group (rsg) defined by __crt_resource_group__. It creates a new one if needed. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='set_create_ACR_flag'></a>\n",
"\n",
"##### Create [ACR]() first time this notebook is run. \n",
"Either docker hub or ACR can be used to store the experimentation image. To create the ACR, set: \n",
"```\n",
"create_ACR_FLAG=True \n",
"```\n",
"It will create an ACR by running severral steps described below in section 2.7. __Create an [ACR]__ \n",
" \n",
" \n",
"[Back](#user_input_requiring_steps) to summary of user input requiring steps."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"create_ACR_FLAG = False #True False"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"sensitive_info = {}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='dot_env_description'></a>\n",
"##### 2.1. Input here sensitive and configuration information\n",
"[dotenv](https://github.com/theskumar/python-dotenv) is used to hide sensitive info, like Azure subscription name/ID. The serialized info needs to be manually input once. \n",
" \n",
"* REQUIRED ACTION for the 2 cells below: uncomment them, add the required info in first cell below, run both cells one. \n",
" The sensitive information will be packed in __sensitive_info__ dictionary variable, which that will then be saved in a following cell in an .env file (__dotenv_file_path__) that should likely be git ignored. \n",
"\n",
"* OPTIONAL STEP: After running once the two cells below to save __sensitive_info__ dictionary variable with your custom info, you can comment them and leave the __sensitive_info__ variable defined above as an empty python dictionary. \n",
" \n",
" \n",
"__Notes__:\n",
"* An empty __sensitive_info__ dictionary is ignored by the __set_dotenv_info__ function defined above in project_utils.py . \n",
"* The saved .env file will be used thereafter in each cell that starts with %dotenv. \n",
"* The saved .env file contains user specific information and it shoulld __not__ be version-controlled in git.\n",
"* If you would like to [use service principal authentication](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azure-ml.ipynb) make sure you provide the optional values as well (see get_auth function definition in project_utils.py file created above for details).\n",
"\n",
"[Back](#user_input_requiring_steps) to summary of user input requiring steps."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# subscription_id = \"\"\n",
"# resource_group = \"ghiordanfwirsg01\"\n",
"# workspace_name = \"ghiordanfwiws\"\n",
"# workspace_region = \"eastus2\"\n",
"# gpu_cluster_name = \"gpuclstfwi02\"\n",
"# gpucluster_admin_user_name = \"\"\n",
"# gpucluster_admin_user_password = \"\"\n",
"\n",
"# experimentation_docker_image_name = \"fwi01_azureml\"\n",
"# experimentation_docker_image_tag = \"sdk.v1.0.60\"\n",
"# docker_container_mount_point = os.getcwd() # use project directory or a subdirectory\n",
"\n",
"# docker_login = \"georgedockeraccount\"\n",
"# docker_pwd = \"\"\n",
"\n",
"# acr_name=\"fwi01acr\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# sensitive_info = {\n",
"# 'SUBSCRIPTION_ID':subscription_id,\n",
"# 'RESOURCE_GROUP':resource_group, \n",
"# 'WORKSPACE_NAME':workspace_name, \n",
"# 'WORKSPACE_REGION':workspace_region,\n",
"# 'GPU_CLUSTER_NAME':gpu_cluster_name,\n",
"# 'GPU_CLUSTER_ADMIN_USER_NAME':gpucluster_admin_user_name,\n",
"# 'GPU_CLUSTER_ADMIN_USER_PASSWORD':gpucluster_admin_user_password,\n",
"# 'EXPERIMENTATION_DOCKER_IMAGE_NAME':experimentation_docker_image_name,\n",
"# 'EXPERIMENTATION_DOCKER_IMAGE_TAG':experimentation_docker_image_tag,\n",
"# 'DOCKER_CONTAINER_MOUNT_POINT':docker_container_mount_point,\n",
"# 'DOCKER_LOGIN':docker_login,\n",
"# 'DOCKER_PWD':docker_pwd,\n",
"# 'ACR_NAME':acr_name\n",
"# }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2.2. Save sensitive info\n",
"An empty __sensitive_info__ variable will be ingored. \n",
"A non-empty __sensitive_info__ variable will overwrite info in an existing .env file."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'./../not_shared/general.env'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%load_ext dotenv\n",
"dotenv_file_path = os.path.join(*(prj_consts.DOTENV_FILE_PATH)) \n",
"os.makedirs(os.path.join(*(prj_consts.DOTENV_FILE_PATH[:-1])), exist_ok=True)\n",
"pathlib.Path(dotenv_file_path).touch()\n",
"\n",
"# # show .env file path\n",
"# !pwd\n",
"dotenv_file_path\n",
"\n",
"#save your sensitive info\n",
"project_utils.set_dotenv_info(dotenv_file_path, sensitive_info)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2.3. Use (load) saved sensitive info\n",
"THis is how sensitive info will be retrieved in other notebooks"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"%dotenv $dotenv_file_path\n",
"\n",
"subscription_id = os.getenv('SUBSCRIPTION_ID')\n",
"# # print a bit of subscription ID, to show dotenv file was found and loaded \n",
"# subscription_id[:2]\n",
"\n",
"crt_resource_group = os.getenv('RESOURCE_GROUP')\n",
"crt_workspace_name = os.getenv('WORKSPACE_NAME')\n",
"crt_workspace_region = os.getenv('WORKSPACE_REGION') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2.4. Access your workspace\n",
"\n",
"* In AML SDK we can get a ws in two ways: \n",
" - via Workspace(subscription_id = ...) \n",
" - via Workspace.from_config(path=some_file_path). \n",
" \n",
"For demo purposes, both ways are shown in this notebook.\n",
"\n",
"* At first notebook run:\n",
" - the AML workspace ws is typically not found, so a new ws object is created and persisted on disk.\n",
" - If the ws has been created other ways (e.g. via Azure portal), it may be persisted on disk by calling ws1.write_config(...)."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"workspace_config_dir = os.path.join(*(prj_consts.AML_WORKSPACE_CONFIG_DIR))\n",
"workspace_config_file = prj_consts.AML_WORKSPACE_CONFIG_FILE_NAME\n",
"\n",
"# # print debug info if needed \n",
"# workspace_config_dir \n",
"# ls_l(os.path.join(os.getcwd(), os.path.join(*([workspace_config_dir]))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='Azure_login'></a>\n",
"###### Login into Azure may be required here\n",
"[Back](#user_input_requiring_steps) to summary of user input requiring steps."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING - Warning: Falling back to use azure cli login credentials.\n",
"If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.\n",
"Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Workspace configuration loading succeeded. \n"
]
}
],
"source": [
"try:\n",
" ws1 = Workspace(\n",
" subscription_id = subscription_id, \n",
" resource_group = crt_resource_group, \n",
" workspace_name = crt_workspace_name,\n",
" auth=project_utils.get_auth(dotenv_file_path))\n",
" print(\"Workspace configuration loading succeeded. \")\n",
"# ws1.write_config(path=os.path.join(os.getcwd(), os.path.join(*([workspace_config_dir]))),\n",
"# file_name=workspace_config_file)\n",
" del ws1 # ws will be (re)created later using from_config() function\n",
"except Exception as e :\n",
" print('Exception msg: {}'.format(str(e )))\n",
" print(\"Workspace not accessible. Will create a new workspace below\")\n",
" \n",
" workspace_region = crt_workspace_region\n",
"\n",
" # Create the workspace using the specified parameters\n",
" ws2 = Workspace.create(name = crt_workspace_name,\n",
" subscription_id = subscription_id,\n",
" resource_group = crt_resource_group, \n",
" location = workspace_region,\n",
" create_resource_group = True,\n",
" exist_ok = False)\n",
" ws2.get_details()\n",
"\n",
" # persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
" ws2.write_config(path=os.path.join(os.getcwd(), os.path.join(*([workspace_config_dir]))),\n",
" file_name=workspace_config_file)\n",
" \n",
" #Delete ws2 and use ws = Workspace.from_config() as shwon below to recover the ws, rather than rely on what we get from one time creation\n",
" del ws2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2.5. Demo access to created workspace\n",
"\n",
"From now on, even in other notebooks, the provisioned AML workspace will be accesible using Workspace.from_config() as shown below:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# path arg is:\n",
"# - a file path which explictly lists aml_config subdir for function from_config() \n",
"# - a dir path with a silently added <<aml_config>> subdir for function write_config(). \n",
"ws = Workspace.from_config(path=os.path.join(os.getcwd(), \n",
" os.path.join(*([workspace_config_dir, '.azureml', workspace_config_file]))))\n",
"# # print debug info if needed\n",
"# print(ws.name, ws.resource_group, ws.location, ws.subscription_id[0], sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2.6. Create compute cluster used in following notebooks"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'gpuclstfwi02'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gpu_cluster_name = os.getenv('GPU_CLUSTER_NAME')\n",
"gpu_cluster_name"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found existing gpu cluster\n"
]
}
],
"source": [
"max_nodes_value = 3\n",
"\n",
"try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print(\"Found existing gpu cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Could not find gpu cluster, please create one\")\n",
" \n",
"# # Specify the configuration for the new cluster, add admin_user_ssh_key='ssh-rsa ... ghiordan@microsoft.com' if needed\n",
"# compute_config = AmlCompute.provisioning_configuration(vm_size=\"Standard_NC12\",\n",
"# min_nodes=0,\n",
"# max_nodes=max_nodes_value,\n",
"# admin_username=os.getenv('GPU_CLUSTER_ADMIN_USER_NAME'), \n",
"# admin_user_password=os.getenv('GPU_CLUSTER_ADMIN_USER_NAME'))\n",
"# # Create the cluster with the specified name and configuration\n",
"# gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
"\n",
"# # Wait for the cluster to complete, show the output log\n",
"# gpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 2.7. Create an [ACR](https://docs.microsoft.com/en-us/azure/container-registry/) if you have not done so using the [portal](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-get-started-portal) \n",
" - Follow the 4 ACR steps described below. \n",
" - Uncomment cells' lines as needed to login and see commands responses while you set the right subscription and then create the ACR. \n",
" - You need [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) to run the commands below. \n",
"\n",
"<a id='Azure_cli_login'></a>\n",
"##### ACR Step 1. Select ACR subscription (az cli login into Azure may be required here)\n",
"[Back](#user_input_requiring_steps) to summary of user input requiring steps."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"azure-cli 2.0.58 *\r\n",
"\r\n",
"acr 2.2.0 *\r\n",
"acs 2.3.17 *\r\n",
"advisor 2.0.0 *\r\n",
"ams 0.4.1 *\r\n",
"appservice 0.2.13 *\r\n",
"backup 1.2.1 *\r\n",
"batch 3.4.1 *\r\n",
"batchai 0.4.7 *\r\n",
"billing 0.2.0 *\r\n",
"botservice 0.1.6 *\r\n",
"cdn 0.2.0 *\r\n",
"cloud 2.1.0 *\r\n",
"cognitiveservices 0.2.4 *\r\n",
"command-modules-nspkg 2.0.2 *\r\n",
"configure 2.0.20 *\r\n",
"consumption 0.4.2 *\r\n",
"container 0.3.13 *\r\n",
"core 2.0.58 *\r\n",
"cosmosdb 0.2.7 *\r\n",
"dla 0.2.4 *\r\n",
"dls 0.1.8 *\r\n",
"dms 0.1.2 *\r\n",
"eventgrid 0.2.1 *\r\n",
"eventhubs 0.3.3 *\r\n",
"extension 0.2.3 *\r\n",
"feedback 2.1.4 *\r\n",
"find 0.2.13 *\r\n",
"hdinsight 0.3.0 *\r\n",
"interactive 0.4.1 *\r\n",
"iot 0.3.6 *\r\n",
"iotcentral 0.1.6 *\r\n",
"keyvault 2.2.11 *\r\n",
"kusto 0.1.0 *\r\n",
"lab 0.1.5 *\r\n",
"maps 0.3.3 *\r\n",
"monitor 0.2.10 *\r\n",
"network 2.3.2 *\r\n",
"nspkg 3.0.3 *\r\n",
"policyinsights 0.1.1 *\r\n",
"profile 2.1.3 *\r\n",
"rdbms 0.3.7 *\r\n",
"redis 0.4.0 *\r\n",
"relay 0.1.3 *\r\n",
"reservations 0.4.1 *\r\n",
"resource 2.1.10 *\r\n",
"role 2.4.0 *\r\n",
"search 0.1.1 *\r\n",
"security 0.1.0 *\r\n",
"servicebus 0.3.3 *\r\n",
"servicefabric 0.1.12 *\r\n",
"signalr 1.0.0 *\r\n",
"sql 2.1.9 *\r\n",
"sqlvm 0.1.0 *\r\n",
"storage 2.3.1 *\r\n",
"telemetry 1.0.1 *\r\n",
"vm 2.2.15 *\r\n",
"\r\n",
"Extensions:\r\n",
"azure-ml-admin-cli 0.0.1\r\n",
"azure-cli-ml Unknown\r\n",
"\r\n",
"Python location '/opt/az/bin/python3'\r\n",
"Extensions directory '/opt/az/extensions'\r\n",
"\r\n",
"Python (Linux) 3.6.5 (default, Feb 12 2019, 02:10:43) \r\n",
"[GCC 5.4.0 20160609]\r\n",
"\r\n",
"Legal docs and information: aka.ms/AzureCliLegal\r\n",
"\r\n",
"\r\n",
"\u001b[33mYou have 57 updates available. Consider updating your CLI installation.\u001b[0m\r\n"
]
}
],
"source": [
"!az --version\n",
"if create_ACR_FLAG:\n",
" !az login\n",
" response01 = ! az account list --all --refresh -o table\n",
" response02 = ! az account set --subscription $subscription_id\n",
" response03 = ! az account list -o table\n",
" response04 = ! $cli_command\n",
"\n",
" response01\n",
" response02\n",
" response03\n",
" response04"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### ACR Step 2. Create the ACR"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'az acr create --resource-group ghiordanfwirsg01 --name fwi01acr --sku Basic'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"[' \"loginServer\": \"fwi01acr.azurecr.io\",',\n",
" ' \"name\": \"fwi01acr\",',\n",
" ' \"networkRuleSet\": null,',\n",
" ' \"provisioningState\": \"Succeeded\",',\n",
" ' \"resourceGroup\": \"ghiordanfwirsg01\",',\n",
" ' \"sku\": {',\n",
" ' \"name\": \"Basic\",',\n",
" ' \"tier\": \"Basic\"',\n",
" ' },',\n",
" ' \"status\": null,',\n",
" ' \"storageAccount\": null,',\n",
" ' \"tags\": {},',\n",
" ' \"type\": \"Microsoft.ContainerRegistry/registries\"',\n",
" '}']"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%dotenv $dotenv_file_path\n",
"acr_name = os.getenv('ACR_NAME')\n",
"\n",
"cli_command='az acr create --resource-group '+ crt_resource_group +' --name ' + acr_name + ' --sku Basic'\n",
"cli_command\n",
"\n",
"response = !$cli_command\n",
"response[-14:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### ACR Step 3. Also enable password and login via __ [--admin-enabled true](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication) __ and then use the az cli or portal to set up the credentials"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'az acr update -n fwi01acr --admin-enabled true'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# per https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication\n",
"cli_command='az acr update -n '+acr_name+' --admin-enabled true'\n",
"cli_command\n",
"\n",
"response = !$cli_command\n",
"# response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### ACR Step 4. Save the ACR password and login"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# create_ACR_FLAG=False\n",
"if create_ACR_FLAG:\n",
" import subprocess\n",
" cli_command = 'az acr credential show -n '+acr_name\n",
"\n",
"acr_username = subprocess.Popen(cli_command+' --query username',shell=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE).\\\n",
"communicate()[0].decode(\"utf-8\").split()[0].strip('\\\"')\n",
"\n",
"acr_password = subprocess.Popen(cli_command+' --query passwords[0].value',shell=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE).\\\n",
"communicate()[0].decode(\"utf-8\").split()[0].strip('\\\"')\n",
"\n",
"response = dotenv.set_key(dotenv_file_path, 'ACR_PASSWORD', acr_password)\n",
"response = dotenv.set_key(dotenv_file_path, 'ACR_USERNAME', acr_username)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"%reload_ext dotenv\n",
"%dotenv -o $dotenv_file_path\n",
"\n",
"# print acr password and login info saved in dotenv file\n",
"if create_ACR_FLAG:\n",
" os.getenv('ACR_PASSWORD')\n",
" os.getenv('ACR_USERNAME')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"print('Finished running 000_Setup_GeophysicsTutorial_FWI_Azure_devito!')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:fwi_dev_conda_environment] *",
"language": "python",
"name": "conda-env-fwi_dev_conda_environment-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -0,0 +1,6 @@
This folder contains a variety of scripts which might be useful.
# Ablation Study
Contained in `ablation.sh`, the script demonstrates running the HRNet model with various patch sizes.

24
contrib/scripts/ablation.sh Executable file
Просмотреть файл

@ -0,0 +1,24 @@
#!/bin/bash
source activate seismic-interpretation
# Patch_Size 100: Patch vs Section Depth
python scripts/prepare_dutchf3.py split_train_val patch --data-dir=/mnt/dutch --stride=50 --patch=100
python train.py OUTPUT_DIR /data/output/hrnet_patch TRAIN.DEPTH patch TRAIN.PATCH_SIZE 100 --cfg 'configs/hrnet.yaml'
python train.py OUTPUT_DIR /data/output/hrnet_section TRAIN.DEPTH section TRAIN.PATCH_SIZE 100 --cfg 'configs/hrnet.yaml'
# Patch_Size 150: Patch vs Section Depth
python scripts/prepare_dutchf3.py split_train_val patch --data-dir=/mnt/dutch --stride=50 --patch=150
python train.py OUTPUT_DIR /data/output/hrnet_patch TRAIN.DEPTH patch TRAIN.PATCH_SIZE 150 --cfg 'configs/hrnet.yaml'
python train.py OUTPUT_DIR /data/output/hrnet_section TRAIN.DEPTH section TRAIN.PATCH_SIZE 150 --cfg 'configs/hrnet.yaml'
# Patch_Size 200: Patch vs Section Depth
python scripts/prepare_dutchf3.py split_train_val patch --data-dir=/mnt/dutch --stride=50 --patch=200
python train.py OUTPUT_DIR /data/output/hrnet_patch TRAIN.DEPTH patch TRAIN.PATCH_SIZE 200 --cfg 'configs/hrnet.yaml'
python train.py OUTPUT_DIR /data/output/hrnet_section TRAIN.DEPTH section TRAIN.PATCH_SIZE 200 --cfg 'configs/hrnet.yaml'
# Patch_Size 250: Patch vs Section Depth
python scripts/prepare_dutchf3.py split_train_val patch --data-dir=/mnt/dutch --stride=50 --patch=250
python train.py OUTPUT_DIR /data/output/hrnet_patch TRAIN.DEPTH patch TRAIN.PATCH_SIZE 250 TRAIN.AUGMENTATIONS.RESIZE.HEIGHT 250 TRAIN.AUGMENTATIONS.RESIZE.WIDTH 250 --cfg 'configs/hrnet.yaml'
python train.py OUTPUT_DIR /data/output/hrnet_section TRAIN.DEPTH section TRAIN.PATCH_SIZE 250 TRAIN.AUGMENTATIONS.RESIZE.HEIGHT 250 TRAIN.AUGMENTATIONS.RESIZE.WIDTH 250 --cfg 'configs/hrnet.yaml'

Просмотреть файл

@ -0,0 +1,27 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
#
# Example:
# download_hrnet.sh /data/models hrnet.pth
#
echo Using "$1" as the download directory
if [ ! -d "$1" ]
then
echo "Directory does not exist - creating..."
mkdir -p "$1"
fi
full_path=$1/$2
echo "Downloading to ${full_path}"
wget --header 'Host: optgaw.dm.files.1drv.com' \
--user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0' \
--header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
--header 'Accept-Language: en-GB,en;q=0.5' \
--referer 'https://onedrive.live.com/' \
--header 'Upgrade-Insecure-Requests: 1' 'https://optgaw.dm.files.1drv.com/y4m14W1OEuoniQMCT4m64UV8CSQT-dFe2ZRhU0LAZSal80V4phgVIlTYxI2tUi6BPVOy7l5rK8MKpZNywVvtz-NKL2ZWq-UYRL6MAjbLgdFA6zyW8RRrKBe_FcqcWr4YTXeJ18xfVqco6CdGZHFfORBE6EtFxEIrHWNjM032dWZLdqZ0eXd7RZTrHs1KKYa92zcs0Rj91CAyIK4hIaOomzEWA/hrnetv2_w48_imagenet_pretrained.pth?download&psid=1' \
--output-document ${full_path}

24
contrib/scripts/get_F3_voxel.sh Executable file
Просмотреть файл

@ -0,0 +1,24 @@
#!/bin/bash
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
echo "Make sure you also download Dutch F3 data from https://github.com/bolgebrygg/MalenoV"
# fetch Dutch F3 from Malenov project.
# wget https://drive.google.com/open?id=0B7brcf-eGK8CUUZKLXJURFNYeXM -O interpretation/voxel2pixel/F3/data.segy
if [ $# -eq 0 ]
then
downdirtrain='experiments/interpretation/voxel2pixel/F3/train'
downdirval='experiments/interpretation/voxel2pixel/F3/val'
else
downdirtrain=$1
downdirval=$1
fi
mkdir -p ${downdirtrain}
mkdir -p ${downdirval}
echo "Downloading train label to $downdirtrain and validation label to $downdirval"
wget https://github.com/waldeland/CNN-for-ASI/raw/master/F3/train/inline_339.png -O ${downdirtrain}/inline_339.png
wget https://github.com/waldeland/CNN-for-ASI/raw/master/F3/val/inline_405.png -O ${downdirval}/inline_405.png
echo "Download complete"

1
cv_lib/AUTHORS.md Normal file
Просмотреть файл

@ -0,0 +1 @@
[Mathew Salvaris] [@msalvaris](http://github.com/msalvaris/)

11
cv_lib/README.md Normal file
Просмотреть файл

@ -0,0 +1,11 @@
# CVLib
A set of utility functions for computer vision
## Install
```bash
pip install -e .
```
This will install the package cv_lib

Просмотреть файл

Просмотреть файл

@ -0,0 +1,4 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
__version__ = "0.0.1"

Просмотреть файл

@ -0,0 +1,42 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
from ignite.handlers import ModelCheckpoint
import glob
import os
from shutil import copyfile
class SnapshotHandler:
def __init__(self, dir_name, filename_prefix, score_function, snapshot_function):
self._model_save_location = dir_name
self._running_model_prefix = filename_prefix + "_running"
self._snapshot_prefix = filename_prefix + "_snapshot"
self._snapshot_function = snapshot_function
self._snapshot_num = 1
self._score_function = score_function
self._checkpoint_handler = self._create_checkpoint_handler()
def _create_checkpoint_handler(self):
return ModelCheckpoint(
self._model_save_location,
self._running_model_prefix,
score_function=self._score_function,
n_saved=1,
create_dir=True,
save_as_state_dict=True,
require_empty=False,
)
def __call__(self, engine, to_save):
self._checkpoint_handler(engine, to_save)
if self._snapshot_function():
files = glob.glob(os.path.join(self._model_save_location, self._running_model_prefix + "*"))
print(files)
name_postfix = os.path.basename(files[0]).lstrip(self._running_model_prefix)
copyfile(
files[0],
os.path.join(self._model_save_location, f"{self._snapshot_prefix}{self._snapshot_num}{name_postfix}",),
)
self._checkpoint_handler = self._create_checkpoint_handler() # Reset the checkpoint handler
self._snapshot_num += 1

Просмотреть файл

Просмотреть файл

@ -0,0 +1,90 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import logging
import logging.config
from toolz import curry
import numpy as np
np.set_printoptions(precision=3)
@curry
def log_training_output(engine, log_interval=100):
logger = logging.getLogger(__name__)
if engine.state.iteration % log_interval == 0:
logger.info(f"Epoch: {engine.state.epoch} Iter: {engine.state.iteration} loss {engine.state.output['loss']}")
@curry
def log_lr(optimizer, engine):
logger = logging.getLogger(__name__)
lr = [param_group["lr"] for param_group in optimizer.param_groups]
logger.info(f"lr - {lr}")
_DEFAULT_METRICS = {"pixacc": "Avg accuracy :", "nll": "Avg loss :"}
@curry
def log_metrics(log_msg, engine, metrics_dict=_DEFAULT_METRICS):
logger = logging.getLogger(__name__)
metrics = engine.state.metrics
metrics_msg = " ".join([f"{metrics_dict[k]} {metrics[k]:.2f}" for k in metrics_dict])
logger.info(f"{log_msg} - Epoch {engine.state.epoch} [{engine.state.max_epochs}] " + metrics_msg)
@curry
def log_class_metrics(log_msg, engine, metrics_dict):
logger = logging.getLogger(__name__)
metrics = engine.state.metrics
metrics_msg = "\n".join(f"{metrics_dict[k]} {metrics[k].numpy()}" for k in metrics_dict)
logger.info(f"{log_msg} - Epoch {engine.state.epoch} [{engine.state.max_epochs}]\n" + metrics_msg)
class Evaluator:
def __init__(self, evaluation_engine, data_loader):
self._evaluation_engine = evaluation_engine
self._data_loader = data_loader
def __call__(self, engine):
self._evaluation_engine.run(self._data_loader)
class HorovodLRScheduler:
"""
Horovod: using `lr = base_lr * hvd.size()` from the very beginning leads to worse final
accuracy. Scale the learning rate `lr = base_lr` ---> `lr = base_lr * hvd.size()` during
the first five epochs. See https://arxiv.org/abs/1706.02677 for details.
After the warmup reduce learning rate by 10 on the 30th, 60th and 80th epochs.
"""
def __init__(
self, base_lr, warmup_epochs, cluster_size, data_loader, optimizer, batches_per_allreduce,
):
self._warmup_epochs = warmup_epochs
self._cluster_size = cluster_size
self._data_loader = data_loader
self._optimizer = optimizer
self._base_lr = base_lr
self._batches_per_allreduce = batches_per_allreduce
self._logger = logging.getLogger(__name__)
def __call__(self, engine):
epoch = engine.state.epoch
if epoch < self._warmup_epochs:
epoch += float(engine.state.iteration + 1) / len(self._data_loader)
lr_adj = 1.0 / self._cluster_size * (epoch * (self._cluster_size - 1) / self._warmup_epochs + 1)
elif epoch < 30:
lr_adj = 1.0
elif epoch < 60:
lr_adj = 1e-1
elif epoch < 80:
lr_adj = 1e-2
else:
lr_adj = 1e-3
for param_group in self._optimizer.param_groups:
param_group["lr"] = self._base_lr * self._cluster_size * self._batches_per_allreduce * lr_adj
self._logger.debug(f"Adjust learning rate {param_group['lr']}")

Просмотреть файл

@ -0,0 +1,69 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
from toolz import curry
import torchvision
import logging
import logging.config
try:
from tensorboardX import SummaryWriter
except ImportError:
raise RuntimeError("No tensorboardX package is found. Please install with the command: \npip install tensorboardX")
def create_summary_writer(log_dir):
writer = SummaryWriter(logdir=log_dir)
return writer
def _log_model_output(log_label, summary_writer, engine):
summary_writer.add_scalar(log_label, engine.state.output["loss"], engine.state.iteration)
@curry
def log_training_output(summary_writer, engine):
_log_model_output("training/loss", summary_writer, engine)
@curry
def log_validation_output(summary_writer, engine):
_log_model_output("validation/loss", summary_writer, engine)
@curry
def log_lr(summary_writer, optimizer, log_interval, engine):
"""[summary]
Args:
optimizer ([type]): [description]
log_interval ([type]): iteration or epoch
summary_writer ([type]): [description]
engine ([type]): [description]
"""
lr = [param_group["lr"] for param_group in optimizer.param_groups]
summary_writer.add_scalar("lr", lr[0], getattr(engine.state, log_interval))
_DEFAULT_METRICS = {"accuracy": "Avg accuracy :", "nll": "Avg loss :"}
@curry
def log_metrics(summary_writer, train_engine, log_interval, engine, metrics_dict=_DEFAULT_METRICS):
metrics = engine.state.metrics
for m in metrics_dict:
summary_writer.add_scalar(metrics_dict[m], metrics[m], getattr(train_engine.state, log_interval))
def create_image_writer(summary_writer, label, output_variable, normalize=False, transform_func=lambda x: x):
logger = logging.getLogger(__name__)
def write_to(engine):
try:
data_tensor = transform_func(engine.state.output[output_variable])
image_grid = torchvision.utils.make_grid(data_tensor, normalize=normalize, scale_each=True)
summary_writer.add_image(label, image_grid, engine.state.epoch)
except KeyError:
logger.warning("Predictions and or ground truth labels not available to report")
return write_to

Просмотреть файл

@ -0,0 +1,17 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
from toolz import curry
import torch.nn.functional as F
@curry
def extract_metric_from(metric, engine):
metrics = engine.state.metrics
return metrics[metric]
@curry
def padded_val_transform(pad_left, fine_size, x, y, y_pred):
y_pred = y_pred[:, :, pad_left : pad_left + fine_size, pad_left : pad_left + fine_size].contiguous()
return {"image": x, "y_pred": F.sigmoid(y_pred).detach(), "mask": y.detach()}

Просмотреть файл

@ -0,0 +1,221 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import math
import numbers
import random
import numpy as np
from PIL import Image, ImageOps
class Compose(object):
def __init__(self, augmentations):
self.augmentations = augmentations
def __call__(self, img, mask):
img, mask = Image.fromarray(img, mode=None), Image.fromarray(mask, mode="L")
assert img.size == mask.size
for a in self.augmentations:
img, mask = a(img, mask)
return np.array(img), np.array(mask, dtype=np.uint8)
class AddNoise(object):
def __call__(self, img, mask):
noise = np.random.normal(loc=0, scale=0.02, size=(img.size[1], img.size[0]))
return img + noise, mask
class RandomCrop(object):
def __init__(self, size, padding=0):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
self.padding = padding
def __call__(self, img, mask):
if self.padding > 0:
img = ImageOps.expand(img, border=self.padding, fill=0)
mask = ImageOps.expand(mask, border=self.padding, fill=0)
assert img.size == mask.size
w, h = img.size
th, tw = self.size
if w == tw and h == th:
return img, mask
if w < tw or h < th:
return (
img.resize((tw, th), Image.BILINEAR),
mask.resize((tw, th), Image.NEAREST),
)
x1 = random.randint(0, w - tw)
y1 = random.randint(0, h - th)
return (
img.crop((x1, y1, x1 + tw, y1 + th)),
mask.crop((x1, y1, x1 + tw, y1 + th)),
)
class CenterCrop(object):
def __init__(self, size):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
th, tw = self.size
x1 = int(round((w - tw) / 2.0))
y1 = int(round((h - th) / 2.0))
return (
img.crop((x1, y1, x1 + tw, y1 + th)),
mask.crop((x1, y1, x1 + tw, y1 + th)),
)
class RandomHorizontallyFlip(object):
def __call__(self, img, mask):
if random.random() < 0.5:
# Note: we use FLIP_TOP_BOTTOM here intentionaly. Due to the dimensions of the image,
# it ends up being a horizontal flip.
return (
img.transpose(Image.FLIP_TOP_BOTTOM),
mask.transpose(Image.FLIP_TOP_BOTTOM),
)
return img, mask
class RandomVerticallyFlip(object):
def __call__(self, img, mask):
if random.random() < 0.5:
return (
img.transpose(Image.FLIP_LEFT_RIGHT),
mask.transpose(Image.FLIP_LEFT_RIGHT),
)
return img, mask
class FreeScale(object):
def __init__(self, size):
self.size = tuple(reversed(size)) # size: (h, w)
def __call__(self, img, mask):
assert img.size == mask.size
return (
img.resize(self.size, Image.BILINEAR),
mask.resize(self.size, Image.NEAREST),
)
class Scale(object):
def __init__(self, size):
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
if (w >= h and w == self.size) or (h >= w and h == self.size):
return img, mask
if w > h:
ow = self.size
oh = int(self.size * h / w)
return (
img.resize((ow, oh), Image.BILINEAR),
mask.resize((ow, oh), Image.NEAREST),
)
else:
oh = self.size
ow = int(self.size * w / h)
return (
img.resize((ow, oh), Image.BILINEAR),
mask.resize((ow, oh), Image.NEAREST),
)
class RandomSizedCrop(object):
def __init__(self, size):
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
for attempt in range(10):
area = img.size[0] * img.size[1]
target_area = random.uniform(0.45, 1.0) * area
aspect_ratio = random.uniform(0.5, 2)
w = int(round(math.sqrt(target_area * aspect_ratio)))
h = int(round(math.sqrt(target_area / aspect_ratio)))
if random.random() < 0.5:
w, h = h, w
if w <= img.size[0] and h <= img.size[1]:
x1 = random.randint(0, img.size[0] - w)
y1 = random.randint(0, img.size[1] - h)
img = img.crop((x1, y1, x1 + w, y1 + h))
mask = mask.crop((x1, y1, x1 + w, y1 + h))
assert img.size == (w, h)
return (
img.resize((self.size, self.size), Image.BILINEAR),
mask.resize((self.size, self.size), Image.NEAREST),
)
# Fallback
scale = Scale(self.size)
crop = CenterCrop(self.size)
return crop(*scale(img, mask))
class RandomRotate(object):
def __init__(self, degree):
self.degree = degree
def __call__(self, img, mask):
"""
PIL automatically adds zeros to the borders of images that rotated. To fix this
issue, the code in the botton sets anywhere in the labels (mask) that is zero to
255 (the value used for ignore_index).
"""
rotate_degree = random.random() * 2 * self.degree - self.degree
img = img.rotate(rotate_degree, Image.BILINEAR)
mask = mask.rotate(rotate_degree, Image.NEAREST)
binary_mask = Image.fromarray(np.ones([mask.size[1], mask.size[0]]))
binary_mask = binary_mask.rotate(rotate_degree, Image.NEAREST)
binary_mask = np.array(binary_mask)
mask_arr = np.array(mask)
mask_arr[binary_mask == 0] = 255
mask = Image.fromarray(mask_arr)
return img, mask
class RandomSized(object):
def __init__(self, size):
self.size = size
self.scale = Scale(self.size)
self.crop = RandomCrop(self.size)
def __call__(self, img, mask):
assert img.size == mask.size
w = int(random.uniform(0.5, 2) * img.size[0])
h = int(random.uniform(0.5, 2) * img.size[1])
img, mask = (
img.resize((w, h), Image.BILINEAR),
mask.resize((w, h), Image.NEAREST),
)
return self.crop(*self.scale(img, mask))

Просмотреть файл

@ -0,0 +1,130 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch
from ignite.engine.engine import Engine, State, Events
from ignite.utils import convert_tensor
import torch.nn.functional as F
from toolz import curry
from torch.nn import functional as F
import numpy as np
def _upscale_model_output(y_pred, y):
ph, pw = y_pred.size(2), y_pred.size(3)
h, w = y.size(2), y.size(3)
if ph != h or pw != w:
y_pred = F.upsample(input=y_pred, size=(h, w), mode="bilinear")
return y_pred
def create_supervised_trainer(
model,
optimizer,
loss_fn,
prepare_batch,
device=None,
non_blocking=False,
output_transform=lambda x, y, y_pred, loss: {"loss": loss.item()},
):
if device:
model.to(device)
def _update(engine, batch):
model.train()
optimizer.zero_grad()
x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
y_pred = model(x)
y_pred = _upscale_model_output(y_pred, y)
loss = loss_fn(y_pred.squeeze(1), y.squeeze(1))
loss.backward()
optimizer.step()
return output_transform(x, y, y_pred, loss)
return Engine(_update)
@curry
def val_transform(x, y, y_pred):
return {"image": x, "y_pred": y_pred.detach(), "mask": y.detach()}
def create_supervised_evaluator(
model, prepare_batch, metrics=None, device=None, non_blocking=False, output_transform=val_transform,
):
metrics = metrics or {}
if device:
model.to(device)
def _inference(engine, batch):
model.eval()
with torch.no_grad():
x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
y_pred = model(x)
y_pred = _upscale_model_output(y_pred, x)
return output_transform(x, y, y_pred)
engine = Engine(_inference)
for name, metric in metrics.items():
metric.attach(engine, name)
return engine
def create_supervised_trainer_apex(
model,
optimizer,
loss_fn,
prepare_batch,
device=None,
non_blocking=False,
output_transform=lambda x, y, y_pred, loss: {"loss": loss.item()},
):
from apex import amp
if device:
model.to(device)
def _update(engine, batch):
model.train()
optimizer.zero_grad()
x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
y_pred = model(x)
loss = loss_fn(y_pred.squeeze(1), y.squeeze(1))
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
return output_transform(x, y, y_pred, loss)
return Engine(_update)
# def create_supervised_evaluator_apex(
# model,
# prepare_batch,
# metrics=None,
# device=None,
# non_blocking=False,
# output_transform=lambda x, y, y_pred: (x, y, pred),
# ):
# metrics = metrics or {}
# if device:
# model.to(device)
# def _inference(engine, batch):
# model.eval()
# with torch.no_grad():
# x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
# y_pred = model(x)
# return output_transform(x, y, y_pred)
# engine = Engine(_inference)
# for name, metric in metrics.items():
# metric.attach(engine, name)
# return engine

Просмотреть файл

@ -0,0 +1,46 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import numpy as np
import torch
from git import Repo
from datetime import datetime
import os
def np_to_tb(array):
# if 2D :
if array.ndim == 2:
# HW => CHW
array = np.expand_dims(array, axis=0)
# CHW => NCHW
array = np.expand_dims(array, axis=0)
elif array.ndim == 3:
# HWC => CHW
array = array.transpose(2, 0, 1)
# CHW => NCHW
array = np.expand_dims(array, axis=0)
array = torch.from_numpy(array)
return array
def current_datetime():
return datetime.now().strftime("%b%d_%H%M%S")
def git_branch():
repo = Repo(search_parent_directories=True)
return repo.active_branch.name
def git_hash():
repo = Repo(search_parent_directories=True)
return repo.active_branch.commit.hexsha
def generate_path(base_path, *directories):
path = os.path.join(base_path, *directories)
if not os.path.exists(path):
os.makedirs(path)
return path

Просмотреть файл

@ -0,0 +1,94 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch
import ignite
def pixelwise_accuracy(num_classes, output_transform=lambda x: x, device=None):
"""Calculates class accuracy
Args:
num_classes (int): number of classes
output_transform (callable, optional): a callable that is used to transform the
output into the form expected by the metric.
Returns:
MetricsLambda
"""
cm = ignite.metrics.ConfusionMatrix(num_classes=num_classes, output_transform=output_transform, device=device)
# Increase floating point precision and pass to CPU
cm = cm.type(torch.DoubleTensor)
pix_cls = ignite.metrics.confusion_matrix.cmAccuracy(cm)
return pix_cls
def class_accuracy(num_classes, output_transform=lambda x: x, device=None):
"""Calculates class accuracy
Args:
num_classes (int): number of classes
output_transform (callable, optional): a callable that is used to transform the
output into the form expected by the metric.
Returns:
MetricsLambda
"""
cm = ignite.metrics.ConfusionMatrix(num_classes=num_classes, output_transform=output_transform, device=device)
# Increase floating point precision and pass to CPU
cm = cm.type(torch.DoubleTensor)
acc_cls = cm.diag() / (cm.sum(dim=1) + 1e-15)
return acc_cls
def mean_class_accuracy(num_classes, output_transform=lambda x: x, device=None):
"""Calculates mean class accuracy
Args:
num_classes (int): number of classes
output_transform (callable, optional): a callable that is used to transform the
output into the form expected by the metric.
Returns:
MetricsLambda
"""
return class_accuracy(num_classes=num_classes, output_transform=output_transform, device=device).mean()
def class_iou(num_classes, output_transform=lambda x: x, device=None, ignore_index=None):
"""Calculates per-class intersection-over-union
Args:
num_classes (int): number of classes
output_transform (callable, optional): a callable that is used to transform the
output into the form expected by the metric.
Returns:
MetricsLambda
"""
cm = ignite.metrics.ConfusionMatrix(num_classes=num_classes, output_transform=output_transform, device=device)
return ignite.metrics.IoU(cm, ignore_index=ignore_index)
def mean_iou(num_classes, output_transform=lambda x: x, device=None, ignore_index=None):
"""Calculates mean intersection-over-union
Args:
num_classes (int): number of classes
output_transform (callable, optional): a callable that is used to transform the
output into the form expected by the metric.
Returns:
MetricsLambda
"""
cm = ignite.metrics.ConfusionMatrix(num_classes=num_classes, output_transform=output_transform, device=device)
return ignite.metrics.mIoU(cm, ignore_index=ignore_index)

Просмотреть файл

@ -0,0 +1,10 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import cv_lib.segmentation.models.seg_hrnet # noqa: F401
import cv_lib.segmentation.models.resnet_unet # noqa: F401
import cv_lib.segmentation.models.unet # noqa: F401
import cv_lib.segmentation.models.section_deconvnet # noqa: F401
import cv_lib.segmentation.models.patch_deconvnet # noqa: F401
import cv_lib.segmentation.models.patch_deconvnet_skip # noqa: F401
import cv_lib.segmentation.models.section_deconvnet_skip # noqa: F401

Просмотреть файл

@ -0,0 +1,308 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch.nn as nn
class patch_deconvnet(nn.Module):
def __init__(self, n_classes=4, learned_billinear=False):
super(patch_deconvnet, self).__init__()
self.learned_billinear = learned_billinear
self.n_classes = n_classes
self.unpool = nn.MaxUnpool2d(2, stride=2)
self.conv_block1 = nn.Sequential(
# conv1_1
nn.Conv2d(1, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv1_2
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool1
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_1
# 48*48
self.conv_block2 = nn.Sequential(
# conv2_1
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv2_2
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool2
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_2
# 24*24
self.conv_block3 = nn.Sequential(
# conv3_1
nn.Conv2d(128, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_2
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_3
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool3
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_3
# 12*12
self.conv_block4 = nn.Sequential(
# conv4_1
nn.Conv2d(256, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool4
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_4
# 6*6
self.conv_block5 = nn.Sequential(
# conv5_1
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool5
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_5
# 3*3
self.conv_block6 = nn.Sequential(
# fc6
nn.Conv2d(512, 4096, 3),
# set the filter size and nor padding to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 1*1
self.conv_block7 = nn.Sequential(
# fc7
nn.Conv2d(4096, 4096, 1),
# set the filter size to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.deconv_block8 = nn.Sequential(
# fc6-deconv
nn.ConvTranspose2d(4096, 512, 3, stride=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 3*3
self.unpool_block9 = nn.Sequential(
# unpool5
nn.MaxUnpool2d(2, stride=2),
)
# usage unpool(output, indices)
# 6*6
self.deconv_block10 = nn.Sequential(
# deconv5_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_3
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block11 = nn.Sequential(
# unpool4
nn.MaxUnpool2d(2, stride=2),
)
# 12*12
self.deconv_block12 = nn.Sequential(
# deconv4_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_3
nn.ConvTranspose2d(512, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block13 = nn.Sequential(
# unpool3
nn.MaxUnpool2d(2, stride=2),
)
# 24*24
self.deconv_block14 = nn.Sequential(
# deconv3_1
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_2
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_3
nn.ConvTranspose2d(256, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block15 = nn.Sequential(
# unpool2
nn.MaxUnpool2d(2, stride=2),
)
# 48*48
self.deconv_block16 = nn.Sequential(
# deconv2_1
nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv2_2
nn.ConvTranspose2d(128, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block17 = nn.Sequential(
# unpool1
nn.MaxUnpool2d(2, stride=2),
)
# 96*96
self.deconv_block18 = nn.Sequential(
# deconv1_1
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv1_2
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.seg_score19 = nn.Sequential(
# seg-score
nn.Conv2d(64, self.n_classes, 1),
)
if self.learned_billinear:
raise NotImplementedError
def forward(self, x):
size0 = x.size()
conv1, indices1 = self.conv_block1(x)
size1 = conv1.size()
conv2, indices2 = self.conv_block2(conv1)
size2 = conv2.size()
conv3, indices3 = self.conv_block3(conv2)
size3 = conv3.size()
conv4, indices4 = self.conv_block4(conv3)
size4 = conv4.size()
conv5, indices5 = self.conv_block5(conv4)
conv6 = self.conv_block6(conv5)
conv7 = self.conv_block7(conv6)
conv8 = self.deconv_block8(conv7)
conv9 = self.unpool(conv8, indices5, output_size=size4)
conv10 = self.deconv_block10(conv9)
conv11 = self.unpool(conv10, indices4, output_size=size3)
conv12 = self.deconv_block12(conv11)
conv13 = self.unpool(conv12, indices3, output_size=size2)
conv14 = self.deconv_block14(conv13)
conv15 = self.unpool(conv14, indices2, output_size=size1)
conv16 = self.deconv_block16(conv15)
conv17 = self.unpool(conv16, indices1, output_size=size0)
conv18 = self.deconv_block18(conv17)
out = self.seg_score19(conv18)
return out
def init_vgg16_params(self, vgg16, copy_fc8=True):
blocks = [
self.conv_block1,
self.conv_block2,
self.conv_block3,
self.conv_block4,
self.conv_block5,
]
ranges = [[0, 4], [5, 9], [10, 16], [17, 23], [24, 29]]
features = list(vgg16.features.children())
i_layer = 0
# copy convolutional filters from vgg16
for idx, conv_block in enumerate(blocks):
for l1, l2 in zip(features[ranges[idx][0] : ranges[idx][1]], conv_block):
if isinstance(l1, nn.Conv2d) and isinstance(l2, nn.Conv2d):
if i_layer == 0:
l2.weight.data = (
(l1.weight.data[:, 0, :, :] + l1.weight.data[:, 1, :, :] + l1.weight.data[:, 2, :, :]) / 3.0
).view(l2.weight.size())
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
else:
assert l1.weight.size() == l2.weight.size()
assert l1.bias.size() == l2.bias.size()
l2.weight.data = l1.weight.data
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
def get_seg_model(cfg, **kwargs):
assert (
cfg.MODEL.IN_CHANNELS == 1
), f"Patch deconvnet is not implemented to accept {cfg.MODEL.IN_CHANNELS} channels. Please only pass 1 for cfg.MODEL.IN_CHANNELS"
model = patch_deconvnet(n_classes=cfg.DATASET.NUM_CLASSES)
return model

Просмотреть файл

@ -0,0 +1,307 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch.nn as nn
class patch_deconvnet_skip(nn.Module):
def __init__(self, n_classes=4, learned_billinear=False):
super(patch_deconvnet_skip, self).__init__()
self.learned_billinear = learned_billinear
self.n_classes = n_classes
self.unpool = nn.MaxUnpool2d(2, stride=2)
self.conv_block1 = nn.Sequential(
# conv1_1
nn.Conv2d(1, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv1_2
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool1
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_1
# 48*48
self.conv_block2 = nn.Sequential(
# conv2_1
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv2_2
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool2
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_2
# 24*24
self.conv_block3 = nn.Sequential(
# conv3_1
nn.Conv2d(128, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_2
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_3
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool3
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_3
# 12*12
self.conv_block4 = nn.Sequential(
# conv4_1
nn.Conv2d(256, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool4
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_4
# 6*6
self.conv_block5 = nn.Sequential(
# conv5_1
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool5
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_5
# 3*3
self.conv_block6 = nn.Sequential(
# fc6
nn.Conv2d(512, 4096, 3),
# set the filter size and nor padding to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 1*1
self.conv_block7 = nn.Sequential(
# fc7
nn.Conv2d(4096, 4096, 1),
# set the filter size to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.deconv_block8 = nn.Sequential(
# fc6-deconv
nn.ConvTranspose2d(4096, 512, 3, stride=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 3*3
self.unpool_block9 = nn.Sequential(
# unpool5
nn.MaxUnpool2d(2, stride=2),
)
# usage unpool(output, indices)
# 6*6
self.deconv_block10 = nn.Sequential(
# deconv5_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_3
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block11 = nn.Sequential(
# unpool4
nn.MaxUnpool2d(2, stride=2),
)
# 12*12
self.deconv_block12 = nn.Sequential(
# deconv4_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_3
nn.ConvTranspose2d(512, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block13 = nn.Sequential(
# unpool3
nn.MaxUnpool2d(2, stride=2),
)
# 24*24
self.deconv_block14 = nn.Sequential(
# deconv3_1
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_2
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_3
nn.ConvTranspose2d(256, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block15 = nn.Sequential(
# unpool2
nn.MaxUnpool2d(2, stride=2),
)
# 48*48
self.deconv_block16 = nn.Sequential(
# deconv2_1
nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv2_2
nn.ConvTranspose2d(128, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block17 = nn.Sequential(
# unpool1
nn.MaxUnpool2d(2, stride=2),
)
# 96*96
self.deconv_block18 = nn.Sequential(
# deconv1_1
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv1_2
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.seg_score19 = nn.Sequential(
# seg-score
nn.Conv2d(64, self.n_classes, 1),
)
if self.learned_billinear:
raise NotImplementedError
def forward(self, x):
size0 = x.size()
conv1, indices1 = self.conv_block1(x)
size1 = conv1.size()
conv2, indices2 = self.conv_block2(conv1)
size2 = conv2.size()
conv3, indices3 = self.conv_block3(conv2)
size3 = conv3.size()
conv4, indices4 = self.conv_block4(conv3)
size4 = conv4.size()
conv5, indices5 = self.conv_block5(conv4)
conv6 = self.conv_block6(conv5)
conv7 = self.conv_block7(conv6)
conv8 = self.deconv_block8(conv7) + conv5
conv9 = self.unpool(conv8, indices5, output_size=size4)
conv10 = self.deconv_block10(conv9) + conv4
conv11 = self.unpool(conv10, indices4, output_size=size3)
conv12 = self.deconv_block12(conv11) + conv3
conv13 = self.unpool(conv12, indices3, output_size=size2)
conv14 = self.deconv_block14(conv13) + conv2
conv15 = self.unpool(conv14, indices2, output_size=size1)
conv16 = self.deconv_block16(conv15) + conv1
conv17 = self.unpool(conv16, indices1, output_size=size0)
conv18 = self.deconv_block18(conv17)
out = self.seg_score19(conv18)
return out
def init_vgg16_params(self, vgg16, copy_fc8=True):
blocks = [
self.conv_block1,
self.conv_block2,
self.conv_block3,
self.conv_block4,
self.conv_block5,
]
ranges = [[0, 4], [5, 9], [10, 16], [17, 23], [24, 29]]
features = list(vgg16.features.children())
i_layer = 0
# copy convolutional filters from vgg16
for idx, conv_block in enumerate(blocks):
for l1, l2 in zip(features[ranges[idx][0] : ranges[idx][1]], conv_block):
if isinstance(l1, nn.Conv2d) and isinstance(l2, nn.Conv2d):
if i_layer == 0:
l2.weight.data = (
(l1.weight.data[:, 0, :, :] + l1.weight.data[:, 1, :, :] + l1.weight.data[:, 2, :, :]) / 3.0
).view(l2.weight.size())
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
else:
assert l1.weight.size() == l2.weight.size()
assert l1.bias.size() == l2.bias.size()
l2.weight.data = l1.weight.data
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
def get_seg_model(cfg, **kwargs):
assert (
cfg.MODEL.IN_CHANNELS == 1
), f"Patch deconvnet is not implemented to accept {cfg.MODEL.IN_CHANNELS} channels. Please only pass 1 for cfg.MODEL.IN_CHANNELS"
model = patch_deconvnet_skip(n_classes=cfg.DATASET.NUM_CLASSES)
return model

Просмотреть файл

@ -0,0 +1,365 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
class FPAv2(nn.Module):
def __init__(self, input_dim, output_dim):
super(FPAv2, self).__init__()
self.glob = nn.Sequential(nn.AdaptiveAvgPool2d(1), nn.Conv2d(input_dim, output_dim, kernel_size=1, bias=False),)
self.down2_1 = nn.Sequential(
nn.Conv2d(input_dim, input_dim, kernel_size=5, stride=2, padding=2, bias=False),
nn.BatchNorm2d(input_dim),
nn.ELU(True),
)
self.down2_2 = nn.Sequential(
nn.Conv2d(input_dim, output_dim, kernel_size=5, padding=2, bias=False),
nn.BatchNorm2d(output_dim),
nn.ELU(True),
)
self.down3_1 = nn.Sequential(
nn.Conv2d(input_dim, input_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(input_dim),
nn.ELU(True),
)
self.down3_2 = nn.Sequential(
nn.Conv2d(input_dim, output_dim, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(output_dim),
nn.ELU(True),
)
self.conv1 = nn.Sequential(
nn.Conv2d(input_dim, output_dim, kernel_size=1, bias=False), nn.BatchNorm2d(output_dim), nn.ELU(True),
)
def forward(self, x):
# x shape: 512, 16, 16
x_glob = self.glob(x) # 256, 1, 1
x_glob = F.upsample(x_glob, scale_factor=16, mode="bilinear", align_corners=True) # 256, 16, 16
d2 = self.down2_1(x) # 512, 8, 8
d3 = self.down3_1(d2) # 512, 4, 4
d2 = self.down2_2(d2) # 256, 8, 8
d3 = self.down3_2(d3) # 256, 4, 4
d3 = F.upsample(d3, scale_factor=2, mode="bilinear", align_corners=True) # 256, 8, 8
d2 = d2 + d3
d2 = F.upsample(d2, scale_factor=2, mode="bilinear", align_corners=True) # 256, 16, 16
x = self.conv1(x) # 256, 16, 16
x = x * d2
x = x + x_glob
return x
def conv3x3(input_dim, output_dim, rate=1):
return nn.Sequential(
nn.Conv2d(input_dim, output_dim, kernel_size=3, dilation=rate, padding=rate, bias=False,),
nn.BatchNorm2d(output_dim),
nn.ELU(True),
)
class SpatialAttention2d(nn.Module):
def __init__(self, channel):
super(SpatialAttention2d, self).__init__()
self.squeeze = nn.Conv2d(channel, 1, kernel_size=1, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
z = self.squeeze(x)
z = self.sigmoid(z)
return x * z
class GAB(nn.Module):
def __init__(self, input_dim, reduction=4):
super(GAB, self).__init__()
self.global_avgpool = nn.AdaptiveAvgPool2d(1)
self.conv1 = nn.Conv2d(input_dim, input_dim // reduction, kernel_size=1, stride=1)
self.conv2 = nn.Conv2d(input_dim // reduction, input_dim, kernel_size=1, stride=1)
self.relu = nn.ReLU(inplace=True)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
z = self.global_avgpool(x)
z = self.relu(self.conv1(z))
z = self.sigmoid(self.conv2(z))
return x * z
class Decoder(nn.Module):
def __init__(self, in_channels, channels, out_channels):
super(Decoder, self).__init__()
self.conv1 = conv3x3(in_channels, channels)
self.conv2 = conv3x3(channels, out_channels)
self.s_att = SpatialAttention2d(out_channels)
self.c_att = GAB(out_channels, 16)
def forward(self, x, e=None):
x = F.upsample(input=x, scale_factor=2, mode="bilinear", align_corners=True)
if e is not None:
x = torch.cat([x, e], 1)
x = self.conv1(x)
x = self.conv2(x)
s = self.s_att(x)
c = self.c_att(x)
output = s + c
return output
class Decoderv2(nn.Module):
def __init__(self, up_in, x_in, n_out):
super(Decoderv2, self).__init__()
up_out = x_out = n_out // 2
self.x_conv = nn.Conv2d(x_in, x_out, 1, bias=False)
self.tr_conv = nn.ConvTranspose2d(up_in, up_out, 2, stride=2)
self.bn = nn.BatchNorm2d(n_out)
self.relu = nn.ReLU(True)
self.s_att = SpatialAttention2d(n_out)
self.c_att = GAB(n_out, 16)
def forward(self, up_p, x_p):
up_p = self.tr_conv(up_p)
x_p = self.x_conv(x_p)
cat_p = torch.cat([up_p, x_p], 1)
cat_p = self.relu(self.bn(cat_p))
s = self.s_att(cat_p)
c = self.c_att(cat_p)
return s + c
class SCse(nn.Module):
def __init__(self, dim):
super(SCse, self).__init__()
self.satt = SpatialAttention2d(dim)
self.catt = GAB(dim)
def forward(self, x):
return self.satt(x) + self.catt(x)
# stage1 model
class Res34Unetv4(nn.Module):
def __init__(self, n_classes=1):
super(Res34Unetv4, self).__init__()
self.resnet = torchvision.models.resnet34(True)
self.conv1 = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu)
self.encode2 = nn.Sequential(self.resnet.layer1, SCse(64))
self.encode3 = nn.Sequential(self.resnet.layer2, SCse(128))
self.encode4 = nn.Sequential(self.resnet.layer3, SCse(256))
self.encode5 = nn.Sequential(self.resnet.layer4, SCse(512))
self.center = nn.Sequential(FPAv2(512, 256), nn.MaxPool2d(2, 2))
self.decode5 = Decoderv2(256, 512, 64)
self.decode4 = Decoderv2(64, 256, 64)
self.decode3 = Decoderv2(64, 128, 64)
self.decode2 = Decoderv2(64, 64, 64)
self.decode1 = Decoder(64, 32, 64)
self.logit = nn.Sequential(
nn.Conv2d(320, 64, kernel_size=3, padding=1),
nn.ELU(True),
nn.Conv2d(64, n_classes, kernel_size=1, bias=False),
)
def forward(self, x):
# x: (batch_size, 3, 256, 256)
x = self.conv1(x) # 64, 128, 128
e2 = self.encode2(x) # 64, 128, 128
e3 = self.encode3(e2) # 128, 64, 64
e4 = self.encode4(e3) # 256, 32, 32
e5 = self.encode5(e4) # 512, 16, 16
f = self.center(e5) # 256, 8, 8
d5 = self.decode5(f, e5) # 64, 16, 16
d4 = self.decode4(d5, e4) # 64, 32, 32
d3 = self.decode3(d4, e3) # 64, 64, 64
d2 = self.decode2(d3, e2) # 64, 128, 128
d1 = self.decode1(d2) # 64, 256, 256
f = torch.cat(
(
d1,
F.upsample(d2, scale_factor=2, mode="bilinear", align_corners=True),
F.upsample(d3, scale_factor=4, mode="bilinear", align_corners=True),
F.upsample(d4, scale_factor=8, mode="bilinear", align_corners=True),
F.upsample(d5, scale_factor=16, mode="bilinear", align_corners=True),
),
1,
) # 320, 256, 256
logit = self.logit(f) # 1, 256, 256
return logit
# stage2 model
class Res34Unetv3(nn.Module):
def __init__(self):
super(Res34Unetv3, self).__init__()
self.resnet = torchvision.models.resnet34(True)
self.conv1 = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu)
self.encode2 = nn.Sequential(self.resnet.layer1, SCse(64))
self.encode3 = nn.Sequential(self.resnet.layer2, SCse(128))
self.encode4 = nn.Sequential(self.resnet.layer3, SCse(256))
self.encode5 = nn.Sequential(self.resnet.layer4, SCse(512))
self.center = nn.Sequential(FPAv2(512, 256), nn.MaxPool2d(2, 2))
self.decode5 = Decoderv2(256, 512, 64)
self.decode4 = Decoderv2(64, 256, 64)
self.decode3 = Decoderv2(64, 128, 64)
self.decode2 = Decoderv2(64, 64, 64)
self.decode1 = Decoder(64, 32, 64)
self.dropout2d = nn.Dropout2d(0.4)
self.dropout = nn.Dropout(0.4)
self.fuse_pixel = conv3x3(320, 64)
self.logit_pixel = nn.Conv2d(64, 1, kernel_size=1, bias=False)
self.fuse_image = nn.Sequential(nn.Linear(512, 64), nn.ELU(True))
self.logit_image = nn.Sequential(nn.Linear(64, 1), nn.Sigmoid())
self.logit = nn.Sequential(
nn.Conv2d(128, 64, kernel_size=3, padding=1, bias=False),
nn.ELU(True),
nn.Conv2d(64, 1, kernel_size=1, bias=False),
)
def forward(self, x):
# x: (batch_size, 3, 256, 256)
batch_size, c, h, w = x.shape
x = self.conv1(x) # 64, 128, 128
e2 = self.encode2(x) # 64, 128, 128
e3 = self.encode3(e2) # 128, 64, 64
e4 = self.encode4(e3) # 256, 32, 32
e5 = self.encode5(e4) # 512, 16, 16
e = F.adaptive_avg_pool2d(e5, output_size=1).view(batch_size, -1) # 512
e = self.dropout(e)
f = self.center(e5) # 256, 8, 8
d5 = self.decode5(f, e5) # 64, 16, 16
d4 = self.decode4(d5, e4) # 64, 32, 32
d3 = self.decode3(d4, e3) # 64, 64, 64
d2 = self.decode2(d3, e2) # 64, 128, 128
d1 = self.decode1(d2) # 64, 256, 256
f = torch.cat(
(
d1,
F.upsample(d2, scale_factor=2, mode="bilinear", align_corners=True),
F.upsample(d3, scale_factor=4, mode="bilinear", align_corners=True),
F.upsample(d4, scale_factor=8, mode="bilinear", align_corners=True),
F.upsample(d5, scale_factor=16, mode="bilinear", align_corners=True),
),
1,
) # 320, 256, 256
f = self.dropout2d(f)
# segmentation process
fuse_pixel = self.fuse_pixel(f) # 64, 256, 256
logit_pixel = self.logit_pixel(fuse_pixel) # 1, 256, 256
# classification process
fuse_image = self.fuse_image(e) # 64
logit_image = self.logit_image(fuse_image) # 1
# combine segmentation and classification
fuse = torch.cat(
[
fuse_pixel,
F.upsample(
fuse_image.view(batch_size, -1, 1, 1), scale_factor=256, mode="bilinear", align_corners=True,
),
],
1,
) # 128, 256, 256
logit = self.logit(fuse) # 1, 256, 256
return logit, logit_pixel, logit_image.view(-1)
# stage3 model
class Res34Unetv5(nn.Module):
def __init__(self):
super(Res34Unetv5, self).__init__()
self.resnet = torchvision.models.resnet34(True)
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, padding=1, bias=False), self.resnet.bn1, self.resnet.relu,
)
self.encode2 = nn.Sequential(self.resnet.layer1, SCse(64))
self.encode3 = nn.Sequential(self.resnet.layer2, SCse(128))
self.encode4 = nn.Sequential(self.resnet.layer3, SCse(256))
self.encode5 = nn.Sequential(self.resnet.layer4, SCse(512))
self.center = nn.Sequential(FPAv2(512, 256), nn.MaxPool2d(2, 2))
self.decode5 = Decoderv2(256, 512, 64)
self.decode4 = Decoderv2(64, 256, 64)
self.decode3 = Decoderv2(64, 128, 64)
self.decode2 = Decoderv2(64, 64, 64)
self.logit = nn.Sequential(
nn.Conv2d(256, 32, kernel_size=3, padding=1), nn.ELU(True), nn.Conv2d(32, 1, kernel_size=1, bias=False),
)
def forward(self, x):
# x: batch_size, 3, 128, 128
x = self.conv1(x) # 64, 128, 128
e2 = self.encode2(x) # 64, 128, 128
e3 = self.encode3(e2) # 128, 64, 64
e4 = self.encode4(e3) # 256, 32, 32
e5 = self.encode5(e4) # 512, 16, 16
f = self.center(e5) # 256, 8, 8
d5 = self.decode5(f, e5) # 64, 16, 16
d4 = self.decode4(d5, e4) # 64, 32, 32
d3 = self.decode3(d4, e3) # 64, 64, 64
d2 = self.decode2(d3, e2) # 64, 128, 128
f = torch.cat(
(
d2,
F.upsample(d3, scale_factor=2, mode="bilinear", align_corners=True),
F.upsample(d4, scale_factor=4, mode="bilinear", align_corners=True),
F.upsample(d5, scale_factor=8, mode="bilinear", align_corners=True),
),
1,
) # 256, 128, 128
f = F.dropout2d(f, p=0.4)
logit = self.logit(f) # 1, 128, 128
return logit
def get_seg_model(cfg, **kwargs):
assert (
cfg.MODEL.IN_CHANNELS == 3
), f"SEResnet Unet deconvnet is not implemented to accept {cfg.MODEL.IN_CHANNELS} channels. Please only pass 3 for cfg.MODEL.IN_CHANNELS"
model = Res34Unetv4(n_classes=cfg.DATASET.NUM_CLASSES)
return model

Просмотреть файл

@ -0,0 +1,307 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch.nn as nn
class section_deconvnet(nn.Module):
def __init__(self, n_classes=4, learned_billinear=False):
super(section_deconvnet, self).__init__()
self.learned_billinear = learned_billinear
self.n_classes = n_classes
self.unpool = nn.MaxUnpool2d(2, stride=2)
self.conv_block1 = nn.Sequential(
# conv1_1
nn.Conv2d(1, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv1_2
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool1
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_1
# 48*48
self.conv_block2 = nn.Sequential(
# conv2_1
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv2_2
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool2
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_2
# 24*24
self.conv_block3 = nn.Sequential(
# conv3_1
nn.Conv2d(128, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_2
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_3
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool3
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_3
# 12*12
self.conv_block4 = nn.Sequential(
# conv4_1
nn.Conv2d(256, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool4
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_4
# 6*6
self.conv_block5 = nn.Sequential(
# conv5_1
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool5
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_5
# 3*3
self.conv_block6 = nn.Sequential(
# fc6
nn.Conv2d(512, 4096, 3),
# set the filter size and nor padding to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 1*1
self.conv_block7 = nn.Sequential(
# fc7
nn.Conv2d(4096, 4096, 1),
# set the filter size to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.deconv_block8 = nn.Sequential(
# fc6-deconv
nn.ConvTranspose2d(4096, 512, 3, stride=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 3*3
self.unpool_block9 = nn.Sequential(
# unpool5
nn.MaxUnpool2d(2, stride=2),
)
# usage unpool(output, indices)
# 6*6
self.deconv_block10 = nn.Sequential(
# deconv5_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_3
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block11 = nn.Sequential(
# unpool4
nn.MaxUnpool2d(2, stride=2),
)
# 12*12
self.deconv_block12 = nn.Sequential(
# deconv4_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_3
nn.ConvTranspose2d(512, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block13 = nn.Sequential(
# unpool3
nn.MaxUnpool2d(2, stride=2),
)
# 24*24
self.deconv_block14 = nn.Sequential(
# deconv3_1
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_2
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_3
nn.ConvTranspose2d(256, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block15 = nn.Sequential(
# unpool2
nn.MaxUnpool2d(2, stride=2),
)
# 48*48
self.deconv_block16 = nn.Sequential(
# deconv2_1
nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv2_2
nn.ConvTranspose2d(128, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block17 = nn.Sequential(
# unpool1
nn.MaxUnpool2d(2, stride=2),
)
# 96*96
self.deconv_block18 = nn.Sequential(
# deconv1_1
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv1_2
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.seg_score19 = nn.Sequential(
# seg-score
nn.Conv2d(64, self.n_classes, 1),
)
if self.learned_billinear:
raise NotImplementedError
def forward(self, x):
size0 = x.size()
conv1, indices1 = self.conv_block1(x)
size1 = conv1.size()
conv2, indices2 = self.conv_block2(conv1)
size2 = conv2.size()
conv3, indices3 = self.conv_block3(conv2)
size3 = conv3.size()
conv4, indices4 = self.conv_block4(conv3)
size4 = conv4.size()
conv5, indices5 = self.conv_block5(conv4)
conv6 = self.conv_block6(conv5)
conv7 = self.conv_block7(conv6)
conv8 = self.deconv_block8(conv7)
conv9 = self.unpool(conv8, indices5, output_size=size4)
conv10 = self.deconv_block10(conv9)
conv11 = self.unpool(conv10, indices4, output_size=size3)
conv12 = self.deconv_block12(conv11)
conv13 = self.unpool(conv12, indices3, output_size=size2)
conv14 = self.deconv_block14(conv13)
conv15 = self.unpool(conv14, indices2, output_size=size1)
conv16 = self.deconv_block16(conv15)
conv17 = self.unpool(conv16, indices1, output_size=size0)
conv18 = self.deconv_block18(conv17)
out = self.seg_score19(conv18)
return out
def init_vgg16_params(self, vgg16, copy_fc8=True):
blocks = [
self.conv_block1,
self.conv_block2,
self.conv_block3,
self.conv_block4,
self.conv_block5,
]
ranges = [[0, 4], [5, 9], [10, 16], [17, 23], [24, 29]]
features = list(vgg16.features.children())
i_layer = 0
# copy convolutional filters from vgg16
for idx, conv_block in enumerate(blocks):
for l1, l2 in zip(features[ranges[idx][0] : ranges[idx][1]], conv_block):
if isinstance(l1, nn.Conv2d) and isinstance(l2, nn.Conv2d):
if i_layer == 0:
l2.weight.data = (
(l1.weight.data[:, 0, :, :] + l1.weight.data[:, 1, :, :] + l1.weight.data[:, 2, :, :]) / 3.0
).view(l2.weight.size())
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
else:
assert l1.weight.size() == l2.weight.size()
assert l1.bias.size() == l2.bias.size()
l2.weight.data = l1.weight.data
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
def get_seg_model(cfg, **kwargs):
assert (
cfg.MODEL.IN_CHANNELS == 1
), f"Section deconvnet is not implemented to accept {cfg.MODEL.IN_CHANNELS} channels. Please only pass 1 for cfg.MODEL.IN_CHANNELS"
model = section_deconvnet(n_classes=cfg.DATASET.NUM_CLASSES)
return model

Просмотреть файл

@ -0,0 +1,307 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch.nn as nn
class section_deconvnet_skip(nn.Module):
def __init__(self, n_classes=4, learned_billinear=False):
super(section_deconvnet_skip, self).__init__()
self.learned_billinear = learned_billinear
self.n_classes = n_classes
self.unpool = nn.MaxUnpool2d(2, stride=2)
self.conv_block1 = nn.Sequential(
# conv1_1
nn.Conv2d(1, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv1_2
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool1
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_1
# 48*48
self.conv_block2 = nn.Sequential(
# conv2_1
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv2_2
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool2
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_2
# 24*24
self.conv_block3 = nn.Sequential(
# conv3_1
nn.Conv2d(128, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_2
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv3_3
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool3
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_3
# 12*12
self.conv_block4 = nn.Sequential(
# conv4_1
nn.Conv2d(256, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv4_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool4
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_4
# 6*6
self.conv_block5 = nn.Sequential(
# conv5_1
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_2
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# conv5_3
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# pool5
nn.MaxPool2d(2, stride=2, return_indices=True, ceil_mode=True),
)
# it returns outputs and pool_indices_5
# 3*3
self.conv_block6 = nn.Sequential(
# fc6
nn.Conv2d(512, 4096, 3),
# set the filter size and nor padding to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 1*1
self.conv_block7 = nn.Sequential(
# fc7
nn.Conv2d(4096, 4096, 1),
# set the filter size to make output into 1*1
nn.BatchNorm2d(4096, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.deconv_block8 = nn.Sequential(
# fc6-deconv
nn.ConvTranspose2d(4096, 512, 3, stride=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
# 3*3
self.unpool_block9 = nn.Sequential(
# unpool5
nn.MaxUnpool2d(2, stride=2),
)
# usage unpool(output, indices)
# 6*6
self.deconv_block10 = nn.Sequential(
# deconv5_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv5_3
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block11 = nn.Sequential(
# unpool4
nn.MaxUnpool2d(2, stride=2),
)
# 12*12
self.deconv_block12 = nn.Sequential(
# deconv4_1
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_2
nn.ConvTranspose2d(512, 512, 3, stride=1, padding=1),
nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv4_3
nn.ConvTranspose2d(512, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block13 = nn.Sequential(
# unpool3
nn.MaxUnpool2d(2, stride=2),
)
# 24*24
self.deconv_block14 = nn.Sequential(
# deconv3_1
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_2
nn.ConvTranspose2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv3_3
nn.ConvTranspose2d(256, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block15 = nn.Sequential(
# unpool2
nn.MaxUnpool2d(2, stride=2),
)
# 48*48
self.deconv_block16 = nn.Sequential(
# deconv2_1
nn.ConvTranspose2d(128, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv2_2
nn.ConvTranspose2d(128, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.unpool_block17 = nn.Sequential(
# unpool1
nn.MaxUnpool2d(2, stride=2),
)
# 96*96
self.deconv_block18 = nn.Sequential(
# deconv1_1
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
# deconv1_2
nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True),
nn.ReLU(inplace=True),
)
self.seg_score19 = nn.Sequential(
# seg-score
nn.Conv2d(64, self.n_classes, 1),
)
if self.learned_billinear:
raise NotImplementedError
def forward(self, x):
size0 = x.size()
conv1, indices1 = self.conv_block1(x)
size1 = conv1.size()
conv2, indices2 = self.conv_block2(conv1)
size2 = conv2.size()
conv3, indices3 = self.conv_block3(conv2)
size3 = conv3.size()
conv4, indices4 = self.conv_block4(conv3)
size4 = conv4.size()
conv5, indices5 = self.conv_block5(conv4)
conv6 = self.conv_block6(conv5)
conv7 = self.conv_block7(conv6)
conv8 = self.deconv_block8(conv7) + conv5
conv9 = self.unpool(conv8, indices5, output_size=size4)
conv10 = self.deconv_block10(conv9) + conv4
conv11 = self.unpool(conv10, indices4, output_size=size3)
conv12 = self.deconv_block12(conv11) + conv3
conv13 = self.unpool(conv12, indices3, output_size=size2)
conv14 = self.deconv_block14(conv13) + conv2
conv15 = self.unpool(conv14, indices2, output_size=size1)
conv16 = self.deconv_block16(conv15) + conv1
conv17 = self.unpool(conv16, indices1, output_size=size0)
conv18 = self.deconv_block18(conv17)
out = self.seg_score19(conv18)
return out
def init_vgg16_params(self, vgg16, copy_fc8=True):
blocks = [
self.conv_block1,
self.conv_block2,
self.conv_block3,
self.conv_block4,
self.conv_block5,
]
ranges = [[0, 4], [5, 9], [10, 16], [17, 23], [24, 29]]
features = list(vgg16.features.children())
i_layer = 0
# copy convolutional filters from vgg16
for idx, conv_block in enumerate(blocks):
for l1, l2 in zip(features[ranges[idx][0] : ranges[idx][1]], conv_block):
if isinstance(l1, nn.Conv2d) and isinstance(l2, nn.Conv2d):
if i_layer == 0:
l2.weight.data = (
(l1.weight.data[:, 0, :, :] + l1.weight.data[:, 1, :, :] + l1.weight.data[:, 2, :, :]) / 3.0
).view(l2.weight.size())
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
else:
assert l1.weight.size() == l2.weight.size()
assert l1.bias.size() == l2.bias.size()
l2.weight.data = l1.weight.data
l2.bias.data = l1.bias.data
i_layer = i_layer + 1
def get_seg_model(cfg, **kwargs):
assert (
cfg.MODEL.IN_CHANNELS == 1
), f"Section deconvnet is not implemented to accept {cfg.MODEL.IN_CHANNELS} channels. Please only pass 1 for cfg.MODEL.IN_CHANNELS"
model = section_deconvnet_skip(n_classes=cfg.DATASET.NUM_CLASSES)
return model

Просмотреть файл

@ -0,0 +1,446 @@
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Ke Sun (sunk@mail.ustc.edu.cn)
# ------------------------------------------------------------------------------
"""HRNET for segmentation taken from https://github.com/HRNet/HRNet-Semantic-Segmentation
pytorch-v1.1 branch
hash: 06142dc1c7026e256a7561c3e875b06622b5670f
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import os
import numpy as np
import torch
import torch._utils
import torch.nn as nn
import torch.nn.functional as F
BatchNorm2d = nn.BatchNorm2d
BN_MOMENTUM = 0.1
logger = logging.getLogger(__name__)
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False)
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn2 = BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
self.bn3 = BatchNorm2d(planes * self.expansion, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class HighResolutionModule(nn.Module):
def __init__(
self, num_branches, blocks, num_blocks, num_inchannels, num_channels, fuse_method, multi_scale_output=True,
):
super(HighResolutionModule, self).__init__()
self._check_branches(num_branches, blocks, num_blocks, num_inchannels, num_channels)
self.num_inchannels = num_inchannels
self.fuse_method = fuse_method
self.num_branches = num_branches
self.multi_scale_output = multi_scale_output
self.branches = self._make_branches(num_branches, blocks, num_blocks, num_channels)
self.fuse_layers = self._make_fuse_layers()
self.relu = nn.ReLU(inplace=True)
def _check_branches(self, num_branches, blocks, num_blocks, num_inchannels, num_channels):
if num_branches != len(num_blocks):
error_msg = "NUM_BRANCHES({}) <> NUM_BLOCKS({})".format(num_branches, len(num_blocks))
logger.error(error_msg)
raise ValueError(error_msg)
if num_branches != len(num_channels):
error_msg = "NUM_BRANCHES({}) <> NUM_CHANNELS({})".format(num_branches, len(num_channels))
logger.error(error_msg)
raise ValueError(error_msg)
if num_branches != len(num_inchannels):
error_msg = "NUM_BRANCHES({}) <> NUM_INCHANNELS({})".format(num_branches, len(num_inchannels))
logger.error(error_msg)
raise ValueError(error_msg)
def _make_one_branch(self, branch_index, block, num_blocks, num_channels, stride=1):
downsample = None
if stride != 1 or self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(
self.num_inchannels[branch_index],
num_channels[branch_index] * block.expansion,
kernel_size=1,
stride=stride,
bias=False,
),
BatchNorm2d(num_channels[branch_index] * block.expansion, momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(self.num_inchannels[branch_index], num_channels[branch_index], stride, downsample,))
self.num_inchannels[branch_index] = num_channels[branch_index] * block.expansion
for i in range(1, num_blocks[branch_index]):
layers.append(block(self.num_inchannels[branch_index], num_channels[branch_index]))
return nn.Sequential(*layers)
def _make_branches(self, num_branches, block, num_blocks, num_channels):
branches = []
for i in range(num_branches):
branches.append(self._make_one_branch(i, block, num_blocks, num_channels))
return nn.ModuleList(branches)
def _make_fuse_layers(self):
if self.num_branches == 1:
return None
num_branches = self.num_branches
num_inchannels = self.num_inchannels
fuse_layers = []
for i in range(num_branches if self.multi_scale_output else 1):
fuse_layer = []
for j in range(num_branches):
if j > i:
fuse_layer.append(
nn.Sequential(
nn.Conv2d(num_inchannels[j], num_inchannels[i], 1, 1, 0, bias=False,),
BatchNorm2d(num_inchannels[i], momentum=BN_MOMENTUM),
)
)
elif j == i:
fuse_layer.append(None)
else:
conv3x3s = []
for k in range(i - j):
if k == i - j - 1:
num_outchannels_conv3x3 = num_inchannels[i]
conv3x3s.append(
nn.Sequential(
nn.Conv2d(num_inchannels[j], num_outchannels_conv3x3, 3, 2, 1, bias=False,),
BatchNorm2d(num_outchannels_conv3x3, momentum=BN_MOMENTUM),
)
)
else:
num_outchannels_conv3x3 = num_inchannels[j]
conv3x3s.append(
nn.Sequential(
nn.Conv2d(num_inchannels[j], num_outchannels_conv3x3, 3, 2, 1, bias=False,),
BatchNorm2d(num_outchannels_conv3x3, momentum=BN_MOMENTUM),
nn.ReLU(inplace=True),
)
)
fuse_layer.append(nn.Sequential(*conv3x3s))
fuse_layers.append(nn.ModuleList(fuse_layer))
return nn.ModuleList(fuse_layers)
def get_num_inchannels(self):
return self.num_inchannels
def forward(self, x):
if self.num_branches == 1:
return [self.branches[0](x[0])]
for i in range(self.num_branches):
x[i] = self.branches[i](x[i])
x_fuse = []
for i in range(len(self.fuse_layers)):
y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
for j in range(1, self.num_branches):
if i == j:
y = y + x[j]
elif j > i:
width_output = x[i].shape[-1]
height_output = x[i].shape[-2]
y = y + F.interpolate(
self.fuse_layers[i][j](x[j]), size=[height_output, width_output], mode="bilinear",
)
else:
y = y + self.fuse_layers[i][j](x[j])
x_fuse.append(self.relu(y))
return x_fuse
blocks_dict = {"BASIC": BasicBlock, "BOTTLENECK": Bottleneck}
class HighResolutionNet(nn.Module):
def __init__(self, config, **kwargs):
extra = config.MODEL.EXTRA
super(HighResolutionNet, self).__init__()
# stem net
self.conv1 = nn.Conv2d(config.MODEL.IN_CHANNELS, 64, kernel_size=3, stride=2, padding=1, bias=False)
self.bn1 = BatchNorm2d(64, momentum=BN_MOMENTUM)
self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False)
self.bn2 = BatchNorm2d(64, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.layer1 = self._make_layer(Bottleneck, 64, 64, 4)
self.stage2_cfg = extra["STAGE2"]
num_channels = self.stage2_cfg["NUM_CHANNELS"]
block = blocks_dict[self.stage2_cfg["BLOCK"]]
num_channels = [num_channels[i] * block.expansion for i in range(len(num_channels))]
self.transition1 = self._make_transition_layer([256], num_channels)
self.stage2, pre_stage_channels = self._make_stage(self.stage2_cfg, num_channels)
self.stage3_cfg = extra["STAGE3"]
num_channels = self.stage3_cfg["NUM_CHANNELS"]
block = blocks_dict[self.stage3_cfg["BLOCK"]]
num_channels = [num_channels[i] * block.expansion for i in range(len(num_channels))]
self.transition2 = self._make_transition_layer(pre_stage_channels, num_channels)
self.stage3, pre_stage_channels = self._make_stage(self.stage3_cfg, num_channels)
self.stage4_cfg = extra["STAGE4"]
num_channels = self.stage4_cfg["NUM_CHANNELS"]
block = blocks_dict[self.stage4_cfg["BLOCK"]]
num_channels = [num_channels[i] * block.expansion for i in range(len(num_channels))]
self.transition3 = self._make_transition_layer(pre_stage_channels, num_channels)
self.stage4, pre_stage_channels = self._make_stage(self.stage4_cfg, num_channels, multi_scale_output=True)
last_inp_channels = np.int(np.sum(pre_stage_channels))
self.last_layer = nn.Sequential(
nn.Conv2d(
in_channels=last_inp_channels, out_channels=last_inp_channels, kernel_size=1, stride=1, padding=0,
),
BatchNorm2d(last_inp_channels, momentum=BN_MOMENTUM),
nn.ReLU(inplace=True),
nn.Conv2d(
in_channels=last_inp_channels,
out_channels=config.DATASET.NUM_CLASSES,
kernel_size=extra.FINAL_CONV_KERNEL,
stride=1,
padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0,
),
)
def _make_transition_layer(self, num_channels_pre_layer, num_channels_cur_layer):
num_branches_cur = len(num_channels_cur_layer)
num_branches_pre = len(num_channels_pre_layer)
transition_layers = []
for i in range(num_branches_cur):
if i < num_branches_pre:
if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
transition_layers.append(
nn.Sequential(
nn.Conv2d(num_channels_pre_layer[i], num_channels_cur_layer[i], 3, 1, 1, bias=False,),
BatchNorm2d(num_channels_cur_layer[i], momentum=BN_MOMENTUM),
nn.ReLU(inplace=True),
)
)
else:
transition_layers.append(None)
else:
conv3x3s = []
for j in range(i + 1 - num_branches_pre):
inchannels = num_channels_pre_layer[-1]
outchannels = num_channels_cur_layer[i] if j == i - num_branches_pre else inchannels
conv3x3s.append(
nn.Sequential(
nn.Conv2d(inchannels, outchannels, 3, 2, 1, bias=False),
BatchNorm2d(outchannels, momentum=BN_MOMENTUM),
nn.ReLU(inplace=True),
)
)
transition_layers.append(nn.Sequential(*conv3x3s))
return nn.ModuleList(transition_layers)
def _make_layer(self, block, inplanes, planes, blocks, stride=1):
downsample = None
if stride != 1 or inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False,),
BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(inplanes, planes, stride, downsample))
inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(inplanes, planes))
return nn.Sequential(*layers)
def _make_stage(self, layer_config, num_inchannels, multi_scale_output=True):
num_modules = layer_config["NUM_MODULES"]
num_branches = layer_config["NUM_BRANCHES"]
num_blocks = layer_config["NUM_BLOCKS"]
num_channels = layer_config["NUM_CHANNELS"]
block = blocks_dict[layer_config["BLOCK"]]
fuse_method = layer_config["FUSE_METHOD"]
modules = []
for i in range(num_modules):
# multi_scale_output is only used last module
if not multi_scale_output and i == num_modules - 1:
reset_multi_scale_output = False
else:
reset_multi_scale_output = True
modules.append(
HighResolutionModule(
num_branches,
block,
num_blocks,
num_inchannels,
num_channels,
fuse_method,
reset_multi_scale_output,
)
)
num_inchannels = modules[-1].get_num_inchannels()
return nn.Sequential(*modules), num_inchannels
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.layer1(x)
x_list = []
for i in range(self.stage2_cfg["NUM_BRANCHES"]):
if self.transition1[i] is not None:
x_list.append(self.transition1[i](x))
else:
x_list.append(x)
y_list = self.stage2(x_list)
x_list = []
for i in range(self.stage3_cfg["NUM_BRANCHES"]):
if self.transition2[i] is not None:
x_list.append(self.transition2[i](y_list[-1]))
else:
x_list.append(y_list[i])
y_list = self.stage3(x_list)
x_list = []
for i in range(self.stage4_cfg["NUM_BRANCHES"]):
if self.transition3[i] is not None:
x_list.append(self.transition3[i](y_list[-1]))
else:
x_list.append(y_list[i])
x = self.stage4(x_list)
# Upsampling
x0_h, x0_w = x[0].size(2), x[0].size(3)
x1 = F.upsample(x[1], size=(x0_h, x0_w), mode="bilinear")
x2 = F.upsample(x[2], size=(x0_h, x0_w), mode="bilinear")
x3 = F.upsample(x[3], size=(x0_h, x0_w), mode="bilinear")
x = torch.cat([x[0], x1, x2, x3], 1)
x = self.last_layer(x)
return x
def init_weights(
self, pretrained="",
):
logger.info("=> init weights from normal distribution")
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.001)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
if os.path.isfile(pretrained):
pretrained_dict = torch.load(pretrained)
logger.info("=> loading pretrained model {}".format(pretrained))
model_dict = self.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict.keys()}
# for k, _ in pretrained_dict.items():
# logger.info(
# '=> loading {} pretrained model {}'.format(k, pretrained))
model_dict.update(pretrained_dict)
self.load_state_dict(model_dict)
def get_seg_model(cfg, **kwargs):
model = HighResolutionNet(cfg, **kwargs)
model.init_weights(cfg.MODEL.PRETRAINED)
return model

Просмотреть файл

@ -0,0 +1,116 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
""" Taken from https://github.com/milesial/Pytorch-UNet
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
class double_conv(nn.Module):
"""(conv => BN => ReLU) * 2"""
def __init__(self, in_ch, out_ch):
super(double_conv, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(in_ch, out_ch, 3, padding=1),
nn.BatchNorm2d(out_ch),
nn.ReLU(inplace=True),
nn.Conv2d(out_ch, out_ch, 3, padding=1),
nn.BatchNorm2d(out_ch),
nn.ReLU(inplace=True),
)
def forward(self, x):
x = self.conv(x)
return x
class inconv(nn.Module):
def __init__(self, in_ch, out_ch):
super(inconv, self).__init__()
self.conv = double_conv(in_ch, out_ch)
def forward(self, x):
x = self.conv(x)
return x
class down(nn.Module):
def __init__(self, in_ch, out_ch):
super(down, self).__init__()
self.mpconv = nn.Sequential(nn.MaxPool2d(2), double_conv(in_ch, out_ch))
def forward(self, x):
x = self.mpconv(x)
return x
class up(nn.Module):
def __init__(self, in_ch, out_ch, bilinear=True):
super(up, self).__init__()
if bilinear:
self.up = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
else:
self.up = nn.ConvTranspose2d(in_ch // 2, in_ch // 2, 2, stride=2)
self.conv = double_conv(in_ch, out_ch)
def forward(self, x1, x2):
x1 = self.up(x1)
# input is CHW
diffY = x2.size()[2] - x1.size()[2]
diffX = x2.size()[3] - x1.size()[3]
x1 = F.pad(x1, (diffX // 2, diffX - diffX // 2, diffY // 2, diffY - diffY // 2))
x = torch.cat([x2, x1], dim=1)
x = self.conv(x)
return x
class outconv(nn.Module):
def __init__(self, in_ch, out_ch):
super(outconv, self).__init__()
self.conv = nn.Conv2d(in_ch, out_ch, 1)
def forward(self, x):
x = self.conv(x)
return x
class UNet(nn.Module):
def __init__(self, n_channels, n_classes):
super(UNet, self).__init__()
self.inc = inconv(n_channels, 64)
self.down1 = down(64, 128)
self.down2 = down(128, 256)
self.down3 = down(256, 512)
self.down4 = down(512, 512)
self.up1 = up(1024, 256)
self.up2 = up(512, 128)
self.up3 = up(256, 64)
self.up4 = up(128, 64)
self.outc = outconv(64, n_classes)
def forward(self, x):
x1 = self.inc(x)
x2 = self.down1(x1)
x3 = self.down2(x2)
x4 = self.down3(x3)
x5 = self.down4(x4)
x = self.up1(x5, x4)
x = self.up2(x, x3)
x = self.up3(x, x2)
x = self.up4(x, x1)
x = self.outc(x)
return x
def get_seg_model(cfg, **kwargs):
model = UNet(cfg.MODEL.IN_CHANNELS, cfg.DATASET.NUM_CLASSES)
return model

Просмотреть файл

@ -0,0 +1,103 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch.nn as nn
class conv2DBatchNorm(nn.Module):
def __init__(self, in_channels, n_filters, k_size, stride, padding, bias=True, dilation=1):
super(conv2DBatchNorm, self).__init__()
if dilation > 1:
conv_mod = nn.Conv2d(
int(in_channels),
int(n_filters),
kernel_size=k_size,
padding=padding,
stride=stride,
bias=bias,
dilation=dilation,
)
else:
conv_mod = nn.Conv2d(
int(in_channels),
int(n_filters),
kernel_size=k_size,
padding=padding,
stride=stride,
bias=bias,
dilation=1,
)
self.cb_unit = nn.Sequential(conv_mod, nn.BatchNorm2d(int(n_filters)),)
def forward(self, inputs):
outputs = self.cb_unit(inputs)
return outputs
class deconv2DBatchNorm(nn.Module):
def __init__(self, in_channels, n_filters, k_size, stride, padding, bias=True):
super(deconv2DBatchNorm, self).__init__()
self.dcb_unit = nn.Sequential(
nn.ConvTranspose2d(
int(in_channels), int(n_filters), kernel_size=k_size, padding=padding, stride=stride, bias=bias,
),
nn.BatchNorm2d(int(n_filters)),
)
def forward(self, inputs):
outputs = self.dcb_unit(inputs)
return outputs
class conv2DBatchNormRelu(nn.Module):
def __init__(self, in_channels, n_filters, k_size, stride, padding, bias=True, dilation=1):
super(conv2DBatchNormRelu, self).__init__()
if dilation > 1:
conv_mod = nn.Conv2d(
int(in_channels),
int(n_filters),
kernel_size=k_size,
padding=padding,
stride=stride,
bias=bias,
dilation=dilation,
)
else:
conv_mod = nn.Conv2d(
int(in_channels),
int(n_filters),
kernel_size=k_size,
padding=padding,
stride=stride,
bias=bias,
dilation=1,
)
self.cbr_unit = nn.Sequential(conv_mod, nn.BatchNorm2d(int(n_filters)), nn.ReLU(inplace=True),)
def forward(self, inputs):
outputs = self.cbr_unit(inputs)
return outputs
class deconv2DBatchNormRelu(nn.Module):
def __init__(self, in_channels, n_filters, k_size, stride, padding, bias=True):
super(deconv2DBatchNormRelu, self).__init__()
self.dcbr_unit = nn.Sequential(
nn.ConvTranspose2d(
int(in_channels), int(n_filters), kernel_size=k_size, padding=padding, stride=stride, bias=bias,
),
nn.BatchNorm2d(int(n_filters)),
nn.ReLU(inplace=True),
)
def forward(self, inputs):
outputs = self.dcbr_unit(inputs)
return outputs

Просмотреть файл

@ -0,0 +1,119 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch
from ignite.engine.engine import Engine
from toolz import curry
from torch.nn import functional as F
def _upscale_model_output(y_pred, y):
ph, pw = y_pred.size(2), y_pred.size(3)
h, w = y.size(2), y.size(3)
if ph != h or pw != w:
y_pred = F.upsample(input=y_pred, size=(h, w), mode="bilinear")
return y_pred
def create_supervised_trainer(
model,
optimizer,
loss_fn,
prepare_batch,
device=None,
non_blocking=False,
output_transform=lambda x, y, y_pred, loss: {"loss": loss.item()},
):
"""Factory function for creating a trainer for supervised segmentation models.
Args:
model (`torch.nn.Module`): the model to train.
optimizer (`torch.optim.Optimizer`): the optimizer to use.
loss_fn (torch.nn loss function): the loss function to use.
prepare_batch (callable): function that receives `batch`, `device`, `non_blocking` and outputs
tuple of tensors `(batch_x, batch_y, patch_id, patch_locations)`.
device (str, optional): device type specification (default: None).
Applies to both model and batches.
non_blocking (bool, optional): if True and this copy is between CPU and GPU, the copy may occur asynchronously
with respect to the host. For other cases, this argument has no effect.
output_transform (callable, optional): function that receives 'x', 'y', 'y_pred', 'loss' and returns value
to be assigned to engine's state.output after each iteration. Default is returning `loss.item()`.
Note: `engine.state.output` for this engine is defined by `output_transform` parameter and is the loss
of the processed batch by default.
Returns:
Engine: a trainer engine with supervised update function.
"""
if device:
model.to(device)
def _update(engine, batch):
model.train()
optimizer.zero_grad()
x, y, ids, patch_locations = prepare_batch(batch, device=device, non_blocking=non_blocking)
y_pred = model(x)
y_pred = _upscale_model_output(y_pred, y)
loss = loss_fn(y_pred.squeeze(1), y.squeeze(1))
loss.backward()
optimizer.step()
return output_transform(x, y, y_pred, loss)
return Engine(_update)
@curry
def val_transform(x, y, y_pred, ids, patch_locations):
return {
"image": x,
"y_pred": y_pred.detach(),
"mask": y.detach(),
"ids": ids,
"patch_locations": patch_locations,
}
def create_supervised_evaluator(
model, prepare_batch, metrics=None, device=None, non_blocking=False, output_transform=val_transform,
):
"""Factory function for creating an evaluator for supervised segmentation models.
Args:
model (`torch.nn.Module`): the model to train.
prepare_batch (callable): function that receives `batch`, `device`, `non_blocking` and outputs
tuple of tensors `(batch_x, batch_y, patch_id, patch_locations)`.
metrics (dict of str - :class:`~ignite.metrics.Metric`): a map of metric names to Metrics.
device (str, optional): device type specification (default: None).
Applies to both model and batches.
non_blocking (bool, optional): if True and this copy is between CPU and GPU, the copy may occur asynchronously
with respect to the host. For other cases, this argument has no effect.
output_transform (callable, optional): function that receives 'x', 'y', 'y_pred' and returns value
to be assigned to engine's state.output after each iteration. Default is returning `(y_pred, y,)` which fits
output expected by metrics. If you change it you should use `output_transform` in metrics.
Note: `engine.state.output` for this engine is defind by `output_transform` parameter and is
a tuple of `(batch_pred, batch_y)` by default.
Returns:
Engine: an evaluator engine with supervised inference function.
"""
metrics = metrics or {}
if device:
model.to(device)
def _inference(engine, batch):
model.eval()
with torch.no_grad():
x, y, ids, patch_locations = prepare_batch(batch, device=device, non_blocking=non_blocking)
y_pred = model(x)
y_pred = _upscale_model_output(y_pred, x)
return output_transform(x, y, y_pred, ids, patch_locations)
engine = Engine(_inference)
for name, metric in metrics.items():
metric.attach(engine, name)
return engine

Просмотреть файл

@ -0,0 +1,39 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import numpy as np
from deepseismic_interpretation.dutchf3.data import decode_segmap
from os import path
from PIL import Image
from toolz import pipe
def _chw_to_hwc(image_array_numpy):
return np.moveaxis(image_array_numpy, 0, -1)
def save_images(pred_dict, output_dir, num_classes, colours, extra_identifier=""):
for id in pred_dict:
save_image(
pred_dict[id].unsqueeze(0).cpu().numpy(),
output_dir,
num_classes,
colours,
extra_identifier=extra_identifier,
)
def save_image(image_numpy_array, output_dir, num_classes, colours, extra_identifier=""):
"""Save segmentation map as image
Args:
image_numpy_array (numpy.Array): numpy array that represents an image
output_dir ([type]):
num_classes ([type]): [description]
colours ([type]): [description]
extra_identifier (str, optional): [description]. Defaults to "".
"""
im_array = decode_segmap(image_numpy_array, n_classes=num_classes, label_colours=colours,)
im = pipe((im_array * 255).astype(np.uint8).squeeze(), _chw_to_hwc, Image.fromarray,)
filename = path.join(output_dir, f"{id}_{extra_identifier}.png")
im.save(filename)

19
cv_lib/cv_lib/utils.py Normal file
Просмотреть файл

@ -0,0 +1,19 @@
import os
import logging
def load_log_configuration(log_config_file):
"""
Loads logging configuration from the given configuration file.
"""
if not os.path.exists(log_config_file) or not os.path.isfile(log_config_file):
msg = "%s configuration file does not exist!", log_config_file
logging.getLogger(__name__).error(msg)
raise ValueError(msg)
try:
logging.config.fileConfig(log_config_file, disable_existing_loggers=False)
logging.getLogger(__name__).info("%s configuration file was loaded.", log_config_file)
except Exception as e:
logging.getLogger(__name__).error("Failed to load configuration from %s!", log_config_file)
logging.getLogger(__name__).debug(str(e), exc_info=True)
raise e

9
cv_lib/requirements.txt Normal file
Просмотреть файл

@ -0,0 +1,9 @@
numpy>=1.16.4
toolz>=0.9.0
pandas>=0.24.2
ignite>=1.1.0
scikit_learn>=0.21.3
tensorboardX>=1.8
torch>=1.2.0
torchvision>=0.4.0
tqdm>=4.33.0

54
cv_lib/setup.py Normal file
Просмотреть файл

@ -0,0 +1,54 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# /* spell-checker: disable */
import os
try:
from setuptools import setup, find_packages
except ImportError:
from distutils.core import setup, find_packages
# Package meta-data.
NAME = "cv_lib"
DESCRIPTION = "A library for computer vision"
URL = ""
EMAIL = "msalvaris@users.noreply.github.com"
AUTHOR = "AUTHORS.md"
LICENSE = ""
LONG_DESCRIPTION = DESCRIPTION
with open("requirements.txt") as f:
requirements = f.read().splitlines()
here = os.path.abspath(os.path.dirname(__file__))
# Load the package's __version__.py module as a dictionary.
about = {}
with open(os.path.join(here, NAME, "__version__.py")) as f:
exec(f.read(), about)
setup(
name=NAME,
version=about["__version__"],
url=URL,
license=LICENSE,
author=AUTHOR,
author_email=EMAIL,
description=DESCRIPTION,
long_description=LONG_DESCRIPTION,
scripts=[],
packages=find_packages(),
include_package_data=True,
install_requires=requirements,
classifiers=[
"Development Status :: 1 - Alpha",
"Intended Audience :: Data Scientists & Developers",
"Operating System :: POSIX",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3.6",
],
)

Просмотреть файл

@ -0,0 +1,126 @@
import torch
import numpy as np
from pytest import approx
from ignite.metrics import ConfusionMatrix, MetricsLambda
from cv_lib.segmentation.metrics import class_accuracy, mean_class_accuracy
# source repo:
# https://github.com/pytorch/ignite/blob/master/tests/ignite/metrics/test_confusion_matrix.py
def _get_y_true_y_pred():
# Generate an image with labels 0 (background), 1, 2
# 3 classes:
y_true = np.zeros((30, 30), dtype=np.int)
y_true[1:11, 1:11] = 1
y_true[15:25, 15:25] = 2
y_pred = np.zeros((30, 30), dtype=np.int)
y_pred[20:30, 1:11] = 1
y_pred[20:30, 20:30] = 2
return y_true, y_pred
# source repo:
# https://github.com/pytorch/ignite/blob/master/tests/ignite/metrics/test_confusion_matrix.py
def _compute_th_y_true_y_logits(y_true, y_pred):
# Create torch.tensor from numpy
th_y_true = torch.from_numpy(y_true).unsqueeze(0)
# Create logits torch.tensor:
num_classes = max(np.max(y_true), np.max(y_pred)) + 1
y_probas = np.ones((num_classes,) + y_true.shape) * -10
for i in range(num_classes):
y_probas[i, (y_pred == i)] = 720
th_y_logits = torch.from_numpy(y_probas).unsqueeze(0)
return th_y_true, th_y_logits
# Dependency metrics do not get updated automatically, so need to retrieve and
# update confusion matrix manually
def _get_cm(metriclambda):
metrics = list(metriclambda.args)
while metrics:
metric = metrics[0]
if isinstance(metric, ConfusionMatrix):
return metric
elif isinstance(metric, MetricsLambda):
metrics.extend(metric.args)
del metrics[0]
def test_class_accuracy():
y_true, y_pred = _get_y_true_y_pred()
## Perfect prediction
th_y_true, th_y_logits = _compute_th_y_true_y_logits(y_true, y_true)
# Update metric
output = (th_y_logits, th_y_true)
acc_metric = class_accuracy(num_classes=3)
acc_metric.update(output)
# Retrieve and update confusion matrix
metric_cm = _get_cm(acc_metric)
# assert confusion matrix exists and is all zeroes
assert metric_cm is not None
assert torch.min(metric_cm.confusion_matrix) == 0.0 and torch.max(metric_cm.confusion_matrix) == 0.0
metric_cm.update(output)
# Expected result
true_res = [1.0, 1.0, 1.0]
res = acc_metric.compute().numpy()
assert np.all(res == true_res), "Result {} vs. expected values {}".format(res, true_res)
## Imperfect prediction
th_y_true, th_y_logits = _compute_th_y_true_y_logits(y_true, y_pred)
# Update metric
output = (th_y_logits, th_y_true)
acc_metric = class_accuracy(num_classes=3)
acc_metric.update(output)
# Retrieve and update confusion matrix
metric_cm = _get_cm(acc_metric)
assert metric_cm is not None
assert torch.min(metric_cm.confusion_matrix) == 0.0 and torch.max(metric_cm.confusion_matrix) == 0.0
metric_cm.update(output)
# Expected result
true_res = [0.75, 0.0, 0.25]
res = acc_metric.compute().numpy()
assert np.all(res == true_res), "Result {} vs. expected values {}".format(res, true_res)
def test_mean_class_accuracy():
y_true, y_pred = _get_y_true_y_pred()
## Perfect prediction
th_y_true, th_y_logits = _compute_th_y_true_y_logits(y_true, y_true)
# Update metric
output = (th_y_logits, th_y_true)
acc_metric = mean_class_accuracy(num_classes=3)
acc_metric.update(output)
# Retrieve and update confusion matrix
metric_cm = _get_cm(acc_metric)
metric_cm.update(output)
# Expected result
true_res = 1.0
res = acc_metric.compute().numpy()
assert res == approx(true_res), "Result {} vs. expected value {}".format(res, true_res)
## Imperfect prediction
th_y_true, th_y_logits = _compute_th_y_true_y_logits(y_true, y_pred)
# Update metric
output = (th_y_logits, th_y_true)
acc_metric = mean_class_accuracy(num_classes=3)
acc_metric.update(output)
# Retrieve and update confusion matrix
metric_cm = _get_cm(acc_metric)
metric_cm.update(output)
# Expected result
true_res = 1 / 3
res = acc_metric.compute().numpy()
assert res == approx(true_res), "Result {} vs. expected value {}".format(res, true_res)

Просмотреть файл

@ -1,3 +0,0 @@
from . import cli, forward, velocity
__all__ = ["cli", "forward", "velocity"]

Просмотреть файл

@ -1,21 +0,0 @@
from functools import partial
import click
from . import forward, velocity
click.option = partial(click.option, show_default=True)
@click.group()
@click.pass_context
def cli(ctx):
ctx.ensure_object(dict)
cli.add_command(forward.fwd)
cli.add_command(velocity.vp)
def main():
cli(obj={})

Просмотреть файл

@ -1,123 +0,0 @@
from functools import partial
import click
import h5py
import numpy as np
from ..forward import Receiver, RickerSource, TimeAxis, VelocityModel
click.option = partial(click.option, show_default=True)
@click.group()
@click.argument("input", type=click.Path())
@click.argument("output", type=click.Path())
@click.option(
"-d",
"--duration",
default=1000.0,
type=float,
help="Simulation duration (in ms)",
)
@click.option("-dt", default=2.0, type=float, help="Time increment (in ms)")
@click.option(
"--n-pml", default=10, type=int, help="PML size (in grid points)"
)
@click.option(
"--n-receivers",
default=11,
type=int,
help="Number of receivers per horizontal dimension",
)
@click.option("--space-order", default=2, type=int, help="Space order")
@click.option(
"--spacing", default=10.0, type=float, help="Spacing between grid points"
)
@click.pass_context
def fwd(
ctx,
dt: float,
duration: float,
input: str,
n_pml: int,
n_receivers: int,
output: str,
space_order: int,
spacing: float,
):
"""Forward modelling"""
if dt:
ctx.obj["dt"] = dt
ctx.obj["duration"] = duration
ctx.obj["input_file"] = h5py.File(input, mode="r")
ctx.obj["n_pml"] = n_pml
ctx.obj["n_receivers"] = n_receivers
ctx.obj["output_file"] = h5py.File(output, mode="w")
ctx.obj["space_order"] = space_order
ctx.obj["spacing"] = spacing
@fwd.command()
@click.option(
"-f0", default=0.01, type=float, help="Source peak frequency (in kHz)"
)
@click.pass_context
def ricker(ctx, f0: float):
"""Ricker source"""
input_file = ctx.obj["input_file"]
output_file = ctx.obj["output_file"]
n = sum(len(x.values()) for x in input_file.values())
with click.progressbar(length=n) as bar:
for input_group_name, input_group in input_file.items():
for dataset in input_group.values():
first_dataset = dataset
break
model = VelocityModel(
shape=first_dataset.shape,
origin=tuple(0.0 for _ in first_dataset.shape),
spacing=tuple(ctx.obj["spacing"] for _ in first_dataset.shape),
vp=first_dataset[()],
space_order=ctx.obj["space_order"],
n_pml=ctx.obj["n_pml"],
)
time_range = TimeAxis(
start=0.0, stop=ctx.obj["duration"], step=ctx.obj["dt"]
)
source = RickerSource(
name="source",
grid=model.grid,
f0=f0,
npoint=1,
time_range=time_range,
)
source.coordinates.data[0, :] = np.array(model.domain_size) * 0.5
source.coordinates.data[0, -1] = 0.0
n_receivers = ctx.obj["n_receivers"]
total_receivers = n_receivers ** (len(model.shape) - 1)
receivers = Receiver(
name="receivers",
grid=model.grid,
npoint=total_receivers,
time_range=time_range,
)
receivers_coords = np.meshgrid(
*(
np.linspace(start=0, stop=s, num=n_receivers + 2)[1:-1]
for s in model.domain_size[:-1]
)
)
for d in range(len(receivers_coords)):
receivers.coordinates.data[:, d] = receivers_coords[
d
].flatten()
receivers.coordinates.data[:, -1] = 0.0
output_group = output_file.create_group(input_group_name)
for input_dataset_name, vp in input_group.items():
model.vp = vp[()]
seismograms = model.solve(
source=source, receivers=receivers, time_range=time_range
)
output_group.create_dataset(
input_dataset_name, data=seismograms
)
bar.update(1)

Просмотреть файл

@ -1,96 +0,0 @@
from functools import partial
from itertools import islice
from typing import Tuple
import click
import h5py
from ..velocity import RoethTarantolaGenerator
click.option = partial(click.option, show_default=True)
@click.group()
@click.argument("output", type=click.Path())
@click.option(
"--append/--no-append",
default=False,
help="Whether to append to output file",
)
@click.option("-n", default=1, type=int, help="Number of simulations")
@click.option(
"-nx",
default=100,
type=int,
help="Number of grid points along the first dimension",
)
@click.option(
"-ny",
default=100,
type=int,
help="Number of grid points along the second dimension",
)
@click.option(
"-nz", type=int, help="Number of grid points along the third dimension"
)
@click.option("-s", "--seed", default=42, type=int, help="Random seed")
@click.pass_context
def vp(
ctx,
append: bool,
n: int,
nx: int,
ny: int,
nz: int,
output: str,
seed: int,
):
"""Vp simulation"""
shape = (nx, ny)
if nz is not None:
shape += (nz,)
output_file = h5py.File(output, mode=("a" if append else "w"))
output_group = output_file.create_group(
str(max((int(x) for x in output_file.keys()), default=-1) + 1)
)
ctx.obj["n"] = n
ctx.obj["output_file"] = output_file
ctx.obj["output_group"] = output_group
ctx.obj["seed"] = seed
ctx.obj["shape"] = shape
@vp.command()
@click.option("--n-layers", default=8, type=int, help="Number of layers")
@click.option(
"--initial-vp",
default=(1350.0, 1650.0),
type=(float, float),
help="Initial Vp (in km/s)",
)
@click.option(
"--vp-perturbation",
default=(-190.0, 570.0),
type=(float, float),
help="Per-layer Vp perturbation (in km/s)",
)
@click.pass_context
def rt(
ctx,
initial_vp: Tuple[float, float],
n_layers: int,
vp_perturbation: Tuple[float, float],
):
"""Röth-Tarantola model"""
model = RoethTarantolaGenerator(
shape=ctx.obj["shape"],
seed=ctx.obj["seed"],
n_layers=n_layers,
initial_vp=initial_vp,
vp_perturbation=vp_perturbation,
)
group = ctx.obj["output_group"]
with click.progressbar(length=ctx.obj["n"]) as bar:
for i, data in enumerate(islice(model.generate_many(), ctx.obj["n"])):
group.create_dataset(str(i), data=data, compression="gzip")
bar.update(1)

Просмотреть файл

@ -1,14 +0,0 @@
from .models import Model, VelocityModel
from .sources import Receiver, RickerSource, WaveletSource
from .time import TimeAxis
from .types import Kernel
__all__ = [
"Kernel",
"Model",
"Receiver",
"RickerSource",
"TimeAxis",
"VelocityModel",
"WaveletSource",
]

Просмотреть файл

@ -1,162 +0,0 @@
from typing import Optional, Tuple, Union
import numpy as np
from devito import (
Constant,
Eq,
Function,
Grid,
Operator,
SubDomain,
TimeFunction,
logger,
solve,
)
from .sources import PointSource
from .subdomains import PhysicalDomain
from .time import TimeAxis
from .types import Kernel
logger.set_log_level("WARNING")
class Model(object):
def __init__(
self,
shape: Tuple[int, ...],
origin: Tuple[float, ...],
spacing: Tuple[float, ...],
n_pml: Optional[int] = 0,
dtype: Optional[type] = np.float32,
subdomains: Optional[Tuple[SubDomain]] = (),
):
shape = tuple(int(x) for x in shape)
origin = tuple(dtype(x) for x in origin)
n_pml = int(n_pml)
subdomains = tuple(subdomains) + (PhysicalDomain(n_pml),)
shape_pml = tuple(x + 2 * n_pml for x in shape)
extent_pml = tuple(s * (d - 1) for s, d in zip(spacing, shape_pml))
origin_pml = tuple(
dtype(o - s * n_pml) for o, s in zip(origin, spacing)
)
self.grid = Grid(
shape=shape_pml,
extent=extent_pml,
origin=origin_pml,
dtype=dtype,
subdomains=subdomains,
)
self.n_pml = n_pml
self.pml = Function(name="pml", grid=self.grid)
pml_data = np.pad(
np.zeros(shape, dtype=dtype),
[(n_pml,) * 2 for _ in range(self.pml.ndim)],
mode="edge",
)
pml_coef = 1.5 * np.log(1000.0) / 40.0
for d in range(self.pml.ndim):
for i in range(n_pml):
pos = np.abs((n_pml - i + 1) / n_pml)
val = pml_coef * (pos - np.sin(2 * np.pi * pos) / (2 * np.pi))
idx = [slice(0, x) for x in pml_data.shape]
idx[d] = slice(i, i + 1)
pml_data[tuple(idx)] += val / self.grid.spacing[d]
idx[d] = slice(
pml_data.shape[d] - i, pml_data.shape[d] - i + 1
)
pml_data[tuple(idx)] += val / self.grid.spacing[d]
pml_data = np.pad(
pml_data,
[(i.left, i.right) for i in self.pml._size_halo],
mode="edge",
)
self.pml.data_with_halo[:] = pml_data
self.shape = shape
@property
def domain_size(self) -> Tuple[float, ...]:
return tuple((d - 1) * s for d, s in zip(self.shape, self.spacing))
@property
def dtype(self) -> type:
return self.grid.dtype
@property
def spacing(self):
return self.grid.spacing
@property
def spacing_map(self):
return self.grid.spacing_map
@property
def time_spacing(self):
return self.grid.stepping_dim.spacing
class VelocityModel(Model):
def __init__(
self,
shape: Tuple[int, ...],
origin: Tuple[float, ...],
spacing: Tuple[float, ...],
vp: Union[float, np.ndarray],
space_order: Optional[int] = None,
n_pml: Optional[int] = 0,
dtype: Optional[type] = np.float32,
subdomains: Optional[Tuple[SubDomain]] = (),
):
super().__init__(shape, origin, spacing, n_pml, dtype, subdomains)
if isinstance(vp, np.ndarray):
assert space_order is not None
self.m = Function(
name="m", grid=self.grid, space_order=int(space_order)
)
else:
self.m = Constant(name="m", value=1.0 / float(vp) ** 2.0)
self.vp = vp
@property
def vp(self) -> Union[float, np.ndarray]:
return self._vp
@vp.setter
def vp(self, vp: Union[float, np.ndarray]) -> None:
self._vp = vp
if isinstance(vp, np.ndarray):
pad_widths = [
(self.n_pml + i.left, self.n_pml + i.right)
for i in self.m._size_halo
]
self.m.data_with_halo[:] = np.pad(
1.0 / self.vp ** 2.0, pad_widths, mode="edge"
)
else:
self.m.data = 1.0 / float(vp) ** 2.0
def solve(
self,
source: PointSource,
receivers: PointSource,
time_range: TimeAxis,
space_order: Optional[int] = 4,
kernel: Optional[Kernel] = Kernel.OT2,
) -> np.ndarray:
assert isinstance(kernel, Kernel)
u = TimeFunction(
name="u", grid=self.grid, time_order=2, space_order=space_order
)
H = u.laplace
if kernel is Kernel.OT4:
H += self.time_spacing ** 2 / 12 * u.laplace2(1 / self.m)
eq = Eq(
u.forward, solve(self.m * u.dt2 - H + self.pml * u.dt, u.forward)
)
src_term = source.inject(
field=u.forward, expr=source * self.time_spacing ** 2 / self.m
)
rec_term = receivers.interpolate(expr=u)
op = Operator([eq] + src_term + rec_term, subs=self.spacing_map)
op(time=time_range.num - 1, dt=time_range.step)
return receivers.data

Просмотреть файл

@ -1,132 +0,0 @@
from typing import Optional
import numpy as np
import sympy
from devito.types import Dimension, SparseTimeFunction
from devito.types.basic import _SymbolCache
from scipy import interpolate
from .time import TimeAxis
class PointSource(SparseTimeFunction):
def __new__(cls, *args, **kwargs):
if cls in _SymbolCache:
options = kwargs.get("options", {})
obj = sympy.Function.__new__(cls, *args, **options)
obj._cached_init()
return obj
name = kwargs.pop("name")
grid = kwargs.pop("grid")
time_range = kwargs.pop("time_range")
time_order = kwargs.pop("time_order", 2)
p_dim = kwargs.pop("dimension", Dimension(name="p_%s" % name))
npoint = kwargs.pop("npoint", None)
coordinates = kwargs.pop(
"coordinates", kwargs.pop("coordinates_data", None)
)
if npoint is None:
assert (
coordinates is not None
), "Either `npoint` or `coordinates` must be provided"
npoint = coordinates.shape[0]
obj = SparseTimeFunction.__new__(
cls,
name=name,
grid=grid,
dimensions=(grid.time_dim, p_dim),
npoint=npoint,
nt=time_range.num,
time_order=time_order,
coordinates=coordinates,
**kwargs
)
obj._time_range = time_range
data = kwargs.get("data")
if data is not None:
obj.data[:] = data
return obj
@property
def time_range(self) -> TimeAxis:
return self._time_range
@property
def time_values(self) -> np.ndarray:
return self._time_range.time_values
def resample(
self,
dt: Optional[float] = None,
num: Optional[int] = None,
rtol: Optional[float] = 1.0e-5,
order: Optional[int] = 3,
):
assert (dt is not None) ^ (
num is not None
), "Exactly one of `dt` or `num` must be provided"
start = self._time_range.start
stop = self._time_range.stop
dt0 = self._time_range.step
if dt is not None:
new_time_range = TimeAxis(start=start, stop=stop, step=dt)
else:
new_time_range = TimeAxis(start=start, stop=stop, num=num)
dt = new_time_range.step
if np.isclose(dt0, dt, rtol=rtol):
return self
n_traces = self.data.shape[1]
new_traces = np.zeros(
(new_time_range.num, n_traces), dtype=self.data.dtype
)
for j in range(n_traces):
tck = interpolate.splrep(
self._time_range.time_values, self.data[:, j], k=order
)
new_traces[:, j] = interpolate.splev(
new_time_range.time_values, tck
)
return PointSource(
name=self.name,
grid=self.grid,
time_range=new_time_range,
coordinates=self.coordinates.data,
data=new_traces,
)
_pickle_kwargs = SparseTimeFunction._pickle_kwargs + ["time_range"]
_pickle_kwargs.remove("nt") # Inferred from time_range
class Receiver(PointSource):
pass
class WaveletSource(PointSource):
def __new__(cls, *args, **kwargs):
if cls in _SymbolCache:
options = kwargs.get("options", {})
obj = sympy.Function.__new__(cls, *args, **options)
obj._cached_init()
return obj
npoint = kwargs.pop("npoint", 1)
obj = PointSource.__new__(cls, npoint=npoint, **kwargs)
obj.f0 = kwargs.get("f0")
for p in range(npoint):
obj.data[:, p] = obj.wavelet(obj.f0, obj.time_values)
return obj
def __init__(self, *args, **kwargs):
if not self._cached():
super(WaveletSource, self).__init__(*args, **kwargs)
def wavelet(self, f0: float, t: np.ndarray) -> np.ndarray:
raise NotImplementedError
_pickle_kwargs = PointSource._pickle_kwargs + ["f0"]
class RickerSource(WaveletSource):
def wavelet(self, f0: float, t: np.ndarray) -> np.ndarray:
r = np.pi * f0 * (t - 1.0 / f0)
return (1.0 - 2.0 * r ** 2.0) * np.exp(-r ** 2.0)

Просмотреть файл

@ -1,16 +0,0 @@
from typing import Dict, Iterable, Tuple
from devito import Dimension, SubDomain
class PhysicalDomain(SubDomain):
name = "physical_domain"
def __init__(self, n_pml: int):
super().__init__()
self.n_pml = n_pml
def define(
self, dimensions: Iterable[Dimension]
) -> Dict[Dimension, Tuple[str, int, int]]:
return {d: ("middle", self.n_pml, self.n_pml) for d in dimensions}

Просмотреть файл

@ -1,34 +0,0 @@
from typing import Optional
import numpy as np
class TimeAxis(object):
def __init__(
self,
start: Optional[float] = None,
stop: Optional[float] = None,
num: Optional[int] = None,
step: Optional[float] = None,
dtype: Optional[type] = np.float32,
):
if start is None:
start = step * (1 - num) + stop
elif stop is None:
stop = step * (num - 1) + start
elif num is None:
num = int(np.ceil((stop - start + step) / step))
stop = step * (num - 1) + start
elif step is None:
step = (stop - start) / (num - 1)
else:
raise ValueError
self.start = start
self.stop = stop
self.num = num
self.step = step
self.dtype = dtype
@property
def time_values(self) -> np.ndarray:
return np.linspace(self.start, self.stop, self.num, dtype=self.dtype)

Просмотреть файл

@ -1,6 +0,0 @@
from enum import Enum, auto
class Kernel(Enum):
OT2 = auto()
OT4 = auto()

Просмотреть файл

@ -1,4 +0,0 @@
from .generator import Generator
from .roeth_tarantola import RoethTarantolaGenerator
__all__ = ["Generator", "RoethTarantolaGenerator"]

Просмотреть файл

@ -1,22 +0,0 @@
from typing import Optional, Tuple
import numpy as np
class Generator(object):
def __init__(
self,
shape: Tuple[int, ...],
dtype: Optional[type] = np.float32,
seed: Optional[int] = None,
):
self.shape = shape
self.dtype = dtype
self._prng = np.random.RandomState(seed)
def generate(self) -> np.ndarray:
raise NotImplementedError
def generate_many(self) -> np.ndarray:
while True:
yield self.generate()

Просмотреть файл

@ -1,41 +0,0 @@
from typing import Optional, Tuple
import numpy as np
from .generator import Generator
class RoethTarantolaGenerator(Generator):
def __init__(
self,
shape: Tuple[int, ...],
dtype: Optional[type] = np.float32,
seed: Optional[int] = None,
depth_dim: Optional[int] = -1,
n_layers: Optional[int] = 8,
initial_vp: Optional[Tuple[float, float]] = (1.35, 1.65),
vp_perturbation: Optional[Tuple[float, float]] = (-0.19, 0.57),
):
super().__init__(shape, dtype, seed)
self.depth_dim = depth_dim
self.n_layers = n_layers
self.initial_vp = initial_vp
self.vp_perturbation = vp_perturbation
def generate(self) -> np.ndarray:
vp = np.zeros(self.shape, dtype=self.dtype)
dim = self.depth_dim
layer_idx = np.round(
np.linspace(0, self.shape[dim], self.n_layers + 1)
).astype(np.int)
vp_idx = [slice(0, x) for x in vp.shape]
layer_vp = None
for i in range(self.n_layers):
vp_idx[dim] = slice(layer_idx[i], layer_idx[i + 1])
layer_vp = (
self._prng.uniform(*self.initial_vp)
if layer_vp is None
else layer_vp + self._prng.uniform(*self.vp_perturbation)
)
vp[tuple(vp_idx)] = layer_vp
return vp

6
docs/README.md Normal file
Просмотреть файл

@ -0,0 +1,6 @@
# Documentation
To setup the documentation, first you need to install the dependencies of the full environment. For it please follow the [SETUP.md](../SETUP.md).
TODO: add more text

Просмотреть файл

@ -0,0 +1,38 @@
name: seismic-interpretation
channels:
- conda-forge
- pytorch
dependencies:
- python=3.6.7
- pip
- pytorch==1.3.1
- cudatoolkit==10.1.243
- jupyter
- ipykernel
- torchvision==0.4.2
- pandas==0.25.3
- opencv==4.1.2
- scikit-learn==0.21.3
- tensorflow==2.0
- opt-einsum>=2.3.2
- tqdm==4.39.0
- itkwidgets==0.23.1
- pytest
- papermill>=1.0.1
- pip:
- segyio==1.8.8
- pytorch-ignite==0.3.0.dev20191105 # pre-release until stable available
- fire==0.2.1
- toolz==0.10.0
- tabulate==0.8.2
- Jinja2==2.10.3
- gitpython==3.0.5
- tensorboard==2.0.1
- tensorboardx==1.9
- invoke==1.3.0
- yacs==0.1.6
- albumentations==0.4.3
- black
- pylint
- scipy==1.1.0
- jupytext==1.3.0

Просмотреть файл

@ -0,0 +1,51 @@
define PROJECT_HELP_MSG
Makefile to control project aml_dist
Usage:
help show this message
build build docker image to use as control plane
bash run bash inside runnin docker container
stop stop running docker container
endef
export PROJECT_HELP_MSG
PWD:=$(shell pwd)
PORT:=9999
TBOARD_PORT:=6006
IMAGE_NAME:=ignite_image
NAME:=ignite_container # Name of running container
DATA:=/mnt
BASEDIR:=$(shell dirname $(shell dirname ${PWD}))
local_code_volume:=-v $(BASEDIR):/workspace
volumes:=-v $(DATA):/data \
-v ${HOME}/.bash_history:/root/.bash_history
help:
echo "$$PROJECT_HELP_MSG" | less
build:
docker build -t $(IMAGE_NAME) -f dockerfile .
run:
# Start docker running as daemon
docker run $(local_code_volume) $(volumes) $(setup_environment_file) \
--shm-size="4g" \
--runtime=nvidia \
--name $(NAME) \
-d \
-v /var/run/docker.sock:/var/run/docker.sock \
-e HIST_FILE=/root/.bash_history \
-it $(IMAGE_NAME)
docker exec -it $(NAME) bash
bash:
docker exec -it $(NAME) bash
stop:
docker stop $(NAME)
docker rm $(NAME)
.PHONY: help build run bash stop

Просмотреть файл

@ -0,0 +1,16 @@
FROM pytorch/pytorch:nightly-devel-cuda10.0-cudnn7
RUN apt-get update && apt-get install -y --no-install-recommends \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender-dev
RUN git clone https://github.com/NVIDIA/apex && \
cd apex && \
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
RUN pip install toolz pytorch-ignite torchvision pandas opencv-python fire tensorboardx scikit-learn yacs
WORKDIR /workspace
CMD /bin/bash

Просмотреть файл

@ -0,0 +1,56 @@
define PROJECT_HELP_MSG
Makefile to control project aml_dist
Usage:
help show this message
build build docker image to use as control plane
bash run bash inside runnin docker container
stop stop running docker container
endef
export PROJECT_HELP_MSG
PWD:=$(shell pwd)
PORT:=9999
TBOARD_PORT:=6006
IMAGE_NAME:=horovod_image
NAME:=horovod_container # Name of running container
DATA:=/mnt
BASEDIR:=$(shell dirname $(shell dirname $(shell dirname ${PWD})))
REPODIR:=$(shell dirname ${BASEDIR})
local_code_volume:=-v $(BASEDIR):/workspace
volumes:=-v $(DATA):/data \
-v ${HOME}/.bash_history:/root/.bash_history
help:
echo "$$PROJECT_HELP_MSG" | less
build:
docker build -t $(IMAGE_NAME) -f dockerfile ${REPODIR}
run:
@echo ${BASEDIR}
# Start docker running as daemon
docker run $(local_code_volume) $(volumes) $(setup_environment_file) \
--privileged \
--shm-size="4g" \
--runtime=nvidia \
--name $(NAME) \
-d \
-v /var/run/docker.sock:/var/run/docker.sock \
-e HIST_FILE=/root/.bash_history \
-it $(IMAGE_NAME)
docker exec -it $(NAME) bash
run-horovod:
docker exec -it $(NAME) mpirun -np 2 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib python train_horovod.py
bash:
docker exec -it $(NAME) bash
stop:
docker stop $(NAME)
docker rm $(NAME)
.PHONY: help build run bash stop

Просмотреть файл

@ -0,0 +1,130 @@
FROM nvidia/cuda:10.0-devel-ubuntu18.04
# Based on default horovod image
ENV PYTORCH_VERSION=1.1.0
ENV TORCHVISION_VERSION=0.3.0
ENV CUDNN_VERSION=7.6.0.64-1+cuda10.0
ENV NCCL_VERSION=2.4.7-1+cuda10.0
# Python 2.7 or 3.6 is supported by Ubuntu Bionic out of the box
ARG python=3.6
ENV PYTHON_VERSION=${python}
# Set default shell to /bin/bash
SHELL ["/bin/bash", "-cu"]
# We need gcc-4.9 to build plugins for TensorFlow & PyTorch, which is only available in Ubuntu Xenial
RUN echo deb http://archive.ubuntu.com/ubuntu xenial main universe | tee -a /etc/apt/sources.list
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends --allow-change-held-packages --allow-downgrades \
build-essential \
cmake \
gcc-4.9 \
g++-4.9 \
gcc-4.9-base \
software-properties-common \
git \
curl \
wget \
ca-certificates \
libcudnn7=${CUDNN_VERSION} \
libnccl2=${NCCL_VERSION} \
libnccl-dev=${NCCL_VERSION} \
libjpeg-dev \
libpng-dev \
python${PYTHON_VERSION} \
python${PYTHON_VERSION}-dev \
librdmacm1 \
libibverbs1 \
ibverbs-utils\
ibutils \
net-tools \
ibverbs-providers \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender-dev
RUN if [[ "${PYTHON_VERSION}" == "3.6" ]]; then \
apt-get install -y python${PYTHON_VERSION}-distutils; \
fi
RUN ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py
# Install PyTorch
RUN pip install future typing
RUN pip install numpy
RUN pip install https://download.pytorch.org/whl/cu100/torch-${PYTORCH_VERSION}-$(python -c "import wheel.pep425tags as w; print('-'.join(w.get_supported()[0]))").whl \
https://download.pytorch.org/whl/cu100/torchvision-${TORCHVISION_VERSION}-$(python -c "import wheel.pep425tags as w; print('-'.join(w.get_supported()[0]))").whl
RUN pip install --no-cache-dir torchvision h5py toolz pytorch-ignite pandas opencv-python fire tensorboardx scikit-learn tqdm yacs albumentations gitpython
COPY ComputerVision_fork/contrib /contrib
RUN pip install -e /contrib
COPY DeepSeismic /DeepSeismic
RUN pip install -e DeepSeismic/interpretation
# Install Open MPI
RUN mkdir /tmp/openmpi && \
cd /tmp/openmpi && \
wget https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-4.0.0.tar.gz && \
tar zxf openmpi-4.0.0.tar.gz && \
cd openmpi-4.0.0 && \
./configure --enable-orterun-prefix-by-default && \
make -j $(nproc) all && \
make install && \
ldconfig && \
rm -rf /tmp/openmpi
# Pin GCC to 4.9 (priority 200) to compile correctly against TensorFlow, PyTorch, and MXNet.
# Backup existing GCC installation as priority 100, so that it can be recovered later.
RUN update-alternatives --install /usr/bin/gcc gcc $(readlink -f $(which gcc)) 100 && \
update-alternatives --install /usr/bin/x86_64-linux-gnu-gcc x86_64-linux-gnu-gcc $(readlink -f $(which gcc)) 100 && \
update-alternatives --install /usr/bin/g++ g++ $(readlink -f $(which g++)) 100 && \
update-alternatives --install /usr/bin/x86_64-linux-gnu-g++ x86_64-linux-gnu-g++ $(readlink -f $(which g++)) 100
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 200 && \
update-alternatives --install /usr/bin/x86_64-linux-gnu-gcc x86_64-linux-gnu-gcc /usr/bin/gcc-4.9 200 && \
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 200 && \
update-alternatives --install /usr/bin/x86_64-linux-gnu-g++ x86_64-linux-gnu-g++ /usr/bin/g++-4.9 200
# Install Horovod, temporarily using CUDA stubs
RUN ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod && \
ldconfig
# Remove GCC pinning
RUN update-alternatives --remove gcc /usr/bin/gcc-4.9 && \
update-alternatives --remove x86_64-linux-gnu-gcc /usr/bin/gcc-4.9 && \
update-alternatives --remove g++ /usr/bin/g++-4.9 && \
update-alternatives --remove x86_64-linux-gnu-g++ /usr/bin/g++-4.9
# Create a wrapper for OpenMPI to allow running as root by default
RUN mv /usr/local/bin/mpirun /usr/local/bin/mpirun.real && \
echo '#!/bin/bash' > /usr/local/bin/mpirun && \
echo 'mpirun.real --allow-run-as-root "$@"' >> /usr/local/bin/mpirun && \
chmod a+x /usr/local/bin/mpirun
# Configure OpenMPI to run good defaults:
# --bind-to none --map-by slot --mca btl_tcp_if_exclude lo,docker0
RUN echo "hwloc_base_binding_policy = none" >> /usr/local/etc/openmpi-mca-params.conf && \
echo "rmaps_base_mapping_policy = slot" >> /usr/local/etc/openmpi-mca-params.conf
# echo "btl_tcp_if_exclude = lo,docker0" >> /usr/local/etc/openmpi-mca-params.conf
# Set default NCCL parameters
RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf && \
echo NCCL_SOCKET_IFNAME=^docker0 >> /etc/nccl.conf
# Install OpenSSH for MPI to communicate between containers
RUN apt-get install -y --no-install-recommends openssh-client openssh-server && \
mkdir -p /var/run/sshd
# Allow OpenSSH to talk to containers without asking for confirmation
RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \
echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \
mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config
WORKDIR /workspace
CMD /bin/bash

Просмотреть файл

@ -0,0 +1 @@
Description of examples

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -0,0 +1,654 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# HRNet training and validation on numpy dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we demonstrate how to train an HRNet model for facies prediction using [Penobscot](https://zenodo.org/record/1341774#.XepaaUB2vOg) dataset. The Penobscot 3D seismic dataset was acquired in the Scotian shelf, offshore Nova Scotia, Canada. Please refer to the top-level [README.md](../../../README.md) file to download and prepare this dataset for the experiments. \n",
"\n",
"The data expected in this notebook needs to be in the form of two 3D numpy arrays. One array will contain the seismic information, the other the mask. The network will be trained to take a 2D patch of data from the seismic block and learn to predict the 2D mask patch associated with it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Environment setup\n",
"\n",
"To set up the conda environment, please follow the instructions in the top-level [README.md](../../../README.md) file.\n",
"\n",
"__Note__: To register the conda environment in Jupyter, run:\n",
"`python -m ipykernel install --user --name envname`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Library imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"import logging.config\n",
"from os import path\n",
"\n",
"import cv2\n",
"import numpy as np\n",
"import yacs.config\n",
"import torch\n",
"from albumentations import Compose, HorizontalFlip, Normalize, PadIfNeeded, Resize\n",
"from cv_lib.utils import load_log_configuration\n",
"from cv_lib.event_handlers import (\n",
" SnapshotHandler,\n",
" logging_handlers,\n",
" tensorboard_handlers,\n",
")\n",
"from cv_lib.event_handlers.logging_handlers import Evaluator\n",
"from cv_lib.event_handlers.tensorboard_handlers import (\n",
" create_image_writer,\n",
" create_summary_writer,\n",
")\n",
"from cv_lib.segmentation import models, extract_metric_from\n",
"from cv_lib.segmentation.metrics import (\n",
" pixelwise_accuracy,\n",
" class_accuracy,\n",
" mean_class_accuracy,\n",
" class_iou,\n",
" mean_iou,\n",
")\n",
"from cv_lib.segmentation.dutchf3.utils import (\n",
" current_datetime,\n",
" generate_path,\n",
" np_to_tb,\n",
")\n",
"from cv_lib.segmentation.penobscot.engine import (\n",
" create_supervised_evaluator,\n",
" create_supervised_trainer,\n",
")\n",
"from deepseismic_interpretation.penobscot.data import PenobscotInlinePatchDataset\n",
"from deepseismic_interpretation.dutchf3.data import decode_segmap\n",
"from ignite.contrib.handlers import CosineAnnealingScheduler\n",
"from ignite.engine import Events\n",
"from ignite.metrics import Loss\n",
"from ignite.utils import convert_tensor\n",
"from toolz import compose\n",
"from torch.utils import data\n",
"from itkwidgets import view\n",
"from utilities import plot_aline\n",
"from toolz import take\n",
"\n",
"\n",
"mask_value = 255\n",
"_SEG_COLOURS = np.asarray(\n",
" [[241, 238, 246], [208, 209, 230], [166, 189, 219], [116, 169, 207], [54, 144, 192], [5, 112, 176], [3, 78, 123]]\n",
")\n",
"\n",
"# experiment configuration file\n",
"CONFIG_FILE = \"./configs/hrnet.yaml\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def _prepare_batch(batch, device=None, non_blocking=False):\n",
" x, y, ids, patch_locations = batch\n",
" return (\n",
" convert_tensor(x, device=device, non_blocking=non_blocking),\n",
" convert_tensor(y, device=device, non_blocking=non_blocking),\n",
" ids,\n",
" patch_locations,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experiment configuration file\n",
"We use configuration files to specify experiment configuration, such as hyperparameters used in training and evaluation, as well as other experiment settings. We provide several configuration files for this notebook, under `./configs`, mainly differing in the DNN architecture used for defining the model.\n",
"\n",
"Modify the `CONFIG_FILE` variable above if you would like to run the experiment using a different configuration file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(CONFIG_FILE, \"rt\") as f_read:\n",
" config = yacs.config.load_cfg(f_read)\n",
"\n",
"print(f'Configuration loaded. Please check that the DATASET.ROOT:{config.DATASET.ROOT} points to your data location.')\n",
"print(f'To modify any of the options, please edit the configuration file {CONFIG_FILE} and reload. \\n')\n",
"print(config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"# The number of datapoints you want to run in training or validation per batch \n",
"# Setting to None will run whole dataset\n",
"# useful for integration tests with a setting of something like 3\n",
"# Use only if you want to check things are running and don't want to run\n",
"# through whole dataset\n",
"max_iterations = None \n",
"# The number of epochs to run in training\n",
"max_epochs = config.TRAIN.END_EPOCH \n",
"max_snapshots = config.TRAIN.SNAPSHOTS\n",
"dataset_root = config.DATASET.ROOT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from toolz import pipe\n",
"import glob\n",
"from PIL import Image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"image_dir = os.path.join(dataset_root, \"inlines\")\n",
"mask_dir = os.path.join(dataset_root, \"masks\")\n",
"\n",
"image_iter = pipe(os.path.join(image_dir, \"*.tiff\"), glob.iglob,)\n",
"\n",
"_open_to_array = compose(np.array, Image.open)\n",
"\n",
"\n",
"def open_image_mask(image_path):\n",
" return pipe(image_path, _open_to_array)\n",
"\n",
"\n",
"def _mask_filename(imagepath):\n",
" file_part = os.path.splitext(os.path.split(imagepath)[-1].strip())[0]\n",
" return os.path.join(mask_dir, file_part + \"_mask.png\")\n",
"\n",
"\n",
"image_list = sorted(list(image_iter))\n",
"image_list_array = [_open_to_array(i) for i in image_list]\n",
"mask_list_array = [pipe(i, _mask_filename, _open_to_array) for i in image_list]\n",
"mask = np.stack(mask_list_array, axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualize the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"view(mask, slicing_planes=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's view slices of the data along inline and crossline directions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"idx = 100\n",
"x_in = image_list_array[idx]\n",
"x_inl = mask_list_array[idx]\n",
"\n",
"plot_aline(x_in, x_inl, xlabel=\"inline\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Model training"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setup logging\n",
"load_log_configuration(config.LOG_CONFIG)\n",
"logger = logging.getLogger(__name__)\n",
"logger.debug(config.WORKERS)\n",
"scheduler_step = max_epochs // max_snapshots\n",
"torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK\n",
"\n",
"torch.manual_seed(config.SEED)\n",
"if torch.cuda.is_available():\n",
" torch.cuda.manual_seed_all(config.SEED)\n",
"np.random.seed(seed=config.SEED)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up data augmentation\n",
"\n",
"Let's define our data augmentation pipeline, which includes basic transformations, such as _data normalization, resizing, and padding_ if necessary.\n",
"The padding is carried out twice becuase if we split the inline or crossline slice into multiple patches then some of these patches will be at the edge of the slice and may not contain a full patch worth of data. To compensate to this and have same size patches in the batch (a requirement) we need to pad them.\n",
"So our basic augmentation is:\n",
"- Normalize\n",
"- Pad if needed to initial size\n",
"- Resize to a larger size\n",
"- Pad further if necessary"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setup Augmentations\n",
"basic_aug = Compose(\n",
" [\n",
" Normalize(mean=(config.TRAIN.MEAN,), std=(config.TRAIN.STD,), max_pixel_value=config.TRAIN.MAX,),\n",
" PadIfNeeded(\n",
" min_height=config.TRAIN.PATCH_SIZE,\n",
" min_width=config.TRAIN.PATCH_SIZE,\n",
" border_mode=cv2.BORDER_CONSTANT,\n",
" always_apply=True,\n",
" mask_value=mask_value,\n",
" value=0,\n",
" ),\n",
" Resize(config.TRAIN.AUGMENTATIONS.RESIZE.HEIGHT, config.TRAIN.AUGMENTATIONS.RESIZE.WIDTH, always_apply=True,),\n",
" PadIfNeeded(\n",
" min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT,\n",
" min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH,\n",
" border_mode=cv2.BORDER_CONSTANT,\n",
" always_apply=True,\n",
" mask_value=mask_value,\n",
" value=0,\n",
" ),\n",
" ]\n",
")\n",
"if config.TRAIN.AUGMENTATION:\n",
" train_aug = Compose([basic_aug, HorizontalFlip(p=0.5)])\n",
" val_aug = basic_aug\n",
"else:\n",
" train_aug = val_aug = basic_aug"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For training the model, we will use a patch-based approach. Rather than using entire sections (crosslines or inlines) of the data, we extract a large number of small patches from the sections, and use the patches as our data. This allows us to generate larger set of images for training, but is also a more feasible approach for large seismic volumes.\n",
"\n",
"We are using a custom patch data loader from our __`deepseismic_interpretation`__ library for generating and loading patches from seismic section data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": [
"train_set = PenobscotInlinePatchDataset(\n",
" dataset_root,\n",
" config.TRAIN.PATCH_SIZE,\n",
" config.TRAIN.STRIDE,\n",
" split=\"train\",\n",
" transforms=train_aug,\n",
" n_channels=config.MODEL.IN_CHANNELS,\n",
" complete_patches_only=config.TRAIN.COMPLETE_PATCHES_ONLY,\n",
")\n",
"\n",
"val_set = PenobscotInlinePatchDataset(\n",
" dataset_root,\n",
" config.TRAIN.PATCH_SIZE,\n",
" config.TRAIN.STRIDE,\n",
" split=\"val\",\n",
" transforms=val_aug,\n",
" n_channels=config.MODEL.IN_CHANNELS,\n",
" complete_patches_only=config.VALIDATION.COMPLETE_PATCHES_ONLY,\n",
")\n",
"\n",
"logger.info(train_set)\n",
"logger.info(val_set)\n",
"\n",
"n_classes = train_set.n_classes\n",
"train_loader = data.DataLoader(\n",
" train_set, batch_size=config.TRAIN.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS, shuffle=True,\n",
")\n",
"\n",
"val_loader = data.DataLoader(val_set, batch_size=config.VALIDATION.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS,)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up model training\n",
"Next, let's define a model to train, an optimization algorithm, and a loss function.\n",
"\n",
"Note that the model is loaded from our __`cv_lib`__ library, using the name of the model as specified in the configuration file. To load a different model, either change the `MODEL.NAME` field in the configuration file, or create a new one corresponding to the model you wish to train."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = getattr(models, config.MODEL.NAME).get_seg_model(config)\n",
"\n",
"device = \"cpu\"\n",
"if torch.cuda.is_available():\n",
" device = \"cuda\"\n",
"model = model.to(device) # Send to GPU\n",
"\n",
"optimizer = torch.optim.SGD(\n",
" model.parameters(), lr=config.TRAIN.MAX_LR, momentum=config.TRAIN.MOMENTUM, weight_decay=config.TRAIN.WEIGHT_DECAY,\n",
")\n",
"\n",
"output_dir = generate_path(config.OUTPUT_DIR, config.MODEL.NAME, current_datetime(),)\n",
"summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR))\n",
"snapshot_duration = scheduler_step * len(train_loader)\n",
"scheduler = CosineAnnealingScheduler(optimizer, \"lr\", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, snapshot_duration)\n",
"\n",
"criterion = torch.nn.CrossEntropyLoss(ignore_index=mask_value, reduction=\"mean\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training the model\n",
"We use [ignite](https://pytorch.org/ignite/index.html) framework to create training and validation loops in our codebase. Ignite provides an easy way to create compact training/validation loops without too much boilerplate code.\n",
"\n",
"In this notebook, we demonstrate the use of ignite on the training loop only. We create a training engine `trainer` that loops multiple times over the training dataset and updates model parameters. In addition, we add various events to the trainer, using an event system, that allows us to interact with the engine on each step of the run, such as, when the trainer is started/completed, when the epoch is started/completed and so on.\n",
"\n",
"In the cell below, we use event handlers to add the following events to the training loop:\n",
"- log training output\n",
"- log and schedule learning rate and\n",
"- periodically save model to disk."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"trainer = create_supervised_trainer(model, optimizer, criterion, _prepare_batch, device=device)\n",
"\n",
"trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)\n",
"\n",
"trainer.add_event_handler(\n",
" Events.ITERATION_COMPLETED, logging_handlers.log_training_output(log_interval=config.PRINT_FREQ),\n",
")\n",
"trainer.add_event_handler(Events.EPOCH_STARTED, logging_handlers.log_lr(optimizer))\n",
"trainer.add_event_handler(\n",
" Events.EPOCH_STARTED, tensorboard_handlers.log_lr(summary_writer, optimizer, \"epoch\"),\n",
")\n",
"trainer.add_event_handler(\n",
" Events.ITERATION_COMPLETED, tensorboard_handlers.log_training_output(summary_writer),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def _select_pred_and_mask(model_out_dict):\n",
" return (model_out_dict[\"y_pred\"].squeeze(), model_out_dict[\"mask\"].squeeze())\n",
"\n",
"\n",
"evaluator = create_supervised_evaluator(\n",
" model,\n",
" _prepare_batch,\n",
" metrics={\n",
" \"pixacc\": pixelwise_accuracy(n_classes, output_transform=_select_pred_and_mask),\n",
" \"nll\": Loss(criterion, output_transform=_select_pred_and_mask),\n",
" \"cacc\": class_accuracy(n_classes, output_transform=_select_pred_and_mask),\n",
" \"mca\": mean_class_accuracy(n_classes, output_transform=_select_pred_and_mask),\n",
" \"ciou\": class_iou(n_classes, output_transform=_select_pred_and_mask),\n",
" \"mIoU\": mean_iou(n_classes, output_transform=_select_pred_and_mask),\n",
" },\n",
" device=device,\n",
")\n",
"\n",
"if max_iterations is not None:\n",
" val_loader = take(max_iterations, val_loader)\n",
"\n",
"# Set the validation run to start on the epoch completion of the training run\n",
"trainer.add_event_handler(Events.EPOCH_COMPLETED, Evaluator(evaluator, val_loader))\n",
"\n",
"evaluator.add_event_handler(\n",
" Events.EPOCH_COMPLETED,\n",
" logging_handlers.log_metrics(\n",
" \"Validation results\",\n",
" metrics_dict={\n",
" \"nll\": \"Avg loss :\",\n",
" \"pixacc\": \"Pixelwise Accuracy :\",\n",
" \"mca\": \"Avg Class Accuracy :\",\n",
" \"mIoU\": \"Avg Class IoU :\",\n",
" },\n",
" ),\n",
")\n",
"evaluator.add_event_handler(\n",
" Events.EPOCH_COMPLETED,\n",
" tensorboard_handlers.log_metrics(\n",
" summary_writer,\n",
" trainer,\n",
" \"epoch\",\n",
" metrics_dict={\n",
" \"mIoU\": \"Validation/mIoU\",\n",
" \"nll\": \"Validation/Loss\",\n",
" \"mca\": \"Validation/MCA\",\n",
" \"pixacc\": \"Validation/Pixel_Acc\",\n",
" },\n",
" ),\n",
")\n",
"\n",
"\n",
"def _select_max(pred_tensor):\n",
" return pred_tensor.max(1)[1]\n",
"\n",
"\n",
"def _tensor_to_numpy(pred_tensor):\n",
" return pred_tensor.squeeze().cpu().numpy()\n",
"\n",
"\n",
"transform_func = compose(np_to_tb, decode_segmap(n_classes=n_classes, label_colours=_SEG_COLOURS), _tensor_to_numpy,)\n",
"\n",
"transform_pred = compose(transform_func, _select_max)\n",
"\n",
"evaluator.add_event_handler(\n",
" Events.EPOCH_COMPLETED, create_image_writer(summary_writer, \"Validation/Image\", \"image\"),\n",
")\n",
"evaluator.add_event_handler(\n",
" Events.EPOCH_COMPLETED,\n",
" create_image_writer(summary_writer, \"Validation/Mask\", \"mask\", transform_func=transform_func),\n",
")\n",
"evaluator.add_event_handler(\n",
" Events.EPOCH_COMPLETED,\n",
" create_image_writer(summary_writer, \"Validation/Pred\", \"y_pred\", transform_func=transform_pred),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Checkpointing\n",
"Below we define the function that will save the best performing models based on mean IoU."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def snapshot_function():\n",
" return (trainer.state.iteration % snapshot_duration) == 0\n",
"\n",
"\n",
"checkpoint_handler = SnapshotHandler(\n",
" path.join(output_dir, config.TRAIN.MODEL_DIR), config.MODEL.NAME, extract_metric_from(\"mIoU\"), snapshot_function,\n",
")\n",
"evaluator.add_event_handler(Events.EPOCH_COMPLETED, checkpoint_handler, {\"model\": model})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start the training engine run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if max_iterations is not None:\n",
" train_loader = take(max_iterations, train_loader)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"logger.info(\"Starting training\")\n",
"trainer.run(train_loader, max_epochs=max_epochs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tensorboard\n",
"Using tensorboard for monitoring runs can be quite enlightening. Just ensure that the appropriate port is open on the VM so you can access it. Below we have the command for running tensorboard in your notebook. You can as easily view it in a seperate browser window by pointing the browser to the appropriate location and port."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if max_epochs>1:\n",
" %load_ext tensorboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if max_epochs>1:\n",
" %tensorboard --logdir outputs --port 6007 --host 0.0.0.0"
]
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "seismic-interpretation",
"language": "python",
"name": "seismic-interpretation"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -0,0 +1,109 @@
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,)
OUTPUT_DIR: 'outputs'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 50
LOG_CONFIG: logging.conf
SEED: 2019
DATASET:
NUM_CLASSES: 7
ROOT: /mnt/penobscot
CLASS_WEIGHTS: [0.02630481, 0.05448931, 0.0811898 , 0.01866496, 0.15868563, 0.0875993 , 0.5730662]
INLINE_HEIGHT: 1501
INLINE_WIDTH: 481
MODEL:
NAME: seg_hrnet
IN_CHANNELS: 3
PRETRAINED: '/data/hrnet_pretrained/image_classification/hrnetv2_w48_imagenet_pretrained.pth'
EXTRA:
FINAL_CONV_KERNEL: 1
STAGE2:
NUM_MODULES: 1
NUM_BRANCHES: 2
BLOCK: BASIC
NUM_BLOCKS:
- 4
- 4
NUM_CHANNELS:
- 48
- 96
FUSE_METHOD: SUM
STAGE3:
NUM_MODULES: 4
NUM_BRANCHES: 3
BLOCK: BASIC
NUM_BLOCKS:
- 4
- 4
- 4
NUM_CHANNELS:
- 48
- 96
- 192
FUSE_METHOD: SUM
STAGE4:
NUM_MODULES: 3
NUM_BRANCHES: 4
BLOCK: BASIC
NUM_BLOCKS:
- 4
- 4
- 4
- 4
NUM_CHANNELS:
- 48
- 96
- 192
- 384
FUSE_METHOD: SUM
TRAIN:
COMPLETE_PATCHES_ONLY: True
BATCH_SIZE_PER_GPU: 32
BEGIN_EPOCH: 0
END_EPOCH: 300
MIN_LR: 0.0001
MAX_LR: 0.02
MOMENTUM: 0.9
WEIGHT_DECAY: 0.0001
SNAPSHOTS: 5
AUGMENTATION: True
DEPTH: "none" #"patch" # Options are none, patch and section
STRIDE: 64
PATCH_SIZE: 128
AUGMENTATIONS:
RESIZE:
HEIGHT: 256
WIDTH: 256
PAD:
HEIGHT: 256
WIDTH: 256
MEAN: [-0.0001777, 0.49, -0.0000688] # First value is for images, second for depth and then combination of both
STD: [0.14076 , 0.2717, 0.06286]
MAX: 1
MODEL_DIR: "models"
VALIDATION:
BATCH_SIZE_PER_GPU: 128
COMPLETE_PATCHES_ONLY: True
TEST:
COMPLETE_PATCHES_ONLY: False
MODEL_PATH: "/data/home/mat/repos/DeepSeismic/experiments/segmentation/penobscot/local/output/penobscot/437970c875226e7e39c8109c0de8d21c5e5d6e3b/seg_hrnet/Sep25_144942/models/seg_hrnet_running_model_28.pth"
AUGMENTATIONS:
RESIZE:
HEIGHT: 256
WIDTH: 256
PAD:
HEIGHT: 256
WIDTH: 256

Просмотреть файл

@ -0,0 +1,59 @@
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 50
LOG_CONFIG: logging.conf
SEED: 2019
DATASET:
NUM_CLASSES: 6
ROOT: /data/dutchf3
CLASS_WEIGHTS: [0.7151, 0.8811, 0.5156, 0.9346, 0.9683, 0.9852]
MODEL:
NAME: patch_deconvnet_skip
IN_CHANNELS: 1
TRAIN:
BATCH_SIZE_PER_GPU: 64
BEGIN_EPOCH: 0
END_EPOCH: 100
MIN_LR: 0.001
MAX_LR: 0.02
MOMENTUM: 0.9
WEIGHT_DECAY: 0.0001
SNAPSHOTS: 5
AUGMENTATION: True
DEPTH: "none" #"patch" # Options are No, Patch and Section
STRIDE: 50
PATCH_SIZE: 99
AUGMENTATIONS:
RESIZE:
HEIGHT: 99
WIDTH: 99
PAD:
HEIGHT: 99
WIDTH: 99
MEAN: 0.0009997 # 0.0009996710808862074
STD: 0.20977 # 0.20976548783479299
MODEL_DIR: "models"
VALIDATION:
BATCH_SIZE_PER_GPU: 512
TEST:
MODEL_PATH: '/data/home/mat/repos/DeepSeismic/examples/interpretation/notebooks/output/models/model_patch_deconvnet_skip_2.pth'
TEST_STRIDE: 10
SPLIT: 'test1' # Can be both, test1, test2
INLINE: True
CROSSLINE: True
POST_PROCESSING:
SIZE: 99 #
CROP_PIXELS: 0 # Number of pixels to crop top, bottom, left and right

Просмотреть файл

@ -0,0 +1,59 @@
# UNet configuration
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
GPUS: (0,)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 50
LOG_CONFIG: logging.conf
SEED: 2019
DATASET:
NUM_CLASSES: 6
ROOT: '/data/dutchf3'
CLASS_WEIGHTS: [0.7151, 0.8811, 0.5156, 0.9346, 0.9683, 0.9852]
MODEL:
NAME: resnet_unet
IN_CHANNELS: 3
TRAIN:
BATCH_SIZE_PER_GPU: 16
BEGIN_EPOCH: 0
END_EPOCH: 10
MIN_LR: 0.001
MAX_LR: 0.02
MOMENTUM: 0.9
WEIGHT_DECAY: 0.0001
SNAPSHOTS: 5
AUGMENTATION: True
DEPTH: "section" # Options are No, Patch and Section
STRIDE: 50
PATCH_SIZE: 100
AUGMENTATIONS:
RESIZE:
HEIGHT: 200
WIDTH: 200
PAD:
HEIGHT: 256
WIDTH: 256
MEAN: 0.0009997 # 0.0009996710808862074
STD: 0.20977 # 0.20976548783479299
MODEL_DIR: "models"
TEST:
MODEL_PATH: ""
TEST_STRIDE: 10
SPLIT: 'Both' # Can be Both, Test1, Test2
INLINE: True
CROSSLINE: True
POST_PROCESSING:
SIZE: 128
CROP_PIXELS: 14 # Number of pixels to crop top, bottom, left and right

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше