Граф коммитов

11 Коммитов

Автор SHA1 Сообщение Дата
Anton Schwaighofer ccb53d01ad
Improve recovery of preempted jobs (#633)
Autosaving checkpoints by default every 1 epoch to a fixed file name. Retiring the "top k" recovery checkpoint notion because that was tied to specific models that needed more than 1 checkpoint.
2022-01-17 12:05:39 +00:00
Anton Schwaighofer 015e9e4829
Moving nii.gz from git lfs to git to simplify the HelloWorld test (#632) 2022-01-11 16:19:17 +00:00
Anton Schwaighofer e477c9dd83
Upgrade to Pytorch Lightning 1.5.5 (#591) 2021-12-15 10:48:35 +00:00
Daniel Coelho de Castro f5b7298c57
Add `--pl_deterministic` to build training jobs (#605) 2021-12-06 19:28:23 +00:00
Anton Schwaighofer 64646c5106
Update to fix issues in daily build (#545)
Daily builds fail of segmentation models with missing files for validation set.
2021-07-19 17:45:45 +01:00
Anton Schwaighofer 53999877d0
Enable a disabled test (#536)
Enabling required changing several stored result files because of the PL upgrade that happened in between.
2021-07-15 20:54:31 +01:00
melanibe 30d515b5b2
Update PL to 1.3.8 (#531)
* update pl

* fix one test

* our fix not needed anymore

* fix yet another test

* add new torchmetrics

* fix checkpoints

* fix some test

* fix one test more

* attempt to fix test

* Update byol code to match new pl bolts

* needed to update

* back to how it was

* update

* update

* changelog

* update regression metrics

* skip test on wind

* flake8

* forgot to update this

* mypy

* remove comment

* try to see if problem comes from sync dist flag

* few fixes

* Update expected number of subjects

* correct more

* flake8

Co-authored-by: Anton Schwaighofer <antonsc@microsoft.com>
2021-07-13 10:24:20 +01:00
Jonathan Tripp 4425a4d7d4
Use best epoch for model comparison (#495)
* Use best epoch for model comparison
2021-06-22 16:38:19 +01:00
Anton Schwaighofer e6f83c3779
Remove 2 files that cause PR build to fail non-reproducibly (#500) 2021-06-22 10:53:24 +00:00
Anton Schwaighofer be36e39206
Bug fix for regression test (#496)
- The regresssion test code in #492 missed out files in AzureML, because the wrong function was called in the main workflow.
- Cleaner folder structure for regression test results
2021-06-21 14:39:09 +01:00
Anton Schwaighofer 01c31ed0e5
Regression test coverage for AzureML runs (#492)
- Enable regression tests on text and binary files, that are either produced by the job or uploaded to the run context
- Adding a large set of these regression test files to all models in PR builds
2021-06-17 20:37:57 +00:00