Jeff Omhover
f4c575359b
reduce models loaded into memory during unit tests ( #151 )
...
* reduce models loaded into memory
* remove 1 model from train unit test
* doc linting
2022-07-21 15:34:29 -07:00
edebar01
1083aceef4
add sample for sklearn 1.1 ( #132 )
...
* add sample
* improvements
* improvements
* fix
* add cpu/gpu
* fix syntax
* add newlinea at end of file
* update test name
* add azureml core to requirements
* fix typo
* fix path
* fix
* change environment to azureaiml version
* change credential
* fix import
* change back to default credential
* add powershell login to test workflow
* change to azurecli creds
* fix import
* alter src path
* try to add version to env name
* attempt syntax fix
* add description
* add experiment name
* fix
* get rid of comments and dummy test
* rename test name
* resolve tags before testing
* print env config
* print extra config
* extra config as object
* only resolve tags for env asset types
* assert job status
* fix syntax
* remove status assert
* remove compute env variables from this PR
* improve log
* remove null check
2022-07-21 14:57:23 -04:00
edebar01
bc38982761
add tensorflow 2.8/2.9 examples ( #133 )
...
* add tensorflow 2.8 example
* fix code health
* rename test
* add tensorflow 2.9 sample
2022-07-21 14:57:13 -04:00
Louie Larson
c061feebcd
Enforce docstrings (PEP 257) on new commits ( #148 )
...
* Remove pep257 exclusions and check only changed scripts
2022-07-21 14:04:53 -04:00
Jeff Omhover
71c1e0e0cc
use same datastore everywhere ( #145 )
2022-07-20 16:54:17 -07:00
Jeff Omhover
59722a3745
add job to create pets dataset ( #144 )
...
* add job to create pets dataset
* align path in jobs
2022-07-20 16:54:08 -07:00
Louie Larson
c97a99255b
Add more code health checks ( #146 )
...
* Start checking code documentation
* Allow pydocstyle validation, enforce some line lengths
* Remove deprecated scripts
2022-07-20 14:45:32 -04:00
Jeff Omhover
a1a36f636e
fix path in jobs and specs ( #143 )
...
* fix path in jobs and specs
* add PYTHONPATH hack to both runnable files
* ignore E402
2022-07-18 22:20:05 -07:00
wayliums
49e49ae21a
Create readme.md ( #141 )
2022-07-18 17:54:05 -07:00
Jeff Omhover
6bdc0e4428
align epoch train/valid metrics ( #142 )
...
* align epoch train/valid metrics
2022-07-18 17:24:07 -07:00
Jeff Omhover
29712e7f26
Add pynvml to retrieve GPU/CUDA parameters to log ( #128 )
...
* add nvml method to get gpu params
* upgrade env in components
* upgrade env in jobs
2022-07-18 10:03:55 -07:00
Jeff Omhover
7774d582d0
Align structure of TF component with PT (common folder) ( #138 )
...
* rename tf folder
* passing unit test
2022-07-15 21:54:06 -07:00
Jeff Omhover
4c8063e675
Create common folder to share code between pt/pf, organize pt code to be extensible ( #131 )
...
* rename components into scripts
* restructure pt folder
* align paths
* move profiler to training sequence
* fix import path
2022-07-15 15:08:20 -07:00
edebar01
65b7235bef
add computes to env variables ( #136 )
2022-07-15 15:21:22 -04:00
Komnus丶Q
6dea29e9b1
E2E Test Framework for Asset Publishing ( #114 )
...
Edited GitHub Actions workflow.
Added Scripts for main->release process and E2E Tests.
Added example test job and tests.yml for training team as a prototype.
Added E2E Test Framework and Publishing workflow for EV2
Co-authored-by: Louie Larson <lolarson@microsoft.com>
Co-authored-by: Louie Larson <lolarson@microsoft.com>
2022-07-15 10:19:25 -07:00
Louie Larson
e3ef655284
Remove remaining env script ( #135 )
2022-07-15 09:18:03 -04:00
Louie Larson
7d8dd49c88
Remove env script ( #134 )
2022-07-14 15:29:30 -04:00
wayliums
ce6204e66e
Create CODEOWNERS ( #130 )
...
* Create CODEOWNERS
* Update CODEOWNERS
* Update CODEOWNERS
2022-07-08 16:06:46 -07:00
edebar01
c9fc3fdaee
add azure login to assets-test ( #125 )
...
* add azure login to assets-test
* add environment variables
* test envionment variables
2022-07-08 13:01:50 -04:00
Louie Larson
259b180f9c
Improve validation of Python scripts ( #129 )
...
* Add copyright validation
* Add flake8 tests
2022-07-08 12:28:41 -04:00
Jeff Omhover
12241fe6cd
Use cliv2 job for Tensorflow training benchmark ( #113 )
2022-07-07 13:36:08 -07:00
Jeff Omhover
a7d47bdc85
Implement 2 sampling strategies in pytorch distributed ( #126 )
...
* implement 2 sampling strategies
* fix subset
2022-07-07 13:02:22 -07:00
Jeff Omhover
19b750de7f
Add JSON export to pytorch profiler outputs ( #115 )
...
* extend profiling with json export
* restrict profiling to first gpu
2022-07-07 13:00:04 -07:00
kicha0
29bbc78c32
RAI: Initial rai assets commit. ( #127 )
2022-07-07 10:23:26 -07:00
Louie Larson
718546e19c
Allow image tag regex ( #123 )
2022-07-05 13:26:36 -04:00
Jeff Omhover
29c0347fe8
Implement further pt optimizations for benchmark script ( #121 )
...
* implement further pt optimizations
* rename parameter
2022-07-01 11:57:10 -07:00
Jeff Omhover
9b2bd80013
Upgrade nvidia environment to 22.06 ( #117 )
...
* upgrade nvidia environment
* Update spec.yaml
2022-07-01 10:28:14 -07:00
Jeff Omhover
2dc7b52482
add wall time ( #120 )
2022-07-01 10:27:53 -07:00
edebar01
43ae7f1779
add tensorflow 2.9 ( #119 )
2022-07-01 10:26:38 -04:00
edebar01
def16d0faf
add sklearn 1.1 curated environment ( #118 )
...
* add sklearn 1.1 curated env
* add conda spec
* add new line to end of conda spec
2022-07-01 10:26:30 -04:00
Jeff Omhover
253b948ddb
Add image segmentation benchmark based on tensorflow+unet ( #107 )
2022-06-29 10:01:30 -07:00
Jeff Omhover
e10401e081
add params/metrics to align with Tensorflow code ( #112 )
2022-06-29 09:59:33 -07:00
Louie Larson
63ae6650cc
Rename environments to existing naming standard ( #110 )
...
* Rename environments to existing naming standard
* Use image names that match the existing convention
2022-06-23 14:54:05 -04:00
daholste
4e65759ad8
Add classification and instance segmentation components ( #108 )
...
* Add classification and instance segmentation components
2022-06-23 10:16:26 -07:00
Louie Larson
8bc3805451
Fix changed asset detection ( #111 )
...
* Fix reference to release dir
* Handle manually-versioned assets better
* Skip build step if no envs
2022-06-23 12:41:15 -04:00
Jeff Omhover
72e490c4db
refactor structure of pt component ( #109 )
2022-06-22 15:29:39 -07:00
Jeff Omhover
835633f30e
Add nvidia/tensorflow to benchmark environments ( #105 )
...
* Add tf environment
* modify readme
* Update env.yml
2022-06-21 13:33:37 -07:00
Louie Larson
f9b6853fa1
Fix tags on pytorch-1.11 ( #106 )
2022-06-21 16:16:26 -04:00
wayliums
2adc511601
Update README.md
2022-06-20 21:27:01 -07:00
Jeff Omhover
cfd3250a8f
Add export script
2022-06-20 21:22:48 -07:00
wayliums
9aa7ce7c11
Update README.md ( #104 )
2022-06-20 17:31:37 -07:00
Louie Larson
1b01823624
Reduce curated environments to just new ones ( #103 )
...
* Reduce curated environments to just new ones
* Update to CUDA 11.3
2022-06-20 12:05:02 -04:00
Louie Larson
feef7b79b2
Improve payload handling ( #102 )
2022-06-16 10:14:44 -04:00
Louie Larson
8a9e2be928
Add DisableDockerDetector for nvcr.io, add Horovod ( #101 )
...
* Add DisableDockerDetector for nvcr.io, add Horovod
* Add protobuf dependency
2022-06-16 07:49:07 -04:00
Louie Larson
5b994cac30
Fix TensorFlow environments ( #100 )
2022-06-15 15:36:21 -04:00
Louie Larson
c541285fe2
Fix deployment config ( #99 )
...
* Fix full image name references in config files
* Fix ACR push semantics
2022-06-14 19:19:49 -04:00
Jeff Omhover
029b5d62ec
Rename vision/ folder to benchmark/ and remove assets (to not publish) ( #95 )
...
* rename everything
* rename build
* remove assets
* force downgrade of protobuf for tests
Co-authored-by: amah <ma.mahmoudzadeh@gmail.com>
2022-06-14 11:36:07 -07:00
amah
cfa6f377e6
load model with retry ( #79 )
...
Co-authored-by: amah <ma.mahmoudzadeh@gmail.com>
2022-06-14 11:05:47 -07:00
Louie Larson
3618dfbfa1
Allow spec.image to be None
2022-06-14 12:17:55 -04:00
Louie Larson
1290f7b69b
Fix output dirs ( #98 )
2022-06-14 10:43:26 -04:00