Граф коммитов

3454 Коммитов

Автор SHA1 Сообщение Дата
Jeff Omhover f4c575359b reduce models loaded into memory during unit tests (#151)
* reduce models loaded into memory
* remove 1 model from train unit test
* doc linting
2022-07-21 15:34:29 -07:00
edebar01 1083aceef4 add sample for sklearn 1.1 (#132)
* add sample

* improvements

* improvements

* fix

* add cpu/gpu

* fix syntax

* add newlinea at end of file

* update test name

* add azureml core to requirements

* fix typo

* fix path

* fix

* change environment to azureaiml version

* change credential

* fix import

* change back to default credential

* add powershell login to test workflow

* change to azurecli creds

* fix import

* alter src path

* try to add version to env name

* attempt syntax fix

* add description

* add experiment name

* fix

* get rid of comments and dummy test

* rename test name

* resolve tags before testing

* print env config

* print extra config

* extra config as object

* only resolve tags for env asset types

* assert job status

* fix syntax

* remove status assert

* remove compute env variables from this PR

* improve log

* remove null check
2022-07-21 14:57:23 -04:00
edebar01 bc38982761 add tensorflow 2.8/2.9 examples (#133)
* add tensorflow 2.8 example

* fix code health

* rename test

* add tensorflow 2.9 sample
2022-07-21 14:57:13 -04:00
Louie Larson c061feebcd Enforce docstrings (PEP 257) on new commits (#148)
* Remove pep257 exclusions and check only changed scripts
2022-07-21 14:04:53 -04:00
Jeff Omhover 71c1e0e0cc use same datastore everywhere (#145) 2022-07-20 16:54:17 -07:00
Jeff Omhover 59722a3745 add job to create pets dataset (#144)
* add job to create pets dataset
* align path in jobs
2022-07-20 16:54:08 -07:00
Louie Larson c97a99255b Add more code health checks (#146)
* Start checking code documentation
* Allow pydocstyle validation, enforce some line lengths
* Remove deprecated scripts
2022-07-20 14:45:32 -04:00
Jeff Omhover a1a36f636e fix path in jobs and specs (#143)
* fix path in jobs and specs
* add PYTHONPATH hack to both runnable files
* ignore E402
2022-07-18 22:20:05 -07:00
wayliums 49e49ae21a Create readme.md (#141) 2022-07-18 17:54:05 -07:00
Jeff Omhover 6bdc0e4428 align epoch train/valid metrics (#142)
* align epoch train/valid metrics
2022-07-18 17:24:07 -07:00
Jeff Omhover 29712e7f26 Add pynvml to retrieve GPU/CUDA parameters to log (#128)
* add nvml method to get gpu params
* upgrade env in components
* upgrade env in jobs
2022-07-18 10:03:55 -07:00
Jeff Omhover 7774d582d0 Align structure of TF component with PT (common folder) (#138)
* rename tf folder
* passing unit test
2022-07-15 21:54:06 -07:00
Jeff Omhover 4c8063e675 Create common folder to share code between pt/pf, organize pt code to be extensible (#131)
* rename components into scripts
* restructure pt folder
* align paths
* move profiler to training sequence
* fix import path
2022-07-15 15:08:20 -07:00
edebar01 65b7235bef add computes to env variables (#136) 2022-07-15 15:21:22 -04:00
Komnus丶Q 6dea29e9b1 E2E Test Framework for Asset Publishing (#114)
Edited GitHub Actions workflow. 
Added Scripts for main->release process and E2E Tests.
Added example test job and tests.yml for training team as a prototype.
Added E2E Test Framework and Publishing workflow for EV2

Co-authored-by: Louie Larson <lolarson@microsoft.com>

Co-authored-by: Louie Larson <lolarson@microsoft.com>
2022-07-15 10:19:25 -07:00
Louie Larson e3ef655284 Remove remaining env script (#135) 2022-07-15 09:18:03 -04:00
Louie Larson 7d8dd49c88 Remove env script (#134) 2022-07-14 15:29:30 -04:00
wayliums ce6204e66e Create CODEOWNERS (#130)
* Create CODEOWNERS

* Update CODEOWNERS

* Update CODEOWNERS
2022-07-08 16:06:46 -07:00
edebar01 c9fc3fdaee add azure login to assets-test (#125)
* add azure login to assets-test

* add environment variables

* test envionment variables
2022-07-08 13:01:50 -04:00
Louie Larson 259b180f9c Improve validation of Python scripts (#129)
* Add copyright validation
* Add flake8 tests
2022-07-08 12:28:41 -04:00
Jeff Omhover 12241fe6cd Use cliv2 job for Tensorflow training benchmark (#113) 2022-07-07 13:36:08 -07:00
Jeff Omhover a7d47bdc85 Implement 2 sampling strategies in pytorch distributed (#126)
* implement 2 sampling strategies
* fix subset
2022-07-07 13:02:22 -07:00
Jeff Omhover 19b750de7f Add JSON export to pytorch profiler outputs (#115)
* extend profiling with json export
* restrict profiling to first gpu
2022-07-07 13:00:04 -07:00
kicha0 29bbc78c32 RAI: Initial rai assets commit. (#127) 2022-07-07 10:23:26 -07:00
Louie Larson 718546e19c Allow image tag regex (#123) 2022-07-05 13:26:36 -04:00
Jeff Omhover 29c0347fe8 Implement further pt optimizations for benchmark script (#121)
* implement further pt optimizations
* rename parameter
2022-07-01 11:57:10 -07:00
Jeff Omhover 9b2bd80013 Upgrade nvidia environment to 22.06 (#117)
* upgrade nvidia environment

* Update spec.yaml
2022-07-01 10:28:14 -07:00
Jeff Omhover 2dc7b52482 add wall time (#120) 2022-07-01 10:27:53 -07:00
edebar01 43ae7f1779 add tensorflow 2.9 (#119) 2022-07-01 10:26:38 -04:00
edebar01 def16d0faf add sklearn 1.1 curated environment (#118)
* add sklearn 1.1 curated env

* add conda spec

* add new line to end of conda spec
2022-07-01 10:26:30 -04:00
Jeff Omhover 253b948ddb Add image segmentation benchmark based on tensorflow+unet (#107) 2022-06-29 10:01:30 -07:00
Jeff Omhover e10401e081 add params/metrics to align with Tensorflow code (#112) 2022-06-29 09:59:33 -07:00
Louie Larson 63ae6650cc Rename environments to existing naming standard (#110)
* Rename environments to existing naming standard
* Use image names that match the existing convention
2022-06-23 14:54:05 -04:00
daholste 4e65759ad8 Add classification and instance segmentation components (#108)
* Add classification and instance segmentation components
2022-06-23 10:16:26 -07:00
Louie Larson 8bc3805451 Fix changed asset detection (#111)
* Fix reference to release dir
* Handle manually-versioned assets better
* Skip build step if no envs
2022-06-23 12:41:15 -04:00
Jeff Omhover 72e490c4db refactor structure of pt component (#109) 2022-06-22 15:29:39 -07:00
Jeff Omhover 835633f30e Add nvidia/tensorflow to benchmark environments (#105)
* Add tf environment
* modify readme
* Update env.yml
2022-06-21 13:33:37 -07:00
Louie Larson f9b6853fa1 Fix tags on pytorch-1.11 (#106) 2022-06-21 16:16:26 -04:00
wayliums 2adc511601 Update README.md 2022-06-20 21:27:01 -07:00
Jeff Omhover cfd3250a8f Add export script 2022-06-20 21:22:48 -07:00
wayliums 9aa7ce7c11 Update README.md (#104) 2022-06-20 17:31:37 -07:00
Louie Larson 1b01823624 Reduce curated environments to just new ones (#103)
* Reduce curated environments to just new ones
* Update to CUDA 11.3
2022-06-20 12:05:02 -04:00
Louie Larson feef7b79b2 Improve payload handling (#102) 2022-06-16 10:14:44 -04:00
Louie Larson 8a9e2be928 Add DisableDockerDetector for nvcr.io, add Horovod (#101)
* Add DisableDockerDetector for nvcr.io, add Horovod
* Add protobuf dependency
2022-06-16 07:49:07 -04:00
Louie Larson 5b994cac30 Fix TensorFlow environments (#100) 2022-06-15 15:36:21 -04:00
Louie Larson c541285fe2 Fix deployment config (#99)
* Fix full image name references in config files
* Fix ACR push semantics
2022-06-14 19:19:49 -04:00
Jeff Omhover 029b5d62ec Rename vision/ folder to benchmark/ and remove assets (to not publish) (#95)
* rename everything
* rename build
* remove assets
* force downgrade of protobuf for tests

Co-authored-by: amah <ma.mahmoudzadeh@gmail.com>
2022-06-14 11:36:07 -07:00
amah cfa6f377e6 load model with retry (#79)
Co-authored-by: amah <ma.mahmoudzadeh@gmail.com>
2022-06-14 11:05:47 -07:00
Louie Larson 3618dfbfa1 Allow spec.image to be None 2022-06-14 12:17:55 -04:00
Louie Larson 1290f7b69b Fix output dirs (#98) 2022-06-14 10:43:26 -04:00