Граф коммитов

2616 Коммитов

Автор SHA1 Сообщение Дата
Wei-ge Chen 0c2ebec20b Fixed more flake8 warnings 2023-04-28 11:22:15 -07:00
Wei-ge Chen b5d00d3098 Autoformatted with black 2023-04-28 10:44:19 -07:00
wchen-github fb9ce81b49
Merge pull request #235 from microsoft/task_facial_landmark_detection
Task_facial_landmark_detection
2023-04-28 10:15:20 -07:00
Gustavo de Rosa bd8684979b
fix(tasks): Fixes issue #227. 2023-04-28 09:01:45 -03:00
Wei-ge Chen f9eeb0140c Missed in previous commit 2023-04-27 14:08:28 -07:00
Chris Lovett 93b8ab75d7
Finalize the AML test run (#236)
* Add a --test option that runs only the data prep step to test the environment is working.

* force train.py to grab the lock on the row (removing rare failure case).

* Fix snpe kubernetes scaling using the anti-node affinity pattern.
Publish new docker image.
Add mlflow integration to train.py.

* Add script that does full training pipeline for final pareto models.

* add iteration 7

* add iteration 9

* switch to bokeh so I can get nice tooltips on each dot in the scatter plot.

* add axis titles.

* Add device F1 scoring to train_pareto
Add more to readmes.

* add image

* Add helper script to do final F1 scoring on Qualcomm devices.

* fix lint errors.

* fix bugs
2023-04-27 13:03:08 -07:00
Wei-ge Chen efc4b5b01f Resolve comments for merge PR 2023-04-27 12:35:03 -07:00
Chris Lovett 5376cdbf04 Merge branch 'main' into task_facial_landmark_detection 2023-04-26 14:56:54 -07:00
Wei-ge Chen 2673902b15 Further clean up 2023-04-26 11:29:00 -07:00
Wei-ge Chen 7b334a2a4f Further clean up 2023-04-26 11:28:23 -07:00
Wei-ge Chen 8e89402d40 Remove hard coded paths. Clean up more. 2023-04-25 16:00:15 -07:00
Wei-ge Chen 11145b86f3 Missed this file 2023-04-25 15:49:29 -07:00
Wei-ge Chen 2694b1e82c More clean up + full training results 2023-04-25 15:48:13 -07:00
Wei-ge Chen c3af4e4f1f Fixed arg name for archid 2023-04-25 10:50:19 -07:00
Wei-ge Chen c3b9c61ec3 Patch up the previous commit 2023-04-25 10:12:33 -07:00
Wei-ge Chen d01ff3a1fc Now have the 1st round of full training result 2023-04-25 10:12:06 -07:00
Wei-ge Chen de2c1cf3c7 Update the CSV file 2023-04-24 22:11:22 -07:00
Chris Lovett 4a3cf62a77
robustify the error case a bit more so search jobs can continue when a small number of training jobs fail. (#234)
* remove old notebook

* store onnx latency
allow aml partial training with no snapdragon mode

* fix docker file now that aml branch is merged.

* fix bug in reset
add notebook

* add link to notebook.

* Add an on_start_iteration callback so that user can track which models came from which iterations.

* fix conda file.

* new version

* robustify the error case a bit more so search jobs can continue when a small number of training jobs fail.

* re-use onnx latency numbers.
2023-04-24 17:53:25 -07:00
Wei-ge Chen 212a1d654f Fixed typo 2023-04-24 16:04:20 -07:00
Wei-ge Chen c4529b4a00 Fixed typo. 2023-04-24 16:01:23 -07:00
Wei-ge Chen f9e9287c5a Remove unused code 2023-04-24 15:56:21 -07:00
Wei-ge Chen 3c289d01ec Add script to train all pareto models 2023-04-24 15:48:29 -07:00
Wei-ge Chen a9db06abc6 Enable full training on candidate models 2023-04-24 12:28:34 -07:00
Wei-ge Chen 89a913981f Full search is finished 2023-04-24 09:07:32 -07:00
Chris Lovett 5410e8bdd1
fix conda file (#233)
* remove old notebook

* store onnx latency
allow aml partial training with no snapdragon mode

* fix docker file now that aml branch is merged.

* fix bug in reset
add notebook

* add link to notebook.

* Add an on_start_iteration callback so that user can track which models came from which iterations.

* fix conda file.
2023-04-23 13:45:40 -07:00
Chris Lovett 0165b708b7
Add an on_start_iteration callback on Searcher so that user can track which models came from which iterations (#232)
* remove old notebook

* store onnx latency
allow aml partial training with no snapdragon mode

* fix docker file now that aml branch is merged.

* fix bug in reset
add notebook

* add link to notebook.

* Add an on_start_iteration callback so that user can track which models came from which iterations.
2023-04-23 00:33:05 -07:00
dependabot[bot] 02b03d8c42
Bump protobuf in /tasks/face_segmentation/aml/docker/quantizer (#228)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.20 to 3.20.2.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.20.0...v3.20.2)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-23 00:32:44 -07:00
Gustavo de Rosa ec72afd749
Merge pull request #212 from sgunasekar/patch-1
Ignore errors in loading tokenizer from_cache
2023-04-22 12:58:16 -03:00
Chris Lovett fc031eb447
remove old notebook (#231)
* remove old notebook

* store onnx latency
allow aml partial training with no snapdragon mode

* fix docker file now that aml branch is merged.

* fix bug in reset
add notebook

* add link to notebook.
2023-04-22 03:23:38 -07:00
Suriya Gunasekar 808756cb96
Update fast_hf_dataset_provider.py 2023-04-21 19:45:48 -07:00
Wei-ge Chen 8352dd39f0 Further clean up. Add copyright statements. 2023-04-21 12:24:43 -07:00
Chris Lovett aae38db1a5
Add Azure ML running to the face segmentation task. (#217)
* add code owners

* initial commit, beginnings of AML version of face synthetics search pipeline.

* Add download_and_extract_zip
Add download capability to FaceSyntheticsDataset
Fix face segmentation data prep script.

* fix bugs

* cleanup launch.json

* cleanup launch.json
add download capability to FaceSyntheticsDataset
add download_and_extract_zip helper

* fix file count test

* work in progress

* work in progress

* unify snpe status table and aml training table.

* fix experiment referencing

* fix experiment referencing

* work in progress

* fix complete status

* fix bugs

* fix bug

* fix metric key, we have 2, one for remote snpe, and another for aml training pipelines.

* pass seed through to the search.py script.

* fix use of AzureMLOnBehalfOfCredential

* fix bugs

* fix bugs

* publish new image

* fix bugs

* fix bugs

* fix bug

* maerge

* revert

* new version

* fix bugs

* rename the top level folder from 'snpe' to 'aml' and move all AML code into this folder except the top level entry point 'aml.py'
make the keys returned from the JobCompletionMonitor wait method configurable
Rename AmlPartialTrainingEvaluator and make it restartable.
Turn off save_pareto_model_weights
Remove redundant copy of JobCompletionMonitor

* rev the versions.

* updates to readme information.

* only inference testing targets are 'cpu' and 'snp', trigger the aml partial training by a different key in the config file.

* add iteration info

* new version.

* fix ordering of results from AmlPartialTrainingEvaluator

* change AML batch size default to 64 for faster training
don't store MODEL_STORAGE_CONNECTION_STRING

* Fix bug in merge_status_entity, add more unit test coverage

* new version

* Store training time in status table.

* improve diagram.

* save iteration in status table.

* pick up new version of archai to fix randomness bug in the EvolutionParetoSearch so that these search jobs are restartable.
2023-04-21 10:47:58 -07:00
Wei-ge Chen 424d37e791 Finished clean up this file 2023-04-21 09:54:27 -07:00
Chris Lovett 4fc5a6d068
Fix some randomness in evolutionary pareto search not coming from given seed. (#225)
Add unit tests to cover this.
2023-04-20 22:37:39 -07:00
Wei-ge Chen 96446f6251 Improve transform code 2023-04-20 16:43:51 -07:00
Wei-ge Chen d552502c6d simplying code 2023-04-20 13:36:14 -07:00
Chris Lovett e225885676
Fix bug in merge_status_entity, add more unit test coverage (#224) 2023-04-19 18:48:13 -07:00
Wei-ge Chen dab2b68a37 Code simplification 2023-04-19 16:28:37 -07:00
Chris Lovett ebf69f432d
add retry logic to store to fix issues with tokens timing out after a very long time in long running jobs. (#223) 2023-04-19 16:27:13 -07:00
Wei-ge Chen 0d7f486666 Remove unused code 2023-04-19 15:41:57 -07:00
Wei-ge Chen e1e2156611 Remove unused functions 2023-04-19 15:17:34 -07:00
Wei-ge Chen 41b4b2aa1f Moving code around 2023-04-19 15:06:00 -07:00
Wei-ge Chen e2aee26da7 Continued clean up 2023-04-19 13:50:28 -07:00
Chris Lovett aa1ab058ab
fix bug in check for complete status. (#222) 2023-04-19 04:25:09 -07:00
Wei-ge Chen e5d7271e5f Adding missed import 2023-04-18 18:02:04 -07:00
Wei-ge Chen 92cfab6f23 Continued clean up: simply arg names 2023-04-18 17:58:45 -07:00
Wei-ge Chen f07b3787c0 Change file names for clarity 2023-04-18 15:11:35 -07:00
Wei-ge Chen 5e7cb01718 Continued clean up 2023-04-18 12:29:01 -07:00
Wei-ge Chen ab87615d4f Continued clean up 2023-04-18 12:19:55 -07:00
Chris Lovett 20dde75800
add retry logic to store to work around an error that can happen on azure VM's that are still booting up their network connections: azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x14952e8a71c0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution. (#221) 2023-04-18 12:08:30 -07:00