Граф коммитов

45 Коммитов

Автор SHA1 Сообщение Дата
Zeliang Tian b2f58696de
Zetia/ensure k8s compute (#3265)
* add ensure_amlarc_compute function

* ensure k8s compute

* ensure k8s compute

---------

Co-authored-by: Ubuntu <zetia@DevBox-zetia.1jltvvkrfgyuhl3llhmbyldkog.bx.internal.cloudapp.net>
2024-07-06 16:51:02 +08:00
Zeliang Tian 735144d3fa
detach compute before attaching, as k8s compute doesn't support update (#3030)
* detach compute before attaching

* fix typo

* refine log

---------

Co-authored-by: Ubuntu <zetia@DevBox-zetia.1jltvvkrfgyuhl3llhmbyldkog.bx.internal.cloudapp.net>
2024-03-26 11:05:53 +08:00
Diondra d43ed2adc3
Fix readme validation failures (#3003)
* Add readme template to contributing.md

* Update exclusion list

* Add more paths to exclusion file

* Add debugging statements

* Wrap in try/except

* Wrap in try/except

* Run black to format

* Add sample to exclusions list
2024-02-14 20:17:50 -08:00
kdestin 50495339ce
ci: Fix 'file does not exist' when running check-readme (#3002)
* ci: Fix computation of repo_root in check-readme.py

    __file__ is a relative path for Python 3.5-3.8, but absolute
    in Python>=3.9

    This distinction causes us to calculate the incorrect path for the
    root of the repo:

      Path.parent will return '.' once you hit the "root" of a relative
      path (i.e. Path('./README.md').parent.parent == Path('.'))

* refactor: Rename working_directory to repo_root
2024-02-13 16:39:32 -05:00
XI JIN aed87ffbfc
Readme exclusion (#2988)
Co-authored-by: Xi Jin <jinxi@microsoft.com>
2024-02-06 12:49:06 -08:00
Rehaan Bhimani e9e8589b1b
add readme exclusions (#2984) 2024-02-05 19:01:43 -07:00
Rehaan Bhimani 7b222c7a4c
add readme exclusions (#2979) 2024-02-02 15:37:08 -08:00
Brynn Yin 124d806311
[Pipeline] Update deprecated image & add missing notebook to check readme ignore files (#2978)
* Update deprecated image

Signed-off-by: Brynn Yin <biyi@microsoft.com>

* Fix more images

Signed-off-by: Brynn Yin <biyi@microsoft.com>

* Use latest version of sklearn

Signed-off-by: Brynn Yin <biyi@microsoft.com>

* Trigger component register

Signed-off-by: Brynn Yin <biyi@microsoft.com>

* Update image

Signed-off-by: Brynn Yin <biyi@microsoft.com>

* Use fixed sklearn

Signed-off-by: Brynn Yin <biyi@microsoft.com>

---------

Signed-off-by: Brynn Yin <biyi@microsoft.com>
2024-02-02 20:41:29 +08:00
Diondra cec9cbb5ee
Add readme check (#2955)
* Add script to check README.md and README template

* Refactor check-readme.sh and add messages

* Add step to cli-assets-component-pipeline workflow to test

* Update workflow

* Update workflow

* Fix syntax error

* Update action

* Fix typo

* Add exclusion logic

* Try adding readme validation to readme.py

* Update workflows with readme.py

* Update workflows with readme.py

* Update workflows with readme.py

* Update workflows with readme.py

* Fix github workspace variable

* Fix github workspace variables for exclusion check

* Fix exclusion logic

* Update readme exclusions

* Revert accidental change

* Update readme_exclusions.txt

* Revert accidental change to working directory

* Update check-readme.sh

* Update exclusion logic and file name

* Add debug message

* Update exclusion logic and file name

* FIx working directory file names

* Update exclusion logic and file name

* Add debugging statements

* Remove debugging statements

* Update readme.py and regenerate cli workflows

* Update working-directory

* Update readme to add validate readme to sdk workflows

* Move templates folder inside infra

* Add validate readme check to tutorials and sdk/python files

* Fix readme.py and revert unintended changes

* Revert unnecessary changes and update readme_validation_exclusions.txt

* Update exclusions list

* Update template

* Update readme_validation_exclusions.txt

* Update exclusions

* Replace check-readme.sh with check-readme.py

* Update readme template

* Update check-readme.py with Kevin's suggestions

* Remove 2nd CLI argument

* Add debugging messages

* Add debugging messages

* Add debugging messages

* Update debugging messages

* Update debugging messages

* Strip whitespace

* Convert sample path to string

* Fix exclusions

* Remove debugging messages

* re-format with black

* Manually update one sample with logging logic to test

* Regenerate resolved notebooks

* Remove appinsights logging step

* Exclude /home/runner/work/azureml-examples/azureml-examples/cli/jobs/automl-standalone-jobs/cli-automl-forecasting-task-github-dau

* Revert changes to notebooks

* Add whitespace back
2024-01-30 11:45:35 -08:00
Diondra 1e212cda6a
Revert "Add readme validation step" (#2951) 2024-01-12 15:13:46 -08:00
Diondra a8431e1b9c
Add readme validation step (#2748)
* Add script to check README.md and README template

* Refactor check-readme.sh and add messages

* Add step to cli-assets-component-pipeline workflow to test

* Update workflow

* Update workflow

* Fix syntax error

* Update action

* Fix typo

* Add exclusion logic

* Try adding readme validation to readme.py

* Update workflows with readme.py

* Update workflows with readme.py

* Update workflows with readme.py

* Update workflows with readme.py

* Fix github workspace variable

* Fix github workspace variables for exclusion check

* Fix exclusion logic

* Update readme exclusions

* Revert accidental change

* Update readme_exclusions.txt

* Revert accidental change to working directory

* Update check-readme.sh

* Update exclusion logic and file name

* Add debug message

* Update exclusion logic and file name

* FIx working directory file names

* Update exclusion logic and file name

* Add debugging statements

* Remove debugging statements

* Update readme.py and regenerate cli workflows

* Update working-directory

* Update readme to add validate readme to sdk workflows

* Move templates folder inside infra

* Add validate readme check to tutorials and sdk/python files

* Fix readme.py and revert unintended changes

* Revert unnecessary changes and update readme_validation_exclusions.txt

* Update exclusions list

* Update template

* Update readme_validation_exclusions.txt

* Update exclusions

* Replace check-readme.sh with check-readme.py

* Update readme template

* Update check-readme.py with Kevin's suggestions

* Remove 2nd CLI argument

* Add debugging messages

* Add debugging messages

* Add debugging messages

* Update debugging messages

* Update debugging messages

* Strip whitespace

* Convert sample path to string

* Fix exclusions

* Remove debugging messages

* re-format with black

* Manually update one sample with logging logic to test

* Regenerate resolved notebooks

* Remove appinsights logging step

* Exclude /home/runner/work/azureml-examples/azureml-examples/cli/jobs/automl-standalone-jobs/cli-automl-forecasting-task-github-dau
2024-01-12 15:36:15 -05:00
jeff-shepherd 6947079b84
Added 2 hour delay before cleanup (#2937)
* Added 2 hour delay before cleanup to prevent resource from being deleted when they are still in use

* Removed duplicate cleanup script
2024-01-04 17:50:29 -08:00
kdestin 6eb55ad0ce
fix: Temporarily allow bootstrapping to proceed past acr permission granting error (#2912) 2023-12-12 20:17:05 -05:00
Aswin Nagarajan e0032242d5
Enable create new ws for OAI v2 and changed chat dataset format (#2854)
* enable create new ws for oai

* set env var

* export env var

* setting location

* setting location

* removed acr access

* removed all unnecessary verifications

* changed dataset format
2023-11-27 01:42:40 -08:00
kdestin b8d304913f
docs: Add a template for jupyter notebook examples (#2753) 2023-10-24 18:43:38 -04:00
Aswin Nagarajan d1ed139b9b
Changes to workflow for oai-v2 examples (#2726) 2023-10-12 22:47:30 -07:00
Aswin Nagarajan a796d3a70a
OAI v2 examples (#2710)
* oaiv2 sdk example

* modified workflows and notebooks

* removed all cell outputs

* removed old files

* changed cron param

* reverted cron param

* black format

* Added cli examples for new finetune pipeline component (#2711)

* Added cli examples for new finetune pipeline component

* Added screenshots for cli examples

* added and renamed cli workflows

* directory fixes

* Vvatsalya/fix cli oai v2 workflow (#2717)

* setting location as ncus

* set in setup-cli step

* new init and setup script for oai v2

* correcting syntax for init sh

* fix init oai v2 script

* fix

* fix 1

* fix 2

* change training dataset name in cli oai v2 example

* sdk workflow region change

* use oai as suffix in ws name

* added oai v1 and v2 in readme for cli and sdk

* adding install reqs for sdk

---------

Co-authored-by: Vishal Vatsalya <98515131+vvatsalya@users.noreply.github.com>
Co-authored-by: Vishal Vatsalya <vvatsalya@microsoft.com>
Co-authored-by: Ayush Mishra <61145377+novaturient95@users.noreply.github.com>
2023-10-10 11:28:03 -07:00
MaggieHust 516cef4430
update test (#2658)
Co-authored-by: Maggie Ma <fama@microsoft.com>
2023-09-21 22:02:23 -07:00
kdestin 35041db7f7
chore: Update service principal name in init_environment.sh (#2621) 2023-09-05 20:00:24 -04:00
jeff-shepherd df7837c1e0
Switched to new GPU SKU because NC6 is deprecated (#2462)
* Switched to new GPU SKU because NC6 is deprecated

* Updated credentials for remaining V1 notebooks

* Updated gpu-cluster in bootstrap.sh
2023-07-18 16:51:47 -07:00
jeff-shepherd cfd459666c
Add upload to datastore for sampledata (#2384)
* Add upload to datastore for sampledata

* Updated SKIP_AUTO_DELETE_TILL format to 4 digit year
2023-06-20 08:21:13 -07:00
kdestin 57ff370d0f
ci: refactor bootstrapping to avoid needing to invoke `apt get` (#2336)
* refactor: Stop invoking jq in infra scripts

* refactor: Remove some commented out code from infra

* refactor: Don't install jq

    Ubuntu runners come with it pre-installed:
        4fe7f6bc86/images/linux/Ubuntu2204-Readme.md

* refactor: Replace `az command | jq '.QUERY'` with `az command --query QUERY`

* refactor: Collapse `jq | jq | jq` into a single jq invocation

* refactor: Do not install xmlstarlet

    Seem to be entirely unused

* refactor: Don't install uuid-runtime and remove install_packages function
2023-05-30 16:13:54 -04:00
kdestin f87d6591d5
refactor: Reorganize infra folder (#2328)
* refactor: Move infra scripts to subdir

* refactor: Rewrite paths in sdk_helpers.sh

* refactor: Update ROOT_DIR

* refactor: Update paths in workflow generators

* refactor: Update path in doc comment

* refactor: Update workflows

* fix: Fix incorrect script path
2023-05-24 15:56:58 -04:00
Gaurav Rajguru be1a7a8517
NebulaML pytoch notebook failure due to k80 gpu (#2278)
Co-authored-by: grajguru <grajguru@microsoft.com>
2023-05-18 13:10:51 +05:30
jeff-shepherd 29cfe9088b Clean old resource group as well (#2180)
* Clean old resource group as well

* Added escape in regular expression for "AzureML Metrics Writer (preview)".  So that more Role Assignments are cleaned
2023-04-03 15:46:02 -07:00
Mathieu St-Louis 7532c4dfe2 Remove batch-cluster from infra cleanup script (#2157)
* update

* update
2023-03-29 16:05:14 -07:00
Komnus丶Q 98774b8093 delete registry after creation (#2107)
* delete registry after creation

* always run deletion

* init env

* set working dir

* add sdk_helper

* add sub_id

* edit for debugging

* try debugging

* try deletion inside the notebook

* delete in the same step

* print registry name for debugging

* remove deletion for testing.

* change everything back for final pr

* modify format

* make deletion inside the ipnb

* try sdk deletion

* try sdk

* use verified api.

* change format

* Update registry-create.ipynb
2023-03-20 13:50:12 -07:00
Harneet Virk b95e94508f Add role assignment cleanup and workaround for the exit code 100 error. (#2049)
* Add role assignment cleanup

* Add verbose logging for troubleshooting the exit code 100 issue

* bypass the error for grub-efi-amd64-signed

* Adding the redirections back
2023-02-23 10:09:12 -08:00
cassieesvelt 53ca2137a1 Add deepspeed autotuning and training examples (#2028)
* add deepspeed example

* fix mlflow logging in train.py

* move examples to cli folder

* move to jobs, add job.yaml files

* generate workflow file

* add comment in train.py

* add latest tag to env

* try different way of starting run

* set up job as pipeline

* move to pipelines folder

* add temp yaml file to pass validation

* modify readme.py to support deepspeed

* move generate-yml script

* recreate workflow files.

* move generated key location

* change environment

* change env for both examples

* fix env

* try v100 computes

* create nd40 cluster

* add compute create line

* change max number of nodes

* change data directory

* change command to block

* move data files into src

* remove unused data thing

* change data file type

* fix mlflow get run fail

* change command style in job

* change location of job

* change generate key path

* change code: path

* rename results folder

* change back key path

* change max number of nd40 nodes

* try moving autotuning folder into src

* change output file name

* try overwriting output

* add dockerfile for custom env

* change env in yaml

* update README and move autotune example

* move dockerfile to parent dir
2023-01-26 09:10:33 -08:00
Jun c7adadc045 delay registry component workflows by 59 min (#2033)
* delay registry component workflows by 59 min

* fix cleanup.sh

Co-authored-by: Ubuntu <fvm@azureml.hdto2glhnacenmzoxwwt5zmpza.phxx.internal.cloudapp.net>
Co-authored-by: Jun Qi <junqi@microsoft.com>
2023-01-23 10:47:08 -08:00
Jun 967d7b799c Create workflows to run pipeline job using registry components (#1900)
Co-authored-by: Ubuntu <fvm@azureml.hdto2glhnacenmzoxwwt5zmpza.phxx.internal.cloudapp.net>
2023-01-11 12:57:50 -08:00
Han Wang c676703096 fix (#2017) 2023-01-08 21:43:34 -08:00
Harneet Virk b6e30f43b9 Refactor the script for adding identity with compute (#2014)
* Refactor the script for adding identity with compute
2023-01-06 15:07:11 -08:00
jeff-shepherd 00be8419d4 Increase maximum nodes on clusters (#1992) 2022-12-20 11:23:57 -08:00
sharma-riti 5234c88bdd Update sdk helper to include cli (#1939)
update max_trials for cli as well
2022-12-14 12:20:33 -08:00
nick863 5172b8b44d Remove cluster after use even if it is in the failed state. (#1872)
* Remove cluster after use even if it is in the failed state.

* Fix readme.py and add the config

* Rollback minor changes

* Re generate config with the new script

* Add documentation

* Reformat readme.py

* Fix typo

* Fix workflow files
2022-11-17 23:42:23 -08:00
Harneet Virk b914914644 Update the resource names for November to recreate resources (#1887)
* Adding checks for k8 extension and check for attaching the compute to workspace

* Adding a note on the top of all the workflows about regenerating using python script

* Update header in the workflows

* Adding the template for account and container

* Update the resource names for November to recreate resources
2022-11-15 21:00:06 -08:00
jeff-shepherd eba4f37bf5 Changed version for data (#1868)
* Changed version for data

* Gave better error message for unexpected data folder

* Updated cifar-10-example dataset creation
2022-11-14 18:24:54 -08:00
Bala P V 586b5b1942 Improving cleanup, installing latest version of CLI (#1867)
* Improving cleanup, installing latest version of CLI

* Update cleanup.sh
2022-11-10 11:40:40 -08:00
Harneet Virk 65384049d9 Adding templates for storage account and container in yaml (#1845)
* Adding checks for k8 extension and check for attaching the compute to workspace

* Adding a note on the top of all the workflows about regenerating using python script

* Update header in the workflows

* Adding the template for account and container
2022-11-03 16:06:34 -07:00
Jun 626dcfe2b8 Create and cleanup registry (#1840)
* Create and cleanup registry

* Create and cleanup registry

* Create and cleanup registry

* Create and cleanup registry

* Create and cleanup registry

* nyc_taxi_data_regression-run-sample

* Add rg in az resource delete
2022-11-03 09:36:44 -07:00
Harneet Virk 65cf226959 Users/harnvir/tempresources (#1841)
* Adding checks for k8 extension and check for attaching the compute to workspace

* Adding a note on the top of all the workflows about regenerating using python script

* Update header in the workflows
2022-11-02 16:26:05 -07:00
Harneet Virk 10ce2c13d4 Updating the workflow to copy the data in the storage account 2022-10-31 17:37:33 -07:00
Harneet Virk 3fd67b496d Change the resource group name 2022-10-31 12:08:00 -07:00
Harneet Virk 2531497d78 Adding reusable bootstrapping script to configure infra resources (#1602)
* Adding reusable bootstrapping script to configure infra resources

* Fixing tput error

* Disable color coding for the output

* Removing the tput call

* Adjusting the directory structure of infra scripts

* Adjusting the path of the bootstrapping script

* Rerunning 2 workflows only

* Addressing the indendation issue

* Fixing the path of the init script

* loggign the path

* Adding workspace path

* adding helper script call

* Removing typo from the file name

* Adding source call

* Refactoring the Compute creation script

* Updating single-step workflows

* Updating CLI script to use bootstrapping script

* Updating all of the cli workflows

* Updating all of the CLI and SDK workflows

* Allowing endpoint workflows to continue ot use old subscription

* Updating all of the CLI and SDK workflows for cli configuration

* Refactoring logging

* Refactoring some of the existing scipts

* Updating the dataset call

* Updating pipeline workflows for extracting the correct config

* Updating the azure config

* Adding helper scripts for resource provisioning

* Refactoring samples to use the new subscription

* fix permissions for running sh files

* fix permissions for running sh files

* Ensuring the tool are installed

* Add copy data script

* Updating the config generation in SDK workflows

* Updating the config generation directory

* Updating the config generation directory one subdirectory level

* Extracting the folder name from the path for creating config

* Adding logging

* Adding logging for pwd

* Adding config.json

* Adding generic scripts for granting permissions on rg

* Updating the workflows by adding the cluster info

* Adding the new dataset

* Adding concurrency and updating the schedules of the jobs

* Updating the workflows to use concurrency

* cancel a currently running workflow from the same PR, branch or tag

* Updating workflows for concurrency

* Logging the information about workflow names

* Changign the group name to variable

* Printing the notebooks content

* Print the content of the notebooks

* Merging changes from main

* Adding pre-commit-hook

* Updating the task names

* Reverting the set-up repo changes

* Addressing the formatting of the files

* Addressing the formatting of the files

* Updating the moe scripts to fetch the registry name

* Updating the new workflows

* Adding new workflows and fixing the formatting of the files

* Updating the replacements

* Updating the ACR details

* Updating bootstrapping script

* Updating the ACR

* Fixing the typo in notebook

* Adding the script to create vnet/subnet

* Updating the workflows

* Updating the identity creation with appropriate roles

* Archieve the model while running endpoint workflows and create vnets

* Adding bootstrapping workflow

* Removing managed online endpoints VS workflow

* Continue on error for bootstrap workflow

* Adding the path in the workflow

* Debug the bootstrapping workflow

* Adding call to bootstrap mltable

* Adding call az ml commands

* Adding call az ml commands by removing data calls

* Adding the replacements for InteractiveBrowserCredential

* Reformatting the notebook

* Updating the bootstrap workflow

* Updating the bootstrap workflow with less logging

* Updating the bootstrap workflow with less logging

* Updating the bootstrap workflows

* Adding more bootstrapping steps

* Removing newly added workflows

* Install tools as part of bootstrap step

* Updating the script to grant permissions on identity

* Updating the ACR Details

* Importing data for automl jobs

* Reformatting the files and regenrating all files

* Updating the SDK package version

* Updating bootstrapping script name

* fix pipeline sample 4b

* fix pipeline sample 4b

* Reformating and updating the folder path in bootstrapping workflow

* Update sdk path in bootstrapping workflow

* Regenerate the workflows

* Merging changes from main and regenerating the workflows

* Adding owner for the tags

* update image cli to run prepare_data.py first (#1699)

Co-authored-by: Riti Sharma <risha@microsoft.com>

* Regenerating the workflows after adding data prep step

* add automl in the check for prepare_data

* update ymls

* Updating the script to reinstall ML extension

* Regenerating the workflows for automl to address identity installation issue

* Regenerating the workflows for automl to address identity installation issue and reformatting

* Installing SDK required for running prepare dataset

* Updating the SDK call for automl workflows

* Fix failed CLI pipeline(add restriction on azureml-mlflow) (#1703)

* add restriction

* fix

* update

* directly use expected version

* update

* update

* remove deleted wf

* Updating ARC script

* Updating the readme file

* Regenerate the workflows

* Updating arc scripts

* Adding macro for account name and container

* Calling update dataset script

* Regenerating workflows and addressing formatting

* Merging changes from main after sdk release

* Address the vulnerabilities in the VMSS agents pool

* Merging changes from main for new workflows

* Update SDK v2 in setup file

* update cli cifar distributed

* init (#1786)

* Updating registry env variable

* Automated resource provisioning - registries (#1812)

* init

* enable registry check

* init REGISTRY_NAME

* changed env var names and import

* try to fix checks

* ran readme.py; changed registry yml location; modified helper

* fixing syntax

* [mldesigner] Modify sdk/1b to adopt newest mldesigner 0.1.0b7 (#1794)

* update

* update

* update

* update

* update

* update

* change registry name for testing

* fixing syntax

* fixing syntax

* change registry name for testing

* change registry name for testing

* Dont run samples for any checkin (#1799)

* Dont run samples for any checkin

* Reformat with black

* change registry name for testing

* change registry name back

* replace version

* Batch Endpoints scenarios (#1777)

* batch scenarios

remove model

* black

* comments

* typo

* typos

* formatting

* build

* batch.sh fix

* modified model deployment

* fixing workflow

* Fix model path in Local Debug in VSCode (#1679)

* Resolve model path issue and add to readme ignore

* Update readme

* Revert readme

* Update readme

* Fix environment

* Remove stray line

* Change to default azure credential

* remove workflow

* [mldesigner] Fix mldesigner version in sample (#1797)

* update

* update

* update

* update

* update

* update

* ran readme.py, edited readme.py, edited workflow

* added log for debugging

* fixing env var

* added enviroment to deploy.yml

* revert deploy.yml

* resolved conflicts

* revert changes in v1

Co-authored-by: Korin <0mza987@gmail.com>
Co-authored-by: Bala P V <33712765+balapv@users.noreply.github.com>
Co-authored-by: Facundo Santiago <fasantia@microsoft.com>
Co-authored-by: Alex Wallace <80542152+xanwal@users.noreply.github.com>

* Updating the bootstrapping script

* Update sdk_helpers.sh

fixing the registry creating bug

* modified ensure_registry

* Updating the resource names to use November month resources

Co-authored-by: lochen <cloga0216@gmail.com>
Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com>
Co-authored-by: Riti Sharma <risha@microsoft.com>
Co-authored-by: Han Wang <phoenix.seek@gmail.com>
Co-authored-by: Ayush Mishra <61145377+novaturient95@users.noreply.github.com>
Co-authored-by: Douglas Xiao <xiake@microsoft.com>
Co-authored-by: Komnus丶Q <40655746+quchuyuan@users.noreply.github.com>
Co-authored-by: Korin <0mza987@gmail.com>
Co-authored-by: Bala P V <33712765+balapv@users.noreply.github.com>
Co-authored-by: Facundo Santiago <fasantia@microsoft.com>
Co-authored-by: Alex Wallace <80542152+xanwal@users.noreply.github.com>
Co-authored-by: Chuyuan Qu <chuyuanqu@microsoft.com>
2022-10-28 19:23:32 -07:00