azureml-examples

Граф коммитов

Автор	SHA1	Сообщение	Дата
savitamittal1	07d87f095d	Added more recommendations (#3244 )	2024-06-17 11:20:32 -07:00
cassieesvelt	93bb9a8d8d	fix when the job fails and doesn't kick the node (#3186 ) * fix when the job fails and doesn't kick the node * reformat	2024-05-13 13:28:16 -07:00
cassieesvelt	fa2caa094f	fix readme with nhc descriptions (#3162 )	2024-05-06 15:16:57 -07:00
cassieesvelt	a3a1c5946f	Add nhc command job (#3016 ) * add nhc checks * fix process_count_per_instance * Add job description * update readme * Add wrapper script * update readme with changes * remove testing files * remove uneeded env file * add node kick command * reformat * add node id print * add fallocate command * update fallocate * update readme and reformat * add 3T * use new mcr image * Add kick_bad_node flag * format code * add entry to Training readme * add setup script	2024-04-30 09:13:40 -07:00
cassieesvelt	4a14687dac	Add elastic training benchmark (#2555 ) * Add elastic benchmark results * add graphs + link * reformat	2023-10-16 11:36:18 -07:00
kdestin	577a8a0522	ci: Refactor Python/Jupyter formatting CI (#2337 ) * Add code quality checks for python/jupyter * refactor: Remove .github/workflows/smoke.yml Superceded by .github/workflows/code-quality-python.yml * refactor: Remove smoke.yml badge from README files * chore: Trigger .github/workflows/code-quality.yml on pushes to main	2023-10-11 15:28:30 -04:00
rdondera-microsoft	d9acebaec5	Fix internal links. (#2393 )	2023-06-22 10:02:07 -07:00
rdondera-microsoft	7c695d99df	Initial set of guidelines for large scale training for Computer Vision (#2381 ) * ViT-Pretrain folder. * Update to the README file under Training. * Move launcher.py and conda.yml to src folder. * Merge descriptions of model pretraining into a single paragraph. * Note about Infiniband addressing multi-node case only. * Copyright header for image classification script.	2023-06-21 10:47:16 -07:00
Samuel Kemp	88465236d1	Samuel100/loadingupdate (#2363 ) * updated data loading * data loading update * addressed feedback	2023-06-13 10:04:39 +01:00
Neehar Duvvuri	6eb684f054	Rename job_service_type to type (#2253 )	2023-05-05 12:47:36 -04:00
Li, Xiaoran	d18f65698e	Change torch_nebula to nebulaml (#2219 ) * Change torch_nebula to nebulaml * Renaming the doc as README file * Rename the package name from torch_nebula to nebulaml --------- Co-authored-by: xiaoranli <xiaoranli@microsoft.com> Co-authored-by: Ziqi Wang <zikeiwong@outlook.com>	2023-04-27 11:30:35 +08:00
savitamittal1	a6a8465afa	Update nebula.md for support of memory buffer size (#2153 )	2023-03-27 16:23:12 -07:00
ccozianu	c720ee1648	Update README.md (#2147 ) fix typo	2023-03-24 09:59:26 +05:30
savitamittal1	c7dbfb0014	Update README.md (#2151 ) * Update README.md * resolved comments * removed space * Changed Monitoring and optimization to Bold as well.	2023-03-23 16:07:23 -07:00
savitamittal1	31b4358caa	Table of content fix and added smoke yaml (#2148 ) * Table of content fix and added smoke yaml * added sample page_type * changed description * updated description	2023-03-23 12:34:43 -07:00
Razvan Tanase	ca3685b405	Fixing broken links in the BestPractices folder. (#2146 ) Fixing broken links under BestPractices folder, used relative paths.	2023-03-21 15:15:16 -07:00
Razvan Tanase	2cbb042412	Adding best practices for large scale deep learning (#2144 ) Adding best-practices for large-scale deep learning workloads.	2023-03-21 13:22:09 -07:00

17 Коммитов