DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Обновлено 2024-09-13 00:12:52 +03:00
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Обновлено 2024-07-03 13:54:08 +03:00
Distributed AI/HPC Monitoring Framework
Обновлено 2024-06-07 19:24:53 +03:00
Federated Learning Utilities and Tools for Experimentation
Обновлено 2024-01-11 22:20:09 +03:00
Azure Monitor Distro for OpenTelemetry Python
Обновлено 2023-12-28 22:02:27 +03:00
An IR for efficiently simulating distributed ML computation.
Обновлено 2022-01-21 01:50:37 +03:00
Machine learning metrics for distributed, scalable PyTorch applications.
Обновлено 2021-09-14 14:47:59 +03:00
Обновлено 2021-03-05 11:20:33 +03:00
Обновлено 2021-03-05 10:11:58 +03:00
👩‍🔬 Train and Serve TensorFlow Models at Scale with Kubernetes and Kubeflow on Azure
Обновлено 2020-11-13 20:49:23 +03:00
Distributed Deep Learning using AzureML
Обновлено 2019-11-19 05:28:26 +03:00
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Обновлено 2019-09-10 03:57:02 +03:00
Sample code showing how to perform distributed training of a Fizyr Keras-RetinaNet model on the COCO dataset using Horovod on Batch AI
Обновлено 2018-08-10 20:56:45 +03:00
Distributed Task Queue (development branch)
Обновлено 2016-07-12 04:06:18 +03:00