A validation and profiling tool for AI infrastructure
Перейти к файлу
Yuting Jiang 7af75df392
Bug Fix: Data Diagnosis - Fix bug of failure test and warning of pandas in data diagnosis (#638)
**Description**
Fix bug of failure test and warning of pandas in data diagnosis.

**Major Revision**
- fix warning of pandas in replace and fillna due to type downcast
- fix bug of failure check function only check one matched metric rather
than all matched metrics
- fix bug when converting regex into str of metrics when there're more
than one match group
2024-08-16 09:04:24 +08:00
.azure-pipelines Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576) 2023-11-22 19:48:10 +08:00
.devcontainer Update Python setup for require packages (#387) 2022-08-17 11:33:57 +08:00
.github CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633) 2024-07-28 23:43:06 +00:00
.vscode Docs - Add config and docs for development experience (#155) 2021-08-16 13:49:19 +08:00
dockerfile Bug Fix - Update Docker Exec Command for Persistent HPCX Environment (#635) 2024-08-13 16:35:01 +00:00
docs Docs - fix typos (#628) 2024-07-25 03:49:19 +00:00
examples/benchmarks Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) 2023-12-08 23:22:38 +08:00
superbench Bug Fix: Data Diagnosis - Fix bug of failure test and warning of pandas in data diagnosis (#638) 2024-08-16 09:04:24 +08:00
tests Update omegaconf version to 2.3.0 (#631) 2024-07-23 14:46:28 -07:00
third_party CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633) 2024-07-28 23:43:06 +00:00
website Bump express from 4.18.2 to 4.19.2 in /website (#618) 2024-07-26 09:12:11 +08:00
.clang-format Benchmarks: Add Benchmark - Add the source code of cuda kernel launch overhead benchmark. (#71) 2021-05-18 14:07:27 +08:00
.codecov.yml Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542) 2023-07-05 16:56:21 +08:00
.dockerignore CLI - Update version to include revision hash and date (#427) 2022-10-31 10:44:41 +08:00
.editorconfig Docs - Add config and docs for development experience (#155) 2021-08-16 13:49:19 +08:00
.flake8 Setup: Revision - Update lint rules (#7) 2021-02-01 14:12:02 +08:00
.gitattributes Setup: Init - Initialize setup.py and basic configs (#4) 2021-01-28 21:01:28 +08:00
.gitignore Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560) 2023-08-22 10:56:33 +00:00
.gitmodules Release - SuperBench v0.10.0 (#607) 2024-01-08 05:40:52 +00:00
.mypy.ini Config - Add inference config for NC A100 and NV A10 series (#329) 2022-03-21 14:24:37 +08:00
.pre-commit-config.yaml Setup: Init - Initialize setup.py and basic configs (#4) 2021-01-28 21:01:28 +08:00
.style.yapf Setup: Revision - Update lint rules (#7) 2021-02-01 14:12:02 +08:00
CITATION.bib Docs - Add BibTeX in README and repo (#632) 2024-07-23 18:31:21 -07:00
CODE_OF_CONDUCT.md Initial CODE_OF_CONDUCT.md commit 2020-12-16 18:22:21 -08:00
LICENSE Initial LICENSE commit 2020-12-16 18:22:25 -08:00
MANIFEST.in Bug bash - Fix bugs in multi GPU benchmarks (#98) 2021-06-23 18:16:43 +08:00
Makefile Dockerfile - Add SuperBench Windows Dockerfile (#534) 2023-06-28 05:35:11 +00:00
README.md Docs - Add BibTeX in README and repo (#632) 2024-07-23 18:31:21 -07:00
SECURITY.md Setup: Init - Initialize setup.py and basic configs (#4) 2021-01-28 21:01:28 +08:00
SUPPORT.md Docs - Initialize README (#6) 2021-02-01 20:21:12 +08:00
setup.py Use `types-setuptools` as `types-pkg_resources` is Yanked (#637) 2024-08-08 22:30:37 +08:00

README.md

SuperBench

Build Image Codecov Website Latest Release Docker Pulls License

Azure Pipelines Build Status
cpu-unit-test Build Status
cuda-unit-test Build Status
ansible-integration-test Build Status

SuperBench is a validation and profiling tool for AI infrastructure.

📢 v0.10.0 has been released!

Check aka.ms/superbench for more details.

Citations

To cite SuperBench in your publications:

@inproceedings {superbench,
	author = {Yifan Xiong and Yuting Jiang and Ziyue Yang and Lei Qu and Guoshuai Zhao and Shuguang Liu and Dong Zhong and Boris Pinzur and Jie Zhang and Yang Wang and Jithin Jose and Hossein Pourreza and Jeff Baxter and Kushal Datta and Prabhat Ram and Luke Melton and Joe Chau and Peng Cheng and Yongqiang Xiong and Lidong Zhou},
	title = {{SuperBench}: Improving Cloud {AI} Infrastructure Reliability with Proactive Validation},
	booktitle = {2024 USENIX Annual Technical Conference (USENIX ATC 24)},
	year = {2024},
	isbn = {978-1-939133-41-0},
	address = {Santa Clara, CA},
	pages = {835--850},
	url = {https://www.usenix.org/conference/atc24/presentation/xiong},
	publisher = {USENIX Association},
	month = jul
}

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.