Граф коммитов

22 Коммитов

Автор SHA1 Сообщение Дата
Yifan Xiong 1362732c79
Docs - Add BibTeX in README and repo (#632)
Add BibTeX for citation in README and repo.
2024-07-23 18:31:21 -07:00
Yifan Xiong 2c88db907f
Release - SuperBench v0.10.0 (#607)
**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606

Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>
2024-01-08 05:40:52 +00:00
Yuting Jiang e1df877bfe
Release - SuperBench v0.9.0 (#558)
**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)
2023-07-27 10:42:31 +08:00
Yifan Xiong 664c59a14d
Docs - Update version in README (#529)
Update version in README.
2023-04-28 03:36:11 +00:00
Yifan Xiong b07fda155e
Release - SuperBench v0.7.0 (#468)
**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)

Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-01-28 11:07:06 +08:00
Yifan Xiong 63e9b2d1bc
Release - SuperBench v0.6.0 (#409)
**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)

Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2022-09-06 18:06:05 +08:00
Yifan Xiong 6681c72043
Release - SuperBench v0.5.0 (#350)
**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
2022-04-29 16:22:55 +08:00
Yifan Xiong ff563b66af
Release - SuperBench v0.4.0 (#278)
__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
2021-12-30 16:24:00 +08:00
Yifan Xiong dfbd70b129
Release - SuperBench v0.3.0 (#212)
**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)

Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
2021-09-26 09:30:31 +08:00
Yifan Xiong 69b2c631fc
Release - SuperBench v0.2.1 (#142)
__Description__
Cherry-pick bug fixes from v0.2.1 to main.

__Major Revisions__
* Fix bug of VGG models failed on A100 GPU with batch_size=128.
* Fix Ansible connection issue when running in localhost.
* Update version in packages and docs.
2021-07-29 17:52:28 +08:00
Yifan Xiong 43620c3f46
Docs - Update README and version for v0.2.0 release (#111)
Update README and version for v0.2 release.
2021-07-03 00:18:30 +08:00
Yifan Xiong 832e392f91
Docs - Update SuperBench documents (#101)
Update SuperBench documents.
2021-06-25 13:40:14 +08:00
TobeyQin 1652524aa0
Docs - Update README file on main page (#79)
* Update Readme file on main page
2021-05-25 15:06:54 +08:00
Yifan Xiong c05e173b3d
Runner - Implement ansible client and runner (#69)
Implement ansible client and runner:
* add ansible client
* add deploy and check_env playbooks
2021-05-23 23:53:37 +08:00
Yifan Xiong 74c4d1b2a4
Setup: Code Revision - Rename dev branch to main in config and readme (#55)
* Rename dev branch to main and set it as default.
2021-04-14 17:40:34 +08:00
Yifan Xiong cb33c99ccb
Docs - Update README for public view (#52)
* update README for public view
2021-04-13 17:41:33 +08:00
Yifan Xiong d32b96eb98
Setup: Add Test - Add Codecov (#9)
Add code coverage configuration.
2021-02-04 10:43:43 +08:00
TobeyQin b1a42c38af
Setup: Add Feature - Add contribute rules (#8)
* Setup: Add contribute rules
2021-02-03 11:05:51 +08:00
Yifan Xiong 3f19685fd9
Docs - Initialize README (#6)
Initialize README.md and update SUPPORT.md, update
* project description
* installation
* usage
* developer guide
* add dependencies version requirement
2021-02-01 20:21:12 +08:00
Yifan Xiong 5be32481b1
Setup: Init - Initialize setup.py and basic configs (#4)
Initialize setup.py and basic configurations for this project.

Major revisions:

- initialize setup.py for Python package
- add gitignore and dockerignore
- add editorconfig for editors
- configure yapf for auto formating
- configure mypy for type hint
- configure flake8 for lint, including quotes and docstrings
- add pre-commit check for `git commit`
- add spelling check in GitHub Actions
- format existing files according to configured rules

Example usage:

    # install dependencies
    $ python3 -m pip install -e .[dev,test]
    $ pre-commit install

    # format code automatically
    $ python3 setup.py format

    # lint code
    $ python3 setup.py lint

    # test code
    $ python3 setup.py test
2021-01-28 21:01:28 +08:00
Microsoft Open Source a4dcec74ba Updating README.md to template content 2020-12-16 18:22:27 -08:00
TobeyQin 8bd6afdb8d
Initial commit 2020-12-17 10:02:57 +08:00