Граф коммитов

39 Коммитов

Автор SHA1 Сообщение Дата
pdr 479491279e
Dockerfile - Add support for arm64 build (#660)
Add support for arm64 build:

- Updated dockerfile for arm64 build
- extend cpu stream compilation for neoverse 
- handle onnxruntime-gpu installation
- third party builds filtering based on arch
- disable cuda decode perf build for non x86
2024-11-06 23:16:12 +00:00
Yifan Xiong 61770b8908
CI/CD - Update Image Build Pipeline (#659)
**Description**

Update image build.

**Major Revision**

* Remove ROCm 6.0 image due to outdated packages
* Remove build tag for ROCm
* Preserve build cache for 30 days
2024-11-02 04:50:50 +00:00
Yuting Jiang 949f9cb406
Release - SuperBench v0.11.0 (#654)
**Description**
Cherry pick bug fixes from v0.11.0 to main

**Major Revision**
* #645 
* #648 
* #646 
* #647 
* #651 
* #652 
* #650

---------

Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
2024-10-10 09:59:47 +08:00
Yuting Jiang 2101e933cc
CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633)
**Description**
Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM
issue.
2024-07-28 23:43:06 +00:00
Yuting Jiang 7435f10a22
Dockerfile - Add CUDA 12.4 dockerfile (#619)
**Description**
Add CUDA 12.4 dockerfile.

**Major Revision**
- upgrade nvidia docker into 23.04


**Minor Revision**
- upgrade hpcx into 2.18
2024-04-22 06:36:19 +00:00
Yifan Xiong 2c88db907f
Release - SuperBench v0.10.0 (#607)
**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606

Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>
2024-01-08 05:40:52 +00:00
Yuting Jiang 1f5031bd74
Dockerfile - Upgrade to rocm5.7 dockerfile (#587)
**Description**
upgrade to rocm5.7 dockerfile.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-12-09 17:41:12 +00:00
Ziyue Yang 6ef3a0110f
Benchmarks: Add MSCCL Support for Nvidia GPU (#584)
**Description**
Add MSCCL support for Nvidia GPU
2023-12-07 19:57:28 +08:00
Yifan Xiong 1ad1c21c38
Dockerfile - Upgrade Docker image to CUDA 12.2 (#577)
Upgrade Docker image to CUDA 12.2 for H100:
* upgrade base image to 23.10
* fix onnxruntime version in python3.10
* fix compilation errors
2023-11-22 13:48:18 +00:00
Yuting Jiang 27a10811af
Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560)
**Description**
source code for evaluating NVDEC decoding performance.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-08-22 10:56:33 +00:00
Yuting Jiang 6c0205cece
Benchmarks: micro benchmarks - add source code for DirectXRenderPerf (#549)
**Description**
add source code for DirectXRenderPerf.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-08-18 05:17:04 +00:00
Yuting Jiang e1df877bfe
Release - SuperBench v0.9.0 (#558)
**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)
2023-07-27 10:42:31 +08:00
Yuting Jiang af4cfd5bbf
Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547)
**Description**
add python code for DirecXGPUMemBw.
2023-07-05 22:07:13 +08:00
Yuting Jiang f1d608aef7
Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542)
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
2023-07-05 16:56:21 +08:00
Yuting Jiang 3704a432b9
CI/CD - Support DirectX test pipeline (#545)
**Description**
Support DirectX test pipeline.
2023-07-05 11:33:40 +08:00
Yuting Jiang 44ef531465
Dockerfile - Add SuperBench Windows Dockerfile (#534)
**Description**
Add dockerfile for win10 and building script for directx_benchmarks.

**Major Revision**
- Add docker file for win10 and required scripts to install the
dependency
- Add building script to build all directx vs benchmarks
- Add call of building script in Makefile

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
2023-06-28 05:35:11 +00:00
Yifan Xiong 51761b3af1
Release - SuperBench v0.8.0 (#517)
**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)

Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-04-14 12:57:55 +00:00
Yifan Xiong bbb86c4a83
CI/CD - Free disk space in GitHub Action VHD (#481)
Free more disk space in GitHub Action VHD.
2023-02-23 17:30:39 +08:00
Yifan Xiong a3c65b2a57
Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449)
Add Docker image for arch90 NVIDIA GPUs:

* add CUDA11.8 Dockerfile
* update archs in Makefile and benchmarks accordingly
* update image build pipeline
2022-12-29 12:19:38 +00:00
Yuting Jiang 3367c4f6cc
Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414)
**Description**
Add support to allow list of custom config string in cudnn-functions and cublas-functions.
2022-10-18 09:59:51 +08:00
Yifan Xiong 9f03d5687a
Update dependencies and Dockerfile (#371)
Update dependencies and Dockerfile:
* upgrade nccl-tests and rccl-tests to current latest version to match
  NCCL/RCCL versions
* unify image tag names on DockerHub
* remove verbose output in Dockerfile and minor fix some flags
2022-07-06 10:31:41 +00:00
Yifan Xiong 483bf782e1
Update ROCm Dockerfile (#361)
**Description**

Update ROCm Dockerfile.

**Major Revisions**
- Add dockerfile for ROCm 5.1.3
- Merge 5.1.x and 5.0.x dockerfile
- Remove 4.2 and 4.0 legacy
- Update build pipeline accordingly
2022-06-19 17:26:39 +08:00
Yuting Jiang 81a4146bc1
Dockerfile - Add dockerfile for rocm5.1.1 (#353)
**Description**
Add dockerfile for rocm5.1.1.
2022-05-25 20:28:11 +08:00
Yuting Jiang 425b9ff865
Dockerfile - Add dockerfile for rocm5.0.1 (#319)
**Description**
Add dockerfile for rocm5.0.1.
2022-02-28 19:30:43 +08:00
Yuting Jiang a4950a707e
Dockerfile - Add rocm5.0 dockerfile (#307)
**Description**
Add rocm5.0 dockerfile.
2022-02-26 07:12:45 +08:00
Ziyue Yang 433785fd0c
Benchmarks: Add Feature - Add GDR-only nccl-tests for Nvidia machines (#299)
This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.
2022-02-08 17:59:48 +08:00
Yifan Xiong 5283bdebe8
CI/CD - Disable version update, allow security update only (#224)
Disable dependabot version update, allow security update only.
Reference:
https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates#open-pull-requests-limit.
2021-10-12 16:08:33 +08:00
Yifan Xiong 849b6caca2
CI/CD - Add code security scanning (#206)
Add code security scanning.

__Major Revisions__
* enable dependabot auto updates
* scan code with CodeQL
2021-10-11 14:38:58 +08:00
Yifan Xiong dfbd70b129
Release - SuperBench v0.3.0 (#212)
**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)

Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
2021-09-26 09:30:31 +08:00
guoshzhao 9c984c7eb0
Bug bash - Merge fix from release/0.2 to main (#124)
* Bug Fix - Fix race condition issue for multi ranks (#117)

Fix race condition issue when multi ranks rotating the same directory.

* Update pipeline for release branch (#122)

* Bug Fix - Fix bug when convert bool config to store_true argument. (#120)

Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
2021-07-09 16:54:42 +08:00
Yifan Xiong e7b6af35c0
Website - Initialize SuperBench website (#102)
* Initialize SuperBench website.
* Add GitHub Actions for automatically build and publish.
2021-06-25 15:59:13 +08:00
Yifan Xiong 25ec3a7c1c
Dockerfile - Update CUDA 11.1.1 Dockerfile (#96)
Update packages and add build cache for CUDA 11.1.1 Dockerfile:

* Remove duplicate cmake and ompi, which are already in base image
* Add hpcx and sharp lib
* Add cache for gitmodules build
* Sort apt-get packages
2021-06-16 16:47:52 +08:00
guoshzhao 40d7905e6b
Benchmarks: Build Pipeline - Add cutlass as a submodule and add build logic. (#85)
* add cutlass as submodule.
* add build script for cutlass.
* only support compute capability 7.0(V100) and 8.0(A100)
2021-06-01 11:17:44 +08:00
Yifan Xiong af6eb004df
CI/CD - Add GitHub Action to build and push image (#70)
* add GitHub Action to build and push image
* update Dockerfile to copy from context
2021-05-17 21:34:15 +08:00
Yifan Xiong 0b2952d459
Setup - Add lint for cpp sources (#73)
__Major Revisions__

* add clang-format to lint cpp sources
* add cpp lint in GitHub Actions
2021-05-17 11:36:41 +08:00
Yifan Xiong 74c4d1b2a4
Setup: Code Revision - Rename dev branch to main in config and readme (#55)
* Rename dev branch to main and set it as default.
2021-04-14 17:40:34 +08:00
TobeyQin 8ae0138004
Setup: template - Add template for PR, bug report and enhancement request (#5)
Add template for PR, bug report and enhancement request
2021-01-29 18:29:26 +08:00
Yifan Xiong 5be32481b1
Setup: Init - Initialize setup.py and basic configs (#4)
Initialize setup.py and basic configurations for this project.

Major revisions:

- initialize setup.py for Python package
- add gitignore and dockerignore
- add editorconfig for editors
- configure yapf for auto formating
- configure mypy for type hint
- configure flake8 for lint, including quotes and docstrings
- add pre-commit check for `git commit`
- add spelling check in GitHub Actions
- format existing files according to configured rules

Example usage:

    # install dependencies
    $ python3 -m pip install -e .[dev,test]
    $ pre-commit install

    # format code automatically
    $ python3 setup.py format

    # lint code
    $ python3 setup.py lint

    # test code
    $ python3 setup.py test
2021-01-28 21:01:28 +08:00
TobeyQin a0a145a855
Setup: Reviewers - Add default reviewers of the repo (#3)
* Setup: Reviewers - Add default reviewers of the repo

**Description**
Add default reviewers of the repo

**Major Revision**
- Add default reviewers of the development
- Add default reviewers for the docs
2021-01-28 10:11:23 +08:00