Граф коммитов

38 Коммитов

Автор SHA1 Сообщение Дата
pdr 479491279e
Dockerfile - Add support for arm64 build (#660)
Add support for arm64 build:

- Updated dockerfile for arm64 build
- extend cpu stream compilation for neoverse 
- handle onnxruntime-gpu installation
- third party builds filtering based on arch
- disable cuda decode perf build for non x86
2024-11-06 23:16:12 +00:00
Yuting Jiang 2101e933cc
CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633)
**Description**
Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM
issue.
2024-07-28 23:43:06 +00:00
Yuting Jiang e304cf1572
Benchmarks: Micro benchmarks - add support for NVIDIA L4/L40/L40s GPUs in gemm-flops (#634)
**Description**
Add support GPU ARCH 8.9 for NVIDIA L4/L40/L40s GPUs in gemm-flops.
2024-07-26 02:42:17 +00:00
Yifan Xiong 2c88db907f
Release - SuperBench v0.10.0 (#607)
**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606

Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>
2024-01-08 05:40:52 +00:00
Yuting Jiang 1f5031bd74
Dockerfile - Upgrade to rocm5.7 dockerfile (#587)
**Description**
upgrade to rocm5.7 dockerfile.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-12-09 17:41:12 +00:00
Ziyue Yang 6ef3a0110f
Benchmarks: Add MSCCL Support for Nvidia GPU (#584)
**Description**
Add MSCCL support for Nvidia GPU
2023-12-07 19:57:28 +08:00
Yuting Jiang dd5a6329ed
Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
2023-12-07 09:37:09 +08:00
Yuting Jiang 79089b6517
Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576)
**Description**
hipblaslt function benchmark and rebase cublaslt function benchmark.
2023-11-22 19:48:10 +08:00
Yuting Jiang 27a10811af
Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560)
**Description**
source code for evaluating NVDEC decoding performance.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-08-22 10:56:33 +00:00
Yuting Jiang e1df877bfe
Release - SuperBench v0.9.0 (#558)
**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)
2023-07-27 10:42:31 +08:00
Yuting Jiang 865472177f
Benchmarks: Build Pipeline - add AMF in third party and build AMF encoding latency test (#543)
**Description**
add AMF in third party and build AMF encoding latency test.
2023-07-03 14:43:21 +00:00
rafsalas19 655bd0aa59
Adding HPL benchmark (#482)
**Description**

- Adding HPL benchmark

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-21 16:44:08 +00:00
Yuting Jiang 0292366075
Benchmarks: Build Pipeline - Add suppport for cpu-only perftest in makefile (#480)
**Description**
Add suppport to install cpu-only perftest in makefile.

Co-authored-by: Yuting Jiang <yuting.jiang@microsoft.com>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-02-24 11:19:46 +08:00
rafsalas19 32896ca477
Adding Stream Benchmark (#473)
**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>
2023-02-13 15:34:37 -05:00
Yifan Xiong a3c65b2a57
Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449)
Add Docker image for arch90 NVIDIA GPUs:

* add CUDA11.8 Dockerfile
* update archs in Makefile and benchmarks accordingly
* update image build pipeline
2022-12-29 12:19:38 +00:00
Yuting Jiang e335556d7a
Benchmarks: Build Pipeline - Degrade perftest submodule to fix stability issue (#386)
**Description**
Degrade perftest submodule to v4.4-0.37 to fix stability issue.

Issue: rdma-loopback is not stable on public version(v0.5/v0.6-rc1)
Docker Version: v0.6-rc1-cuda11.1
Testbed: 8 A100 40GB GPUs (1 NDv4 node)
Result: 
New perftest version introduce the variance, max-min/mean = 2% for v4.4-0.37, 8% for v4.5-0.2
2022-08-16 11:49:10 +08:00
Yifan Xiong 9f03d5687a
Update dependencies and Dockerfile (#371)
Update dependencies and Dockerfile:
* upgrade nccl-tests and rccl-tests to current latest version to match
  NCCL/RCCL versions
* unify image tag names on DockerHub
* remove verbose output in Dockerfile and minor fix some flags
2022-07-06 10:31:41 +00:00
Yifan Xiong 483bf782e1
Update ROCm Dockerfile (#361)
**Description**

Update ROCm Dockerfile.

**Major Revisions**
- Add dockerfile for ROCm 5.1.3
- Merge 5.1.x and 5.0.x dockerfile
- Remove 4.2 and 4.0 legacy
- Update build pipeline accordingly
2022-06-19 17:26:39 +08:00
Yifan Xiong 60a3c74306
Fix cmake and build issues (#360)
**Description**

Fix cmake and build issues.

**Major Revision**

* Remove unnecessary boost build
* Remove user-agent for mlc
* Remove -j for third party to build each project in sequence
* Fix ansible collections installation path
2022-06-15 13:07:57 +08:00
rafsalas19 ff51a3cee9
Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324)
**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn
2022-03-16 16:20:11 +08:00
Yuting Jiang 4f5027dbda
Benchmarks: Build Pipeline - Make gpcnet only for cuda (#316)
**Description**
Make gpcnet only for cuda.
2022-02-24 18:18:49 +08:00
Yuting Jiang 4abda6f5d4
Benchmarks: Build Pipeline - Update rccl-tests submodule to fix divide by zero error (#306)
**Description**
Update rccl-tests submodule to fix divide by zero error.
2022-02-09 14:46:29 +00:00
Yifan Xiong 3419447c11
Benchmarks - Support T4 and A10 in GEMM benchmark (#294)
Support T4 and A10 in GEMM benchmark.
2022-01-29 13:26:00 +00:00
Yifan Xiong ff563b66af
Release - SuperBench v0.4.0 (#278)
__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
2021-12-30 16:24:00 +08:00
Ziyue Yang b0e759f599
Benchmarks: Build Pipeline - Upgrade FIO benchmark tool (#251)
**Description**
Upgrade FIO benchmark tool from 3.27 to 3.28.
2021-12-01 20:33:09 +08:00
Yuting Jiang b592a7c793
Benchmarks: Build Pipeline - Add gpcnet as git submodule and building logic (#228)
**Description**
Add gpcnet as git submodule and building logic.

**Major Revision**
- add gpcnet as a submodule
- add build logic in third_party/Makefile
2021-10-21 11:28:51 +00:00
Yifan Xiong dfbd70b129
Release - SuperBench v0.3.0 (#212)
**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)

Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
2021-09-26 09:30:31 +08:00
Yuting Jiang b90b47f3c1
Benchmarks: Build Pipeline - Support rocblas building in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 docker (#172)
**Description**
Revise rocblas building logic in third_party/makefile to support rocblas building in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 docker.

**Major Revision**
- add extra building logic including env about pthread limit and build command restrict to reduce amount of resource used

**Minor Revision**
- make rocm_version to be able to modify
2021-09-01 06:21:02 +08:00
Yuting Jiang a1e5c90d43
Benchmarks: Build Pipeline - Add build logic of hipBusBandwidth in third_party (#151)
**Description**
Add build logic of hipBusBandwidth in third_party.

**Major Revision**
- Add build logic of hipBusBandwidth in third_party
2021-08-20 14:32:14 +08:00
Yuting Jiang 86c390a912
Benchmarks: Build Pipeline - Add rocBLAS building logic in third_party (#144)
**Description**
Add rocBLAS building logic in third_party.

**Major Revision**
- Add rocm_rocblas target in third_party/Makefile.
- Add rocblas building logic
2021-08-02 17:30:02 +08:00
Yuting Jiang a532eee414
Benchmarks: Build Pipeline - add rccl-tests as a submodule with building logic (#139)
**Description**
Support rocm in third_party/makefile and add rccl-tests as a submodule with building logic.

**Major Revision**
- Support rocm in third_party/makefile
- Add rccl-tests as a submodule 
- Add build logic in third_party/Makefile for rccl-tests
2021-07-30 07:27:08 +08:00
Yuting Jiang c88ce05611
Benchmarks: Build Pipeline - Support rocm in third_party/makefile (#140)
**Description**
Support rocm in third_party/makefile.

**Major Revision**
- Split rocm and cuda target in makefile
- Add target in dockerfile
2021-07-29 14:18:36 +08:00
Ziyue Yang 4bbd7f513d
Benchmarks: Build Pipeline - Add FIO benchmark tool (#127)
**Description**
Add FIO benchmark tool into third-party dependency.

**Major Revision**
- Add FIO submodule into third-party directory and modify Makefile to enable it.
2021-07-19 12:59:28 +08:00
Yuting Jiang 419dea265a
Benchmarks: Build Pipeline - Add perftest as a submodule and add build logic (#129)
Add perftest as a submodule and add build logic
2021-07-16 16:11:07 +08:00
Yuting Jiang 8c8beb4b04
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic. (#128)
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic.
2021-07-16 14:51:56 +08:00
Yuting Jiang 9547ccc19a
Benchmarks: Fix bug - fix bug of third_party/cuda-samples git checkout issue when building docker (#126)
* fix bug in docker build of third_party/cuda-samples
2021-07-15 16:41:57 +08:00
Yuting Jiang f9550bd693
Benchmarks: Add Benchmark - Add memory bandwidth benchmark for cuda. (#114)
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile
2021-07-13 17:30:19 +08:00
guoshzhao 40d7905e6b
Benchmarks: Build Pipeline - Add cutlass as a submodule and add build logic. (#85)
* add cutlass as submodule.
* add build script for cutlass.
* only support compute capability 7.0(V100) and 8.0(A100)
2021-06-01 11:17:44 +08:00