Add support for arm64 build:
- Updated dockerfile for arm64 build
- extend cpu stream compilation for neoverse
- handle onnxruntime-gpu installation
- third party builds filtering based on arch
- disable cuda decode perf build for non x86
**Description**
Cherry-pick bug fixes from v0.9.0 to main.
**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)
**Description**
Degrade perftest submodule to v4.4-0.37 to fix stability issue.
Issue: rdma-loopback is not stable on public version(v0.5/v0.6-rc1)
Docker Version: v0.6-rc1-cuda11.1
Testbed: 8 A100 40GB GPUs (1 NDv4 node)
Result:
New perftest version introduce the variance, max-min/mean = 2% for v4.4-0.37, 8% for v4.5-0.2
Update dependencies and Dockerfile:
* upgrade nccl-tests and rccl-tests to current latest version to match
NCCL/RCCL versions
* unify image tag names on DockerHub
* remove verbose output in Dockerfile and minor fix some flags
**Description**
Fix cmake and build issues.
**Major Revision**
* Remove unnecessary boost build
* Remove user-agent for mlc
* Remove -j for third party to build each project in sequence
* Fix ansible collections installation path
**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn
__Description__
Cherry-pick bug fixes from v0.4.0 to main.
__Major Revisions__
* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
**Description**
Add gpcnet as git submodule and building logic.
**Major Revision**
- add gpcnet as a submodule
- add build logic in third_party/Makefile
**Description**
Revise rocblas building logic in third_party/makefile to support rocblas building in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 docker.
**Major Revision**
- add extra building logic including env about pthread limit and build command restrict to reduce amount of resource used
**Minor Revision**
- make rocm_version to be able to modify
**Description**
Add rocBLAS building logic in third_party.
**Major Revision**
- Add rocm_rocblas target in third_party/Makefile.
- Add rocblas building logic
**Description**
Support rocm in third_party/makefile and add rccl-tests as a submodule with building logic.
**Major Revision**
- Support rocm in third_party/makefile
- Add rccl-tests as a submodule
- Add build logic in third_party/Makefile for rccl-tests
**Description**
Add FIO benchmark tool into third-party dependency.
**Major Revision**
- Add FIO submodule into third-party directory and modify Makefile to enable it.
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile