Граф коммитов

36 Коммитов

Автор SHA1 Сообщение Дата
Yuting Jiang dc3846cbd4
Dockerfile - Upgrade mlc to v3.11 (#620)
**Description**
Upgrade mlc to v3.11.
2024-04-18 10:59:36 +08:00
Yuting Jiang dd5a6329ed
Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
2023-12-07 09:37:09 +08:00
guoshzhao 9f4880cb8e
Analyzer - Generate baseline given results from multiple nodes. (#575)
**Description**
Generate baseline given results from multiple nodes. 

**Major Revision**
- Add sub command `sb result generate-baseline`
- Add UT and docs

---------

Co-authored-by: 454314380 <454314380@qq.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-11-22 14:42:32 +08:00
Yuting Jiang d246bab430
Dockerfile - update mlc version into 3.10 for cuda and rocm dockerfiles (#562)
**Description**
Update mlc version into 3.10 for cuda and rocm dockerfiles to be
consistent with cuda12 dockerfile

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-10-23 11:21:17 +08:00
Yuting Jiang 27a10811af
Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560)
**Description**
source code for evaluating NVDEC decoding performance.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-08-22 10:56:33 +00:00
rafsalas19 655bd0aa59
Adding HPL benchmark (#482)
**Description**

- Adding HPL benchmark

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-21 16:44:08 +00:00
rafsalas19 32896ca477
Adding Stream Benchmark (#473)
**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>
2023-02-13 15:34:37 -05:00
Yifan Xiong a3c65b2a57
Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449)
Add Docker image for arch90 NVIDIA GPUs:

* add CUDA11.8 Dockerfile
* update archs in Makefile and benchmarks accordingly
* update image build pipeline
2022-12-29 12:19:38 +00:00
Yifan Xiong d7bb8303fb
CLI - Update version to include revision hash and date (#427)
Update version to include revision hash and date in "{last tag}+g{git
hash}.d{date}" format, here're the examples:
* exact tag: 0.6.0
* commit after tag: 0.6.0+gcbb1b34
* commit after tag with local changes: 0.6.0+gcbb1b34.d20221028
2022-10-31 10:44:41 +08:00
Yifan Xiong 63e9b2d1bc
Release - SuperBench v0.6.0 (#409)
**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)

Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2022-09-06 18:06:05 +08:00
Yifan Xiong 626ac0a463
Update Python setup for require packages (#387)
__Description__

Update Python setup for require packages.

__Major Revisions__
* downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6
* add extra entry in extras_require for nested packages
* update `pip install` contents accordingly
2022-08-17 11:33:57 +08:00
Yang Wang faeee0a7cc
Auto generate ibstat file for topo aware traffic pattern (#381)
An enhancement for topo-aware IB performance validation #373.
This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.
2022-08-13 18:20:42 +08:00
Yifan Xiong 16b6385dee
Add dependencies (#374)
Add dependencies

* include ndv4-topo.xml in cuda docker images
* require requests version to avoid RequestsDependencyWarning
2022-07-13 08:42:53 +00:00
Yifan Xiong 9f03d5687a
Update dependencies and Dockerfile (#371)
Update dependencies and Dockerfile:
* upgrade nccl-tests and rccl-tests to current latest version to match
  NCCL/RCCL versions
* unify image tag names on DockerHub
* remove verbose output in Dockerfile and minor fix some flags
2022-07-06 10:31:41 +00:00
Yifan Xiong 325a7338bf
Fix incorrect ulimit config in Dockerfile (#364)
Fix incorrect ulimit nofile config in Dockerfile.

Instead of bash, sh is used by default where `echo` does not accept any parameters and `-e` is written into /etc/security/limits.conf.
2022-06-24 14:14:00 +00:00
Yifan Xiong bfaa1c837b
Support multiple IB/GPU in ib validation (#363)
**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.
2022-06-24 08:35:20 +00:00
Yifan Xiong 60a3c74306
Fix cmake and build issues (#360)
**Description**

Fix cmake and build issues.

**Major Revision**

* Remove unnecessary boost build
* Remove user-agent for mlc
* Remove -j for third party to build each project in sequence
* Fix ansible collections installation path
2022-06-15 13:07:57 +08:00
Yuting Jiang 3f135e4669
Dockerfile - Add support to run sb command inside docker image (#356)
**Description**
Add support to run sb command inside docker image - install missing dependency.
2022-06-01 01:11:28 +08:00
Ziyue Yang 433785fd0c
Benchmarks: Add Feature - Add GDR-only nccl-tests for Nvidia machines (#299)
This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.
2022-02-08 17:59:48 +08:00
Hossein Pourreza b590409e0f
Benchmarks: Add Benchmark - Add mlc benchmark to superbench (#216)
**Description**
Add mlc memory bandwidth and latency micro benchmark to Superbench.

**Major Revision**
- Add mlc benchmark with test and example files
2021-12-13 13:47:42 +08:00
guoshzhao 4d85630abb
Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245)
**Description**
Add ONNXRuntime inference benchmark based on ORT python API.

**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.
2021-12-10 13:53:11 +00:00
Ziyue Yang 008e0fe1d8
Benchmarks: Add Feature - Add CPU-initiated copy and dtod support to gpu-sm-copy benchmark (#230)
**Description**
This commit does the following:
1) Adds CPU-initiated copy benchmark;
2) Adds dtod benchmark;
3) Support scanning NUMA nodes and GPUs inside the benchmark program;
4) Change the name of gpu-sm-copy to gpu-copy.
2021-10-30 11:19:09 +08:00
Yifan Xiong 4e431f11d2
Dockerfile - Fix ulimit nofile in Docker images (#183)
__Description__

Resolve "too many open files" issue when runnning NCCL/RCCL on multiple nodes using Docker images, increase nofile number in limits.conf.
2021-09-02 21:38:59 +08:00
guoshzhao 115cd2e6ae
Dockerfile: Add Package - Install openmpi for ROCm images (#181)
**Description**
Install openmpi-4.0.0 for ROCm images.
2021-09-01 18:40:43 +08:00
guoshzhao 7d947757ea
Benchmarks: Docker Benchmarks - Setup Docker-in-Docker environment (#180)
**Description**
Setup docker environment in docker container.

**Major Revision**
- Install docker client for cuda and rocm images.
- Mount /var/run/docker.sock from host
2021-09-01 16:35:00 +08:00
Yuting Jiang c88ce05611
Benchmarks: Build Pipeline - Support rocm in third_party/makefile (#140)
**Description**
Support rocm in third_party/makefile.

**Major Revision**
- Split rocm and cuda target in makefile
- Add target in dockerfile
2021-07-29 14:18:36 +08:00
Yuting Jiang 419dea265a
Benchmarks: Build Pipeline - Add perftest as a submodule and add build logic (#129)
Add perftest as a submodule and add build logic
2021-07-16 16:11:07 +08:00
Yuting Jiang 8c8beb4b04
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic. (#128)
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic.
2021-07-16 14:51:56 +08:00
Yifan Xiong 25ec3a7c1c
Dockerfile - Update CUDA 11.1.1 Dockerfile (#96)
Update packages and add build cache for CUDA 11.1.1 Dockerfile:

* Remove duplicate cmake and ompi, which are already in base image
* Add hpcx and sharp lib
* Add cache for gitmodules build
* Sort apt-get packages
2021-06-16 16:47:52 +08:00
guoshzhao 331c740a15
Benchmarks: Add Feature - Add nvml package to provide python interfaces of nvidia. (#91) 2021-06-01 23:31:07 +08:00
guoshzhao 40d7905e6b
Benchmarks: Build Pipeline - Add cutlass as a submodule and add build logic. (#85)
* add cutlass as submodule.
* add build script for cutlass.
* only support compute capability 7.0(V100) and 8.0(A100)
2021-06-01 11:17:44 +08:00
guoshzhao 94d3765b49
Benchmarks: Build Pipeline - Call build script when setup environment. (#76)
* call build script in Makefile.
* add cppbuild command for testing and docker env.
2021-05-18 11:49:23 +08:00
Yifan Xiong af6eb004df
CI/CD - Add GitHub Action to build and push image (#70)
* add GitHub Action to build and push image
* update Dockerfile to copy from context
2021-05-17 21:34:15 +08:00
Yifan Xiong 74c4d1b2a4
Setup: Code Revision - Rename dev branch to main in config and readme (#55)
* Rename dev branch to main and set it as default.
2021-04-14 17:40:34 +08:00
Yifan Xiong 8c5273083a
Executor - Fix issues when executing benchmarks (#51)
* fix missing package in dockerfile
* update benchmark list and parameters
* catch runtime errors
* refine logging info
2021-04-13 14:38:11 +08:00
Yifan Xiong 67053d9a1f
Add CUDA dockerfile for superbench (#43)
* add cuda11.1.1 dockerfile
2021-04-12 14:17:10 +08:00