Граф коммитов

397 Коммитов

Автор SHA1 Сообщение Дата
yutingjiang ad5546e449 add and revise gemm test for mi300 fp8 and hipblas test 2023-11-05 13:19:25 +00:00
yutingjiang 2e68db1f0a add and revise gemm test for mi300 fp8 and hipblas test 2023-11-05 13:18:41 +00:00
Yuting Jiang 1537a274e8
Docs - Upgrade version and release note (#557)
**Description**
Upgrade version and release note.


**Major Revision**
- Upgrade package versions
- Add release note for v0.9.0
2023-07-26 13:23:24 +08:00
Yuting Jiang ed56d4db21
CI/CD - add push win docker image on release branch in pipeline (#552)
**Description**
add push win docker image on release branch in pipeline.
2023-07-25 09:50:07 +00:00
Yuting Jiang 1aa5db2596
Benchmarks: bug fix - use absolute path for input file in DirectXEncodingLatency (#554)
**Description**
use absolute path for input file DirectXEncodingLatency.
2023-07-24 18:34:27 +08:00
Yuting Jiang 296cd091bc
CI/CD: pipeline - clean more disk space to fix rocm building image pipeline (#555)
**Description**
clean more disk space to fix rocm building image pipeline.
2023-07-24 16:53:41 +08:00
Yuting Jiang e8ac0b1e28
Benchmarks: micro benchmarks - add python code for DirectXGPUEncodingLatency (#548)
**Description**
add python code for DirectXGPUEncodingLatency.
2023-07-06 15:31:28 +08:00
Yuting Jiang c8c079c2af
Benchmarks: micro benchmarks - add python code for DirectXGPUCopy (#546)
**Description**
add python code for DirectXGPUCopy.
2023-07-06 00:15:32 +08:00
Yuting Jiang af4cfd5bbf
Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547)
**Description**
add python code for DirecXGPUMemBw.
2023-07-05 22:07:13 +08:00
Yuting Jiang f1d608aef7
Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542)
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
2023-07-05 16:56:21 +08:00
Yuting Jiang 3704a432b9
CI/CD - Support DirectX test pipeline (#545)
**Description**
Support DirectX test pipeline.
2023-07-05 11:33:40 +08:00
Yuting Jiang 865472177f
Benchmarks: Build Pipeline - add AMF in third party and build AMF encoding latency test (#543)
**Description**
add AMF in third party and build AMF encoding latency test.
2023-07-03 14:43:21 +00:00
Yuting Jiang 97f7b1df86
Benchmarks: microbenchmark - add auto selecting algorithm support for cudnn functions (#540)
**Description**
add auto selecting algorithm support for cudnn functions.

**Major Revision**
- add auto selecting algorithm support for cudnn functions in source
code
- add 'auto_algo' option in benchmark
- add related test
2023-06-30 12:58:41 +00:00
Lei Qu c7d0beaf9e
Doc - Update outdate references in micro-benchmarks.md (#544)
Modify link for Nvidia bandwidth test tool

**Description**
previous link is 404

**Minor Revision**
update the link value to
https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest
2023-06-30 19:17:41 +08:00
Yifan Xiong 7184bdd1ed
Benchmarks - Update result parsing in tensorrt inference (#541)
* Update result parsing for newer tensorrt versions
* Update arguments when load torchvision models
2023-06-30 11:22:46 +08:00
Yuting Jiang f259913707
Benchmarks: Add benchmark - Add source code of DirectxGPUCopy microbenchmark (#486)
**Description**
Add source code of DirectxGPUCopy microbenchmark.
2023-06-29 19:38:01 +08:00
Yuting Jiang af4d18dedf
Benchmarks: Add benchmark - Add source code of DirectxGPUMemBw microbenchmark (#487)
**Description**
Add source code of DirectxGPUMemBw microbenchmark.

---------

Co-authored-by: v-junlinlv <v-junlinlv@microsoft.com>
2023-06-29 07:03:40 +00:00
Yuting Jiang ed027e4c8e
Tools - Add runner for sys info and update docs (#532)
**Description**
Add runner for sys info to automatically collect on multiple nodes and
update related docs.

**Major Revision**
- add runner for sys info which will check docker status and run `sb
node info` on all nodes' docker and fetch results from all nodes

**Minor Revision**
- update cli and system-info doc
- update sb node info to save output info output-dir/sys-info.json
2023-06-29 06:09:44 +00:00
Yuting Jiang 3a6622f7d3
Benchmarks: Add benchmark - Add source code of DirectXGPUCoreFLOPs microbenchmark (#488)
**Description**
Add source code of DirectXGPUCoreFLOPs microbenchmark.

---------

Co-authored-by: v-junlinlv <v-junlinlv@microsoft.com>
2023-06-29 10:06:14 +08:00
Yuting Jiang 44ef531465
Dockerfile - Add SuperBench Windows Dockerfile (#534)
**Description**
Add dockerfile for win10 and building script for directx_benchmarks.

**Major Revision**
- Add docker file for win10 and required scripts to install the
dependency
- Add building script to build all directx vs benchmarks
- Add call of building script in Makefile

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
2023-06-28 05:35:11 +00:00
Yuting Jiang bbb0e24342
Benchmarks - Add support for DirectX GPU platform (#536)
**Description**
Add support for DirectX GPU platform.

**Major Revision**
- Add DirectX platform for benchmark registry
- Add gpu_vendor identify for AMD and NVIDIA with win driver
2023-06-21 01:58:13 +00:00
guoshzhao e909ddd0ca
Benchmarks - Update outdate references (#539)
**Description**
Update 404 outdate reference links.
2023-06-16 17:50:09 +08:00
Yifan Xiong f4dab9f7ba
Update error message in setup (#538)
Update error message in setup, require wheel for pip>=23.1.
2023-06-14 10:51:45 +08:00
Yifan Xiong a1cd3c9475
Runner - Add signal handler in runner (#530)
Add signal handler in runner to gracefully exit when receiving SIGINT
(<kbd>Ctrl</kbd>+<kbd>C</kbd>) or SIGTERM during benchmark execution.
2023-05-23 17:25:35 +08:00
F̷N̷ 4c0d96e5d8
Docs - Fix typo on kernel_parameters and kernel_modules in system-config (#528)
**Description**
Kernel_parameters and kernel_modules command and examples are exchanged.
2023-05-04 00:55:42 +00:00
guoshzhao f38a9829d0
ModelBenchmarks - Fix early stop logic due to num_steps. (#522)
**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.
2023-04-28 13:15:47 +08:00
Yifan Xiong 664c59a14d
Docs - Update version in README (#529)
Update version in README.
2023-04-28 03:36:11 +00:00
Ziyue Yang 4cb431cab4
Benchmarks - Revise step time collection in distributed inference benchmark (#524)
**Description**
This commit revises distributed inference benchmark to give a unified
step time result by taking maximum step times of different GPUs.
2023-04-24 10:17:49 +08:00
Yifan Xiong 51761b3af1
Release - SuperBench v0.8.0 (#517)
**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)

Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-04-14 12:57:55 +00:00
Yifan Xiong 97c9a41f14
Benchmark - Update TE FP8 model conversion (#499)
__Description__

Update TE FP8 model conversion.

__Major Revisions__
* Add 16-byte alignment comment.
* Fix TE layer parameters type.
2023-03-28 15:01:41 +00:00
Yifan Xiong c88c970943
Benchmarks - Support TE FP8 in BERT/GPT2 models (#496)
Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by
converting linear/layernorm to TE layers.
2023-03-25 19:28:27 +08:00
Ziyue Yang 8daef211dd
Benchmarks - Add distributed inference benchmark (#493)
**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------

Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-24 17:15:17 +08:00
guoshzhao a9b45a072e
Monitor - Support cgroup V2 when read system metrics. (#491)
**Description**
Since ubuntu 22.04 will use cgroup V2 and the file structure changed.
Modify the monitor to adapt to cgroup v1 and v2.
2023-03-22 08:33:18 +00:00
Yifan Xiong dbeba8056b
Benchmark - Support batch/shape range in cublaslt gemm (#494)
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
2023-03-22 13:22:36 +08:00
rafsalas19 655bd0aa59
Adding HPL benchmark (#482)
**Description**

- Adding HPL benchmark

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-21 16:44:08 +00:00
Yifan Xiong 644b5395df
Benchmark - Fix torch.dist init issue with multiple models (#495)
Fix potential barrier timeout in init_process_group due to race
condition of using the same port. Change to different ports when running
multiple models sequentially in one process.
For example, when running vgg11/13/16/19, will use port 29501~29504
respectively.
2023-03-21 12:35:03 +00:00
Yuting Jiang 5a88db1601
Benchmarks: Support error tolerance in micro-benchmark for CuDNN function (#490)
**Description**
Support error tolerance in micro-benchmark for CuDNN function


**Major Revision**
- revise micro_base to support running the remaining commands run when
one command failed in the microbenchmark
- make error tolerance as true in cudnn functions
2023-03-20 21:20:21 +08:00
Yifan Xiong b808135c27
Benchmarks - Support tensor core precisions in cublaslt gemm (#492)
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
2023-03-20 10:59:40 +08:00
dependabot[bot] 139d4df55f
Bump webpack from 5.39.1 to 5.76.1 in /website (#489)
Bumps [webpack](https://github.com/webpack/webpack) from 5.39.1 to 5.76.1.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](webpack/webpack@v5.39.1...v5.76.1)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-03-17 20:18:35 +08:00
Yifan Xiong 35f5390512
Pin setuptools version to v65.7.0 (#483)
Pin setuptools version to
[v65.7.0](https://setuptools.pypa.io/en/latest/history.html#v65-7-0) to
avoid breaking changes since v66.0.0.
2023-03-06 11:43:44 +00:00
Yifan Xiong 2cc4cd03e2
Limit ansible_runner version for Python3.6 (#485)
Limit ansible_runner version to less than 2.3.2 for Python3.6.
2023-03-06 18:54:45 +08:00
Yuting Jiang eba298f5f0
Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark (#479)
**Description**
revise cublas-benchmark for flexible warmup and fill data with fixed
number for perf test to improve the running efficiency.

**Major Revision**
- remove num_in_steps for warmup to support more flexible warmup setting
for users
- Add support to generate input with fixed number for perf test
2023-02-28 06:35:18 +08:00
Yuting Jiang 0292366075
Benchmarks: Build Pipeline - Add suppport for cpu-only perftest in makefile (#480)
**Description**
Add suppport to install cpu-only perftest in makefile.

Co-authored-by: Yuting Jiang <yuting.jiang@microsoft.com>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-02-24 11:19:46 +08:00
Yifan Xiong bbb86c4a83
CI/CD - Free disk space in GitHub Action VHD (#481)
Free more disk space in GitHub Action VHD.
2023-02-23 17:30:39 +08:00
Yuting Jiang ec7f502c93
CI/CD - Upgrade networkx version to fix installation compatibility issue (#478)
**Description**
Upgrade networkx version to fix installation compatibility issue.
2023-02-17 05:36:21 +00:00
dependabot[bot] f041b6eacc
Bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#477)
Bumps [@sideway/formula](https://github.com/sideway/formula) from 3.0.0 to 3.0.1.
- [Release notes](https://github.com/sideway/formula/releases)
- [Commits](hapijs/formula@v3.0.0...v3.0.1)

---
updated-dependencies:
- dependency-name: "@sideway/formula"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-17 02:33:08 +00:00
dependabot[bot] e1a489496c
Bump http-cache-semantics from 4.1.0 to 4.1.1 in /website (#474)
Bumps [http-cache-semantics](https://github.com/kornelski/http-cache-semantics) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/kornelski/http-cache-semantics/releases)
- [Commits](kornelski/http-cache-semantics@v4.1.0...v4.1.1)

---
updated-dependencies:
- dependency-name: http-cache-semantics
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-16 14:14:47 +00:00
rafsalas19 32896ca477
Adding Stream Benchmark (#473)
**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>
2023-02-13 15:34:37 -05:00
Yuting Jiang 62a2913497
Executor - Support SuperBench Executor running on Windows (#475)
**Description**
Support SuperBench Executor running on Windows.

**Major Revision**
- Lazy import ansible related module
2023-02-13 08:20:07 +00:00
pnunna93 f21bfef2f3
Dockerfile: Remove fixed rccl version in rocm5.1.x docker file (#476)
**Description**
The commit(e08b6d3a1c) installs a rccl
version which is causing "undefined symbol: ncclGetLastError" while
trying to import torch. Revert it to avoid the error.
2023-02-07 15:24:26 +08:00