Yifan Xiong
63cf2f1db4
Update
...
Update.
2023-04-10 09:26:36 +08:00
Yifan Xiong
0d6e723c63
Fix num_workers to 0 in data loader
...
Fix num_workers to 0 in data loader.
2023-04-09 17:03:21 +08:00
guoshzhao
103807090f
Monitor - Collect realtime GPU power when benchmarking. ( #507 )
...
**Description**
Collect realtime GPU power when benchmarking.
2023-04-07 20:26:55 +08:00
Yuting Jiang
9f18dea342
Bug - Fix bug to get metric from cmd when error happens ( #506 )
...
**Description**
Fix bug to get metric from cmd when error happens(cudnn-function/_time:4)
2023-04-06 11:13:29 +00:00
Yuting Jiang
14a4a44b01
Analyzer: Fix bug in python3.8 due to pandas api change ( #504 )
...
**Description**
Analyzer: Fix bug in python3.8 due to pandas api change.
**Major Revision**
- force check numeric only in dataframe for analysis
- dataframe.append -> pd.concat
- pd.ExcelWriter.save() -> pd.ExcelWriter.close()
2023-04-06 07:13:48 +00:00
Ziyue Yang
b97ddcf723
Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark ( #505 )
...
**Description**
This commit fixes wrong `torch.empty_like` usage and missing dtype and
device argument in communication wrappers.
2023-04-06 06:16:57 +00:00
Yifan Xiong
9d250cdd0c
Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM ( #503 )
...
Fix matrix size overflow issue when cast from int to size_t implicitly.
2023-04-06 13:16:08 +08:00
guoshzhao
26373edb78
Monitor - Fix the cgroup version checking logic. ( #502 )
...
**Description**
Looks `grep cgroup /proc/filesystems` doesn't work for NDv4 whose cgroup
version is v1, but the result of this command got v2 for NDv4. Instead,
checking the file existence to judge the cgroup version.
2023-04-03 09:22:43 +08:00
Yifan Xiong
97c9a41f14
Benchmark - Update TE FP8 model conversion ( #499 )
...
__Description__
Update TE FP8 model conversion.
__Major Revisions__
* Add 16-byte alignment comment.
* Fix TE layer parameters type.
2023-03-28 15:01:41 +00:00
Yifan Xiong
c88c970943
Benchmarks - Support TE FP8 in BERT/GPT2 models ( #496 )
...
Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by
converting linear/layernorm to TE layers.
2023-03-25 19:28:27 +08:00
Ziyue Yang
8daef211dd
Benchmarks - Add distributed inference benchmark ( #493 )
...
**Description**
This PR adds a micro-benchmark of distributed model inference workloads.
**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.
---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-24 17:15:17 +08:00
guoshzhao
a9b45a072e
Monitor - Support cgroup V2 when read system metrics. ( #491 )
...
**Description**
Since ubuntu 22.04 will use cgroup V2 and the file structure changed.
Modify the monitor to adapt to cgroup v1 and v2.
2023-03-22 08:33:18 +00:00
Yifan Xiong
dbeba8056b
Benchmark - Support batch/shape range in cublaslt gemm ( #494 )
...
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
2023-03-22 13:22:36 +08:00
rafsalas19
655bd0aa59
Adding HPL benchmark ( #482 )
...
**Description**
- Adding HPL benchmark
---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-21 16:44:08 +00:00
Yifan Xiong
644b5395df
Benchmark - Fix torch.dist init issue with multiple models ( #495 )
...
Fix potential barrier timeout in init_process_group due to race
condition of using the same port. Change to different ports when running
multiple models sequentially in one process.
For example, when running vgg11/13/16/19, will use port 29501~29504
respectively.
2023-03-21 12:35:03 +00:00
Yuting Jiang
5a88db1601
Benchmarks: Support error tolerance in micro-benchmark for CuDNN function ( #490 )
...
**Description**
Support error tolerance in micro-benchmark for CuDNN function
**Major Revision**
- revise micro_base to support running the remaining commands run when
one command failed in the microbenchmark
- make error tolerance as true in cudnn functions
2023-03-20 21:20:21 +08:00
Yifan Xiong
b808135c27
Benchmarks - Support tensor core precisions in cublaslt gemm ( #492 )
...
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
2023-03-20 10:59:40 +08:00
dependabot[bot]
139d4df55f
Bump webpack from 5.39.1 to 5.76.1 in /website ( #489 )
...
Bumps [webpack](https://github.com/webpack/webpack ) from 5.39.1 to 5.76.1.
- [Release notes](https://github.com/webpack/webpack/releases )
- [Commits](webpack/webpack@v5.39.1...v5.76.1)
---
updated-dependencies:
- dependency-name: webpack
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-03-17 20:18:35 +08:00
Yifan Xiong
35f5390512
Pin setuptools version to v65.7.0 ( #483 )
...
Pin setuptools version to
[v65.7.0](https://setuptools.pypa.io/en/latest/history.html#v65-7-0 ) to
avoid breaking changes since v66.0.0.
2023-03-06 11:43:44 +00:00
Yifan Xiong
2cc4cd03e2
Limit ansible_runner version for Python3.6 ( #485 )
...
Limit ansible_runner version to less than 2.3.2 for Python3.6.
2023-03-06 18:54:45 +08:00
Yuting Jiang
eba298f5f0
Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark ( #479 )
...
**Description**
revise cublas-benchmark for flexible warmup and fill data with fixed
number for perf test to improve the running efficiency.
**Major Revision**
- remove num_in_steps for warmup to support more flexible warmup setting
for users
- Add support to generate input with fixed number for perf test
2023-02-28 06:35:18 +08:00
Yuting Jiang
0292366075
Benchmarks: Build Pipeline - Add suppport for cpu-only perftest in makefile ( #480 )
...
**Description**
Add suppport to install cpu-only perftest in makefile.
Co-authored-by: Yuting Jiang <yuting.jiang@microsoft.com>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-02-24 11:19:46 +08:00
Yifan Xiong
bbb86c4a83
CI/CD - Free disk space in GitHub Action VHD ( #481 )
...
Free more disk space in GitHub Action VHD.
2023-02-23 17:30:39 +08:00
Yuting Jiang
ec7f502c93
CI/CD - Upgrade networkx version to fix installation compatibility issue ( #478 )
...
**Description**
Upgrade networkx version to fix installation compatibility issue.
2023-02-17 05:36:21 +00:00
dependabot[bot]
f041b6eacc
Bump @sideway/formula from 3.0.0 to 3.0.1 in /website ( #477 )
...
Bumps [@sideway/formula](https://github.com/sideway/formula ) from 3.0.0 to 3.0.1.
- [Release notes](https://github.com/sideway/formula/releases )
- [Commits](hapijs/formula@v3.0.0...v3.0.1)
---
updated-dependencies:
- dependency-name: "@sideway/formula"
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-02-17 02:33:08 +00:00
dependabot[bot]
e1a489496c
Bump http-cache-semantics from 4.1.0 to 4.1.1 in /website ( #474 )
...
Bumps [http-cache-semantics](https://github.com/kornelski/http-cache-semantics ) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/kornelski/http-cache-semantics/releases )
- [Commits](kornelski/http-cache-semantics@v4.1.0...v4.1.1)
---
updated-dependencies:
- dependency-name: http-cache-semantics
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-02-16 14:14:47 +00:00
rafsalas19
32896ca477
Adding Stream Benchmark ( #473 )
...
**Description**
- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream
---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>
2023-02-13 15:34:37 -05:00
Yuting Jiang
62a2913497
Executor - Support SuperBench Executor running on Windows ( #475 )
...
**Description**
Support SuperBench Executor running on Windows.
**Major Revision**
- Lazy import ansible related module
2023-02-13 08:20:07 +00:00
pnunna93
f21bfef2f3
Dockerfile: Remove fixed rccl version in rocm5.1.x docker file ( #476 )
...
**Description**
The commit(e08b6d3a1c
) installs a rccl
version which is causing "undefined symbol: ncclGetLastError" while
trying to import torch. Revert it to avoid the error.
2023-02-07 15:24:26 +08:00
dependabot[bot]
121a5ddc5e
Bump ua-parser-js from 0.7.28 to 0.7.33 in /website ( #469 )
...
Bumps [ua-parser-js](https://github.com/faisalman/ua-parser-js ) from 0.7.28 to 0.7.33.
- [Release notes](https://github.com/faisalman/ua-parser-js/releases )
- [Changelog](https://github.com/faisalman/ua-parser-js/blob/master/changelog.md )
- [Commits](faisalman/ua-parser-js@0.7.28...0.7.33)
---
updated-dependencies:
- dependency-name: ua-parser-js
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2023-01-30 10:58:24 +08:00
Yifan Xiong
b07fda155e
Release - SuperBench v0.7.0 ( #468 )
...
**Description**
Cherry-pick bug fixes from v0.7.0 to main.
**Major Revisions**
* Benchmarks - Fix missing include in FP8 benchmark (#460 )
* Fix bug in TE BERT model (#461 )
* Doc - Update benchmark doc (#465 )
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464 )
* Support `sb deploy` without pulling image (#466 )
* Docs - Upgrade version and release note (#467 )
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-01-28 11:07:06 +08:00
Yuting Jiang
f380bc5eff
Bug: Fix bug for incorrect datatype judgement in cublas-function source code ( #462 )
...
**Description**
Fix bug for incorrect datatype judgement in cublas-function source code.
2023-01-17 10:51:57 +08:00
dependabot[bot]
65bae28c0d
Bump json5 from 1.0.1 to 1.0.2 in /website ( #459 )
...
Bumps [json5](https://github.com/json5/json5 ) from 1.0.1 to 1.0.2.
- [Release notes](https://github.com/json5/json5/releases )
- [Changelog](https://github.com/json5/json5/blob/main/CHANGELOG.md )
- [Commits](json5/json5@v1.0.1...v1.0.2)
---
updated-dependencies:
- dependency-name: json5
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-09 17:10:40 +08:00
Yang Wang
ccccd988df
Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark ( #454 )
...
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
2023-01-04 12:30:32 +00:00
Yang Wang
8e748d5649
Runner - Generate host groups file in mpi mode ( #458 )
...
**Major Revision**
- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.
**Minor Revision**
- Fix typo
2023-01-04 19:49:14 +08:00
Yifan Xiong
5197cdf5cb
Benchmarks - Support FP8 in BERT models ( #446 )
...
Support FP8 in PyTorch BERT models:
* add fp8 hybrid/e4m3/e5m2 in precision arguments
* build BERT encoders with `te.TransformerLayer` to repalce
`transformers.BertModel`
* wrap forward steps with fp8 autocast
2023-01-04 11:12:05 +08:00
Yang Wang
65e433c0c6
Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode ( #437 )
...
**Description**
Support the following patterns in `mpi` mode:
* `k-batch`
* `topo-aware`
2023-01-03 10:28:35 +00:00
Yifan Xiong
fc661f7db3
Support GEMM benchmark on Hopper GPUs ( #456 )
...
Support GEMM benchmark on Hopper GPUs.
2023-01-03 09:45:27 +00:00
Yifan Xiong
616e7a5a5a
Benchmarks - Integrate cublaslt micro-benchmark ( #455 )
...
Integrate cublaslt-gemm micro-benchmark #451 .
2023-01-03 08:54:40 +00:00
Yuting Jiang
75573f59da
Benchmarks: Micro benchmarks - Add correctness check in cublas-function benchmark ( #452 )
...
**Description**
Add correctness check in cublas-function benchmark.
**Major Revision**
- add python code of correctness check in cublas-function benchmark and test
2023-01-03 14:59:30 +08:00
Yifan Xiong
0591da5f49
Benchmarks - Add cuBLASLt FP16 and FP8 GEMM micro-benchmark ( #451 )
...
Add micro-benchmark for cublaslt fp8 gemm.
2023-01-03 05:28:56 +00:00
Yuting Jiang
678b1251f1
Benchmarks: Micro benchmarks - add source code of correctness check for cublas functions ( #450 )
...
**Description**
Add c source code of correctness check for cublas functions.
**Major Revision**
- add correctness check for all supported cublas functions
- add --correctness option into binary
**Minor Revision**
- fix bug and template fill_data and prepare_tensor to get right memory-alignment output matrix for different datatype
2023-01-03 04:20:10 +00:00
Yuting Jiang
9dfefce350
Executor - Add stdout logging util module and enable real-time logging flushing in executor ( #445 )
...
**Description**
Add stdout logging util module and enable real-time logging flushing in executor
**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`
**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks
- udpate related docs
2022-12-30 09:40:28 +00:00
Yang Wang
f2634d8608
Benchmarks - Support `pair-wise` pattern in IB validation benchmark ( #453 )
...
**Description**
* Reuse `gen_pair_wise_config` in micro-benchmark
2022-12-30 13:02:52 +08:00
Yifan Xiong
a3c65b2a57
Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs ( #449 )
...
Add Docker image for arch90 NVIDIA GPUs:
* add CUDA11.8 Dockerfile
* update archs in Makefile and benchmarks accordingly
* update image build pipeline
2022-12-29 12:19:38 +00:00
Yang Wang
7838b6b154
Runner - Support `pair-wise` pattern in `mpi` mode ( #447 )
...
* Extract pair-wise pattern from ib_validation
2022-12-29 08:23:36 +00:00
dependabot[bot]
6186146d59
Bump qs and express in /website ( #440 )
...
Bumps [qs](https://github.com/ljharb/qs ) and [express](https://github.com/expressjs/express ). These dependencies needed to be updated together.
Updates `qs` from 6.7.0 to 6.11.0
- [Release notes](https://github.com/ljharb/qs/releases )
- [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md )
- [Commits](https://github.com/ljharb/qs/compare/v6.7.0...v6.11.0 )
Updates `express` from 4.17.1 to 4.18.2
- [Release notes](https://github.com/expressjs/express/releases )
- [Changelog](https://github.com/expressjs/express/blob/master/History.md )
- [Commits](https://github.com/expressjs/express/compare/4.17.1...4.18.2 )
---
updated-dependencies:
- dependency-name: qs
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-28 13:47:06 +08:00
dependabot[bot]
de6deb0e2d
Bump decode-uri-component from 0.2.0 to 0.2.2 in /website ( #439 )
...
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component ) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases )
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2 )
---
updated-dependencies:
- dependency-name: decode-uri-component
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-27 13:40:49 +08:00
Yuting Jiang
6583ba2e40
Benchmark: Revision - Add wait time option to resolve mem-bw unstable issue ( #438 )
...
**Description**
Add wait time option to resolve mem-bw unstable issue.
2022-12-14 17:21:02 +08:00
Yuting Jiang
1deb2eaa29
downgrage transformers version to fix tersorrt ( #441 )
...
**Description**
Downgrage transformers version to fix tersorrt test failure.
2022-12-14 14:19:32 +08:00