Граф коммитов

86 Коммитов

Автор SHA1 Сообщение Дата
omahs a4c87da0ac
Docs - fix typos (#628)
Docs - fix typos
2024-07-25 03:49:19 +00:00
Yifan Xiong 2c88db907f
Release - SuperBench v0.10.0 (#607)
**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606

Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>
2024-01-08 05:40:52 +00:00
Ziyue Yang 719a427fe7
Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation (#586)
**Description**
Add distributed inference benchmark cpp implementation.
2023-12-11 06:53:51 +08:00
Yuting Jiang 1f5031bd74
Dockerfile - Upgrade to rocm5.7 dockerfile (#587)
**Description**
upgrade to rocm5.7 dockerfile.

---------

Co-authored-by: yukirora <yuting.jiang@microsoft.com>
2023-12-09 17:41:12 +00:00
Ziyue Yang 4fa60be7cd
Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
**Description**
Add one-to-all, all-to-one, all-to-all support to
gpu_copy_bw_performance, and fix performance bug in gpu_copy
2023-12-08 23:22:38 +08:00
Yuting Jiang dd5a6329ed
Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
2023-12-07 09:37:09 +08:00
Yuting Jiang 9ae8c67093
Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation (#581)
**Description**
Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in
ib-validation

**Major Revision**
- Support cpu-gpu and gpu-cpu in ib-validation


**Minor Revision**
- support multi msg size, multi direction, multi ib commands in
ib-validation
2023-12-04 22:20:46 +08:00
guoshzhao 9f4880cb8e
Analyzer - Generate baseline given results from multiple nodes. (#575)
**Description**
Generate baseline given results from multiple nodes. 

**Major Revision**
- Add sub command `sb result generate-baseline`
- Add UT and docs

---------

Co-authored-by: 454314380 <454314380@qq.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-11-22 14:42:32 +08:00
Yuting Jiang e1df877bfe
Release - SuperBench v0.9.0 (#558)
**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)
2023-07-27 10:42:31 +08:00
Lei Qu c7d0beaf9e
Doc - Update outdate references in micro-benchmarks.md (#544)
Modify link for Nvidia bandwidth test tool

**Description**
previous link is 404

**Minor Revision**
update the link value to
https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest
2023-06-30 19:17:41 +08:00
Yuting Jiang ed027e4c8e
Tools - Add runner for sys info and update docs (#532)
**Description**
Add runner for sys info to automatically collect on multiple nodes and
update related docs.

**Major Revision**
- add runner for sys info which will check docker status and run `sb
node info` on all nodes' docker and fetch results from all nodes

**Minor Revision**
- update cli and system-info doc
- update sb node info to save output info output-dir/sys-info.json
2023-06-29 06:09:44 +00:00
F̷N̷ 4c0d96e5d8
Docs - Fix typo on kernel_parameters and kernel_modules in system-config (#528)
**Description**
Kernel_parameters and kernel_modules command and examples are exchanged.
2023-05-04 00:55:42 +00:00
guoshzhao f38a9829d0
ModelBenchmarks - Fix early stop logic due to num_steps. (#522)
**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.
2023-04-28 13:15:47 +08:00
Yifan Xiong 51761b3af1
Release - SuperBench v0.8.0 (#517)
**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)

Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-04-14 12:57:55 +00:00
Ziyue Yang 8daef211dd
Benchmarks - Add distributed inference benchmark (#493)
**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------

Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-24 17:15:17 +08:00
Yifan Xiong dbeba8056b
Benchmark - Support batch/shape range in cublaslt gemm (#494)
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
2023-03-22 13:22:36 +08:00
rafsalas19 655bd0aa59
Adding HPL benchmark (#482)
**Description**

- Adding HPL benchmark

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2023-03-21 16:44:08 +00:00
rafsalas19 32896ca477
Adding Stream Benchmark (#473)
**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------

Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>
2023-02-13 15:34:37 -05:00
Yifan Xiong b07fda155e
Release - SuperBench v0.7.0 (#468)
**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)

Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2023-01-28 11:07:06 +08:00
Yang Wang ccccd988df
Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454)
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
2023-01-04 12:30:32 +00:00
Yang Wang 8e748d5649
Runner - Generate host groups file in mpi mode (#458)
**Major Revision**

- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.

**Minor Revision**
- Fix typo
2023-01-04 19:49:14 +08:00
Yang Wang 65e433c0c6
Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437)
**Description**
Support the following patterns  in `mpi` mode:
* `k-batch`
* `topo-aware`
2023-01-03 10:28:35 +00:00
Yifan Xiong 616e7a5a5a
Benchmarks - Integrate cublaslt micro-benchmark (#455)
Integrate cublaslt-gemm micro-benchmark #451.
2023-01-03 08:54:40 +00:00
Yuting Jiang 9dfefce350
Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445)
**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs
2022-12-30 09:40:28 +00:00
Yang Wang 7838b6b154
Runner - Support `pair-wise` pattern in `mpi` mode (#447)
* Extract pair-wise pattern from ib_validation
2022-12-29 08:23:36 +00:00
Yang Wang e4eeda0afd
Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430)
* add mpi-parallels mode

* update according to comments

* fix and update doc

* update

* merge into 'mpi' mode

* udpate according to comments

* fix testcases

* fix ansible

* regard pattern as field

* udpate

* fix flake8 version

* add flake8 range

* remove map-by from host config

* udpate comments
2022-11-29 12:30:10 +08:00
Yifan Xiong 63e9b2d1bc
Release - SuperBench v0.6.0 (#409)
**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)

Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
2022-09-06 18:06:05 +08:00
Yuting Jiang 10a79c4ea8
Analyzer - Add support for both jsonl and json format in data diagnosis (#388)
**Description**
Add support for both jsonl and json format in data diagnosis.

**Major Revision**
- Add support for both jsonl and json format in data diagnosis


**Minor Revision**
- change related doc
- add jsonl support in cli
2022-08-22 10:57:00 +08:00
Yifan Xiong 626ac0a463
Update Python setup for require packages (#387)
__Description__

Update Python setup for require packages.

__Major Revisions__
* downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6
* add extra entry in extras_require for nested packages
* update `pip install` contents accordingly
2022-08-17 11:33:57 +08:00
Jie Zhang ef4d65745b
Support topo-aware IB performance validation (#373)
* Support topo-aware IB performance validation

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.

Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add description about topology aware ib traffic tests

Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add unit test to verify generated topology aware config file

This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.

Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Fix lint issue on Azure pipeline

Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
2022-07-26 16:56:19 -07:00
Yifan Xiong e00a8180f6
Support node_num=1 in mpi mode (#372)
Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in
both 1 node and all nodes in one config by changing `node_num`.
Update docs and add test case accordingly.
2022-07-08 09:24:17 +08:00
Yifan Xiong 620192a242
Fix issues in ib loopback benchmark (#369)
Fix several issues in ib loopback benchmark:
* use `--report_gbits` and divide by 8 to get GB/s, previous results are
  MiB/s / 1000
* use the ib_write_bw binary built in third_party instead of system path
* update the metrics name so that different hca indices have same metric
2022-06-29 17:53:02 +00:00
Yifan Xiong bfaa1c837b
Support multiple IB/GPU in ib validation (#363)
**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.
2022-06-24 08:35:20 +00:00
Yifan Xiong a4937e95c6
Support `sb run` on host directly without Docker (#358)
**Description**

Support `sb run` on host directly without Docker

**Major Revisions**
- Add `--no-docker` argument for `sb run`.
- Run on host directly if `--no-docker` if specified.
- Update docs and tests correspondingly.
2022-06-14 10:57:01 +08:00
Yifan Xiong 6681c72043
Release - SuperBench v0.5.0 (#350)
**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)

Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
2022-04-29 16:22:55 +08:00
Yuting Jiang 712eafc373
Docs - Update links using relative file paths with extensions (#346)
**Description**
Update links of referencing other docs using relative file paths with extensions.
2022-04-21 07:28:19 +08:00
Jared Bowden cb26691173
Docs - Update link to cli.md (#341)
**Description**
Fixes relative link in documentation: point to `../cli.md`.
2022-04-15 22:11:14 +08:00
Yuting Jiang 8dc19ca4af
CLI - Integrate output all nodes diagnosis results (#339)
**Description**
Integrate output all nodes diagnosis results.
2022-04-11 13:42:04 +08:00
Yuting Jiang 56c9a711a8
Docs - Add usage for result summary (#337)
**Description**
Add usage for result summary.
2022-04-08 20:44:25 +00:00
Yuting Jiang f15da60b2b
CLI - Integrage result summary and update output format of data diagnosis (#335)
**Description**
Integrage result summary and update output format of data diagnosis.

**Major Revision**
- integrage result summary 
- add md and html format for data diagnosis
2022-04-08 18:48:43 +08:00
guoshzhao 6d895da83c
Benchmarks: Add Feature - Provide option to save raw data into file. (#333)
**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.
2022-04-01 16:26:09 +08:00
rafsalas19 ff51a3cee9
Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324)
**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn
2022-03-16 16:20:11 +08:00
Yuting Jiang 97ed12f97f
Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289)
**Description**
Add multi-rules feature for data diagnosis to support multiple rules' combined check.

**Major Revision**
- revise rule design to support multiple rules combination check
- update related codes and tests
2022-02-20 16:59:38 +08:00
Ziyue Yang 6cdf759543
Benchmarks: Revise Code - Eliminate NUMA binding for device-to-device tests in gpu_copy (#302)
**Description**
This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.
2022-02-09 20:30:42 +08:00
Yuting Jiang 28195be6db
Bug - Fix typo in document (#297)
Fix typo in document.
2022-01-30 13:38:00 +08:00
Yifan Xiong 3524975cfc
Config - Support customized env for all modes (#295)
Support customized env for all modes in configuration.
2022-01-29 08:19:48 +00:00
guoshzhao d877ca2322
Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288)
**Description**
Add timeout feature for each benchmark.

**Major Revision**
- Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
- Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
   [ansible.py:80][WARNING] Run failed, return code 254.
- Using `timeout` command to terminate the client process.
2022-01-28 08:16:32 +00:00
Yifan Xiong 7d7cd3dc63
Config - Update benchmark naming to support annotations (#284)
__Description__

Update benchmark naming to support annotations.

__Major Revisions__
- Update name for `create_benchmark_context` in executor.
- Backward compatibility for model benchmarks using "_models" suffix.
- Update documents.
2022-01-25 09:54:58 +00:00
Ziyue Yang 74421ffee0
Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285)
**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
2022-01-21 13:45:37 +08:00
guoshzhao fd2bc9e048
Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283)
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
2022-01-19 10:49:56 +08:00