**Description**
Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in
ib-validation
**Major Revision**
- Support cpu-gpu and gpu-cpu in ib-validation
**Minor Revision**
- support multi msg size, multi direction, multi ib commands in
ib-validation
**Description**
Generate baseline given results from multiple nodes.
**Major Revision**
- Add sub command `sb result generate-baseline`
- Add UT and docs
---------
Co-authored-by: 454314380 <454314380@qq.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
**Description**
Cherry-pick bug fixes from v0.9.0 to main.
**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)
**Description**
Add runner for sys info to automatically collect on multiple nodes and
update related docs.
**Major Revision**
- add runner for sys info which will check docker status and run `sb
node info` on all nodes' docker and fetch results from all nodes
**Minor Revision**
- update cli and system-info doc
- update sb node info to save output info output-dir/sys-info.json
**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.
**Description**
Cherry-pick bug fixes from v0.8.0 to main.
**Major Revisions**
* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
**Description**
This PR adds a micro-benchmark of distributed model inference workloads.
**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.
---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
**Description**
Cherry-pick bug fixes from v0.7.0 to main.
**Major Revisions**
* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
**Major Revision**
- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.
**Minor Revision**
- Fix typo
**Description**
Add stdout logging util module and enable real-time logging flushing in executor
**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`
**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks
- udpate related docs
**Description**
Cherry-pick bug fixes from v0.6.0 to main.
**Major Revisions**
* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
**Description**
Add support for both jsonl and json format in data diagnosis.
**Major Revision**
- Add support for both jsonl and json format in data diagnosis
**Minor Revision**
- change related doc
- add jsonl support in cli
__Description__
Update Python setup for require packages.
__Major Revisions__
* downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6
* add extra entry in extras_require for nested packages
* update `pip install` contents accordingly
* Support topo-aware IB performance validation
Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.
To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern topo-aware
--ibstat path to ibstat output
--ibnetdiscover path to ibnetdiscover output
--min_dist minimum distance of VM pairs (optional, default 2)
--max_dist maximum distance of VM pairs (optional, default 6)
The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).
The specified IB test will then be running across these
generated VM pairs.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
* Add description about topology aware ib traffic tests
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
* Add unit test to verify generated topology aware config file
This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
* Fix lint issue on Azure pipeline
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>
Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in
both 1 node and all nodes in one config by changing `node_num`.
Update docs and add test case accordingly.
Fix several issues in ib loopback benchmark:
* use `--report_gbits` and divide by 8 to get GB/s, previous results are
MiB/s / 1000
* use the ib_write_bw binary built in third_party instead of system path
* update the metrics name so that different hca indices have same metric
**Description**
Support multiple IB/GPU devices run simultaneously in ib validation benchmark.
**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.
Closes#326.
**Description**
Support `sb run` on host directly without Docker
**Major Revisions**
- Add `--no-docker` argument for `sb run`.
- Run on host directly if `--no-docker` if specified.
- Update docs and tests correspondingly.
**Description**
Cherry-pick bug fixes from v0.5.0 to main.
**Major Revisions**
* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
**Description**
Integrage result summary and update output format of data diagnosis.
**Major Revision**
- integrage result summary
- add md and html format for data diagnosis
**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.
**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn
**Description**
Add multi-rules feature for data diagnosis to support multiple rules' combined check.
**Major Revision**
- revise rule design to support multiple rules combination check
- update related codes and tests
**Description**
This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.
**Description**
Add timeout feature for each benchmark.
**Major Revision**
- Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
- Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
[ansible.py:80][WARNING] Run failed, return code 254.
- Using `timeout` command to terminate the client process.
__Description__
Update benchmark naming to support annotations.
__Major Revisions__
- Update name for `create_benchmark_context` in executor.
- Backward compatibility for model benchmarks using "_models" suffix.
- Update documents.
**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.