**Description**
Add multi-rules feature for data diagnosis to support multiple rules' combined check.
**Major Revision**
- revise rule design to support multiple rules combination check
- update related codes and tests
**Description**
Add support for pytorch>=1.9.0 of init_process_group.
**Major Revision**
- Use PrefixStore(TCPStore) to init_process_group manully for each model run
**Description**
This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.
**Description**
This commit does the following to optimize result variance in gpu_copy benchmark:
1) Add warmup phase for gpu_copy benchmark to avoid timing instability caused by first-time CUDA kernel launch overhead;
2) Use CUDA events for timing instead of CPU timestamps;
3) Make data checking an option that is not preferred to be enabled in performance test;
4) Enlarge message size in performance benchmark.
**Description**
Please write a brief description and link the related issue if have.
**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.
**Description**
Add timeout feature for each benchmark.
**Major Revision**
- Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
- Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
[ansible.py:80][WARNING] Run failed, return code 254.
- Using `timeout` command to terminate the client process.
__Description__
Update benchmark naming to support annotations.
__Major Revisions__
- Update name for `create_benchmark_context` in executor.
- Backward compatibility for model benchmarks using "_models" suffix.
- Update documents.
**Description**
Fix insecure issue of Multiplication result converted to larger type.
**Major Revision**
- Use a cast to ensure that the multiplication is done using the long long to avoid overflow.
**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
__Description__
Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks.
<details>
<summary>Examples</summary>
<pre>
$ sb benchmark list -n [a-z]+-bw -o table
Result
--------
mem-bw
nccl-bw
rccl-bw
</pre>
<pre>
$ sb benchmark list-parameters -n mem-bw
=== mem-bw ===
optional arguments:
--bin_dir str Specify the directory of the benchmark binary.
--duration int The elapsed time of benchmark in seconds.
--mem_type str [str ...]
Memory types to benchmark. E.g. htod dtoh dtod.
--memory str Memory argument for bandwidthtest. E.g. pinned unpinned.
--run_count int The run count of benchmark.
--shmoo_mode Enable shmoo mode for bandwidthtest.
default values:
{'bin_dir': None,
'duration': 0,
'mem_type': ['htod', 'dtoh'],
'memory': 'pinned',
'run_count': 1}
</pre>
</details>
__Major Revisions__
* Add `sb benchmark list` to list benchmarks matching given name.
* Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name.
__Minor Revisions__
* Sort format help text for argparse.
__Description__
Cherry-pick bug fixes from v0.4.0 to main.
__Major Revisions__
* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
**Description**
Add ONNXRuntime inference benchmark based on ORT python API.
**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.
**Description**
Integrate monitor into Superbench.
**Major Revision**
- Initialize, start and stop monitor in SB executor.
- Parse the monitor data in SB runner and merge into benchmark results.
- Specify ReduceType for monitor metrics, such as MAX, MIN and LAST.
- Add monitor configs into config file.
**Description**
Add data diagnosis module.
**Major Revision**
- Add DataDiagnosis class to support rule-based data diagnosis for result summary jsonl file of multi nodes
- Add RuleOp class to define rule operators
**Description**
If `ignore_invalid` is True, and 'required' arguments are not set when register the benchmark, the arguments should be provided by user in config and skip the arguments checking.