**Description**
This PR adds a micro-benchmark of distributed model inference workloads.
**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.
---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn
**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
**Description**
Add ONNXRuntime inference benchmark based on ORT python API.
**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.
**Description**
This commit does the following:
1) Adds CPU-initiated copy benchmark;
2) Adds dtod benchmark;
3) Support scanning NUMA nodes and GPUs inside the benchmark program;
4) Change the name of gpu-sm-copy to gpu-copy.
**Description**
Add gpcnet microbenchmark
**Major Revision**
- add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test
- add related test and example file
**Description**
Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile.
**Major Revision**
- Add tcp connectivity validation microbenchmark and related test, example
**Description**
Add gemm flops microbenchmark for amd.
**Major Revision**
- Add gemm flops microbenchmark for amd.
- Add related example and test file.
**Description**
Add memory bus bandwidth performance microbenchmark for amd.
**Major Revision**
- Add memory bus bandwidth performance microbenchmark for amd.
- Add related example and test file.
**Description**
Add disk performance microbenchmark.
**Major Revision**
- Add microbenchmark, example, test, config for disk performance.
**Minor Revision**
- Fix bugs in executor unit test related to default enabled tests.
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile
* add benchmark for cublas test
* format
* revise error handling and test
* add interface to read json file, revise json file path and include .json in packaging
* add random_seed in arguments
* revise preprocess of cublas benchmark
* fix lint error and note error in source code
* update according comments
* revise input arguments from json file to custom str and convert json file to built-in dict list
* restore package config
* fit lint issue
* update platform and comments
* rename files to match source code dir and fix comments error
Co-authored-by: root <root@sb-validation-000001.51z1chmys5fuzfqyo4niepozre.bx.internal.cloudapp.net>
* Benchmarks: Add Benchmark - add computation and communication overlap micro benchmark
* Benchmarks: Add benchmark - fix some format issues and typo
* Benchmarks: Add Benchmark - update according comments and add test
* revise tests
* skip multi gpu test due to no multi gpu
Co-authored-by: v-yujiang <v-yujiang@microsoft.com>
* add bert-large as the model benchmark example
* add more arguments.
* address comments.
* delete duplicated file.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>