Граф коммитов

305 Коммитов

Автор SHA1 Сообщение Дата
Yuting Jiang 666e3a9471
Benchmarks: Add Benchmark - Add memory bus bandwidth performance microbenchmark for amd (#153)
**Description**
Add memory bus bandwidth performance microbenchmark for amd.

**Major Revision**
- Add memory bus bandwidth performance microbenchmark for amd.
- Add related example and test file.
2021-08-27 21:17:39 +08:00
Ziyue Yang 2880f71ef0
Benchmarks: Add Benchmark - Add GPU SM copy benchmark (#162)
**Description**
This commit adds the benchmark program for GPU-initiated data transfer benchmark.
2021-08-27 17:10:34 +08:00
Yuting Jiang 958ebc0e33
Benchmarks: Fix Bug - fix bug of microbenmark building cublas and cudnn for amd in build pipeline (#166)
**Description**
Fix bug of microbenmark building cublas and cudnn for amd

**Major Revision**
- remove cuda LANGUAGES in project()
- check CUDAToolkit quiet and then build if found
2021-08-27 08:07:19 +08:00
Yuting Jiang 34cd2e8ca1
Benchmarks: Code Revision - Rename computation_communication_overlap microbenchmark metric (#167)
**Description**
Rename computation_communication_overlap microbenchmark metric .

**Major Revision**
- remove rank info in metric.
- simplify and rename metric.
2021-08-26 17:05:08 +08:00
Yuting Jiang e5e84a2ece
Benchmarks: Code Revision - Extract base class for memory bandwidth microbenchmark (#159)
**Description**
extract base class for memory bandwidth microbenchmark.

**Major Revision**
- revise and optimize cuda_memory_bandwidth_performance
- extract base class for memory bandwidth microbenchmark
- add test for base class
2021-08-26 07:48:07 +08:00
Yuting Jiang 0583862d2d
Benchmarks: Code Revision - fix typo in test of nccl microbenchmark. (#163)
**Description**
 fix typo in test_nccl_bw_performance.py.

**Major Revision**
-  fix typo in test_nccl_bw_performance.py.
2021-08-23 13:53:47 +08:00
Ziyue Yang 6774d7b702
Benchmarks: Revise Benchmark - Add readwrite I/O pattern (#161)
**Description**
This commit adds readwrite I/O pattern for FIO benchmark. Read/write ratio is fixed at 4:1.
2021-08-22 22:38:25 +08:00
guoshzhao 7595d79434
Runner: Add Feature - Generate summarized output files. (#157)
**Description**
Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op`

**Major Revision**
- Generate the summarized json file per node:
For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]`
For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}`
`[]` means optional.
```
{
  "kernel-launch/overhead_event:0": 0.00583,
  "kernel-launch/overhead_event:1": 0.00545,
  "kernel-launch/overhead_event:2": 0.00581,
  "kernel-launch/overhead_event:3": 0.00572,
  "kernel-launch/overhead_event:4": 0.00559,
  "kernel-launch/overhead_event:5": 0.00591,
  "kernel-launch/overhead_event:6": 0.00562,
  "kernel-launch/overhead_event:7": 0.00586,
  "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134,
  "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773,
  "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677,
  "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973,
  "pytorch-sharding-matmul/0/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/1/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/0/allgather": 10.088025093078613,
  "pytorch-sharding-matmul/1/allgather": 10.088025093078613
}
```
- Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.
2021-08-20 16:48:40 +08:00
Yuting Jiang a1e5c90d43
Benchmarks: Build Pipeline - Add build logic of hipBusBandwidth in third_party (#151)
**Description**
Add build logic of hipBusBandwidth in third_party.

**Major Revision**
- Add build logic of hipBusBandwidth in third_party
2021-08-20 14:32:14 +08:00
Yifan Xiong 98b6c0e3ca
Runner - Support mpi mode (#146)
Support mpi mode in runner:
* concate mpirun command
* support mca and env config
* prepare hostfile and update Ansible host pattern

Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
2021-08-19 15:59:17 +08:00
Yifan Xiong 96fc4d09dd
Docs - Add config and docs for development experience (#155)
Add config and docs for development experience.

__Major Revision__
- Add settings and extensions config for VSCode.
- Add devcontainer config for Codespaces.
- Update document accordingly.
2021-08-16 13:49:19 +08:00
guoshzhao 7293e783f1
Benchmarks: Code Revision - change 'reduce' to 'reduce_op' (#156)
**Description**
Change the field name `reduce` to `reduce_op`.
2021-08-16 11:33:39 +08:00
Yifan Xiong 783c91258d
Docs - Add docs for Docker container and image (#154)
Add docs on:
* Docker image tag list
* Build image and run container instructions
2021-08-12 16:09:52 +08:00
guoshzhao d23ad898b0
Benchmarks: Doc Revision - Add ReduceType into benchmarks doc. (#150)
Add ReduceType description into benchmarks doc.
2021-08-09 16:52:09 +08:00
guoshzhao acf365a856
Benchmarks: Add Feature - Set reduce type for current benchmarks' metrics. (#149)
**Description**
Set reduce type for current benchmarks' metrics, including model benchmarks and ShardingMatmul.
2021-08-06 17:23:14 +08:00
guoshzhao bc1a61b91a
Benchmarks: Code Revision - Calculate average value by using statistics module. (#148)
**Description**
Replace `sum(results) / len(results)` with `statistics.mean(results)`
2021-08-06 13:37:18 +08:00
guoshzhao e41b1f6225
Benchmarks: Add Feature - Add reduce function support for output summary. (#147)
**Description**
Add reduce function support for output summary.

**Major Revision**
- Add reducer class to maintain all reduce functions.
- Save reduce type of each metric into `BenchmarkResult`
- Fix UT.
2021-08-05 16:52:49 +08:00
Yuting Jiang 86c390a912
Benchmarks: Build Pipeline - Add rocBLAS building logic in third_party (#144)
**Description**
Add rocBLAS building logic in third_party.

**Major Revision**
- Add rocm_rocblas target in third_party/Makefile.
- Add rocblas building logic
2021-08-02 17:30:02 +08:00
TobeyQin 03d1fcacbe
Docs - Add design document (#125)
**Description**
Add Executor and Benchmarks design doc

**Major Revision**
- Add Executor design doc
- Add Benchmarks design doc
2021-08-02 15:25:25 +08:00
Yuting Jiang 157b4e2dd1
Benchmarks: Add Benchmark - Revise and add rccl microbenchmark for rocm (#143)
**Description**
Add rccl bandwidth microbenchmark for rocm.

**Major Revision**
- Register rccl-bw benchmark.
2021-07-30 15:45:32 +08:00
Yuting Jiang a532eee414
Benchmarks: Build Pipeline - add rccl-tests as a submodule with building logic (#139)
**Description**
Support rocm in third_party/makefile and add rccl-tests as a submodule with building logic.

**Major Revision**
- Support rocm in third_party/makefile
- Add rccl-tests as a submodule 
- Add build logic in third_party/Makefile for rccl-tests
2021-07-30 07:27:08 +08:00
Yifan Xiong 69b2c631fc
Release - SuperBench v0.2.1 (#142)
__Description__
Cherry-pick bug fixes from v0.2.1 to main.

__Major Revisions__
* Fix bug of VGG models failed on A100 GPU with batch_size=128.
* Fix Ansible connection issue when running in localhost.
* Update version in packages and docs.
2021-07-29 17:52:28 +08:00
Yuting Jiang c88ce05611
Benchmarks: Build Pipeline - Support rocm in third_party/makefile (#140)
**Description**
Support rocm in third_party/makefile.

**Major Revision**
- Split rocm and cuda target in makefile
- Add target in dockerfile
2021-07-29 14:18:36 +08:00
Yuting Jiang 1ee8f7dcf5
Benchmarks: Add Benchmark - Add the source code of rocm kernel launch overhead benchmark. (#136)
**Description**
Add the source code of rocm kernel launch overhead benchmark. 

**Major Revision**
- Revise cmake build logic to support both cuda and rocm
2021-07-27 22:22:31 +08:00
Yuting Jiang fdc33f406c
Benchmarks: Build Pipeline - Support rocm cmake build (#137)
**Description**
Support rocm cmake build. 

**Major Revision**
- Add  some envs in rocm_common.cmake to support rocm cmake build.
2021-07-27 15:00:52 +08:00
Yuting Jiang e083a598cf
Benchmarks: Add Benchmark - Add NCCL performance benchmark (#113)
**Description**
Add NCCL performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for NCCL
2021-07-26 10:54:47 +08:00
Yuting Jiang b0c5addcac
Benchmarks: Add Benchmark - Add IB Loopback performance benchmark. (#112)
**Description**
Add RDMA Loopback performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for RDMA Loopback
2021-07-24 03:40:24 +08:00
Ziyue Yang db297fb4ed
Benchmarks: Add Benchmark - Add disk performance benchmark (#132)
**Description**
Add disk performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for disk performance.

**Minor Revision**
- Fix bugs in executor unit test related to default enabled tests.
2021-07-23 14:49:05 +08:00
TobeyQin 702fb1eb37
Docs - Add result contributing rules (#131)
**Description**
Add result contributing rules
2021-07-21 16:11:33 +08:00
TobeyQin 4dbbe85cf1
Docs - Add release process doc (#130)
**Description**
Add release process document.
2021-07-21 06:25:00 +08:00
Ziyue Yang 477fbb0ad2
Benchmarks: Fix bug - fix bug in test_executor.py to test default enabled tests only (#133)
**Description**
Fix bug of tests/executor/test_executor.py.

**Major Revision**
- Test default enabled benchmarks only instead of all benchmarks.
2021-07-20 20:11:08 +08:00
Ziyue Yang 4bbd7f513d
Benchmarks: Build Pipeline - Add FIO benchmark tool (#127)
**Description**
Add FIO benchmark tool into third-party dependency.

**Major Revision**
- Add FIO submodule into third-party directory and modify Makefile to enable it.
2021-07-19 12:59:28 +08:00
Yuting Jiang 419dea265a
Benchmarks: Build Pipeline - Add perftest as a submodule and add build logic (#129)
Add perftest as a submodule and add build logic
2021-07-16 16:11:07 +08:00
Yuting Jiang 8c8beb4b04
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic. (#128)
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic.
2021-07-16 14:51:56 +08:00
Yuting Jiang 9547ccc19a
Benchmarks: Fix bug - fix bug of third_party/cuda-samples git checkout issue when building docker (#126)
* fix bug in docker build of third_party/cuda-samples
2021-07-15 16:41:57 +08:00
Yuting Jiang f9550bd693
Benchmarks: Add Benchmark - Add memory bandwidth benchmark for cuda. (#114)
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile
2021-07-13 17:30:19 +08:00
Yuting Jiang 71c1617b2e
Utils: Code Revision - Update network common utils (#118)
Update network common utils. Add get_ib_devices in network common utils and move get_free_port from test utils to network common utils
2021-07-13 16:05:01 +08:00
guoshzhao 9c984c7eb0
Bug bash - Merge fix from release/0.2 to main (#124)
* Bug Fix - Fix race condition issue for multi ranks (#117)

Fix race condition issue when multi ranks rotating the same directory.

* Update pipeline for release branch (#122)

* Bug Fix - Fix bug when convert bool config to store_true argument. (#120)

Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
2021-07-09 16:54:42 +08:00
guoshzhao 05d70537c4
Benchmarks: Add Configuration - Add validation config file for AMD MI100 (#121)
* add validation config file for AMD MI100
2021-07-09 10:09:59 +08:00
Yifan Xiong 7458f83a9b
Runner & Executor - Support AMD GPU (#119)
Support both NVIDIA and AMD GPU and check GPU vendor during deployment and execution.

* Add GPU environment check in sb deploy.
* Check GPU vendor in executor.
2021-07-09 00:42:49 +08:00
Yifan Xiong 43620c3f46
Docs - Update README and version for v0.2.0 release (#111)
Update README and version for v0.2 release.
2021-07-03 00:18:30 +08:00
Yifan Xiong fb7d4a7396
Runner - Fetch benchmarks results on all nodes (#116)
Fetch benchmarks results on all nodes, will rsync after each benchmark.
The results directory structure on control node is as follows:

```
outputs/
└── datetime
    ├── nodes
    │   └── node-0
    │       ├── benchmarks
    │       │   ├── benchmark-0
    │       │   │   ├── rank-0
    │       │   │   │   └── results.json
    │       └── sb-exec.log
    ├── sb-run.log
    └── sb.config.yaml
```
2021-07-02 21:45:56 +08:00
Yifan Xiong 60ba63bb11
CLI - Support host-list for deploy and run commands (#108)
Support `--host-list` for deploy and run commands.

Before this change, an inventory file is needed to use `sb deploy/run`.
Now, `--host-list localhost` or `-l localhost` is sufficient for quick try.
2021-07-01 21:48:36 +08:00
Yifan Xiong 7b0b0e9add
CLI - Support custom output directory (#110)
* Support custom output directory.
* Update document.
2021-07-01 21:10:12 +08:00
TobeyQin 2710fad5de
Docs - Release Note and Introduction (#107)
* Add introduction and release documents.
* Fix some typos in documents.
2021-06-30 11:17:31 +08:00
guoshzhao 1e96c27e39
Benchmarks: Fix Bug - Fix typo in gemm-flops benchmark. (#109) 2021-06-30 10:26:25 +08:00
guoshzhao 8ffaddfaef
Benchmarks: Fix Bug - Fix gemm kernel bug for nvidia v100. (#105)
* fix bug for nvidia v100
* hard code the supported dict for different arch.
2021-06-29 18:46:44 +08:00
guoshzhao f22bb3f219
Benchmarks: Add Configuration - Add validation config file for azure NDv4. (#103)
* add config file for ndv4.
2021-06-28 16:39:58 +08:00
guoshzhao 9c7485276b
Benchmarks: Code Revision - Replace torch.optim.AdamW with transformers.AdamW. (#106)
* replace torch.optim.AdamW with transformers.AdamW.
2021-06-28 15:24:39 +08:00
guoshzhao 05e449a3fe
Bug bash - Fix ambiguous type check in executor. (#104) 2021-06-28 11:07:18 +08:00