474 строки
13 KiB
Plaintext
474 строки
13 KiB
Plaintext
---
|
|
id: superbench-config
|
|
---
|
|
|
|
import Tabs from '@theme/Tabs';
|
|
import TabItem from '@theme/TabItem';
|
|
|
|
|
|
# SuperBench Config File
|
|
|
|
[YAML](https://yaml.org/spec/1.2/spec.html) format configuration file is an efficient method to take full advantage of SuperBench.
|
|
You can put it in any place and specify the path to config file through `-c /path/to/config.yaml` in `sb` CLI.
|
|
|
|
This document covers schema of SuperBench configuration YAML file.
|
|
You can learn YAML basics from [Learn YAML in Y minutes](https://learnxinyminutes.com/docs/yaml/).
|
|
SuperBench configuration supports most of the YAML features, including anchors and aliases, merge key, etc.
|
|
|
|
## Conventions
|
|
|
|
Here lists syntax conventions used in this document:
|
|
|
|
* The schema and example are in YAML format.
|
|
|
|
* In YAML mappings which use a colon `:` to mark `key: value` pair.
|
|
|
|
The left side of colon is a literal keyword defined in configuration,
|
|
if it is surrounded by `${}`, like `${name}`, then the key is a string that can be defined by user.
|
|
|
|
The right side of colon is a data type, which may be Python built-in types (like `string`, `dict`),
|
|
or a rich structure defined in this document (first character capitalized).
|
|
|
|
* The notation `[ datatype ]` indicates a YAML sequence of the mentioned data type.
|
|
For example, `[ string ]` is a list of strings.
|
|
|
|
* The notation `|` indicates there are multiple optional data types.
|
|
For example, `string | [ string ]` means either a string or a list of strings is allowed.
|
|
|
|
## Configuration Schema
|
|
|
|
The configuration file describes all benchmarks running by SuperBench.
|
|
There will be one or more benchmarks, each benchmark has its own settings and parameters.
|
|
One benchmark may have one or more modes, which indicates how to run benchmarks in all given machines.
|
|
|
|
Here is an overview of SuperBench configuration structure:
|
|
|
|
<Tabs
|
|
defaultValue='schema'
|
|
values={[
|
|
{label: 'Schema', value: 'schema'},
|
|
{label: 'Example', value: 'example'},
|
|
]
|
|
}>
|
|
<TabItem value='schema'>
|
|
|
|
```yaml
|
|
version: string
|
|
superbench:
|
|
enable: string | [ string ]
|
|
monitor:
|
|
enable: bool
|
|
sample_duration: int
|
|
sample_interval: int
|
|
var:
|
|
${var_name}: dict
|
|
benchmarks:
|
|
${benchmark_name}: Benchmark
|
|
```
|
|
|
|
</TabItem>
|
|
<TabItem value='example'>
|
|
|
|
```yaml
|
|
version: v0.10
|
|
superbench:
|
|
enable: benchmark_1
|
|
monitor:
|
|
enable: false
|
|
sample_duration: 10
|
|
sample_interval: 1
|
|
var:
|
|
var_1: value
|
|
benchmarks:
|
|
benchmark_1:
|
|
enable: true
|
|
modes:
|
|
- name: local
|
|
```
|
|
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
### `version`
|
|
|
|
Version of the configuration file.
|
|
Lower version `sb` CLI may not understand higher version config.
|
|
|
|
### `superbench`
|
|
|
|
SuperBench configuration for all benchmarks.
|
|
|
|
### `superbench.enable`
|
|
|
|
Enable which benchmark to run, could be one or multiple benchmarks' name.
|
|
If not specified, will use [`${benchmark_name}.enable`](#enable) in each benchmark as default.
|
|
|
|
* value from: benchmark names defined in `superbench.benchmarks`
|
|
* default value: `null`
|
|
|
|
### `superbench.monitor`
|
|
|
|
Enable monitor to collect system metrics periodically, currently only support CUDA platform. There are three settings:
|
|
|
|
#### `enable`
|
|
|
|
Whether enable the monitor module or not.
|
|
|
|
#### `sample_duration`
|
|
|
|
Calculate the average metrics during sample_duration seconds, such as CPU usage and NIC bandwidth.
|
|
|
|
#### `sample_interval`
|
|
|
|
Do sampling every sample_interval seconds.
|
|
|
|
### `superbench.var`
|
|
|
|
User-defined variables to be used in the configuration.
|
|
Leveraging YAML [anchors and aliases](https://yaml.org/spec/1.2/spec.html#id2765878),
|
|
common settings can be defined here to avoid config duplication.
|
|
|
|
Here is a usage example:
|
|
|
|
```yaml {3-6,11,15}
|
|
superbench:
|
|
var:
|
|
common_param: ¶m
|
|
num_warmup: 16
|
|
num_steps: 128
|
|
batch_size: 128
|
|
benchmarks:
|
|
model-benchmarks:foo:
|
|
models:
|
|
- resnet50
|
|
parameters: *param
|
|
model-benchmarks:bar:
|
|
models:
|
|
- vgg19
|
|
parameters: *param
|
|
```
|
|
|
|
The above configuration equals to the following:
|
|
```yaml {6-9,13-16}
|
|
superbench:
|
|
benchmarks:
|
|
model-benchmarks:foo:
|
|
models:
|
|
- resnet50
|
|
parameters:
|
|
num_warmup: 16
|
|
num_steps: 128
|
|
batch_size: 128
|
|
model-benchmarks:bar:
|
|
models:
|
|
- vgg19
|
|
parameters:
|
|
num_warmup: 16
|
|
num_steps: 128
|
|
batch_size: 128
|
|
```
|
|
|
|
### `superbench.benchmarks`
|
|
|
|
Mappings of `${benchmark_name}: Benchmark`.
|
|
|
|
There are three types of benchmarks,
|
|
[micro-benchmark](./user-tutorial/benchmarks/micro-benchmarks.md),
|
|
[model-benchmark](./user-tutorial/benchmarks/model-benchmarks.md),
|
|
and [docker-benchmark](./user-tutorial/benchmarks/docker-benchmarks.md).
|
|
Each benchmark has its own unique name listed in docs.
|
|
|
|
`${benchmark_name}` can be one of the followings:
|
|
* `${benchmark_unique_name}`, it can be the exact same as benchmark's own unique name;
|
|
* `${benchmark_unique_name}:${annotation}`, or if there's a need to run one benchmark with different settings,
|
|
an annotation separated by `:` can be appended after benchmark's unique name.
|
|
|
|
See [`Benchmark` Schema](#benchmark-schema) for benchmark definition.
|
|
|
|
## `Benchmark` Schema
|
|
|
|
Definition for each benchmark, here is an overview of `Benchmark` configuration structure:
|
|
|
|
<Tabs
|
|
defaultValue='schema'
|
|
values={[
|
|
{label: 'Schema', value: 'schema'},
|
|
{label: 'Example', value: 'example'},
|
|
]
|
|
}>
|
|
<TabItem value='schema'>
|
|
|
|
#### Micro-Benchmark
|
|
|
|
```yaml
|
|
${benchmark_name}:
|
|
enable: bool
|
|
timeout: int
|
|
modes: [ Mode ]
|
|
frameworks: [ enum ]
|
|
parameters:
|
|
run_count: int
|
|
duration: int
|
|
log_raw_data: bool
|
|
${argument}: bool | str | int | float | list
|
|
```
|
|
|
|
#### Model-Benchmark
|
|
|
|
```yaml
|
|
model-benchmarks:${annotation}:
|
|
enable: bool
|
|
timeout: int
|
|
modes: [ Mode ]
|
|
frameworks: [ enum ]
|
|
models: [ enum ]
|
|
parameters:
|
|
run_count: int
|
|
duration: int
|
|
log_raw_data: bool
|
|
num_warmup: int
|
|
num_steps: int
|
|
sample_count: int
|
|
batch_size: int
|
|
precision: [ enum ]
|
|
model_action: [ enum ]
|
|
pin_memory: bool
|
|
${argument}: bool | str | int | float | list
|
|
```
|
|
|
|
</TabItem>
|
|
<TabItem value='example'>
|
|
|
|
#### Micro-Benchmark
|
|
|
|
```yaml
|
|
kernel-launch:
|
|
enable: true
|
|
timeout: 120
|
|
modes:
|
|
- name: local
|
|
proc_num: 8
|
|
prefix: CUDA_VISIBLE_DEVICES={proc_rank}
|
|
parallel: yes
|
|
parameters:
|
|
num_warmup: 100
|
|
num_steps: 2000000
|
|
interval: 2000
|
|
```
|
|
|
|
#### Model-Benchmark
|
|
|
|
```yaml
|
|
model-benchmarks:resnet:
|
|
enable: true
|
|
timeout: 1800
|
|
modes:
|
|
- name: torch.distributed
|
|
proc_num: 8
|
|
node_num: 1
|
|
frameworks:
|
|
- pytorch
|
|
models:
|
|
- resnet50
|
|
- resnet101
|
|
- resnet152
|
|
parameters:
|
|
duration: 0
|
|
num_warmup: 16
|
|
num_steps: 128
|
|
batch_size: 128
|
|
precision:
|
|
- float32
|
|
- float16
|
|
model_action:
|
|
- train
|
|
```
|
|
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
### `enable`
|
|
|
|
Enable current benchmark or not, can be overwritten by [`superbench.enable`](#superbenchenable).
|
|
|
|
* default value: `true`
|
|
|
|
### `timeout`
|
|
|
|
Set the timeout value in seconds, the benchmarking will stop early if timeout is triggered.
|
|
|
|
* default value: none
|
|
|
|
### `modes`
|
|
|
|
A list of modes in which the benchmark runs.
|
|
Currently only one mode is supported for each benchmark.
|
|
|
|
See [`Mode` Schema](#mode-schema) for mode definition.
|
|
|
|
### `frameworks`
|
|
|
|
A list of frameworks in which the benchmark runs.
|
|
Some benchmarks can support multiple frameworks while others only support one.
|
|
|
|
* accepted values: `[ onnxruntime | pytorch | tf1 | tf2 | none ]`
|
|
* default value: `[ none ]`
|
|
|
|
### `models`
|
|
|
|
A list of models to run, only supported in model-benchmark.
|
|
|
|
* accepted values:
|
|
```yaml
|
|
# pytorch framework
|
|
[ alexnet | densenet121 | densenet169 | densenet201 | densenet161 | googlenet | inception_v3 |
|
|
mnasnet0_5 | mnasnet0_75 | mnasnet1_0 | mnasnet1_3 | mobilenet_v2 |
|
|
resnet18 | resnet34 | resnet50 | resnet101 | resnet152 |
|
|
resnext50_32x4d | resnext101_32x8d | wide_resnet50_2 | wide_resnet101_2 |
|
|
shufflenet_v2_x0_5 | shufflenet_v2_x1_0 | shufflenet_v2_x1_5 | shufflenet_v2_x2_0 |
|
|
squeezenet1_0 | squeezenet1_1 |
|
|
vgg11 | vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn | vgg19_bn | vgg19 |
|
|
bert-base | bert-large | gpt2-small | gpt2-medium | gpt2-large | gpt2-xl ]
|
|
```
|
|
* default value: `[ ]`
|
|
|
|
### `parameters`
|
|
|
|
Parameters for benchmark to use, varying for different benchmarks.
|
|
|
|
There are four common parameters for all benchmarks:
|
|
* run_count: how many times does user want to run this benchmark, default value is 1.
|
|
* duration: the elapsed time of benchmark in seconds. It can work for all model-benchmark. But for micro-benchmark, benchmark authors should consume it by themselves.
|
|
* log_raw_data: log raw data into file instead of saving it into result object, default value is `False`. Benchmarks who have large raw output may want to set it as `True`, such as `nccl-bw`/`rccl-bw`.
|
|
* log_flushing: real-time log flushing, default value is `False`.
|
|
|
|
For Model-Benchmark, there are some parameters that can control the elapsed time.
|
|
* duration: the elapsed time of benchmark in seconds.
|
|
* num_warmup: the number of warmup steps, should be positive integer.
|
|
* num_steps: the number of test steps.
|
|
|
|
If `duration > 0` and `num_steps > 0`, then benchmark will take the least as the elapsed time. Otherwise only one of them will take effect.
|
|
|
|
## `Mode` Schema
|
|
|
|
Definition for each benchmark mode, here is an overview of `Mode` configuration structure:
|
|
|
|
<Tabs
|
|
defaultValue='schema'
|
|
values={[
|
|
{label: 'Schema', value: 'schema'},
|
|
{label: 'Example', value: 'example'},
|
|
]
|
|
}>
|
|
<TabItem value='schema'>
|
|
|
|
```yaml
|
|
name: enum
|
|
proc_num: int
|
|
node_num: int
|
|
env: dict
|
|
mca: dict
|
|
prefix: str
|
|
parallel: bool
|
|
```
|
|
|
|
</TabItem>
|
|
<TabItem value='example'>
|
|
|
|
```yaml
|
|
name: local
|
|
proc_num: 8
|
|
prefix: CUDA_VISIBLE_DEVICES={proc_rank}
|
|
parallel: yes
|
|
```
|
|
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
### `name`
|
|
|
|
Mode name to use. Here lists available modes:
|
|
+ `local`: run benchmark as local process.
|
|
+ `torch.distributed`: launch benchmark through [PyTorch DDP](https://pytorch.org/docs/stable/distributed.html#launch-utility), each process will run on one GPU.
|
|
+ `mpi`: launch benchmark through MPI, the benchmark implementation could leverage MPI interface.
|
|
|
|
Some attributes may only be suitable for specific mode.
|
|
|
|
| | `local` | `torch.distributed` | `mpi` |
|
|
| ---------: | :-----: | :-----------------: | :---: |
|
|
| `proc_num` | ✓ | ✓ | ✓ |
|
|
| `node_num` | ✘ | ✓ | ✓ |
|
|
| `prefix` | ✓ | ✘ | ✘ |
|
|
| `env` | ✓ | ✓ | ✓ |
|
|
| `mca` | ✘ | ✘ | ✓ |
|
|
| `parallel` | ✓ | ✘ | ✘ |
|
|
| `pattern` | ✘ | ✘ | ✓ |
|
|
|
|
* accepted values: `local | torch.distributed | mpi`
|
|
* default value: `local`
|
|
|
|
### `proc_num`
|
|
|
|
Process number to run per node.
|
|
Each process will run an individual benchmark, how processes communicate depends on the mode.
|
|
|
|
* default value: `1`
|
|
|
|
### `node_num`
|
|
|
|
Node number to run in the mode. Defaults to all nodes provided by host file in the run.
|
|
Will be ignored in `local` mode.
|
|
|
|
For example, assuming you are running model benchmark on 4 nodes,
|
|
`proc_num: 8, node_num: 1` will run 8-GPU distributed training on each node,
|
|
while `proc_num: 8, node_num: null` will run 32-GPU distributed training on all nodes.
|
|
|
|
* default value: `null`
|
|
|
|
### `prefix`
|
|
|
|
Command prefix to use in the mode, in Python formatted string.
|
|
|
|
Available variables in formatted string include:
|
|
+ `proc_rank`
|
|
+ `proc_num`
|
|
|
|
So `prefix: CUDA_VISIBLE_DEVICES={proc_rank}` will be expressed as `CUDA_VISIBLE_DEVICES=0`, `CUDA_VISIBLE_DEVICES=1`, etc.
|
|
|
|
### `env`
|
|
|
|
Environment variables to use in the mode, in a flatten key-value dictionary.
|
|
The value needs to be string, any integer or boolean values need to be enclosed in quotes.
|
|
|
|
Formatted string is also supported in value, available variables include:
|
|
+ `proc_rank`
|
|
+ `proc_num`
|
|
|
|
### `mca`
|
|
|
|
MCA (Modular Component Architecture) frameworks, components, or modules to use in MPI,
|
|
in a flatten key-value dictionary.
|
|
Only available for `mpi` mode.
|
|
|
|
### `parallel`
|
|
|
|
Whether run benchmarks in parallel (all ranks at the same time) or in sequence (one rank at a time).
|
|
Only available for `local` mode.
|
|
|
|
* default value: `yes`
|
|
|
|
### `pattern`
|
|
|
|
Pattern variables to run benchmarks with nodes in specified traffic pattern combination, in a flatten key-value dictionary.
|
|
Only available for `mpi` mode.
|
|
|
|
Available variables in formatted string includes:
|
|
+ `type(str)`: the traffic pattern type, required.
|
|
* accepted values: `all-nodes`, `pair-wise`, `k-batch`, `topo-aware`
|
|
+ `mpi_pattern(bool)`: generate pattern config file in `./output/mpi_pattern.txt` for diagnosis, required.
|
|
+ `batch(int)`: the scale of batch, required in `k-batch` pattern.
|
|
+ `ibstat(str)`: the path of ibstat output, wil be auto-generated in `./output/ibstat_file.txt` if not specified, optional in `topo-aware` pattern
|
|
+ `ibnetdiscover(str)`: the path of ibnetdiscover output `ibnetdiscover_file.txt`, required in `topo-aware` pattern.
|
|
+ `min_dist(int)`: minimum distance of VM pair, required in `topo-aware` pattern.
|
|
+ `max_dist(int)`: maximum distance of VM pair, required in `topo-aware` pattern.
|