Docs - Add usage for result summary (#337)
**Description** Add usage for result summary.
This commit is contained in:
Родитель
f15da60b2b
Коммит
56c9a711a8
|
@ -0,0 +1,124 @@
|
|||
---
|
||||
id: result-summary
|
||||
---
|
||||
|
||||
# Result Summary
|
||||
|
||||
## Introduction
|
||||
|
||||
This tool is to generate a readable summary report based on the raw benchmark results of single or multiple machines.
|
||||
|
||||
## Usage
|
||||
|
||||
1. [Install SuperBench](../getting-started/installation) on the local machine.
|
||||
|
||||
2. Prepare the raw data and rule file on the local machine.
|
||||
|
||||
3. Generate the result summary automatically using `sb result summary` command. The detailed command can be found from [SuperBench CLI](../cli).
|
||||
|
||||
```bash
|
||||
sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir}
|
||||
```
|
||||
|
||||
4. Find the output result file named 'results_summary.md' under ${output_dir}.
|
||||
|
||||
## Input
|
||||
|
||||
The input includes 2 files:
|
||||
|
||||
|
||||
|
||||
- **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner.
|
||||
|
||||
:::tip Tips
|
||||
Raw data file can be found at ${output-dir}/results-summary.jsonl after each successful run.
|
||||
:::
|
||||
|
||||
- **Rule File**: It uses YAML format and defines how to generate the result summary including how to classify the metrics and what statistical methods (P50, mean, etc.) are applied.
|
||||
|
||||
### Rule File
|
||||
|
||||
This section describes how to write rules in **rule file**.
|
||||
|
||||
The convention is the same as [SuperBench Config File](../superbench-config), please view it first.
|
||||
|
||||
Here is an overview of the rule file structure:
|
||||
|
||||
```yaml title="Scheme"
|
||||
version: string
|
||||
superbench:
|
||||
rules:
|
||||
${rule_name}:
|
||||
statistics:
|
||||
- ${statistic_name}
|
||||
categories: string
|
||||
aggregate: (optional)[bool|string]
|
||||
metrics:
|
||||
- ${benchmark_name}/regex
|
||||
- ${benchmark_name}/regex
|
||||
```
|
||||
|
||||
```yaml title="Example"
|
||||
# SuperBench rules
|
||||
version: v0.4
|
||||
superbench:
|
||||
rules:
|
||||
kernel_launch:
|
||||
statistics:
|
||||
- mean
|
||||
- p90
|
||||
- min
|
||||
- max
|
||||
aggregate: True
|
||||
categories: KernelLaunch
|
||||
metrics:
|
||||
- kernel-launch/event_overhead
|
||||
- kernel-launch/wall_overhead
|
||||
nccl:
|
||||
statistics: mean
|
||||
categories: NCCL
|
||||
metrics:
|
||||
- nccl-bw/allreduce_8388608_busbw
|
||||
ib-loopback:
|
||||
statistics: mean
|
||||
categories: RDMA
|
||||
metrics:
|
||||
- ib-loopback/IB_write_8388608_Avg_\d+
|
||||
aggregate: ib-loopback/IB_write_.*_Avg_(\d+)
|
||||
```
|
||||
|
||||
This rule file describes the rules used for the result summary.
|
||||
|
||||
They are organized by the rule name and each rule mainly includes several elements:
|
||||
|
||||
#### `metrics`
|
||||
|
||||
The list of metrics for this rule. Each metric is in the format of ${benchmark_name}/regex, you can use regex after the first '/', but to be noticed, the benchmark name can not be a regex.
|
||||
|
||||
#### `categories`
|
||||
|
||||
User-defined category name in string belongs to the rule, which is used to classify and organize the metrics.
|
||||
|
||||
#### `aggregate`
|
||||
|
||||
This item is used to determine whether to aggregate the benchmark results from multiple devices to treat them as one collection.
|
||||
For example, aggregate the results of kernel-launch overhead from 8 GPU devices into one collection.
|
||||
|
||||
The value of this item should be bool or pattern string with regex:
|
||||
|
||||
- bool:
|
||||
- `False`(default): if no aggregation.
|
||||
- `True`: aggregate the results of multiple ranks. In detail, the metric names in `metrics` like 'metric:\\d+' will be aggregated and turned into 'metric' for most microbenchmark metrics.
|
||||
- pattern string with regex: aggregate the results using the pattern string, which is used to match the metric names in `metrics`. In detail, the part of the metric that matches the contents of () in the pattern string will be turned into *, other parts of the metric remain unchanged.
|
||||
|
||||
#### `statistics`
|
||||
|
||||
A list of statistical functions is used for this rule to get the results statistics from multiple nodes/ranks.
|
||||
|
||||
The following illustrates all statistical functions:
|
||||
- `count`
|
||||
- `max`
|
||||
- `mean`
|
||||
- `min`
|
||||
- `p${value}`: ${value} can be 1-99. For example, p50, p90, etc.
|
||||
- `std`
|
|
@ -32,6 +32,7 @@ module.exports = {
|
|||
},
|
||||
'user-tutorial/system-config',
|
||||
'user-tutorial/data-diagnosis',
|
||||
'user-tutorial/result-summary',
|
||||
'user-tutorial/monitor',
|
||||
'user-tutorial/container-images',
|
||||
],
|
||||
|
|
Загрузка…
Ссылка в новой задаче