diff --git a/docs/user-tutorial/result-summary.md b/docs/user-tutorial/result-summary.md new file mode 100644 index 00000000..83d84cd9 --- /dev/null +++ b/docs/user-tutorial/result-summary.md @@ -0,0 +1,124 @@ +--- +id: result-summary +--- + +# Result Summary + +## Introduction + +This tool is to generate a readable summary report based on the raw benchmark results of single or multiple machines. + +## Usage + +1. [Install SuperBench](../getting-started/installation) on the local machine. + +2. Prepare the raw data and rule file on the local machine. + +3. Generate the result summary automatically using `sb result summary` command. The detailed command can be found from [SuperBench CLI](../cli). + + ```bash + sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir} + ``` + +4. Find the output result file named 'results_summary.md' under ${output_dir}. + +## Input + +The input includes 2 files: + + + +- **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner. + +:::tip Tips +Raw data file can be found at ${output-dir}/results-summary.jsonl after each successful run. +::: + +- **Rule File**: It uses YAML format and defines how to generate the result summary including how to classify the metrics and what statistical methods (P50, mean, etc.) are applied. + +### Rule File + +This section describes how to write rules in **rule file**. + +The convention is the same as [SuperBench Config File](../superbench-config), please view it first. + +Here is an overview of the rule file structure: + +```yaml title="Scheme" +version: string +superbench: + rules: + ${rule_name}: + statistics: + - ${statistic_name} + categories: string + aggregate: (optional)[bool|string] + metrics: + - ${benchmark_name}/regex + - ${benchmark_name}/regex +``` + +```yaml title="Example" +# SuperBench rules +version: v0.4 +superbench: + rules: + kernel_launch: + statistics: + - mean + - p90 + - min + - max + aggregate: True + categories: KernelLaunch + metrics: + - kernel-launch/event_overhead + - kernel-launch/wall_overhead + nccl: + statistics: mean + categories: NCCL + metrics: + - nccl-bw/allreduce_8388608_busbw + ib-loopback: + statistics: mean + categories: RDMA + metrics: + - ib-loopback/IB_write_8388608_Avg_\d+ + aggregate: ib-loopback/IB_write_.*_Avg_(\d+) +``` + +This rule file describes the rules used for the result summary. + +They are organized by the rule name and each rule mainly includes several elements: + +#### `metrics` + +The list of metrics for this rule. Each metric is in the format of ${benchmark_name}/regex, you can use regex after the first '/', but to be noticed, the benchmark name can not be a regex. + +#### `categories` + +User-defined category name in string belongs to the rule, which is used to classify and organize the metrics. + +#### `aggregate` + +This item is used to determine whether to aggregate the benchmark results from multiple devices to treat them as one collection. +For example, aggregate the results of kernel-launch overhead from 8 GPU devices into one collection. + +The value of this item should be bool or pattern string with regex​: + +- bool: + - `False`(default): if no aggregation. + - `True`: aggregate the results of multiple ranks. In detail, the metric names in `metrics` like 'metric:\\d+' will be aggregated and turned into 'metric' for most microbenchmark metrics. +- pattern string with regex: aggregate the results using the pattern string, which is used to match the metric names in `metrics`. In detail, the part of the metric that matches the contents of () in the pattern string will be turned into *, other parts of the metric remain unchanged. + +#### `statistics` + +A list of statistical functions is used for this rule to get the results statistics from multiple nodes/ranks. + +The following illustrates all statistical functions: +- `count` +- `max` +- `mean` +- `min` +- `p${value}`: ${value} can be 1-99. For example, p50, p90, etc. +- `std` diff --git a/website/sidebars.js b/website/sidebars.js index 3501d450..391b150a 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -32,6 +32,7 @@ module.exports = { }, 'user-tutorial/system-config', 'user-tutorial/data-diagnosis', + 'user-tutorial/result-summary', 'user-tutorial/monitor', 'user-tutorial/container-images', ],