Docs - Update SuperBench documents (#101)

Update SuperBench documents.
2021-06-25 13:40:14 +08:00 · 2021-06-25 13:40:14 +08:00 · 832e392f91
--- a/.editorconfig
+++ b/.editorconfig
@ -11,6 +11,9 @@ insert_final_newline = true
 [*.py]
 max_line_length = 120

+[*.{js,jsx,ts,tsx,md,mdx,css}]
+indent_size = 2
+
 [*.{yml,yaml}]
 indent_size = 2

--- a/README.md
+++ b/README.md
@ -1,4 +1,4 @@
-# SuperBenchmark
+# SuperBench

 [![MIT licensed](https://img.shields.io/badge/license-MIT-brightgreen.svg)](LICENSE)
 [![Lint](https://github.com/microsoft/superbenchmark/workflows/Lint/badge.svg)](https://github.com/microsoft/superbenchmark/actions?query=workflow%3ALint)
@ -9,304 +9,9 @@
 | cpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/microsoft.superbenchmark?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main) |
 | gpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |

+__SuperBench__ is a validation and profiling tool for AI infrastructure.

-**SuperBench** is a validation and profiling tool for AI infrastructure, which supports:
-
-* AI infrastructure validation and diagnosis
-    * Distributed validation tools to validate hundreds or thousands of servers automatically
-    * Consider both raw hardware and E2E model performance with ML workload patterns
-    * Build a contract to identify hardware issues
-    * Provide infrastructural-oriented criteria as Performance/Quality Gates for hardware and system release
-    * Provide detailed performance report and advanced analysis tool  
-* AI workload benchmarking and profiling
-    * Provide comprehensive performance comparison between different existing hardware
-    * Provide insights for hardware and software co-design
-
-It includes micro-benchmark for primitive computation and communication benchmarking,
-and model-benchmark to measure domain-aware end-to-end deep learning workloads.
-
-> 🔴 __Note__:
-SuperBench is in the early pre-alpha stage for open source, and not ready for general public yet.
-If you want to jump in early, you can try building latest code yourself.
-
-## SuperBench capabilities, workflow and benchmarking metrics
-
-The following graphic shows the capabilities provide by SuperBench core framework and its extension.
-
-<img src="imgs/superbench_structure.png">
-
-Benchmarking metrics provided by SuperBench are listed as below.
-
-<table>
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-      </td>
-      <td>
-        <b>Micro Benchmark</b>
-        <img src="imgs/bar.png"/>
-      </td>
-      <td>
-        <b>Model Benchmark</b>
-        <img src="imgs/bar.png"/>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td align="center" valign="middle">
-        <b>Metrics</b>
-      </td>
-      <td>
-        <ul><li><b>Computation Benchmark</b></li>
-          <ul><li><b>Kernel Performance</b></li>
-            <ul>
-              <li>GFLOPS</li>
-              <li>TensorCore</li>
-              <li>cuBLAS</li>
-              <li>cuDNN</li>
-            </ul>
-          </ul>
-          <ul><li><b>Kernel Launch Time</b></li>
-            <ul>
-              <li>Kernel_Launch_Event_Time</li>
-              <li>Kernel_Launch_Wall_Time</li>
-            </ul>
-          </ul>
-          <ul><li><b>Operator Performance</b></li>
-            <ul><li>MatMul</li><li>Sharding_MatMul</li></ul>
-          </ul>
-          <ul><li><b>Memory</b></li>
-            <ul><li>H2D_Mem_BW_&lt;GPU ID&gt;</li>
-              <li>H2D_Mem_BW_&lt;GPU ID&gt;</li></ul>
-          </ul>
-        </ul>
-        <ul><li><b>Communication Benchmark</b></li>
-          <ul><li><b>Device P2P Bandwidth</b></li>
-            <ul><li>P2P_BW_Max</li><li>P2P_BW_Min</li><li>P2P_BW_Avg</li></ul>
-          </ul>
-          <ul><li><b>RDMA</b></li>
-            <ul><li>RDMA_Peak</li><li>RDMA_Avg</li></ul>
-          </ul>
-          <ul><li><b>NCCL</b></li>
-            <ul><li>NCCL_AllReduce</li></ul>
-            <ul><li>NCCL_AllGather</li></ul>
-            <ul><li>NCCL_broadcast</li></ul>
-            <ul><li>NCCL_reduce</li></ul>
-            <ul><li>NCCL_reduce_scatter</li></ul>
-          </ul>
-        </ul>
-        <ul><li><b>Computation-Communication Benchmark</b></li>
-          <ul><li><b>Mul_During_NCCL</b></li><li><b>MatMul_During_NCCL</b></li></ul>
-        </ul>
-        <ul><li><b>Storage Benchmark</b></li>
-          <ul><li><b>Disk</b></li>
-            <ul>
-              <li>Read/Write</li><li>Rand_Read/Rand_Write</li>
-              <li>R/W_Read</li><li>R/W_Write</li><li>Rand_R/W_Read</li><li>Rand_R/W_Write</li>
-            </ul>
-          </ul>
-        </ul>   
-      </td>
-      <td>
-        <ul><li><b>CNN models</b></li>
-          <ul>
-            <li><b>ResNet</b></li>
-              <ul><li>ResNet-50</li><li>ResNet-101</li><li>ResNet-152</li></ul>
-          </ul>
-          <ul>
-            <li><b>DenseNet</b></li>
-              <ul><li>DenseNet-169</li><li>DenseNet-201</li></ul>
-          </ul>
-          <ul>
-            <li><b>VGG</b></li>
-              <ul><li>VGG-11</li><li>VGG-13</li><li>VGG-16</li><li>VGG-19</li></ul>
-          </ul>
-          <ul><li><b>Other CNN models</b></li><ul><li>...</li></ul></ul>
-        </ul>  
-        <ul><li><b>BERT models</b></li>
-          <ul><li><b>BERT</b></li><li><b>BERT_LARGE</b></li></ul>
-        </ul>
-        <ul><li><b>LSTM</b></li></ul>
-        <ul><li><b>GPT-2</b></li></ul>
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-
-## Installation
-
-### Using Docker (_Preferred_)
-
-__System Requirements__
-
-* Platform: Ubuntu 18.04 or later (64-bit)
-* Docker: Docker CE 19.03 or later
-
-__Install SuperBench__
-
-* Using Pre-Build Images
-
-    ```sh
-    docker pull superbench/superbench:dev-cuda11.1.1
-    docker run -it --rm \
-        --privileged --net=host --ipc=host --gpus=all \
-        superbench/superbench:dev-cuda11.1.1 bash
-    ```
-
-* Building the Image
-
-    ```sh
-    docker build -f dockerfile/cuda11.1.1.dockerfile -t superbench/superbench:dev .
-    ```
-
-### Using Python
-
-__System Requirements__
-
-* Platform: Ubuntu 18.04 or later (64-bit); Windows 10 (64-bit) with WSL2
-* Python: Python 3.6 or later, pip 18.0 or later
-
-    Check whether Python environment is already configured:
-    ```sh
-    # check Python version
-    python3 --version
-    # check pip version
-    python3 -m pip --version
-    ```
-    If not, install the followings:
-    * [Python](https://www.python.org/)
-    * [pip](https://pip.pypa.io/en/stable/installing/)
-    * [venv](https://docs.python.org/3/library/venv.html)
-
-    It's recommended to use a virtual environment (optional):
-    ```sh
-    # create a new virtual environment
-    python3 -m venv --system-site-packages ./venv
-    # activate the virtual environment
-    source ./venv/bin/activate
-
-    # exit the virtual environment later
-    # after you finish running superbench
-    deactivate
-    ```
-
-__Install SuperBench__
-
-* PyPI Binary
-
-    ```sh
-    # not available yet
-    ```
-
-* From Source
-
-    ```sh
-    # get source code
-    git clone https://github.com/microsoft/superbenchmark
-    cd superbenchmark
-
-    # install superbench
-    python3 -m pip install .
-    make postinstall
-    ```
-
-
-## Usage
-
-### Run SuperBench
-
-```sh
-# run benchmarks in default settings
-sb exec
-
-# use a custom config
-sb exec --config-file ./superbench/config/default.yaml
-```
-
-### Benchmark Gallary
-
-Please find more benchmark examples [here](examples/benchmarks/).
-
-
-## Developer Guide
-
-If you want to develop new feature, please follow below steps to set up development environment.
-
-### Check Environment
-
-Follow __[System Requirements](#using-python)__.
-
-### Set Up
-
-```sh
-# get latest code
-git clone https://github.com/microsoft/superbenchmark
-cd superbenchmark
-
-# install superbench
-python3 -m pip install -e .[dev,test]
-```
-
-### Lint and Test
-
-```sh
-# format code using yapf
-python3 setup.py format
-
-# check code style with mypy and flake8
-python3 setup.py lint
-
-# run all unit tests
-python3 setup.py test
-```
-
-### Submit a Pull Request
-
-Please install `pre-commit` before `git commit` to run all pre-checks.
-
-```sh
-pre-commit install
-```
-
-Open a pull request to main branch on GitHub.
-
-
-## Contributing
-
-### Contributor License Agreement
-
-This project welcomes contributions and suggestions.  Most contributions require you to agree to a
-Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
-
-When you submit a pull request, a CLA bot will automatically determine whether you need to provide
-a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
-provided by the bot. You will only need to do this once across all repos using our CLA.
-
-This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
-For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
-contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
-
-### Contributing principles
-
-SuperBenchmark is an open-source project. Your participation and contribution are highly appreciated. There are several important things you need know before contributing to this project:
-
-#### What content can be added to SuperBenchmark
-
-1. Bug fixes for existing features.
-2. New features for benchmark module (micro-benchmark, model-benchmark, etc.)
-
-   If you would like to contribute a new feature on SuperBenchmark, please submit your proposal first. In [GitHub Issues](https://github.com/microsoft/superbenchmark/issues) module, choose `Enhancement Request` to finish the submission. If the proposal is accepted, you can submit pull requests to origin main branch.
-
-#### Contribution steps
-
-If you would like to contribute to the project, please follow below steps of joint development on GitHub.
-
-1. `Fork` the repo first to your personal GitHub account.
-2. Checkout from main branch for feature development.
-3. When you finish the feature, please fetch the latest code from origin repo, merge to your branch and resolve conflict.
-4. Submit pull requests to origin main branch.
-5. Please note that there might be comments or questions from reviewers. It will need your help to update the pull request.
+_Check SuperBench website for more details._

 ## Trademarks

--- a/docs/assets/bar.png
+++ b/docs/assets/bar.png
--- a/docs/assets/superbench_structure.png
+++ b/docs/assets/superbench_structure.png
--- a/docs/benchmarks/micro-benchmarks.md
+++ b/docs/benchmarks/micro-benchmarks.md
@ -0,0 +1,113 @@
+---
+id: micro-benchmarks
+---
+
+# Micro Benchmarks
+
+## Benchmarking list
+
+### Computation benchmark
+
+### Communication benchmark
+
+### Computation-communication benchmark
+
+### Storage benchmark
+
+
+## Benchmarking metrics
+
+<table>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+      </td>
+      <td>
+        <b>Micro Benchmark</b>
+        <img src={require('../assets/bar.png').default}/>
+      </td>
+      <td>
+        <b>Model Benchmark</b>
+        <img src={require('../assets/bar.png').default}/>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td align="center" valign="middle">
+        <b>Metrics</b>
+      </td>
+      <td>
+        <ul><li><b>Computation Benchmark</b></li>
+          <ul><li><b>Kernel Performance</b></li>
+            <ul>
+              <li>GFLOPS</li>
+              <li>TensorCore</li>
+              <li>cuBLAS</li>
+              <li>cuDNN</li>
+            </ul>
+          </ul>
+          <ul><li><b>Kernel Launch Time</b></li>
+            <ul>
+              <li>Kernel_Launch_Event_Time</li>
+              <li>Kernel_Launch_Wall_Time</li>
+            </ul>
+          </ul>
+          <ul><li><b>Operator Performance</b></li>
+            <ul><li>MatMul</li><li>Sharding_MatMul</li></ul>
+          </ul>
+          <ul><li><b>Memory</b></li>
+            <ul><li>H2D_Mem_BW_&lt;GPU ID&gt;</li>
+              <li>H2D_Mem_BW_&lt;GPU ID&gt;</li></ul>
+          </ul>
+        </ul>
+        <ul><li><b>Communication Benchmark</b></li>
+          <ul><li><b>Device P2P Bandwidth</b></li>
+            <ul><li>P2P_BW_Max</li><li>P2P_BW_Min</li><li>P2P_BW_Avg</li></ul>
+          </ul>
+          <ul><li><b>RDMA</b></li>
+            <ul><li>RDMA_Peak</li><li>RDMA_Avg</li></ul>
+          </ul>
+          <ul><li><b>NCCL</b></li>
+            <ul><li>NCCL_AllReduce</li></ul>
+            <ul><li>NCCL_AllGather</li></ul>
+            <ul><li>NCCL_broadcast</li></ul>
+            <ul><li>NCCL_reduce</li></ul>
+            <ul><li>NCCL_reduce_scatter</li></ul>
+          </ul>
+        </ul>
+        <ul><li><b>Computation-Communication Benchmark</b></li>
+          <ul><li><b>Mul_During_NCCL</b></li><li><b>MatMul_During_NCCL</b></li></ul>
+        </ul>
+        <ul><li><b>Storage Benchmark</b></li>
+          <ul><li><b>Disk</b></li>
+            <ul>
+              <li>Read/Write</li><li>Rand_Read/Rand_Write</li>
+              <li>R/W_Read</li><li>R/W_Write</li><li>Rand_R/W_Read</li><li>Rand_R/W_Write</li>
+            </ul>
+          </ul>
+        </ul>
+      </td>
+      <td>
+        <ul><li><b>CNN models</b></li>
+          <ul>
+            <li><b>ResNet</b></li>
+              <ul><li>ResNet-50</li><li>ResNet-101</li><li>ResNet-152</li></ul>
+          </ul>
+          <ul>
+            <li><b>DenseNet</b></li>
+              <ul><li>DenseNet-169</li><li>DenseNet-201</li></ul>
+          </ul>
+          <ul>
+            <li><b>VGG</b></li>
+              <ul><li>VGG-11</li><li>VGG-13</li><li>VGG-16</li><li>VGG-19</li></ul>
+          </ul>
+          <ul><li><b>Other CNN models</b></li><ul><li>...</li></ul></ul>
+        </ul>
+        <ul><li><b>BERT models</b></li>
+          <ul><li><b>BERT</b></li><li><b>BERT_LARGE</b></li></ul>
+        </ul>
+        <ul><li><b>LSTM</b></li></ul>
+        <ul><li><b>GPT-2</b></li></ul>
+      </td>
+    </tr>
+  </tbody>
+</table>
--- a/docs/benchmarks/model-benchmarks.md
+++ b/docs/benchmarks/model-benchmarks.md
@ -0,0 +1,113 @@
+---
+id: model-benchmarks
+---
+
+# Model Benchmarks
+
+## Benchmarking list
+
+### GPT-2 models
+
+### BERT models
+
+### LSTM models
+
+### CNN models
+
+
+## Benchmarking metrics
+
+<table>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+      </td>
+      <td>
+        <b>Micro Benchmark</b>
+        <img src={require('../assets/bar.png').default}/>
+      </td>
+      <td>
+        <b>Model Benchmark</b>
+        <img src={require('../assets/bar.png').default}/>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td align="center" valign="middle">
+        <b>Metrics</b>
+      </td>
+      <td>
+        <ul><li><b>Computation Benchmark</b></li>
+          <ul><li><b>Kernel Performance</b></li>
+            <ul>
+              <li>GFLOPS</li>
+              <li>TensorCore</li>
+              <li>cuBLAS</li>
+              <li>cuDNN</li>
+            </ul>
+          </ul>
+          <ul><li><b>Kernel Launch Time</b></li>
+            <ul>
+              <li>Kernel_Launch_Event_Time</li>
+              <li>Kernel_Launch_Wall_Time</li>
+            </ul>
+          </ul>
+          <ul><li><b>Operator Performance</b></li>
+            <ul><li>MatMul</li><li>Sharding_MatMul</li></ul>
+          </ul>
+          <ul><li><b>Memory</b></li>
+            <ul><li>H2D_Mem_BW_&lt;GPU ID&gt;</li>
+              <li>H2D_Mem_BW_&lt;GPU ID&gt;</li></ul>
+          </ul>
+        </ul>
+        <ul><li><b>Communication Benchmark</b></li>
+          <ul><li><b>Device P2P Bandwidth</b></li>
+            <ul><li>P2P_BW_Max</li><li>P2P_BW_Min</li><li>P2P_BW_Avg</li></ul>
+          </ul>
+          <ul><li><b>RDMA</b></li>
+            <ul><li>RDMA_Peak</li><li>RDMA_Avg</li></ul>
+          </ul>
+          <ul><li><b>NCCL</b></li>
+            <ul><li>NCCL_AllReduce</li></ul>
+            <ul><li>NCCL_AllGather</li></ul>
+            <ul><li>NCCL_broadcast</li></ul>
+            <ul><li>NCCL_reduce</li></ul>
+            <ul><li>NCCL_reduce_scatter</li></ul>
+          </ul>
+        </ul>
+        <ul><li><b>Computation-Communication Benchmark</b></li>
+          <ul><li><b>Mul_During_NCCL</b></li><li><b>MatMul_During_NCCL</b></li></ul>
+        </ul>
+        <ul><li><b>Storage Benchmark</b></li>
+          <ul><li><b>Disk</b></li>
+            <ul>
+              <li>Read/Write</li><li>Rand_Read/Rand_Write</li>
+              <li>R/W_Read</li><li>R/W_Write</li><li>Rand_R/W_Read</li><li>Rand_R/W_Write</li>
+            </ul>
+          </ul>
+        </ul>
+      </td>
+      <td>
+        <ul><li><b>CNN models</b></li>
+          <ul>
+            <li><b>ResNet</b></li>
+              <ul><li>ResNet-50</li><li>ResNet-101</li><li>ResNet-152</li></ul>
+          </ul>
+          <ul>
+            <li><b>DenseNet</b></li>
+              <ul><li>DenseNet-169</li><li>DenseNet-201</li></ul>
+          </ul>
+          <ul>
+            <li><b>VGG</b></li>
+              <ul><li>VGG-11</li><li>VGG-13</li><li>VGG-16</li><li>VGG-19</li></ul>
+          </ul>
+          <ul><li><b>Other CNN models</b></li><ul><li>...</li></ul></ul>
+        </ul>
+        <ul><li><b>BERT models</b></li>
+          <ul><li><b>BERT</b></li><li><b>BERT_LARGE</b></li></ul>
+        </ul>
+        <ul><li><b>LSTM</b></li></ul>
+        <ul><li><b>GPT-2</b></li></ul>
+      </td>
+    </tr>
+  </tbody>
+</table>
--- a/docs/cli.md
+++ b/docs/cli.md
@ -0,0 +1,158 @@
+---
+id: cli
+---
+
+# CLI
+
+SuperBench provides a command line interface to help you use, deploy and run benchmarks.
+```
+$ sb
+
+   _____                       ____                  _
+  / ____|                     |  _ \                | |
+ | (___  _   _ _ __   ___ _ __| |_) | ___ _ __   ___| |__
+  \___ \| | | | '_ \ / _ \ '__|  _ < / _ \ '_ \ / __| '_ \
+  ____) | |_| | |_) |  __/ |  | |_) |  __/ | | | (__| | | |
+ |_____/ \__,_| .__/ \___|_|  |____/ \___|_| |_|\___|_| |_|
+              | |
+              |_|
+
+Welcome to the SB CLI!
+```
+
+## SuperBench CLI commands
+
+The following lists `sb` commands usages and examples:
+
+### `sb deploy`
+
+Deploy the SuperBench environments to all managed nodes.
+```bash title="SB CLI"
+sb deploy [--docker-image]
+          [--docker-password]
+          [--docker-username]
+          [--host-file]
+          [--host-list]
+          [--host-password]
+          [--host-username]
+          [--private-key]
+```
+
+#### Optional arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--docker-image` `-i` | `superbench/superbench` | Docker image URI. |
+| `--docker-password` | `None` | Docker registry password if authentication is needed. |
+| `--docker-username` | `None` | Docker registry username if authentication is needed. |
+| `--host-file` `-f` | `None` | Path to Ansible inventory host file. |
+| `--host-list` `-l` | `None` | Comma separated host list. |
+| `--host-password` | `None` | Host password or key passphase if needed. |
+| `--host-username` | `None` | Host username if needed. |
+| `--private-key` | `None` | Path to private key if needed. |
+
+#### Global arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--help` `-h` | N/A | Show help message. |
+
+#### Examples
+
+Deploy image `superbench/cuda:11.1` to all nodes in `./host.yaml`:
+```bash title="SB CLI"
+sb deploy --docker-image superbench/cuda:11.1 --host-file ./host.yaml
+```
+
+### `sb exec`
+
+Execute the SuperBench benchmarks locally.
+```bash title="SB CLI"
+sb exec [--config-file]
+        [--config-override]
+```
+
+#### Optional arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--config-file` `-c` | `None` | Path to SuperBench config file. |
+| `--config-override` `-C` | `None` | Extra arguments to override config_file. |
+
+#### Global arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--help` `-h` | N/A | Show help message. |
+
+#### Examples
+
+Execute GPT2 model benchmark in default configuration:
+```bash title="SB CLI"
+sb exec --config-override superbench.enable="['gpt2_models']"
+```
+
+### `sb run`
+
+Run the SuperBench benchmarks distributedly.
+```bash title="SB CLI"
+sb run [--config-file]
+       [--config-override]
+       [--docker-image]
+       [--docker-password]
+       [--docker-username]
+       [--host-file]
+       [--host-list]
+       [--host-password]
+       [--host-username]
+       [--private-key]
+```
+
+#### Optional arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--config-file` `-c` | `None` | Path to SuperBench config file. |
+| `--config-override` `-C` | `None` | Extra arguments to override config_file. |
+| `--docker-image` `-i` | `superbench/superbench` | Docker image URI. |
+| `--docker-password` | `None` | Docker registry password if authentication is needed. |
+| `--docker-username` | `None` | Docker registry username if authentication is needed. |
+| `--host-file` `-f` | `None` | Path to Ansible inventory host file. |
+| `--host-list` `-l` | `None` | Comma separated host list. |
+| `--host-password` | `None` | Host password or key passphase if needed. |
+| `--host-username` | `None` | Host username if needed. |
+| `--private-key` | `None` | Path to private key if needed. |
+
+#### Global arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--help` `-h` | N/A | Show help message. |
+
+#### Examples
+
+Run all benchmarks on all managed nodes in `./host.yaml` using image `superbench/cuda:11.1`
+and default benchmarking configuration:
+```bash title="SB CLI"
+sb run --docker-image superbench/cuda:11.1 --host-file ./host.yaml
+```
+
+### `sb version`
+
+Print the current SuperBench CLI version.
+```bash title="SB CLI"
+sb version
+```
+
+#### Global arguments
+
+| Name | Default | Description |
+| --- | --- | --- |
+| `--help` `-h` | N/A | Show help message. |
+
+#### Examples
+
+Print version:
+```bash title="SB CLI"
+sb version
+```
--- a/docs/developer-guides/contributing.md
+++ b/docs/developer-guides/contributing.md
@ -0,0 +1,40 @@
+---
+id: contributing
+---
+
+# Contributing
+
+## Contributor License Agreement
+
+This project welcomes contributions and suggestions.  Most contributions require you to agree to a
+Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
+the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+
+When you submit a pull request, a CLA bot will automatically determine whether you need to provide
+a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
+provided by the bot. You will only need to do this once across all repos using our CLA.
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
+contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
+
+## Contributing principles
+
+SuperBenchmark is an open-source project. Your participation and contribution are highly appreciated. There are several important things you need know before contributing to this project:
+
+### What content can be added to SuperBenchmark
+
+1. Bug fixes for existing features.
+2. New features for benchmark module (micro-benchmark, model-benchmark, etc.)
+
+   If you would like to contribute a new feature on SuperBenchmark, please submit your proposal first. In [GitHub Issues](https://github.com/microsoft/superbenchmark/issues) module, choose `Enhancement Request` to finish the submission. If the proposal is accepted, you can submit pull requests to origin main branch.
+
+### Contribution steps
+
+If you would like to contribute to the project, please follow below steps of joint development on GitHub.
+
+1. `Fork` the repo first to your personal GitHub account.
+2. Checkout from main branch for feature development.
+3. When you finish the feature, please fetch the latest code from origin repo, merge to your branch and resolve conflict.
+4. Submit pull requests to origin main branch.
+5. Please note that there might be comments or questions from reviewers. It will need your help to update the pull request.
--- a/docs/developer-guides/development.md
+++ b/docs/developer-guides/development.md
@ -0,0 +1,47 @@
+---
+id: development
+---
+
+# Development
+
+If you want to develop new feature, please follow below steps to set up development environment.
+
+## Check Environment
+
+Follow [System Requirements](../getting-started/installation.md).
+
+## Set Up
+
+```bash
+git clone https://github.com/microsoft/superbenchmark
+cd superbenchmark
+
+python3 -m pip install -e .[dev,test]
+```
+
+## Lint and Test
+
+Format code using yapf.
+```bash
+python3 setup.py format
+```
+
+Check code style with mypy and flake8
+```bash
+python3 setup.py lint
+```
+
+Run unit tests.
+```bash
+python3 setup.py test
+```
+
+## Submit a Pull Request
+
+Please install `pre-commit` before `git commit` to run all pre-checks.
+
+```bash
+pre-commit install
+```
+
+Open a pull request to main branch on GitHub.
--- a/docs/getting-started/configuration.md
+++ b/docs/getting-started/configuration.md
@ -0,0 +1,154 @@
+---
+id: configuration
+---
+
+# Configuration
+
+## SuperBench config
+
+SuperBench uses a [YAML](https://yaml.org/spec/1.2/spec.html) config file to configure the details of benchmarkings,
+including which benchmark to run, which distributing mode to choose, which parameter to use, etc.
+
+Here's what default config file looks like.
+
+```yaml title="superbench/config/default.yaml"
+# SuperBench Config
+superbench:
+  enable: null
+  var:
+    default_local_mode: &default_local_mode
+      enable: true
+      modes:
+        - name: local
+          proc_num: 8
+          prefix: CUDA_VISIBLE_DEVICES={proc_rank}
+          parallel: yes
+    default_pytorch_mode: &default_pytorch_mode
+      enable: true
+      modes:
+        - name: torch.distributed
+          proc_num: 8
+          node_num: 1
+      frameworks:
+        - pytorch
+    common_model_config: &common_model_config
+      duration: 0
+      num_warmup: 16
+      num_steps: 128
+      precision:
+        - float32
+        - float16
+      model_action:
+        - train
+  benchmarks:
+    kernel-launch:
+      <<: *default_local_mode
+    gemm-flops:
+      <<: *default_local_mode
+    cudnn-function:
+      <<: *default_local_mode
+    cublas-function:
+      <<: *default_local_mode
+    matmul:
+      <<: *default_local_mode
+      frameworks:
+        - pytorch
+    sharding-matmul:
+      <<: *default_pytorch_mode
+    computation-communication-overlap:
+      <<: *default_pytorch_mode
+    gpt_models:
+      <<: *default_pytorch_mode
+      models:
+        - gpt2-small
+        - gpt2-large
+      parameters:
+        <<: *common_model_config
+        batch_size: 4
+    bert_models:
+      <<: *default_pytorch_mode
+      models:
+        - bert-base
+        - bert-large
+      parameters:
+        <<: *common_model_config
+        batch_size: 8
+    lstm_models:
+      <<: *default_pytorch_mode
+      models:
+        - lstm
+      parameters:
+        <<: *common_model_config
+        batch_size: 128
+    cnn_models:
+      <<: *default_pytorch_mode
+      models:
+        - resnet50
+        - resnet101
+        - resnet152
+        - densenet169
+        - densenet201
+        - vgg11
+        - vgg13
+        - vgg16
+        - vgg19
+      parameters:
+        <<: *common_model_config
+        batch_size: 128
+```
+
+By default, all benchmarks in default configuration will be run if you don't specify customized configuration.
+
+If you want to have a quick try, you can modify this config a little bit. For example, only run resnet models.
+1. copy the default config to a file named `resnet.yaml` in current path.
+  ```bash
+  cp superbench/config/default.yaml resnet.yaml
+  ```
+2. enable only `cnn_models` in the config and remove other models except resnet under `benchmarks.cnn_models.models`.
+  ```yaml {3,10-13} title="resnet.yaml"
+  # SuperBench Config
+  superbench:
+    enable: ['cnn_models']
+    var:
+  # ...
+  # omit the middle part
+  # ...
+      cnn_models:
+        <<: *default_pytorch_mode
+        models:
+          - resnet50
+          - resnet101
+          - resnet152
+        parameters:
+          <<: *common_model_config
+          batch_size: 128
+  ```
+
+## Ansible Inventory
+
+SuperBench leverages [Ansible](https://docs.ansible.com/ansible/latest/) to run benchmarking workloads on managed nodes,
+you need to provide an [inventory](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html) file
+to configure host list for managed nodes.
+
+Here're some basic examples as your starting point.
+* One managed node, same node as control node.
+  ```ini title="local.ini"
+  [all]
+  localhost ansible_connection=local
+  ```
+* Two managed nodes, one is control node and the other can be remote accessed.
+  ```ini title="mix.ini"
+  [all]
+  localhost ansible_connection=local
+  10.0.0.100 ansible_user=username ansible_ssh_private_key_file=id_rsa
+  ```
+* Eight managed nodes, all can be accessed remotely.
+  ```ini title="remote.ini"
+  [all]
+  10.0.0.[100:103]
+  10.0.0.[200:203]
+
+  [all:vars]
+  ansible_user=username
+  ansible_ssh_private_key_file=id_rsa
+  ```
--- a/docs/getting-started/installation.md
+++ b/docs/getting-started/installation.md
@ -0,0 +1,83 @@
+---
+id: installation
+---
+
+# Installation
+
+SuperBench is used to run validations for AI infrastructure,
+thus you need to prepare one __control node__ which is used to run SuperBench commands,
+and one or multiple __managed nodes__ which are going to be validated.
+
+Usually __control node__ could be a CPU node, while __managed nodes__ are GPU nodes with high speed inter-connection.
+
+:::tip Tips
+It is fine if you have only one GPU node and want to try SuperBench on it.
+Control node and managed node can co-locate on the same machine.
+:::
+
+## Control node
+
+Here're the system requirements for control node.
+
+### Requirements
+
+* Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
+* [Python](https://www.python.org/) version 3.6 or later (which can be checked by running `python3 --version`).
+* [Pip](https://pip.pypa.io/en/stable/installing/) version 18.0 or later (which can be checked by running `python3 -m pip --version`).
+
+:::note
+Windows is not supported due to lack of Ansible support, but you still can use WSL2.
+:::
+
+Besides, control node should be able to access all managed nodes through SSH.
+If you are going to use password instead of private key for SSH, you also need to install `sshpass`.
+
+```bash
+sudo apt-get install sshpass
+```
+
+It is also recommended to use [venv](https://docs.python.org/3/library/venv.html) for virtual environments,
+but it is not strictly necessary.
+
+```bash
+# create a new virtual environment
+python3 -m venv --system-site-packages ./venv
+# activate the virtual environment
+source ./venv/bin/activate
+
+# exit the virtual environment later
+# after you finish running superbench
+deactivate
+```
+
+### Build
+
+You can clone the source from GitHub and build it.
+
+```bash
+git clone https://github.com/microsoft/superbenchmark
+cd superbenchmark
+
+python3 -m pip install .
+make postinstall
+```
+
+After installation, you should be able to run SB CLI.
+
+```bash
+sb
+```
+
+## Managed nodes
+
+Here're the system requirements for all managed GPU nodes.
+
+### Requirements
+
+* Latest version of Linux, you're highly encouraged to use Ubuntu 18.04 or later.
+* Compatible GPU drivers should be install correctly.
+  * For NVIDIA GPUs, driver version can be checked by running `nvidia-smi`.
+* [Docker CE](https://docs.docker.com/engine/install/) version 19.03 or later (which can be checked by running `docker --version`).
+* GPU support in Docker.
+  * For NVIDIA GPUs, install
+  [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit).
--- a/docs/getting-started/run-superbench.md
+++ b/docs/getting-started/run-superbench.md
@ -0,0 +1,33 @@
+---
+id: run-superbench
+---
+
+# Run SuperBench
+
+Having prepared benchmark configuration and inventory files,
+you can start to run SuperBench over all managed nodes.
+
+## Deploy
+
+Leveraging `sb deploy` command, we can easily deploy SuperBench environment to all managed nodes.
+After running the following command, SuperBench will automatically access all nodes, pull container image and prepare container.
+
+```bash
+sb deploy -f local.ini
+```
+
+Alternatively, to run on remote nodes, use the corresponding inventory file instead.
+
+If you are using password for SSH and cannot specify private key in inventory,
+or your private key requires a passphase before use, you can do
+```bash
+sb deploy -f remote.ini --host-password [password]
+```
+
+## Run
+
+After deployment, you can start to run the SuperBench benchmarks on all managed nodes using `sb run` command.
+
+```bash
+sb run -f local.ini -c resnet.yaml
+```
--- a/docs/introduction.md
+++ b/docs/introduction.md
@ -0,0 +1,33 @@
+---
+id: introduction
+---
+
+# Introduction
+
+## Features
+
+__SuperBench__ is a validation and profiling tool for AI infrastructure, which supports:
+
+* AI infrastructure validation and diagnosis
+  * Distributed validation tools to validate hundreds or thousands of servers automatically
+  * Consider both raw hardware and E2E model performance with ML workload patterns
+  * Build a contract to identify hardware issues
+  * Provide infrastructural-oriented criteria as Performance/Quality Gates for hardware and system release
+  * Provide detailed performance report and advanced analysis tool
+* AI workload benchmarking and profiling
+  * Provide comprehensive performance comparison between different existing hardware
+  * Provide insights for hardware and software co-design
+
+It includes micro-benchmark for primitive computation and communication benchmarking,
+and model-benchmark to measure domain-aware end-to-end deep learning workloads.
+
+:::note
+SuperBench is in the early pre-alpha stage for open source, and not ready for general public yet.
+If you want to jump in early, you can try building latest code yourself.
+:::
+
+## Overview
+
+The following figure shows the capabilities provide by SuperBench core framework and its extension.
+
+![SuperBench Structure](./assets/superbench_structure.png)