Release - SuperBench v0.4.0 (#278)
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>
This commit is contained in:
Родитель
682ed06aee
Коммит
ff563b66af
|
@ -8,14 +8,14 @@
|
||||||
[![License](https://img.shields.io/github/license/microsoft/superbenchmark.svg)](LICENSE)
|
[![License](https://img.shields.io/github/license/microsoft/superbenchmark.svg)](LICENSE)
|
||||||
|
|
||||||
| Azure Pipelines | Build Status |
|
| Azure Pipelines | Build Status |
|
||||||
| :---: | :---: |
|
|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| cpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cpu-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main) |
|
| cpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cpu-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=77&branchName=main) |
|
||||||
| cuda-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |
|
| cuda-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |
|
||||||
| ansible-integration-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/ansible-integration-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=82&branchName=main) |
|
| ansible-integration-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/ansible-integration-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=82&branchName=main) |
|
||||||
|
|
||||||
__SuperBench__ is a validation and profiling tool for AI infrastructure.
|
__SuperBench__ is a validation and profiling tool for AI infrastructure.
|
||||||
|
|
||||||
📢 [v0.3.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.3.0) has been released!
|
📢 [v0.4.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.4.0) has been released!
|
||||||
|
|
||||||
## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._
|
## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._
|
||||||
|
|
||||||
|
|
|
@ -63,18 +63,6 @@ RUN mkdir -p /root/.ssh && \
|
||||||
echo -e "* soft nofile 1048576\n* hard nofile 1048576" >> /etc/security/limits.conf && \
|
echo -e "* soft nofile 1048576\n* hard nofile 1048576" >> /etc/security/limits.conf && \
|
||||||
echo -e "root soft nofile 1048576\nroot hard nofile 1048576" >> /etc/security/limits.conf
|
echo -e "root soft nofile 1048576\nroot hard nofile 1048576" >> /etc/security/limits.conf
|
||||||
|
|
||||||
# Install OpenMPI
|
|
||||||
ENV OPENMPI_VERSION=4.0.5
|
|
||||||
RUN cd /tmp && \
|
|
||||||
wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
|
|
||||||
tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
|
|
||||||
cd openmpi-${OPENMPI_VERSION} && \
|
|
||||||
./configure --enable-orterun-prefix-by-default && \
|
|
||||||
make -j $(nproc) all && \
|
|
||||||
make install && \
|
|
||||||
ldconfig && \
|
|
||||||
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
|
|
||||||
|
|
||||||
# Install OFED
|
# Install OFED
|
||||||
ENV OFED_VERSION=5.2-2.2.3.0
|
ENV OFED_VERSION=5.2-2.2.3.0
|
||||||
RUN cd /tmp && \
|
RUN cd /tmp && \
|
||||||
|
@ -83,6 +71,18 @@ RUN cd /tmp && \
|
||||||
PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
|
PATH=/usr/bin:${PATH} MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64/mlnxofedinstall --user-space-only --without-fw-update --force --all && \
|
||||||
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}*
|
rm -rf MLNX_OFED_LINUX-${OFED_VERSION}*
|
||||||
|
|
||||||
|
# Install OpenMPI
|
||||||
|
ENV OPENMPI_VERSION=4.0.5
|
||||||
|
RUN cd /tmp && \
|
||||||
|
wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
|
||||||
|
tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
|
||||||
|
cd openmpi-${OPENMPI_VERSION} && \
|
||||||
|
./configure --enable-orterun-prefix-by-default --with-ucx=/usr --enable-mca-no-build=btl-uct && \
|
||||||
|
make -j $(nproc) all && \
|
||||||
|
make install && \
|
||||||
|
ldconfig && \
|
||||||
|
rm -rf /tmp/openmpi-${OPENMPI_VERSION}*
|
||||||
|
|
||||||
# Install HPC-X
|
# Install HPC-X
|
||||||
RUN cd /opt && \
|
RUN cd /opt && \
|
||||||
wget -q https://azhpcstor.blob.core.windows.net/azhpc-images-store/hpcx-v2.8.3-gcc-MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tbz && \
|
wget -q https://azhpcstor.blob.core.windows.net/azhpc-images-store/hpcx-v2.8.3-gcc-MLNX_OFED_LINUX-${OFED_VERSION}-ubuntu18.04-x86_64.tbz && \
|
||||||
|
|
|
@ -69,7 +69,7 @@ RUN cd /tmp && \
|
||||||
wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
|
wget -q https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz && \
|
||||||
tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
|
tar xzf openmpi-${OPENMPI_VERSION}.tar.gz && \
|
||||||
cd openmpi-${OPENMPI_VERSION} && \
|
cd openmpi-${OPENMPI_VERSION} && \
|
||||||
./configure --enable-orterun-prefix-by-default && \
|
./configure --enable-orterun-prefix-by-default --with-ucx=/opt/ucx --enable-mca-no-build=btl-uct && \
|
||||||
make -j $(nproc) all && \
|
make -j $(nproc) all && \
|
||||||
make install && \
|
make install && \
|
||||||
ldconfig && \
|
ldconfig && \
|
||||||
|
|
|
@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
|
||||||
:::note Note
|
:::note Note
|
||||||
You should checkout corresponding tag to use release version, for example,
|
You should checkout corresponding tag to use release version, for example,
|
||||||
|
|
||||||
`git clone -b v0.3.0 https://github.com/microsoft/superbenchmark`
|
`git clone -b v0.4.0 https://github.com/microsoft/superbenchmark`
|
||||||
:::
|
:::
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
|
@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
|
||||||
:::note Note
|
:::note Note
|
||||||
You should deploy corresponding Docker image to use release version, for example,
|
You should deploy corresponding Docker image to use release version, for example,
|
||||||
|
|
||||||
`sb deploy -f local.ini -i superbench/superbench:v0.3.0-cuda11.1.1`
|
`sb deploy -f local.ini -i superbench/superbench:v0.4.0-cuda11.1.1`
|
||||||
|
|
||||||
You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.
|
You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.
|
||||||
|
|
||||||
|
|
|
@ -70,7 +70,7 @@ superbench:
|
||||||
<TabItem value='example'>
|
<TabItem value='example'>
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
enable: benchmark_1
|
enable: benchmark_1
|
||||||
monitor:
|
monitor:
|
||||||
|
|
|
@ -60,11 +60,40 @@ Large scale matmul operation using `torch.matmul` with one GPU.
|
||||||
|
|
||||||
### `cublas-function`
|
### `cublas-function`
|
||||||
|
|
||||||
TODO
|
#### Introduction
|
||||||
|
|
||||||
|
Measure the performance of most common Nvidia cuBLAS functions with parameters in models training including ResNet, VGG, DenseNet, LSTM, BERT, and GPT-2.
|
||||||
|
|
||||||
|
The supported functions for cuBLAS are as follows:
|
||||||
|
- cublasSgemm
|
||||||
|
- cublasSgemmStridedBatched
|
||||||
|
- cublasGemmStridedBatchedEx
|
||||||
|
- cublasGemmEx
|
||||||
|
- cublasCgemm3mStridedBatched
|
||||||
|
- cublasCgemm
|
||||||
|
|
||||||
|
#### Metrics
|
||||||
|
|
||||||
|
| Name | Unit | Description |
|
||||||
|
|----------------------------------------------------------|-----------|-------------------------------------------------------------------|
|
||||||
|
| cublas-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cublas function with the parameters. |
|
||||||
|
|
||||||
### `cudnn-function`
|
### `cudnn-function`
|
||||||
|
|
||||||
TODO
|
#### Introduction
|
||||||
|
|
||||||
|
Measure the performance of most common Nvidia cuDNN functions with parameters in models training including ResNet, VGG, DenseNet, LSTM, BERT, and GPT-2.
|
||||||
|
|
||||||
|
The supported functions for cuDNN are as follows:
|
||||||
|
- cudnnConvolutionBackwardFilter
|
||||||
|
- cudnnConvolutionBackwardData
|
||||||
|
- cudnnConvolutionForward
|
||||||
|
|
||||||
|
#### Metrics
|
||||||
|
|
||||||
|
| Name | Unit | Description |
|
||||||
|
|---------------------------------------------------------|-----------|------------------------------------------------------------------|
|
||||||
|
| cudnn-function/name_${function_name}_${parameters}_time | time (us) | The mean time to execute the cudnn function with the parameters. |
|
||||||
|
|
||||||
### `tensorrt-inference`
|
### `tensorrt-inference`
|
||||||
|
|
||||||
|
|
|
@ -29,6 +29,7 @@ available tags are listed below for all stable versions.
|
||||||
|
|
||||||
| Tag | Description |
|
| Tag | Description |
|
||||||
| ----------------- | ---------------------------------- |
|
| ----------------- | ---------------------------------- |
|
||||||
|
| v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
|
||||||
| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
|
| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
|
||||||
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
|
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
|
||||||
| v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 |
|
| v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 |
|
||||||
|
@ -38,6 +39,8 @@ available tags are listed below for all stable versions.
|
||||||
|
|
||||||
| Tag | Description |
|
| Tag | Description |
|
||||||
| --------------------------- | ---------------------------------------------- |
|
| --------------------------- | ---------------------------------------------- |
|
||||||
|
| v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 |
|
||||||
|
| v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 |
|
||||||
| v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
|
| v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
|
||||||
| v0.3.0-rocm4.0-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.0, PyTorch 1.7.0 |
|
| v0.3.0-rocm4.0-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.0, PyTorch 1.7.0 |
|
||||||
|
|
||||||
|
|
|
@ -64,7 +64,7 @@ superbench:
|
||||||
example:
|
example:
|
||||||
```yaml
|
```yaml
|
||||||
# SuperBench rules
|
# SuperBench rules
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
rules:
|
rules:
|
||||||
failure-rule:
|
failure-rule:
|
||||||
|
|
2
setup.py
2
setup.py
|
@ -165,7 +165,7 @@ setup(
|
||||||
'pytest>=6.2.2',
|
'pytest>=6.2.2',
|
||||||
'types-pyyaml',
|
'types-pyyaml',
|
||||||
'vcrpy>=4.1.1',
|
'vcrpy>=4.1.1',
|
||||||
'yapf>=0.30.0',
|
'yapf==0.31.0',
|
||||||
],
|
],
|
||||||
'nvidia': ['py3nvml>=0.2.6'],
|
'nvidia': ['py3nvml>=0.2.6'],
|
||||||
'ort': [
|
'ort': [
|
||||||
|
|
|
@ -6,5 +6,5 @@
|
||||||
Provide hardware and software benchmarks for AI systems.
|
Provide hardware and software benchmarks for AI systems.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
__version__ = '0.3.0'
|
__version__ = '0.4.0'
|
||||||
__author__ = 'Microsoft'
|
__author__ = 'Microsoft'
|
||||||
|
|
|
@ -5,12 +5,13 @@
|
||||||
|
|
||||||
import re
|
import re
|
||||||
from typing import Callable
|
from typing import Callable
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
|
|
||||||
from superbench.common.utils import logger
|
from superbench.common.utils import logger
|
||||||
from superbench.analyzer.diagnosis_rule_op import RuleOp, DiagnosisRuleType
|
from superbench.analyzer.diagnosis_rule_op import RuleOp, DiagnosisRuleType
|
||||||
import superbench.analyzer.file_handler as file_handler
|
from superbench.analyzer import file_handler
|
||||||
|
|
||||||
|
|
||||||
class DataDiagnosis():
|
class DataDiagnosis():
|
||||||
|
@ -31,6 +32,11 @@ class DataDiagnosis():
|
||||||
"""
|
"""
|
||||||
benchmarks_metrics = {}
|
benchmarks_metrics = {}
|
||||||
for metric in metrics_list:
|
for metric in metrics_list:
|
||||||
|
if '/' not in metric:
|
||||||
|
logger.warning(
|
||||||
|
'DataDiagnosis: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric)
|
||||||
|
)
|
||||||
|
else:
|
||||||
benchmark = metric.split('/')[0]
|
benchmark = metric.split('/')[0]
|
||||||
if benchmark not in benchmarks_metrics:
|
if benchmark not in benchmarks_metrics:
|
||||||
benchmarks_metrics[benchmark] = set()
|
benchmarks_metrics[benchmark] = set()
|
||||||
|
@ -133,6 +139,7 @@ class DataDiagnosis():
|
||||||
if re.search(metric_regex, metric):
|
if re.search(metric_regex, metric):
|
||||||
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
|
self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
|
||||||
self._enable_metrics.append(metric)
|
self._enable_metrics.append(metric)
|
||||||
|
self._enable_metrics.sort()
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
|
logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
|
||||||
return False
|
return False
|
||||||
|
@ -171,8 +178,8 @@ class DataDiagnosis():
|
||||||
issue_label = True
|
issue_label = True
|
||||||
if issue_label:
|
if issue_label:
|
||||||
# Add category information
|
# Add category information
|
||||||
general_cat_str = ','.join(categories)
|
general_cat_str = ','.join(sorted(list(categories)))
|
||||||
details_cat_str = ','.join(details)
|
details_cat_str = ','.join(sorted((details)))
|
||||||
details_row = [general_cat_str, details_cat_str]
|
details_row = [general_cat_str, details_cat_str]
|
||||||
return details_row, summary_data_row
|
return details_row, summary_data_row
|
||||||
|
|
||||||
|
@ -236,15 +243,15 @@ class DataDiagnosis():
|
||||||
try:
|
try:
|
||||||
self._raw_data_df = file_handler.read_raw_data(raw_data_file)
|
self._raw_data_df = file_handler.read_raw_data(raw_data_file)
|
||||||
self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
|
self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
|
||||||
logger.info('DataDiagnosis: Begin to processe {} nodes'.format(len(self._raw_data_df)))
|
logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
|
||||||
data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
|
data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
|
||||||
logger.info('DataDiagnosis: Processed finished')
|
logger.info('DataDiagnosis: Processed finished')
|
||||||
outpout_path = ''
|
output_path = ''
|
||||||
if output_format == 'excel':
|
if output_format == 'excel':
|
||||||
output_path = output_dir + '/diagnosis_summary.xlsx'
|
output_path = str(Path(output_dir) / 'diagnosis_summary.xlsx')
|
||||||
file_handler.output_excel(self._raw_data_df, data_not_accept_df, outpout_path, self._sb_rules)
|
file_handler.output_excel(self._raw_data_df, data_not_accept_df, output_path, self._sb_rules)
|
||||||
elif output_format == 'json':
|
elif output_format == 'json':
|
||||||
output_path = output_dir + '/diagnosis_summary.jsonl'
|
output_path = str(Path(output_dir) / 'diagnosis_summary.jsonl')
|
||||||
file_handler.output_json_data_not_accept(data_not_accept_df, output_path)
|
file_handler.output_json_data_not_accept(data_not_accept_df, output_path)
|
||||||
else:
|
else:
|
||||||
logger.error('DataDiagnosis: output failed - unsupported output format')
|
logger.error('DataDiagnosis: output failed - unsupported output format')
|
||||||
|
|
|
@ -129,10 +129,11 @@ class torch2onnxExporter():
|
||||||
if not self.check_torchvision_model(model_name):
|
if not self.check_torchvision_model(model_name):
|
||||||
return ''
|
return ''
|
||||||
file_name = str(self._onnx_model_path / (model_name + '.onnx'))
|
file_name = str(self._onnx_model_path / (model_name + '.onnx'))
|
||||||
input_shape = (batch_size, 3, 224, 224)
|
model = getattr(torchvision.models, model_name)(pretrained=False).eval().cuda()
|
||||||
|
dummy_input = torch.randn((batch_size, 3, 224, 224), device='cuda')
|
||||||
torch.onnx.export(
|
torch.onnx.export(
|
||||||
getattr(torchvision.models, model_name)(pretrained=False).eval().cuda(),
|
model,
|
||||||
torch.randn(input_shape, device='cuda'),
|
dummy_input,
|
||||||
file_name,
|
file_name,
|
||||||
opset_version=10,
|
opset_version=10,
|
||||||
operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
|
operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
|
||||||
|
@ -147,6 +148,10 @@ class torch2onnxExporter():
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
del model
|
||||||
|
del dummy_input
|
||||||
|
torch.cuda.empty_cache()
|
||||||
return file_name
|
return file_name
|
||||||
|
|
||||||
def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
|
def export_benchmark_model(self, model_name, batch_size=1, seq_length=512):
|
||||||
|
@ -163,13 +168,13 @@ class torch2onnxExporter():
|
||||||
if not self.check_benchmark_model(model_name):
|
if not self.check_benchmark_model(model_name):
|
||||||
return
|
return
|
||||||
file_name = str(self._onnx_model_path / (model_name + '.onnx'))
|
file_name = str(self._onnx_model_path / (model_name + '.onnx'))
|
||||||
input_shape, dtype = (batch_size, seq_length), torch.int64
|
model = self.benchmark_models[model_name]().eval().cuda()
|
||||||
|
dummy_input = torch.ones((batch_size, seq_length), dtype=torch.int64, device='cuda')
|
||||||
if model_name == 'lstm':
|
if model_name == 'lstm':
|
||||||
input_shape += (self.lstm_input_size, )
|
dummy_input = torch.ones((batch_size, seq_length, self.lstm_input_size), device='cuda')
|
||||||
dtype = None
|
|
||||||
torch.onnx.export(
|
torch.onnx.export(
|
||||||
self.benchmark_models[model_name]().eval().cuda(),
|
model,
|
||||||
torch.ones(input_shape, dtype=dtype, device='cuda'),
|
dummy_input,
|
||||||
file_name,
|
file_name,
|
||||||
opset_version=10,
|
opset_version=10,
|
||||||
do_constant_folding=True,
|
do_constant_folding=True,
|
||||||
|
@ -185,4 +190,8 @@ class torch2onnxExporter():
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
del model
|
||||||
|
del dummy_input
|
||||||
|
torch.cuda.empty_cache()
|
||||||
return file_name
|
return file_name
|
||||||
|
|
|
@ -291,8 +291,8 @@ class CublasBenchmark(MicroBenchmarkWithInvoke):
|
||||||
raw_data = raw_data.split(',')
|
raw_data = raw_data.split(',')
|
||||||
raw_data.pop()
|
raw_data.pop()
|
||||||
raw_data = [float(item) for item in raw_data]
|
raw_data = [float(item) for item in raw_data]
|
||||||
self._result.add_result(metric, statistics.mean(raw_data))
|
self._result.add_result(metric.lower() + '_time', statistics.mean(raw_data))
|
||||||
self._result.add_raw_data(metric, raw_data)
|
self._result.add_raw_data(metric.lower() + '_time', raw_data)
|
||||||
if 'Error' in line:
|
if 'Error' in line:
|
||||||
error = True
|
error = True
|
||||||
except BaseException as e:
|
except BaseException as e:
|
||||||
|
|
|
@ -6,6 +6,7 @@
|
||||||
import os
|
import os
|
||||||
import json
|
import json
|
||||||
import yaml
|
import yaml
|
||||||
|
import statistics
|
||||||
|
|
||||||
from superbench.common.utils import logger
|
from superbench.common.utils import logger
|
||||||
from superbench.benchmarks import Platform, BenchmarkRegistry, ReturnCode
|
from superbench.benchmarks import Platform, BenchmarkRegistry, ReturnCode
|
||||||
|
@ -424,8 +425,8 @@ class CudnnBenchmark(MicroBenchmarkWithInvoke):
|
||||||
raw_data = raw_data.split(',')
|
raw_data = raw_data.split(',')
|
||||||
raw_data.pop()
|
raw_data.pop()
|
||||||
raw_data = [float(item) for item in raw_data]
|
raw_data = [float(item) for item in raw_data]
|
||||||
self._result.add_result(metric, sum(raw_data) / len(raw_data))
|
self._result.add_result(metric.lower() + '_time', statistics.mean(raw_data) * 1000)
|
||||||
self._result.add_raw_data(metric, raw_data)
|
self._result.add_raw_data(metric.lower() + '_time', raw_data)
|
||||||
if 'Error' in line:
|
if 'Error' in line:
|
||||||
error = True
|
error = True
|
||||||
except BaseException as e:
|
except BaseException as e:
|
||||||
|
|
|
@ -249,7 +249,7 @@ class IBBenchmark(MicroBenchmarkWithInvoke):
|
||||||
msg_size = '-s ' + str(self._args.msg_size)
|
msg_size = '-s ' + str(self._args.msg_size)
|
||||||
# Add GPUDirect for ib command
|
# Add GPUDirect for ib command
|
||||||
gpu_enable = ''
|
gpu_enable = ''
|
||||||
if self._args.gpu_index:
|
if self._args.gpu_index is not None:
|
||||||
gpu = GPU()
|
gpu = GPU()
|
||||||
if gpu.vendor == 'nvidia':
|
if gpu.vendor == 'nvidia':
|
||||||
gpu_enable = ' --use_cuda={gpu_index}'.format(gpu_index=str(self._args.gpu_index))
|
gpu_enable = ' --use_cuda={gpu_index}'.format(gpu_index=str(self._args.gpu_index))
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
# Server:
|
# Server:
|
||||||
# - Product: HPE Apollo 6500
|
# - Product: HPE Apollo 6500
|
||||||
|
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
enable: null
|
enable: null
|
||||||
var:
|
var:
|
||||||
|
@ -99,9 +99,31 @@ superbench:
|
||||||
copy_type:
|
copy_type:
|
||||||
- sm
|
- sm
|
||||||
- dma
|
- dma
|
||||||
ort-inference:
|
ib-traffic:
|
||||||
<<: *default_local_mode
|
|
||||||
enable: false
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
btl: tcp,self
|
||||||
|
pml: ob1
|
||||||
|
btl_tcp_if_include: ens17f0
|
||||||
|
gpcnet-network-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: ens17f0
|
||||||
|
tcp-connectivity:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: local
|
||||||
|
parallel: no
|
||||||
|
parameters:
|
||||||
|
port: 22
|
||||||
ort-models:
|
ort-models:
|
||||||
enable: false
|
enable: false
|
||||||
modes:
|
modes:
|
||||||
|
|
|
@ -4,7 +4,7 @@
|
||||||
# - Product: G482-Z53
|
# - Product: G482-Z53
|
||||||
# - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html
|
# - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html
|
||||||
|
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
enable: null
|
enable: null
|
||||||
var:
|
var:
|
||||||
|
|
|
@ -3,9 +3,13 @@
|
||||||
# Azure NDm A100 v4
|
# Azure NDm A100 v4
|
||||||
# reference: https://docs.microsoft.com/en-us/azure/virtual-machines/ndm-a100-v4-series
|
# reference: https://docs.microsoft.com/en-us/azure/virtual-machines/ndm-a100-v4-series
|
||||||
|
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
enable: null
|
enable: null
|
||||||
|
monitor:
|
||||||
|
enable: true
|
||||||
|
sample_duration: 1
|
||||||
|
sample_interval: 10
|
||||||
var:
|
var:
|
||||||
default_local_mode: &default_local_mode
|
default_local_mode: &default_local_mode
|
||||||
enable: true
|
enable: true
|
||||||
|
@ -123,6 +127,52 @@ superbench:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
computation-communication-overlap:
|
computation-communication-overlap:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
|
ib-traffic:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
gpcnet-network-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: eth0
|
||||||
|
gpcnet-network-load-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: eth0
|
||||||
|
tcp-connectivity:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: local
|
||||||
|
parallel: no
|
||||||
|
parameters:
|
||||||
|
port: 22
|
||||||
|
ort-inference:
|
||||||
|
<<: *default_local_mode
|
||||||
|
tensorrt-inference:
|
||||||
|
<<: *default_local_mode
|
||||||
|
parameters:
|
||||||
|
pytorch_models:
|
||||||
|
- resnet50
|
||||||
|
- resnet101
|
||||||
|
- resnet152
|
||||||
|
- densenet169
|
||||||
|
- densenet201
|
||||||
|
- bert-base
|
||||||
|
- bert-large
|
||||||
|
seq_length: 224
|
||||||
|
batch_size: 32
|
||||||
|
precision: int8
|
||||||
gpt_models:
|
gpt_models:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
models:
|
models:
|
||||||
|
|
|
@ -1,9 +1,9 @@
|
||||||
# SuperBench Config
|
# SuperBench Config
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
enable: null
|
enable: null
|
||||||
monitor:
|
monitor:
|
||||||
enable: false
|
enable: true
|
||||||
sample_duration: 1
|
sample_duration: 1
|
||||||
sample_interval: 10
|
sample_interval: 10
|
||||||
var:
|
var:
|
||||||
|
@ -109,9 +109,52 @@ superbench:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
computation-communication-overlap:
|
computation-communication-overlap:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
|
ib-traffic:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
gpcnet-network-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: eth0
|
||||||
|
gpcnet-network-load-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: eth0
|
||||||
|
tcp-connectivity:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: local
|
||||||
|
parallel: no
|
||||||
|
parameters:
|
||||||
|
port: 22
|
||||||
ort-inference:
|
ort-inference:
|
||||||
<<: *default_local_mode
|
<<: *default_local_mode
|
||||||
enable: false
|
tensorrt-inference:
|
||||||
|
<<: *default_local_mode
|
||||||
|
parameters:
|
||||||
|
pytorch_models:
|
||||||
|
- resnet50
|
||||||
|
- resnet101
|
||||||
|
- resnet152
|
||||||
|
- densenet169
|
||||||
|
- densenet201
|
||||||
|
- bert-base
|
||||||
|
- bert-large
|
||||||
|
seq_length: 224
|
||||||
|
batch_size: 32
|
||||||
|
precision: int8
|
||||||
gpt_models:
|
gpt_models:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
models:
|
models:
|
||||||
|
|
|
@ -1,9 +1,9 @@
|
||||||
# SuperBench Config
|
# SuperBench Config
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
enable: null
|
enable: null
|
||||||
monitor:
|
monitor:
|
||||||
enable: false
|
enable: true
|
||||||
sample_duration: 1
|
sample_duration: 1
|
||||||
sample_interval: 10
|
sample_interval: 10
|
||||||
var:
|
var:
|
||||||
|
@ -107,9 +107,56 @@ superbench:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
computation-communication-overlap:
|
computation-communication-overlap:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
|
ib-traffic:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
gpcnet-network-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: eth0
|
||||||
|
env:
|
||||||
|
UCX_NET_DEVICES: mlx5_0:1
|
||||||
|
gpcnet-network-load-test:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: mpi
|
||||||
|
proc_num: 1
|
||||||
|
mca:
|
||||||
|
pml: ucx
|
||||||
|
btl: ^uct
|
||||||
|
btl_tcp_if_include: eth0
|
||||||
|
env:
|
||||||
|
UCX_NET_DEVICES: mlx5_0:1
|
||||||
|
tcp-connectivity:
|
||||||
|
enable: false
|
||||||
|
modes:
|
||||||
|
- name: local
|
||||||
|
parallel: no
|
||||||
|
parameters:
|
||||||
|
port: 22
|
||||||
ort-inference:
|
ort-inference:
|
||||||
<<: *default_local_mode
|
<<: *default_local_mode
|
||||||
enable: false
|
tensorrt-inference:
|
||||||
|
<<: *default_local_mode
|
||||||
|
parameters:
|
||||||
|
pytorch_models:
|
||||||
|
- resnet50
|
||||||
|
- resnet101
|
||||||
|
- resnet152
|
||||||
|
- densenet169
|
||||||
|
- densenet201
|
||||||
|
- bert-base
|
||||||
|
- bert-large
|
||||||
|
seq_length: 224
|
||||||
|
batch_size: 32
|
||||||
|
precision: int8
|
||||||
gpt_models:
|
gpt_models:
|
||||||
<<: *default_pytorch_mode
|
<<: *default_pytorch_mode
|
||||||
models:
|
models:
|
||||||
|
|
|
@ -3,6 +3,7 @@
|
||||||
|
|
||||||
"""SuperBench Ansible Client."""
|
"""SuperBench Ansible Client."""
|
||||||
|
|
||||||
|
import tempfile
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import ansible_runner
|
import ansible_runner
|
||||||
|
@ -22,10 +23,10 @@ class AnsibleClient():
|
||||||
"""
|
"""
|
||||||
self._playbook_path = Path(__file__).parent / 'playbooks'
|
self._playbook_path = Path(__file__).parent / 'playbooks'
|
||||||
self._config = {
|
self._config = {
|
||||||
'private_data_dir': None,
|
|
||||||
'host_pattern': 'localhost',
|
'host_pattern': 'localhost',
|
||||||
'cmdline': '--forks 128',
|
'cmdline': '--forks 128',
|
||||||
}
|
}
|
||||||
|
self._head_host = None
|
||||||
if config:
|
if config:
|
||||||
inventory_file = getattr(config, 'host_file', None)
|
inventory_file = getattr(config, 'host_file', None)
|
||||||
inventory_list = getattr(config, 'host_list', None)
|
inventory_list = getattr(config, 'host_list', None)
|
||||||
|
@ -34,9 +35,10 @@ class AnsibleClient():
|
||||||
if inventory_file or inventory_list:
|
if inventory_file or inventory_list:
|
||||||
self._config['host_pattern'] = 'all'
|
self._config['host_pattern'] = 'all'
|
||||||
inventory = InventoryManager(loader=DataLoader(), sources=inventory_file or f'{inventory_list},')
|
inventory = InventoryManager(loader=DataLoader(), sources=inventory_file or f'{inventory_list},')
|
||||||
host_list = inventory.get_groups_dict()['all']
|
host_list = inventory.get_hosts(pattern='all', order='sorted')
|
||||||
if len(host_list) > 0:
|
if len(host_list) > 0:
|
||||||
self._config['cmdline'] = '--forks {}'.format(len(host_list))
|
self._config['cmdline'] = '--forks {}'.format(len(host_list))
|
||||||
|
self._head_host = host_list[0].get_name()
|
||||||
if inventory_list in ['localhost', '127.0.0.1']:
|
if inventory_list in ['localhost', '127.0.0.1']:
|
||||||
self._config['cmdline'] += ' --connection local'
|
self._config['cmdline'] += ' --connection local'
|
||||||
self._config['cmdline'] += ' --inventory {}'.format(inventory_file or f'{inventory_list},')
|
self._config['cmdline'] += ' --inventory {}'.format(inventory_file or f'{inventory_list},')
|
||||||
|
@ -69,12 +71,13 @@ class AnsibleClient():
|
||||||
if sudo:
|
if sudo:
|
||||||
logger.info('Run as sudo ...')
|
logger.info('Run as sudo ...')
|
||||||
ansible_config['cmdline'] += ' --become'
|
ansible_config['cmdline'] += ' --become'
|
||||||
r = ansible_runner.run(**ansible_config)
|
with tempfile.TemporaryDirectory(prefix='ansible') as tmpdir:
|
||||||
|
r = ansible_runner.run(private_data_dir=tmpdir, **ansible_config)
|
||||||
|
logger.debug(r.stats)
|
||||||
if r.rc == 0:
|
if r.rc == 0:
|
||||||
logger.info('Run succeed, return code {}.'.format(r.rc))
|
logger.info('Run succeed, return code {}.'.format(r.rc))
|
||||||
else:
|
else:
|
||||||
logger.warning('Run failed, return code {}.'.format(r.rc))
|
logger.warning('Run failed, return code {}.'.format(r.rc))
|
||||||
logger.debug(r.stats)
|
|
||||||
return r.rc
|
return r.rc
|
||||||
|
|
||||||
def update_mpi_config(self, ansible_config):
|
def update_mpi_config(self, ansible_config):
|
||||||
|
@ -86,7 +89,10 @@ class AnsibleClient():
|
||||||
Returns:
|
Returns:
|
||||||
dict: Updated Ansible config dict.
|
dict: Updated Ansible config dict.
|
||||||
"""
|
"""
|
||||||
|
if not self._head_host:
|
||||||
ansible_config['host_pattern'] += '[0]'
|
ansible_config['host_pattern'] += '[0]'
|
||||||
|
else:
|
||||||
|
ansible_config['host_pattern'] = self._head_host
|
||||||
return ansible_config
|
return ansible_config
|
||||||
|
|
||||||
def get_shell_config(self, cmd):
|
def get_shell_config(self, cmd):
|
||||||
|
|
|
@ -1,11 +1,13 @@
|
||||||
- name: Fetch Results
|
- name: Fetch Results
|
||||||
hosts: all
|
hosts: all
|
||||||
gather_facts: true
|
gather_facts: true
|
||||||
|
vars:
|
||||||
|
workspace: '{{ ansible_user_dir }}/sb-workspace'
|
||||||
tasks:
|
tasks:
|
||||||
- name: Synchronize Output Directory
|
- name: Synchronize Output Directory
|
||||||
ansible.posix.synchronize:
|
ansible.posix.synchronize:
|
||||||
mode: pull
|
mode: pull
|
||||||
src: '{{ sb_output_dir }}/'
|
src: '{{ sb_output_dir if sb_output_dir.startswith("/") else workspace + "/" + sb_output_dir }}/'
|
||||||
dest: '{{ absolute_output_dir }}/nodes/{{ ansible_hostname }}'
|
dest: '{{ absolute_output_dir }}/nodes/{{ ansible_hostname }}'
|
||||||
rsync_opts:
|
rsync_opts:
|
||||||
- --exclude=nodes
|
- --exclude=nodes
|
||||||
|
|
|
@ -39,7 +39,7 @@ class SuperBenchRunner():
|
||||||
self._ansible_client = AnsibleClient(ansible_config)
|
self._ansible_client = AnsibleClient(ansible_config)
|
||||||
|
|
||||||
self.__set_logger('sb-run.log')
|
self.__set_logger('sb-run.log')
|
||||||
logger.info('Runner uses config: %s.', pformat(self._sb_config))
|
logger.info('Runner uses config: %s.', pformat(OmegaConf.to_container(self._sb_config, resolve=True)))
|
||||||
logger.info('Runner writes to: %s.', str(self._output_path))
|
logger.info('Runner writes to: %s.', str(self._output_path))
|
||||||
|
|
||||||
self._sb_benchmarks = self._sb_config.superbench.benchmarks
|
self._sb_benchmarks = self._sb_config.superbench.benchmarks
|
||||||
|
@ -336,7 +336,8 @@ class SuperBenchRunner():
|
||||||
for pattern, reduce_type in MonitorRecord.reduce_ops.items():
|
for pattern, reduce_type in MonitorRecord.reduce_ops.items():
|
||||||
if pattern in metric:
|
if pattern in metric:
|
||||||
reduce_func = Reducer.get_reduce_func(reduce_type)
|
reduce_func = Reducer.get_reduce_func(reduce_type)
|
||||||
metrics_summary[metric] = reduce_func(values)
|
metric_name = 'monitor/{}'.format(metric)
|
||||||
|
metrics_summary[metric_name] = reduce_func(values)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
return metrics_summary
|
return metrics_summary
|
||||||
|
|
|
@ -18,9 +18,10 @@ class TestDataDiagnosis(unittest.TestCase):
|
||||||
"""Test for DataDiagnosis class."""
|
"""Test for DataDiagnosis class."""
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
"""Method called to prepare the test fixture."""
|
"""Method called to prepare the test fixture."""
|
||||||
self.output_excel_file = str(Path(__file__).parent.resolve()) + '/diagnosis_summary.xlsx'
|
self.parent_path = Path(__file__).parent
|
||||||
self.test_rule_file_fake = str(Path(__file__).parent.resolve()) + '/test_rules_fake.yaml'
|
self.output_excel_file = str(self.parent_path / 'diagnosis_summary.xlsx')
|
||||||
self.output_json_file = str(Path(__file__).parent.resolve()) + '/diagnosis_summary.jsonl'
|
self.test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
|
||||||
|
self.output_json_file = str(self.parent_path / 'diagnosis_summary.jsonl')
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
"""Method called after the test method has been called and the result recorded."""
|
||||||
|
@ -33,21 +34,31 @@ class TestDataDiagnosis(unittest.TestCase):
|
||||||
"""Test for rule-based data diagnosis."""
|
"""Test for rule-based data diagnosis."""
|
||||||
# Test - read_raw_data and get_metrics_from_raw_data
|
# Test - read_raw_data and get_metrics_from_raw_data
|
||||||
# Positive case
|
# Positive case
|
||||||
test_raw_data = str(Path(__file__).parent.resolve()) + '/test_results.jsonl'
|
test_raw_data = str(self.parent_path / 'test_results.jsonl')
|
||||||
test_rule_file = str(Path(__file__).parent.resolve()) + '/test_rules.yaml'
|
test_rule_file = str(self.parent_path / 'test_rules.yaml')
|
||||||
test_baseline_file = str(Path(__file__).parent.resolve()) + '/test_baseline.json'
|
test_baseline_file = str(self.parent_path / 'test_baseline.json')
|
||||||
diag1 = DataDiagnosis()
|
diag1 = DataDiagnosis()
|
||||||
diag1._raw_data_df = file_handler.read_raw_data(test_raw_data)
|
diag1._raw_data_df = file_handler.read_raw_data(test_raw_data)
|
||||||
diag1._metrics = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df))
|
diag1._metrics = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df))
|
||||||
assert (len(diag1._raw_data_df) == 3)
|
assert (len(diag1._raw_data_df) == 3)
|
||||||
# Negative case
|
# Negative case
|
||||||
test_raw_data_fake = str(Path(__file__).parent.resolve()) + '/test_results_fake.jsonl'
|
test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
|
||||||
test_rule_file_fake = str(Path(__file__).parent.resolve()) + '/test_rules_fake.yaml'
|
test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
|
||||||
diag2 = DataDiagnosis()
|
diag2 = DataDiagnosis()
|
||||||
diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake)
|
diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake)
|
||||||
diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
|
diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
|
||||||
assert (len(diag2._raw_data_df) == 0)
|
assert (len(diag2._raw_data_df) == 0)
|
||||||
assert (len(diag2._metrics) == 0)
|
assert (len(diag2._metrics) == 0)
|
||||||
|
metric_list = [
|
||||||
|
'gpu_temperature', 'gpu_power_limit', 'gemm-flops/FP64',
|
||||||
|
'bert_models/pytorch-bert-base/steptime_train_float32'
|
||||||
|
]
|
||||||
|
self.assertDictEqual(
|
||||||
|
diag2._get_metrics_by_benchmarks(metric_list), {
|
||||||
|
'gemm-flops': {'gemm-flops/FP64'},
|
||||||
|
'bert_models': {'bert_models/pytorch-bert-base/steptime_train_float32'}
|
||||||
|
}
|
||||||
|
)
|
||||||
# Test - read rules
|
# Test - read rules
|
||||||
rules = file_handler.read_rules(test_rule_file_fake)
|
rules = file_handler.read_rules(test_rule_file_fake)
|
||||||
assert (not rules)
|
assert (not rules)
|
||||||
|
@ -176,3 +187,27 @@ class TestDataDiagnosis(unittest.TestCase):
|
||||||
assert ('Category' in line)
|
assert ('Category' in line)
|
||||||
assert ('Defective Details' in line)
|
assert ('Defective Details' in line)
|
||||||
assert ('Index' in line)
|
assert ('Index' in line)
|
||||||
|
|
||||||
|
def test_data_diagnosis_run(self):
|
||||||
|
"""Test for the run process of rule-based data diagnosis."""
|
||||||
|
test_raw_data = str(self.parent_path / 'test_results.jsonl')
|
||||||
|
test_rule_file = str(self.parent_path / 'test_rules.yaml')
|
||||||
|
test_baseline_file = str(self.parent_path / 'test_baseline.json')
|
||||||
|
|
||||||
|
# Test - output in excel
|
||||||
|
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'excel')
|
||||||
|
excel_file = pd.ExcelFile(self.output_excel_file, engine='openpyxl')
|
||||||
|
data_sheet_name = 'Not Accept'
|
||||||
|
data_not_accept_read_from_excel = excel_file.parse(data_sheet_name)
|
||||||
|
expect_result_file = pd.ExcelFile(str(self.parent_path / '../data/diagnosis_summary.xlsx'), engine='openpyxl')
|
||||||
|
expect_result = expect_result_file.parse(data_sheet_name)
|
||||||
|
pd.util.testing.assert_frame_equal(data_not_accept_read_from_excel, expect_result)
|
||||||
|
# Test - output in json
|
||||||
|
DataDiagnosis().run(test_raw_data, test_rule_file, test_baseline_file, str(self.parent_path), 'json')
|
||||||
|
assert (Path(self.output_json_file).is_file())
|
||||||
|
with Path(self.output_json_file).open() as f:
|
||||||
|
data_not_accept_read_from_json = f.read()
|
||||||
|
expect_result_file = self.parent_path / '../data/diagnosis_summary.jsonl'
|
||||||
|
with Path(expect_result_file).open() as f:
|
||||||
|
expect_result = f.read()
|
||||||
|
assert (data_not_accept_read_from_json == expect_result)
|
||||||
|
|
|
@ -1,5 +1,5 @@
|
||||||
# SuperBench rules
|
# SuperBench rules
|
||||||
version: v0.3
|
version: v0.4
|
||||||
superbench:
|
superbench:
|
||||||
rules:
|
rules:
|
||||||
rule0:
|
rule0:
|
||||||
|
|
|
@ -14,4 +14,5 @@
|
||||||
vars:
|
vars:
|
||||||
ssh_port: 12345
|
ssh_port: 12345
|
||||||
output_dir: /tmp/test_ansible
|
output_dir: /tmp/test_ansible
|
||||||
docker_image: superbench/superbench
|
# use a mock superbench image (requires `sb` binary inside)
|
||||||
|
docker_image: superbench/superbench:v0.3.0-cuda11.1.1
|
||||||
|
|
|
@ -3,29 +3,20 @@
|
||||||
|
|
||||||
"""Tests for cpu-memory-bw-latency benchmark."""
|
"""Tests for cpu-memory-bw-latency benchmark."""
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
|
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
|
|
||||||
|
|
||||||
class CpuMemBwLatencyBenchmarkTest(unittest.TestCase):
|
class CpuMemBwLatencyBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Test class for cpu-memory-bw-latency benchmark."""
|
"""Test class for cpu-memory-bw-latency benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
self.__curr_micro_path = os.environ.get('SB_MICRO_PATH', '')
|
super().setUpClass()
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
cls.createMockEnvs(cls)
|
||||||
binary_path = Path(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockFiles(cls, ['bin/mlc'])
|
||||||
binary_path.mkdir(parents=True, exist_ok=True)
|
|
||||||
self.__binary_file = binary_path / 'mlc'
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
|
||||||
self.__binary_file.unlink()
|
|
||||||
os.environ['SB_MICRO_PATH'] = self.__curr_micro_path
|
|
||||||
|
|
||||||
def test_cpu_mem_bw_latency_benchmark_empty_param(self):
|
def test_cpu_mem_bw_latency_benchmark_empty_param(self):
|
||||||
"""Test cpu-memory-bw-latency benchmark command generation with empty parameter."""
|
"""Test cpu-memory-bw-latency benchmark command generation with empty parameter."""
|
||||||
|
|
|
@ -3,29 +3,22 @@
|
||||||
|
|
||||||
"""Tests for gemm-flops benchmark."""
|
"""Tests for gemm-flops benchmark."""
|
||||||
|
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from tests.helper import decorator
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.common.utils import device_manager as dm
|
from superbench.common.utils import device_manager as dm
|
||||||
from superbench.benchmarks import BenchmarkRegistry, ReturnCode, Platform, BenchmarkType
|
from superbench.benchmarks import BenchmarkRegistry, ReturnCode, Platform, BenchmarkType
|
||||||
|
|
||||||
|
|
||||||
class CudaGemmFlopsBenchmarkTest(unittest.TestCase):
|
class CudaGemmFlopsBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Tests for CudaGemmFlopsBenchmark benchmark."""
|
"""Tests for CudaGemmFlopsBenchmark benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = os.path.join(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
Path(binary_path).mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/cutlass_profiler'])
|
||||||
self.__binary_file = Path(os.path.join(binary_path, 'cutlass_profiler'))
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
|
||||||
self.__binary_file.unlink()
|
|
||||||
|
|
||||||
@decorator.cuda_test
|
@decorator.cuda_test
|
||||||
def test_flops_performance_cuda(self):
|
def test_flops_performance_cuda(self):
|
||||||
|
|
|
@ -4,29 +4,26 @@
|
||||||
"""Tests for mem-bw benchmark."""
|
"""Tests for mem-bw benchmark."""
|
||||||
|
|
||||||
import numbers
|
import numbers
|
||||||
from pathlib import Path
|
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
|
|
||||||
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
|
|
||||||
|
|
||||||
class CudaMemBwTest(unittest.TestCase):
|
class CudaMemBwTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Test class for cuda mem-bw benchmark."""
|
"""Test class for cuda mem-bw benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = os.path.join(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
Path(os.getenv('SB_MICRO_PATH'), 'bin').mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/bandwidthTest'])
|
||||||
self.__binary_file = Path(binary_path, 'bandwidthTest')
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
@decorator.load_data('tests/data/cuda_memory_h2d_bw.log')
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
@decorator.load_data('tests/data/cuda_memory_d2h_bw.log')
|
||||||
self.__binary_file.unlink()
|
@decorator.load_data('tests/data/cuda_memory_d2d_bw.log')
|
||||||
|
def test_cuda_memory_bw_performance(self, raw_output_h2d, raw_output_d2h, raw_output_d2d):
|
||||||
def test_cuda_memory_bw_performance(self):
|
|
||||||
"""Test cuda mem-bw benchmark."""
|
"""Test cuda mem-bw benchmark."""
|
||||||
benchmark_name = 'mem-bw'
|
benchmark_name = 'mem-bw'
|
||||||
(benchmark_class,
|
(benchmark_class,
|
||||||
|
@ -54,280 +51,7 @@ class CudaMemBwTest(unittest.TestCase):
|
||||||
assert (command == expected_command[i])
|
assert (command == expected_command[i])
|
||||||
|
|
||||||
# Check results and metrics.
|
# Check results and metrics.
|
||||||
raw_output = {}
|
raw_output = [raw_output_h2d, raw_output_d2h, raw_output_d2d]
|
||||||
raw_output[0] = """
|
|
||||||
[CUDA Bandwidth Test] - Starting...
|
|
||||||
Running on...
|
|
||||||
|
|
||||||
Device 0: Tesla V100-PCIE-32GB
|
|
||||||
Shmoo Mode
|
|
||||||
|
|
||||||
.................................................................................
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 0.4 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 0.7 GB/s, Time = 0.00000 s, Size = 2000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 1.0 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 1.4 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 1.7 GB/s, Time = 0.00000 s, Size = 5000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 2.0 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 2.3 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 2.5 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 2.7 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 2.9 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 3.2 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 3.4 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 3.5 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 3.5 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 3.8 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 4.0 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 4.1 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 4.3 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 4.4 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 4.6 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 4.8 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 5.0 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 5.2 GB/s, Time = 0.00000 s, Size = 26000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 5.4 GB/s, Time = 0.00001 s, Size = 28000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 5.7 GB/s, Time = 0.00001 s, Size = 30000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 5.9 GB/s, Time = 0.00001 s, Size = 32000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 6.1 GB/s, Time = 0.00001 s, Size = 34000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 6.3 GB/s, Time = 0.00001 s, Size = 36000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 6.4 GB/s, Time = 0.00001 s, Size = 38000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 6.6 GB/s, Time = 0.00001 s, Size = 40000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 6.7 GB/s, Time = 0.00001 s, Size = 42000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 6.9 GB/s, Time = 0.00001 s, Size = 44000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 7.0 GB/s, Time = 0.00001 s, Size = 46000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 7.1 GB/s, Time = 0.00001 s, Size = 48000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 7.3 GB/s, Time = 0.00001 s, Size = 50000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 7.8 GB/s, Time = 0.00001 s, Size = 60000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 8.2 GB/s, Time = 0.00001 s, Size = 70000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 8.6 GB/s, Time = 0.00001 s, Size = 80000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 8.9 GB/s, Time = 0.00001 s, Size = 90000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 9.2 GB/s, Time = 0.00001 s, Size = 100000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 10.5 GB/s, Time = 0.00002 s, Size = 200000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.1 GB/s, Time = 0.00003 s, Size = 300000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.4 GB/s, Time = 0.00004 s, Size = 400000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.6 GB/s, Time = 0.00004 s, Size = 500000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.7 GB/s, Time = 0.00005 s, Size = 600000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.8 GB/s, Time = 0.00006 s, Size = 700000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.9 GB/s, Time = 0.00007 s, Size = 800000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.9 GB/s, Time = 0.00008 s, Size = 900000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.7 GB/s, Time = 0.00009 s, Size = 1000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.1 GB/s, Time = 0.00016 s, Size = 2000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00024 s, Size = 3000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00033 s, Size = 4000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.5 GB/s, Time = 0.00043 s, Size = 5000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00049 s, Size = 6000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00057 s, Size = 7000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00065 s, Size = 8000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00073 s, Size = 9000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00081 s, Size = 10000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00089 s, Size = 11000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00097 s, Size = 12000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00105 s, Size = 13000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00113 s, Size = 14000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00121 s, Size = 15000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00129 s, Size = 16000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00145 s, Size = 18000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00162 s, Size = 20000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00178 s, Size = 22000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00194 s, Size = 24000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00210 s, Size = 26000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00226 s, Size = 28000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00242 s, Size = 30000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 10.5 GB/s, Time = 0.00304 s, Size = 32000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.2 GB/s, Time = 0.00295 s, Size = 36000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 10.8 GB/s, Time = 0.00369 s, Size = 40000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00355 s, Size = 44000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00387 s, Size = 48000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.1 GB/s, Time = 0.00431 s, Size = 52000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 11.7 GB/s, Time = 0.00480 s, Size = 56000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00484 s, Size = 60000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.1 GB/s, Time = 0.00528 s, Size = 64000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00549 s, Size = 68000000 bytes, NumDevsUsed = 1
|
|
||||||
Result = PASS
|
|
||||||
"""
|
|
||||||
raw_output[1] = """
|
|
||||||
[CUDA Bandwidth Test] - Starting...
|
|
||||||
Running on...
|
|
||||||
|
|
||||||
Device 0: Tesla V100-PCIE-32GB
|
|
||||||
Shmoo Mode
|
|
||||||
|
|
||||||
.................................................................................
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 0.4 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 0.5 GB/s, Time = 0.00000 s, Size = 2000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 0.9 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 1.1 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 1.4 GB/s, Time = 0.00000 s, Size = 5000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 1.9 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 2.6 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 2.9 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 3.3 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 3.7 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 4.0 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 4.5 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 4.9 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 5.6 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 5.7 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 6.0 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 6.2 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 6.3 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 6.5 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 6.9 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 7.1 GB/s, Time = 0.00000 s, Size = 26000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 7.4 GB/s, Time = 0.00000 s, Size = 28000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 7.6 GB/s, Time = 0.00000 s, Size = 30000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 7.9 GB/s, Time = 0.00000 s, Size = 32000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 8.0 GB/s, Time = 0.00000 s, Size = 34000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 8.3 GB/s, Time = 0.00000 s, Size = 36000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 8.5 GB/s, Time = 0.00000 s, Size = 38000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 8.6 GB/s, Time = 0.00000 s, Size = 40000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 8.7 GB/s, Time = 0.00000 s, Size = 42000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 9.3 GB/s, Time = 0.00000 s, Size = 44000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 9.4 GB/s, Time = 0.00000 s, Size = 46000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 9.5 GB/s, Time = 0.00001 s, Size = 48000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 9.5 GB/s, Time = 0.00001 s, Size = 50000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 10.1 GB/s, Time = 0.00001 s, Size = 60000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 10.4 GB/s, Time = 0.00001 s, Size = 70000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 10.6 GB/s, Time = 0.00001 s, Size = 80000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 10.9 GB/s, Time = 0.00001 s, Size = 90000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 11.1 GB/s, Time = 0.00001 s, Size = 100000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.0 GB/s, Time = 0.00002 s, Size = 200000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00002 s, Size = 300000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.6 GB/s, Time = 0.00003 s, Size = 400000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.6 GB/s, Time = 0.00004 s, Size = 500000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.7 GB/s, Time = 0.00005 s, Size = 600000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.7 GB/s, Time = 0.00006 s, Size = 700000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.8 GB/s, Time = 0.00006 s, Size = 800000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.9 GB/s, Time = 0.00007 s, Size = 900000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.8 GB/s, Time = 0.00008 s, Size = 1000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.0 GB/s, Time = 0.00015 s, Size = 2000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.0 GB/s, Time = 0.00023 s, Size = 3000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00031 s, Size = 4000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00038 s, Size = 5000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00046 s, Size = 6000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00053 s, Size = 7000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00061 s, Size = 8000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.5 GB/s, Time = 0.00072 s, Size = 9000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00076 s, Size = 10000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00084 s, Size = 11000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00091 s, Size = 12000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00099 s, Size = 13000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00106 s, Size = 14000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00114 s, Size = 15000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00122 s, Size = 16000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00137 s, Size = 18000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00152 s, Size = 20000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00167 s, Size = 22000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00183 s, Size = 24000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 12.9 GB/s, Time = 0.00202 s, Size = 26000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00213 s, Size = 28000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00228 s, Size = 30000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00243 s, Size = 32000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00273 s, Size = 36000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00304 s, Size = 40000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00334 s, Size = 44000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00364 s, Size = 48000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00395 s, Size = 52000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00425 s, Size = 56000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00455 s, Size = 60000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00487 s, Size = 64000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00520 s, Size = 68000000 bytes, NumDevsUsed = 1
|
|
||||||
Result = PASS
|
|
||||||
"""
|
|
||||||
raw_output[2] = """
|
|
||||||
[CUDA Bandwidth Test] - Starting...
|
|
||||||
Running on...
|
|
||||||
|
|
||||||
Device 0: Tesla V100-PCIE-32GB
|
|
||||||
Shmoo Mode
|
|
||||||
|
|
||||||
.................................................................................
|
|
||||||
bandwidthTest-D2D, Bandwidth = 0.4 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 0.1 GB/s, Time = 0.00004 s, Size = 2000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 0.8 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 1.2 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 0.4 GB/s, Time = 0.00001 s, Size = 5000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 1.7 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 7.0 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 8.0 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 9.0 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 10.0 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 6.1 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 12.0 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 13.1 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 8.0 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 8.9 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 9.5 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 9.8 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 19.0 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 22.0 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 6.3 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 0.7 GB/s, Time = 0.00004 s, Size = 26000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 28.1 GB/s, Time = 0.00000 s, Size = 28000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 30.1 GB/s, Time = 0.00000 s, Size = 30000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 32.0 GB/s, Time = 0.00000 s, Size = 32000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 14.6 GB/s, Time = 0.00000 s, Size = 34000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 20.9 GB/s, Time = 0.00000 s, Size = 36000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 22.7 GB/s, Time = 0.00000 s, Size = 38000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 23.5 GB/s, Time = 0.00000 s, Size = 40000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 24.8 GB/s, Time = 0.00000 s, Size = 42000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 44.1 GB/s, Time = 0.00000 s, Size = 44000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 27.2 GB/s, Time = 0.00000 s, Size = 46000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 48.0 GB/s, Time = 0.00000 s, Size = 48000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 28.5 GB/s, Time = 0.00000 s, Size = 50000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 60.2 GB/s, Time = 0.00000 s, Size = 60000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 42.7 GB/s, Time = 0.00000 s, Size = 70000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 8.4 GB/s, Time = 0.00001 s, Size = 80000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 55.6 GB/s, Time = 0.00000 s, Size = 90000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 59.6 GB/s, Time = 0.00000 s, Size = 100000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 127.9 GB/s, Time = 0.00000 s, Size = 200000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 183.1 GB/s, Time = 0.00000 s, Size = 300000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 270.2 GB/s, Time = 0.00000 s, Size = 400000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 15.5 GB/s, Time = 0.00003 s, Size = 500000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 399.2 GB/s, Time = 0.00000 s, Size = 600000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 172.1 GB/s, Time = 0.00000 s, Size = 700000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 27.5 GB/s, Time = 0.00003 s, Size = 800000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 71.3 GB/s, Time = 0.00001 s, Size = 900000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 502.2 GB/s, Time = 0.00000 s, Size = 1000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 59.4 GB/s, Time = 0.00003 s, Size = 2000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 348.7 GB/s, Time = 0.00001 s, Size = 3000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 519.4 GB/s, Time = 0.00001 s, Size = 4000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 422.3 GB/s, Time = 0.00001 s, Size = 5000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 447.9 GB/s, Time = 0.00001 s, Size = 6000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 225.3 GB/s, Time = 0.00003 s, Size = 7000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 146.0 GB/s, Time = 0.00005 s, Size = 8000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 190.9 GB/s, Time = 0.00005 s, Size = 9000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 301.1 GB/s, Time = 0.00003 s, Size = 10000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 192.8 GB/s, Time = 0.00006 s, Size = 11000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 243.9 GB/s, Time = 0.00005 s, Size = 12000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 328.7 GB/s, Time = 0.00004 s, Size = 13000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 621.2 GB/s, Time = 0.00002 s, Size = 14000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 682.5 GB/s, Time = 0.00002 s, Size = 15000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 686.3 GB/s, Time = 0.00002 s, Size = 16000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 693.1 GB/s, Time = 0.00003 s, Size = 18000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 707.0 GB/s, Time = 0.00003 s, Size = 20000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 714.4 GB/s, Time = 0.00003 s, Size = 22000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 719.4 GB/s, Time = 0.00003 s, Size = 24000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 723.2 GB/s, Time = 0.00004 s, Size = 26000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 726.7 GB/s, Time = 0.00004 s, Size = 28000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 728.8 GB/s, Time = 0.00004 s, Size = 30000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 724.2 GB/s, Time = 0.00004 s, Size = 32000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 735.3 GB/s, Time = 0.00005 s, Size = 36000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 741.1 GB/s, Time = 0.00005 s, Size = 40000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 748.9 GB/s, Time = 0.00006 s, Size = 44000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 748.9 GB/s, Time = 0.00006 s, Size = 48000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 754.1 GB/s, Time = 0.00007 s, Size = 52000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 757.4 GB/s, Time = 0.00007 s, Size = 56000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 758.5 GB/s, Time = 0.00008 s, Size = 60000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 772.0 GB/s, Time = 0.00008 s, Size = 64000000 bytes, NumDevsUsed = 1
|
|
||||||
bandwidthTest-D2D, Bandwidth = 762.8 GB/s, Time = 0.00009 s, Size = 68000000 bytes, NumDevsUsed = 1
|
|
||||||
Result = PASS
|
|
||||||
"""
|
|
||||||
for i, metric in enumerate(['h2d_bw', 'd2h_bw', 'd2d_bw']):
|
for i, metric in enumerate(['h2d_bw', 'd2h_bw', 'd2d_bw']):
|
||||||
assert (benchmark._process_raw_result(i, raw_output[i]))
|
assert (benchmark._process_raw_result(i, raw_output[i]))
|
||||||
assert (metric in benchmark.result)
|
assert (metric in benchmark.result)
|
||||||
|
|
|
@ -3,36 +3,41 @@
|
||||||
|
|
||||||
"""Tests for nccl-bw benchmark."""
|
"""Tests for nccl-bw benchmark."""
|
||||||
|
|
||||||
import os
|
|
||||||
import numbers
|
import numbers
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
|
|
||||||
|
|
||||||
class CudaNcclBwBenchmarkTest(unittest.TestCase):
|
class CudaNcclBwBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Tests for CudaNcclBwBenchmark benchmark."""
|
"""Tests for CudaNcclBwBenchmark benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = os.path.join(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
Path(binary_path).mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(
|
||||||
self.__binary_files = []
|
cls, [
|
||||||
for bin_name in [
|
f'bin/{name}' for name in [
|
||||||
'all_reduce_perf', 'all_gather_perf', 'broadcast_perf', 'reduce_perf', 'reduce_scatter_perf',
|
'all_reduce_perf',
|
||||||
'alltoall_perf'
|
'all_gather_perf',
|
||||||
]:
|
'broadcast_perf',
|
||||||
self.__binary_files.append(Path(binary_path, bin_name))
|
'reduce_perf',
|
||||||
Path(binary_path, bin_name).touch(mode=0o755, exist_ok=True)
|
'reduce_scatter_perf',
|
||||||
|
'alltoall_perf',
|
||||||
|
]
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
def tearDown(self):
|
@decorator.load_data('tests/data/nccl_allgather.log')
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
@decorator.load_data('tests/data/nccl_allreduce.log')
|
||||||
for binary_file in self.__binary_files:
|
@decorator.load_data('tests/data/nccl_reduce.log')
|
||||||
binary_file.unlink()
|
@decorator.load_data('tests/data/nccl_broadcast.log')
|
||||||
|
@decorator.load_data('tests/data/nccl_reducescatter.log')
|
||||||
def test_nccl_bw_performance(self):
|
@decorator.load_data('tests/data/nccl_alltoall.log')
|
||||||
|
def test_nccl_bw_performance(self, allgather, allreduce, reduce, broadcast, reducescatter, alltoall):
|
||||||
"""Test nccl-bw benchmark."""
|
"""Test nccl-bw benchmark."""
|
||||||
benchmark_name = 'nccl-bw'
|
benchmark_name = 'nccl-bw'
|
||||||
(benchmark_class,
|
(benchmark_class,
|
||||||
|
@ -75,336 +80,14 @@ class CudaNcclBwBenchmarkTest(unittest.TestCase):
|
||||||
assert (benchmark._process_raw_result(0, '') is False)
|
assert (benchmark._process_raw_result(0, '') is False)
|
||||||
|
|
||||||
# Case with valid raw_output
|
# Case with valid raw_output
|
||||||
raw_output = {}
|
raw_output = {
|
||||||
raw_output['allgather'] = """
|
'allgather': allgather,
|
||||||
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
'allreduce': allreduce,
|
||||||
#
|
'reduce': reduce,
|
||||||
# Using devices
|
'broadcast': broadcast,
|
||||||
# Rank 0 Pid 112372 on localhost device 0 [0x00] A100-SXM4-40GB
|
'reducescatter': reducescatter,
|
||||||
# Rank 1 Pid 112372 on localhost device 1 [0x00] A100-SXM4-40GB
|
'alltoall': alltoall,
|
||||||
# Rank 2 Pid 112372 on localhost device 2 [0x00] A100-SXM4-40GB
|
}
|
||||||
# Rank 3 Pid 112372 on localhost device 3 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 4 Pid 112372 on localhost device 4 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 5 Pid 112372 on localhost device 5 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 6 Pid 112372 on localhost device 6 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 7 Pid 112372 on localhost device 7 [0x00] A100-SXM4-40GB
|
|
||||||
#
|
|
||||||
# out-of-place in-place
|
|
||||||
# size count type time algbw busbw error time algbw busbw error
|
|
||||||
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
|
||||||
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
|
||||||
0 0 float 34.27 0.00 0.00 N/A 33.57 0.00 0.00 N/A
|
|
||||||
0 0 float 33.41 0.00 0.00 N/A 33.62 0.00 0.00 N/A
|
|
||||||
0 0 float 33.94 0.00 0.00 N/A 33.48 0.00 0.00 N/A
|
|
||||||
0 0 float 33.83 0.00 0.00 N/A 33.62 0.00 0.00 N/A
|
|
||||||
0 0 float 33.82 0.00 0.00 N/A 33.57 0.00 0.00 N/A
|
|
||||||
32 1 float 35.03 0.00 0.00 N/A 34.15 0.00 0.00 N/A
|
|
||||||
64 2 float 34.36 0.00 0.00 N/A 33.83 0.00 0.00 N/A
|
|
||||||
128 4 float 33.94 0.00 0.00 N/A 35.22 0.00 0.00 N/A
|
|
||||||
256 8 float 34.44 0.01 0.01 N/A 34.82 0.01 0.01 N/A
|
|
||||||
512 16 float 34.84 0.01 0.01 N/A 34.76 0.01 0.01 N/A
|
|
||||||
1024 32 float 35.38 0.03 0.03 N/A 34.53 0.03 0.03 N/A
|
|
||||||
2048 64 float 34.67 0.06 0.05 N/A 34.91 0.06 0.05 N/A
|
|
||||||
4096 128 float 34.62 0.12 0.10 N/A 34.81 0.12 0.10 N/A
|
|
||||||
8192 256 float 34.76 0.24 0.21 N/A 35.03 0.23 0.20 N/A
|
|
||||||
16384 512 float 34.80 0.47 0.41 N/A 34.90 0.47 0.41 N/A
|
|
||||||
32768 1024 float 34.54 0.95 0.83 N/A 35.23 0.93 0.81 N/A
|
|
||||||
65536 2048 float 36.34 1.80 1.58 N/A 36.01 1.82 1.59 N/A
|
|
||||||
131072 4096 float 40.18 3.26 2.85 N/A 39.43 3.32 2.91 N/A
|
|
||||||
262144 8192 float 46.45 5.64 4.94 N/A 46.27 5.67 4.96 N/A
|
|
||||||
524288 16384 float 58.48 8.96 7.84 N/A 60.40 8.68 7.60 N/A
|
|
||||||
1048576 32768 float 72.95 14.37 12.58 N/A 73.07 14.35 12.56 N/A
|
|
||||||
2097152 65536 float 77.28 27.14 23.75 N/A 75.84 27.65 24.20 N/A
|
|
||||||
4194304 131072 float 100.7 41.64 36.43 N/A 99.56 42.13 36.86 N/A
|
|
||||||
8388608 262144 float 123.5 67.94 59.44 N/A 120.7 69.51 60.82 N/A
|
|
||||||
16777216 524288 float 167.7 100.03 87.52 N/A 164.6 101.94 89.20 N/A
|
|
||||||
33554432 1048576 float 265.8 126.24 110.46 N/A 257.5 130.33 114.04 N/A
|
|
||||||
67108864 2097152 float 379.7 176.74 154.65 N/A 367.6 182.57 159.75 N/A
|
|
||||||
134217728 4194304 float 698.6 192.13 168.12 N/A 657.3 204.20 178.67 N/A
|
|
||||||
268435456 8388608 float 1192.2 225.16 197.01 N/A 1136.0 236.29 206.76 N/A
|
|
||||||
536870912 16777216 float 2304.1 233.01 203.88 N/A 2227.9 240.98 210.85 N/A
|
|
||||||
1073741824 33554432 float 4413.4 243.29 212.88 N/A 4258.8 252.12 220.61 N/A
|
|
||||||
2147483648 67108864 float 8658.8 248.01 217.01 N/A 8389.4 255.98 223.98 N/A
|
|
||||||
4294967296 134217728 float 17016 252.40 220.85 N/A 16474 260.71 228.12 N/A
|
|
||||||
8589934592 268435456 float 33646 255.31 223.39 N/A 32669 262.94 230.07 N/A
|
|
||||||
# Out of bounds values : 0 OK
|
|
||||||
# Avg bus bandwidth : 58.2651
|
|
||||||
#
|
|
||||||
"""
|
|
||||||
raw_output['allreduce'] = """
|
|
||||||
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
|
||||||
#
|
|
||||||
# Using devices
|
|
||||||
# Rank 0 Pid 112424 on localhost device 0 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 1 Pid 112424 on localhost device 1 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 2 Pid 112424 on localhost device 2 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 3 Pid 112424 on localhost device 3 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 4 Pid 112424 on localhost device 4 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 5 Pid 112424 on localhost device 5 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 6 Pid 112424 on localhost device 6 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 7 Pid 112424 on localhost device 7 [0x00] A100-SXM4-40GB
|
|
||||||
#
|
|
||||||
# out-of-place in-place
|
|
||||||
# size count type redop time algbw busbw error time algbw busbw error
|
|
||||||
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
|
||||||
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
|
||||||
0 0 float sum 35.20 0.00 0.00 N/A 34.05 0.00 0.00 N/A
|
|
||||||
0 0 float sum 34.18 0.00 0.00 N/A 33.50 0.00 0.00 N/A
|
|
||||||
4 1 float sum 34.73 0.00 0.00 N/A 35.30 0.00 0.00 N/A
|
|
||||||
8 2 float sum 34.66 0.00 0.00 N/A 34.84 0.00 0.00 N/A
|
|
||||||
16 4 float sum 35.00 0.00 0.00 N/A 35.61 0.00 0.00 N/A
|
|
||||||
32 8 float sum 35.60 0.00 0.00 N/A 35.27 0.00 0.00 N/A
|
|
||||||
64 16 float sum 34.83 0.00 0.00 N/A 34.61 0.00 0.00 N/A
|
|
||||||
128 32 float sum 34.53 0.00 0.01 N/A 43.78 0.00 0.01 N/A
|
|
||||||
256 64 float sum 34.56 0.01 0.01 N/A 34.95 0.01 0.01 N/A
|
|
||||||
512 128 float sum 34.94 0.01 0.03 N/A 35.20 0.01 0.03 N/A
|
|
||||||
1024 256 float sum 36.07 0.03 0.05 N/A 35.77 0.03 0.05 N/A
|
|
||||||
2048 512 float sum 35.42 0.06 0.10 N/A 35.89 0.06 0.10 N/A
|
|
||||||
4096 1024 float sum 35.92 0.11 0.20 N/A 36.11 0.11 0.20 N/A
|
|
||||||
8192 2048 float sum 35.91 0.23 0.40 N/A 36.07 0.23 0.40 N/A
|
|
||||||
16384 4096 float sum 36.18 0.45 0.79 N/A 35.87 0.46 0.80 N/A
|
|
||||||
32768 8192 float sum 36.65 0.89 1.56 N/A 35.73 0.92 1.60 N/A
|
|
||||||
65536 16384 float sum 37.82 1.73 3.03 N/A 37.25 1.76 3.08 N/A
|
|
||||||
131072 32768 float sum 41.19 3.18 5.57 N/A 41.11 3.19 5.58 N/A
|
|
||||||
262144 65536 float sum 47.53 5.52 9.65 N/A 47.94 5.47 9.57 N/A
|
|
||||||
524288 131072 float sum 60.32 8.69 15.21 N/A 60.52 8.66 15.16 N/A
|
|
||||||
1048576 262144 float sum 74.78 14.02 24.54 N/A 76.17 13.77 24.09 N/A
|
|
||||||
2097152 524288 float sum 93.48 22.43 39.26 N/A 96.10 21.82 38.19 N/A
|
|
||||||
4194304 1048576 float sum 112.0 37.44 65.52 N/A 110.2 38.06 66.60 N/A
|
|
||||||
8388608 2097152 float sum 162.0 51.79 90.63 N/A 160.0 52.44 91.77 N/A
|
|
||||||
16777216 4194304 float sum 226.0 74.23 129.90 N/A 225.0 74.57 130.49 N/A
|
|
||||||
33554432 8388608 float sum 374.3 89.65 156.89 N/A 372.8 90.00 157.50 N/A
|
|
||||||
67108864 16777216 float sum 584.5 114.81 200.91 N/A 581.9 115.33 201.82 N/A
|
|
||||||
134217728 33554432 float sum 1162.2 115.49 202.11 N/A 1162.5 115.46 202.05 N/A
|
|
||||||
268435456 67108864 float sum 2112.2 127.09 222.40 N/A 2111.8 127.11 222.45 N/A
|
|
||||||
536870912 134217728 float sum 4200.3 127.82 223.68 N/A 4184.0 128.32 224.55 N/A
|
|
||||||
1073741824 268435456 float sum 8159.5 131.59 230.29 N/A 8176.5 131.32 229.81 N/A
|
|
||||||
2147483648 536870912 float sum 16215 132.44 231.76 N/A 16203 132.53 231.93 N/A
|
|
||||||
4294967296 1073741824 float sum 32070 133.92 234.37 N/A 32052 134.00 234.50 N/A
|
|
||||||
8589934592 2147483648 float sum 63896 134.44 235.26 N/A 63959 134.30 235.03 N/A
|
|
||||||
# Out of bounds values : 0 OK
|
|
||||||
# Avg bus bandwidth : 68.4048
|
|
||||||
#
|
|
||||||
"""
|
|
||||||
raw_output['reduce'] = """
|
|
||||||
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
|
||||||
#
|
|
||||||
# Using devices
|
|
||||||
# Rank 0 Pid 112476 on localhost device 0 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 1 Pid 112476 on localhost device 1 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 2 Pid 112476 on localhost device 2 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 3 Pid 112476 on localhost device 3 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 4 Pid 112476 on localhost device 4 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 5 Pid 112476 on localhost device 5 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 6 Pid 112476 on localhost device 6 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 7 Pid 112476 on localhost device 7 [0x00] A100-SXM4-40GB
|
|
||||||
#
|
|
||||||
# out-of-place in-place
|
|
||||||
# size count type redop root time algbw busbw error time algbw busbw error
|
|
||||||
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
|
||||||
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
|
||||||
0 0 float sum 0 36.90 0.00 0.00 N/A 36.47 0.00 0.00 N/A
|
|
||||||
0 0 float sum 0 34.18 0.00 0.00 N/A 35.70 0.00 0.00 N/A
|
|
||||||
4 1 float sum 0 35.40 0.00 0.00 N/A 35.59 0.00 0.00 N/A
|
|
||||||
8 2 float sum 0 36.35 0.00 0.00 N/A 35.74 0.00 0.00 N/A
|
|
||||||
16 4 float sum 0 35.47 0.00 0.00 N/A 34.27 0.00 0.00 N/A
|
|
||||||
32 8 float sum 0 36.16 0.00 0.00 N/A 36.19 0.00 0.00 N/A
|
|
||||||
64 16 float sum 0 35.61 0.00 0.00 N/A 35.45 0.00 0.00 N/A
|
|
||||||
128 32 float sum 0 34.78 0.00 0.00 N/A 35.80 0.00 0.00 N/A
|
|
||||||
256 64 float sum 0 35.37 0.01 0.01 N/A 35.89 0.01 0.01 N/A
|
|
||||||
512 128 float sum 0 35.49 0.01 0.01 N/A 35.53 0.01 0.01 N/A
|
|
||||||
1024 256 float sum 0 35.38 0.03 0.03 N/A 35.52 0.03 0.03 N/A
|
|
||||||
2048 512 float sum 0 35.97 0.06 0.06 N/A 35.13 0.06 0.06 N/A
|
|
||||||
4096 1024 float sum 0 36.03 0.11 0.11 N/A 35.82 0.11 0.11 N/A
|
|
||||||
8192 2048 float sum 0 36.80 0.22 0.22 N/A 36.71 0.22 0.22 N/A
|
|
||||||
16384 4096 float sum 0 35.37 0.46 0.46 N/A 36.79 0.45 0.45 N/A
|
|
||||||
32768 8192 float sum 0 35.16 0.93 0.93 N/A 35.72 0.92 0.92 N/A
|
|
||||||
65536 16384 float sum 0 38.08 1.72 1.72 N/A 37.74 1.74 1.74 N/A
|
|
||||||
131072 32768 float sum 0 43.07 3.04 3.04 N/A 41.59 3.15 3.15 N/A
|
|
||||||
262144 65536 float sum 0 52.16 5.03 5.03 N/A 50.49 5.19 5.19 N/A
|
|
||||||
524288 131072 float sum 0 67.58 7.76 7.76 N/A 66.57 7.88 7.88 N/A
|
|
||||||
1048576 262144 float sum 0 76.74 13.66 13.66 N/A 80.47 13.03 13.03 N/A
|
|
||||||
2097152 524288 float sum 0 78.51 26.71 26.71 N/A 78.76 26.63 26.63 N/A
|
|
||||||
4194304 1048576 float sum 0 81.47 51.48 51.48 N/A 80.30 52.23 52.23 N/A
|
|
||||||
8388608 2097152 float sum 0 94.72 88.57 88.57 N/A 94.06 89.19 89.19 N/A
|
|
||||||
16777216 4194304 float sum 0 137.7 121.83 121.83 N/A 139.6 120.17 120.17 N/A
|
|
||||||
33554432 8388608 float sum 0 218.3 153.70 153.70 N/A 218.1 153.83 153.83 N/A
|
|
||||||
67108864 16777216 float sum 0 370.8 180.96 180.96 N/A 369.8 181.49 181.49 N/A
|
|
||||||
134217728 33554432 float sum 0 661.0 203.06 203.06 N/A 659.9 203.39 203.39 N/A
|
|
||||||
268435456 67108864 float sum 0 1251.4 214.52 214.52 N/A 1268.1 211.68 211.68 N/A
|
|
||||||
536870912 134217728 float sum 0 2421.6 221.70 221.70 N/A 2413.4 222.45 222.45 N/A
|
|
||||||
1073741824 268435456 float sum 0 4736.0 226.72 226.72 N/A 4757.9 225.68 225.68 N/A
|
|
||||||
2147483648 536870912 float sum 0 9323.5 230.33 230.33 N/A 9354.0 229.58 229.58 N/A
|
|
||||||
4294967296 1073741824 float sum 0 18594 230.99 230.99 N/A 18570 231.28 231.28 N/A
|
|
||||||
8589934592 2147483648 float sum 0 37613 228.38 228.38 N/A 37539 228.83 228.83 N/A
|
|
||||||
# Out of bounds values : 0 OK
|
|
||||||
# Avg bus bandwidth : 65.018
|
|
||||||
#
|
|
||||||
"""
|
|
||||||
raw_output['broadcast'] = """
|
|
||||||
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
|
||||||
#
|
|
||||||
# Using devices
|
|
||||||
# Rank 0 Pid 112528 on localhost device 0 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 1 Pid 112528 on localhost device 1 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 2 Pid 112528 on localhost device 2 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 3 Pid 112528 on localhost device 3 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 4 Pid 112528 on localhost device 4 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 5 Pid 112528 on localhost device 5 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 6 Pid 112528 on localhost device 6 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 7 Pid 112528 on localhost device 7 [0x00] A100-SXM4-40GB
|
|
||||||
#
|
|
||||||
# out-of-place in-place
|
|
||||||
# size count type root time algbw busbw error time algbw busbw error
|
|
||||||
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
|
||||||
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
|
||||||
0 0 float 0 34.61 0.00 0.00 N/A 34.33 0.00 0.00 N/A
|
|
||||||
0 0 float 0 34.43 0.00 0.00 N/A 35.06 0.00 0.00 N/A
|
|
||||||
4 1 float 0 33.96 0.00 0.00 N/A 33.80 0.00 0.00 N/A
|
|
||||||
8 2 float 0 34.16 0.00 0.00 N/A 34.32 0.00 0.00 N/A
|
|
||||||
16 4 float 0 34.47 0.00 0.00 N/A 34.85 0.00 0.00 N/A
|
|
||||||
32 8 float 0 35.24 0.00 0.00 N/A 34.75 0.00 0.00 N/A
|
|
||||||
64 16 float 0 35.12 0.00 0.00 N/A 34.89 0.00 0.00 N/A
|
|
||||||
128 32 float 0 34.67 0.00 0.00 N/A 34.36 0.00 0.00 N/A
|
|
||||||
256 64 float 0 34.23 0.01 0.01 N/A 34.42 0.01 0.01 N/A
|
|
||||||
512 128 float 0 34.26 0.01 0.01 N/A 35.20 0.01 0.01 N/A
|
|
||||||
1024 256 float 0 34.87 0.03 0.03 N/A 34.80 0.03 0.03 N/A
|
|
||||||
2048 512 float 0 34.90 0.06 0.06 N/A 35.27 0.06 0.06 N/A
|
|
||||||
4096 1024 float 0 35.37 0.12 0.12 N/A 34.59 0.12 0.12 N/A
|
|
||||||
8192 2048 float 0 34.95 0.23 0.23 N/A 34.79 0.24 0.24 N/A
|
|
||||||
16384 4096 float 0 34.94 0.47 0.47 N/A 34.94 0.47 0.47 N/A
|
|
||||||
32768 8192 float 0 35.03 0.94 0.94 N/A 34.71 0.94 0.94 N/A
|
|
||||||
65536 16384 float 0 36.04 1.82 1.82 N/A 36.48 1.80 1.80 N/A
|
|
||||||
131072 32768 float 0 40.09 3.27 3.27 N/A 39.92 3.28 3.28 N/A
|
|
||||||
262144 65536 float 0 46.58 5.63 5.63 N/A 45.89 5.71 5.71 N/A
|
|
||||||
524288 131072 float 0 58.37 8.98 8.98 N/A 59.67 8.79 8.79 N/A
|
|
||||||
1048576 262144 float 0 76.02 13.79 13.79 N/A 78.43 13.37 13.37 N/A
|
|
||||||
2097152 524288 float 0 78.12 26.85 26.85 N/A 78.84 26.60 26.60 N/A
|
|
||||||
4194304 1048576 float 0 81.06 51.74 51.74 N/A 80.39 52.17 52.17 N/A
|
|
||||||
8388608 2097152 float 0 97.20 86.30 86.30 N/A 96.09 87.30 87.30 N/A
|
|
||||||
16777216 4194304 float 0 143.1 117.22 117.22 N/A 142.1 118.06 118.06 N/A
|
|
||||||
33554432 8388608 float 0 223.4 150.21 150.21 N/A 221.3 151.61 151.61 N/A
|
|
||||||
67108864 16777216 float 0 374.8 179.05 179.05 N/A 374.4 179.23 179.23 N/A
|
|
||||||
134217728 33554432 float 0 672.2 199.67 199.67 N/A 670.0 200.34 200.34 N/A
|
|
||||||
268435456 67108864 float 0 1271.5 211.11 211.11 N/A 1264.5 212.28 212.28 N/A
|
|
||||||
536870912 134217728 float 0 2436.3 220.37 220.37 N/A 2434.5 220.53 220.53 N/A
|
|
||||||
1073741824 268435456 float 0 4769.2 225.14 225.14 N/A 4697.5 228.58 228.58 N/A
|
|
||||||
2147483648 536870912 float 0 9314.2 230.56 230.56 N/A 9248.3 232.20 232.20 N/A
|
|
||||||
4294967296 1073741824 float 0 18487 232.33 232.33 N/A 18381 233.66 233.66 N/A
|
|
||||||
8589934592 2147483648 float 0 36896 232.81 232.81 N/A 36599 234.70 234.70 N/A
|
|
||||||
# Out of bounds values : 0 OK
|
|
||||||
# Avg bus bandwidth : 64.8653
|
|
||||||
#
|
|
||||||
"""
|
|
||||||
raw_output['reducescatter'] = """
|
|
||||||
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
|
||||||
#
|
|
||||||
# Using devices
|
|
||||||
# Rank 0 Pid 112580 on localhost device 0 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 1 Pid 112580 on localhost device 1 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 2 Pid 112580 on localhost device 2 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 3 Pid 112580 on localhost device 3 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 4 Pid 112580 on localhost device 4 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 5 Pid 112580 on localhost device 5 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 6 Pid 112580 on localhost device 6 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 7 Pid 112580 on localhost device 7 [0x00] A100-SXM4-40GB
|
|
||||||
#
|
|
||||||
# out-of-place in-place
|
|
||||||
# size count type redop time algbw busbw error time algbw busbw error
|
|
||||||
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
|
||||||
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
|
||||||
0 0 float sum 34.88 0.00 0.00 N/A 33.65 0.00 0.00 N/A
|
|
||||||
0 0 float sum 33.54 0.00 0.00 N/A 33.72 0.00 0.00 N/A
|
|
||||||
0 0 float sum 33.45 0.00 0.00 N/A 33.44 0.00 0.00 N/A
|
|
||||||
0 0 float sum 34.07 0.00 0.00 N/A 33.44 0.00 0.00 N/A
|
|
||||||
0 0 float sum 33.55 0.00 0.00 N/A 33.43 0.00 0.00 N/A
|
|
||||||
32 1 float sum 35.06 0.00 0.00 N/A 35.14 0.00 0.00 N/A
|
|
||||||
64 2 float sum 34.82 0.00 0.00 N/A 34.76 0.00 0.00 N/A
|
|
||||||
128 4 float sum 34.38 0.00 0.00 N/A 34.52 0.00 0.00 N/A
|
|
||||||
256 8 float sum 34.75 0.01 0.01 N/A 34.32 0.01 0.01 N/A
|
|
||||||
512 16 float sum 34.71 0.01 0.01 N/A 35.43 0.01 0.01 N/A
|
|
||||||
1024 32 float sum 35.16 0.03 0.03 N/A 34.75 0.03 0.03 N/A
|
|
||||||
2048 64 float sum 35.43 0.06 0.05 N/A 35.29 0.06 0.05 N/A
|
|
||||||
4096 128 float sum 35.49 0.12 0.10 N/A 35.17 0.12 0.10 N/A
|
|
||||||
8192 256 float sum 35.18 0.23 0.20 N/A 35.77 0.23 0.20 N/A
|
|
||||||
16384 512 float sum 35.27 0.46 0.41 N/A 35.49 0.46 0.40 N/A
|
|
||||||
32768 1024 float sum 35.00 0.94 0.82 N/A 35.09 0.93 0.82 N/A
|
|
||||||
65536 2048 float sum 36.78 1.78 1.56 N/A 36.92 1.77 1.55 N/A
|
|
||||||
131072 4096 float sum 40.71 3.22 2.82 N/A 39.78 3.29 2.88 N/A
|
|
||||||
262144 8192 float sum 48.12 5.45 4.77 N/A 46.65 5.62 4.92 N/A
|
|
||||||
524288 16384 float sum 59.81 8.77 7.67 N/A 58.88 8.90 7.79 N/A
|
|
||||||
1048576 32768 float sum 72.37 14.49 12.68 N/A 74.95 13.99 12.24 N/A
|
|
||||||
2097152 65536 float sum 80.64 26.01 22.76 N/A 79.62 26.34 23.05 N/A
|
|
||||||
4194304 131072 float sum 108.9 38.53 33.72 N/A 109.3 38.37 33.57 N/A
|
|
||||||
8388608 262144 float sum 147.3 56.96 49.84 N/A 166.8 50.28 44.00 N/A
|
|
||||||
16777216 524288 float sum 152.4 110.11 96.34 N/A 152.8 109.82 96.09 N/A
|
|
||||||
33554432 1048576 float sum 240.5 139.50 122.06 N/A 240.8 139.33 121.91 N/A
|
|
||||||
67108864 2097152 float sum 356.1 188.45 164.89 N/A 352.1 190.57 166.75 N/A
|
|
||||||
134217728 4194304 float sum 618.1 217.15 190.01 N/A 615.2 218.18 190.90 N/A
|
|
||||||
268435456 8388608 float sum 1108.7 242.11 211.84 N/A 1112.6 241.27 211.11 N/A
|
|
||||||
536870912 16777216 float sum 2169.0 247.52 216.58 N/A 2181.8 246.07 215.31 N/A
|
|
||||||
1073741824 33554432 float sum 4203.0 255.47 223.54 N/A 4206.3 255.27 223.36 N/A
|
|
||||||
2147483648 67108864 float sum 8356.9 256.97 224.85 N/A 8323.5 258.00 225.75 N/A
|
|
||||||
4294967296 134217728 float sum 16400 261.89 229.15 N/A 16402 261.86 229.13 N/A
|
|
||||||
8589934592 268435456 float sum 32464 264.60 231.52 N/A 32502 264.29 231.25 N/A
|
|
||||||
# Out of bounds values : 0 OK
|
|
||||||
# Avg bus bandwidth : 60.168
|
|
||||||
#
|
|
||||||
"""
|
|
||||||
raw_output['alltoall'] = """
|
|
||||||
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
|
||||||
#
|
|
||||||
# Using devices
|
|
||||||
# Rank 0 Pid 167261 on localhost device 0 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 1 Pid 167261 on localhost device 1 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 2 Pid 167261 on localhost device 2 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 3 Pid 167261 on localhost device 3 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 4 Pid 167261 on localhost device 4 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 5 Pid 167261 on localhost device 5 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 6 Pid 167261 on localhost device 6 [0x00] A100-SXM4-40GB
|
|
||||||
# Rank 7 Pid 167261 on localhost device 7 [0x00] A100-SXM4-40GB
|
|
||||||
#
|
|
||||||
# out-of-place in-place
|
|
||||||
# size count type redop time algbw busbw error time algbw busbw error
|
|
||||||
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
|
||||||
0 0 float 1.63 0.00 0.00 N/A 1.38 0.00 0.00 N/A
|
|
||||||
0 0 float 1.35 0.00 0.00 N/A 1.34 0.00 0.00 N/A
|
|
||||||
0 0 float 1.35 0.00 0.00 N/A 1.77 0.00 0.00 N/A
|
|
||||||
0 0 float 1.37 0.00 0.00 N/A 1.39 0.00 0.00 N/A
|
|
||||||
0 0 float 1.34 0.00 0.00 N/A 1.33 0.00 0.00 N/A
|
|
||||||
32 1 float 89.00 0.00 0.00 N/A 85.13 0.00 0.00 N/A
|
|
||||||
64 2 float 86.83 0.00 0.00 N/A 85.77 0.00 0.00 N/A
|
|
||||||
128 4 float 86.02 0.00 0.00 N/A 85.30 0.00 0.00 N/A
|
|
||||||
256 8 float 87.20 0.00 0.00 N/A 86.21 0.00 0.00 N/A
|
|
||||||
512 16 float 87.33 0.01 0.01 N/A 88.47 0.01 0.01 N/A
|
|
||||||
1024 32 float 88.17 0.01 0.01 N/A 88.98 0.01 0.01 N/A
|
|
||||||
2048 64 float 86.44 0.02 0.02 N/A 86.65 0.02 0.02 N/A
|
|
||||||
4096 128 float 86.75 0.05 0.04 N/A 86.68 0.05 0.04 N/A
|
|
||||||
8192 256 float 88.78 0.09 0.08 N/A 87.05 0.09 0.08 N/A
|
|
||||||
16384 512 float 87.71 0.19 0.16 N/A 86.76 0.19 0.17 N/A
|
|
||||||
32768 1024 float 86.26 0.38 0.33 N/A 88.92 0.37 0.32 N/A
|
|
||||||
65536 2048 float 87.67 0.75 0.65 N/A 89.16 0.74 0.64 N/A
|
|
||||||
131072 4096 float 87.35 1.50 1.31 N/A 86.76 1.51 1.32 N/A
|
|
||||||
262144 8192 float 87.02 3.01 2.64 N/A 87.98 2.98 2.61 N/A
|
|
||||||
524288 16384 float 86.58 6.06 5.30 N/A 89.33 5.87 5.14 N/A
|
|
||||||
1048576 32768 float 87.42 11.99 10.50 N/A 88.90 11.79 10.32 N/A
|
|
||||||
2097152 65536 float 89.61 23.40 20.48 N/A 90.10 23.27 20.37 N/A
|
|
||||||
4194304 131072 float 96.44 43.49 38.05 N/A 99.62 42.10 36.84 N/A
|
|
||||||
8388608 262144 float 121.1 69.28 60.62 N/A 120.6 69.56 60.87 N/A
|
|
||||||
16777216 524288 float 160.4 104.62 91.55 N/A 158.8 105.64 92.43 N/A
|
|
||||||
33554432 1048576 float 237.5 141.30 123.64 N/A 234.5 143.11 125.22 N/A
|
|
||||||
67108864 2097152 float 396.8 169.13 147.99 N/A 387.0 173.41 151.73 N/A
|
|
||||||
134217728 4194304 float 633.6 211.83 185.35 N/A 620.9 216.17 189.15 N/A
|
|
||||||
268435456 8388608 float 1189.1 225.75 197.53 N/A 1167.8 229.86 201.13 N/A
|
|
||||||
536870912 16777216 float 2236.6 240.04 210.03 N/A 2197.4 244.32 213.78 N/A
|
|
||||||
1073741824 33554432 float 4335.5 247.66 216.71 N/A 4274.2 251.22 219.81 N/A
|
|
||||||
2147483648 67108864 float 8510.4 252.34 220.79 N/A 8405.3 255.49 223.56 N/A
|
|
||||||
4294967296 134217728 float 16860 254.74 222.90 N/A 16678 257.53 225.34 N/A
|
|
||||||
8589934592 268435456 float 33508 256.36 224.31 N/A 33234 258.47 226.16 N/A
|
|
||||||
# Out of bounds values : 0 OK
|
|
||||||
# Avg bus bandwidth : 58.6481
|
|
||||||
|
|
||||||
"""
|
|
||||||
|
|
||||||
for op in raw_output.keys():
|
for op in raw_output.keys():
|
||||||
benchmark._args.operation = op
|
benchmark._args.operation = op
|
||||||
|
|
|
@ -3,28 +3,22 @@
|
||||||
|
|
||||||
"""Tests for disk-performance benchmark."""
|
"""Tests for disk-performance benchmark."""
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
from unittest import mock
|
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
|
from unittest import mock
|
||||||
|
|
||||||
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
|
|
||||||
|
|
||||||
class DiskBenchmarkTest(unittest.TestCase):
|
class DiskBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Test class for disk-performance benchmark."""
|
"""Test class for disk-performance benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = Path(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
binary_path.mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/fio'])
|
||||||
self.__binary_file = binary_path / 'fio'
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
|
||||||
self.__binary_file.unlink()
|
|
||||||
|
|
||||||
def test_disk_performance_empty_param(self):
|
def test_disk_performance_empty_param(self):
|
||||||
"""Test disk-performance benchmark command generation with empty parameter."""
|
"""Test disk-performance benchmark command generation with empty parameter."""
|
||||||
|
@ -178,7 +172,8 @@ class DiskBenchmarkTest(unittest.TestCase):
|
||||||
assert ('--rwmixread=%d' % default_rwmixread in benchmark._commands[command_idx])
|
assert ('--rwmixread=%d' % default_rwmixread in benchmark._commands[command_idx])
|
||||||
command_idx += 1
|
command_idx += 1
|
||||||
|
|
||||||
def test_disk_performance_result_parsing(self):
|
@decorator.load_data('tests/data/disk_performance.log')
|
||||||
|
def test_disk_performance_result_parsing(self, test_raw_output):
|
||||||
"""Test disk-performance benchmark result parsing."""
|
"""Test disk-performance benchmark result parsing."""
|
||||||
benchmark_name = 'disk-benchmark'
|
benchmark_name = 'disk-benchmark'
|
||||||
(benchmark_class,
|
(benchmark_class,
|
||||||
|
@ -193,317 +188,6 @@ class DiskBenchmarkTest(unittest.TestCase):
|
||||||
assert (benchmark.type == BenchmarkType.MICRO)
|
assert (benchmark.type == BenchmarkType.MICRO)
|
||||||
|
|
||||||
# Positive case - valid raw output.
|
# Positive case - valid raw output.
|
||||||
test_raw_output = """
|
|
||||||
{
|
|
||||||
"fio version" : "fio-3.16",
|
|
||||||
"timestamp" : 1626763278,
|
|
||||||
"timestamp_ms" : 1626763278577,
|
|
||||||
"time" : "Tue Jul 20 06:41:18 2021",
|
|
||||||
"global options" : {
|
|
||||||
"filename" : "/dev/nvme0n1",
|
|
||||||
"ramp_time" : "10s",
|
|
||||||
"runtime" : "30s",
|
|
||||||
"iodepth" : "64",
|
|
||||||
"numjobs" : "4",
|
|
||||||
"randrepeat" : "1",
|
|
||||||
"thread" : "1",
|
|
||||||
"ioengine" : "libaio",
|
|
||||||
"direct" : "1",
|
|
||||||
"norandommap" : "1",
|
|
||||||
"lat_percentiles" : "1",
|
|
||||||
"group_reporting" : "1"
|
|
||||||
},
|
|
||||||
"jobs" : [
|
|
||||||
{
|
|
||||||
"jobname" : "rand_read_write",
|
|
||||||
"groupid" : 0,
|
|
||||||
"error" : 0,
|
|
||||||
"eta" : 0,
|
|
||||||
"elapsed" : 41,
|
|
||||||
"job options" : {
|
|
||||||
"name" : "rand_read",
|
|
||||||
"rw" : "randrw",
|
|
||||||
"bs" : "4096",
|
|
||||||
"time_based" : "1"
|
|
||||||
},
|
|
||||||
"read" : {
|
|
||||||
"io_bytes" : 10463010816,
|
|
||||||
"io_kbytes" : 10217784,
|
|
||||||
"bw_bytes" : 348743777,
|
|
||||||
"bw" : 340570,
|
|
||||||
"iops" : 85138.890741,
|
|
||||||
"runtime" : 30002,
|
|
||||||
"total_ios" : 2554337,
|
|
||||||
"short_ios" : 0,
|
|
||||||
"drop_ios" : 0,
|
|
||||||
"slat_ns" : {
|
|
||||||
"min" : 1332,
|
|
||||||
"max" : 48691,
|
|
||||||
"mean" : 2032.588341,
|
|
||||||
"stddev" : 864.921965
|
|
||||||
},
|
|
||||||
"clat_ns" : {
|
|
||||||
"min" : 278533,
|
|
||||||
"max" : 10175655,
|
|
||||||
"mean" : 1444476.063469,
|
|
||||||
"stddev" : 300748.583131
|
|
||||||
},
|
|
||||||
"lat_ns" : {
|
|
||||||
"min" : 280646,
|
|
||||||
"max" : 10177629,
|
|
||||||
"mean" : 1446562.147113,
|
|
||||||
"stddev" : 300723.879349,
|
|
||||||
"percentile" : {
|
|
||||||
"1.000000" : 872448,
|
|
||||||
"5.000000" : 1036288,
|
|
||||||
"10.000000" : 1122304,
|
|
||||||
"20.000000" : 1220608,
|
|
||||||
"30.000000" : 1286144,
|
|
||||||
"40.000000" : 1351680,
|
|
||||||
"50.000000" : 1417216,
|
|
||||||
"60.000000" : 1482752,
|
|
||||||
"70.000000" : 1564672,
|
|
||||||
"80.000000" : 1662976,
|
|
||||||
"90.000000" : 1810432,
|
|
||||||
"95.000000" : 1941504,
|
|
||||||
"99.000000" : 2244608,
|
|
||||||
"99.500000" : 2408448,
|
|
||||||
"99.900000" : 3620864,
|
|
||||||
"99.950000" : 4358144,
|
|
||||||
"99.990000" : 6062080
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"bw_min" : 291288,
|
|
||||||
"bw_max" : 380288,
|
|
||||||
"bw_agg" : 99.999134,
|
|
||||||
"bw_mean" : 340567.050000,
|
|
||||||
"bw_dev" : 6222.338382,
|
|
||||||
"bw_samples" : 240,
|
|
||||||
"iops_min" : 72822,
|
|
||||||
"iops_max" : 95072,
|
|
||||||
"iops_mean" : 85141.733333,
|
|
||||||
"iops_stddev" : 1555.582888,
|
|
||||||
"iops_samples" : 240
|
|
||||||
},
|
|
||||||
"write" : {
|
|
||||||
"io_bytes" : 10454208512,
|
|
||||||
"io_kbytes" : 10209188,
|
|
||||||
"bw_bytes" : 348450387,
|
|
||||||
"bw" : 340283,
|
|
||||||
"iops" : 85066.128925,
|
|
||||||
"runtime" : 30002,
|
|
||||||
"total_ios" : 2552154,
|
|
||||||
"short_ios" : 0,
|
|
||||||
"drop_ios" : 0,
|
|
||||||
"slat_ns" : {
|
|
||||||
"min" : 1383,
|
|
||||||
"max" : 315361,
|
|
||||||
"mean" : 2182.824623,
|
|
||||||
"stddev" : 919.625590
|
|
||||||
},
|
|
||||||
"clat_ns" : {
|
|
||||||
"min" : 433904,
|
|
||||||
"max" : 6300941,
|
|
||||||
"mean" : 1558511.433458,
|
|
||||||
"stddev" : 207734.850159
|
|
||||||
},
|
|
||||||
"lat_ns" : {
|
|
||||||
"min" : 441909,
|
|
||||||
"max" : 6302845,
|
|
||||||
"mean" : 1560749.444938,
|
|
||||||
"stddev" : 207695.144244,
|
|
||||||
"percentile" : {
|
|
||||||
"1.000000" : 1155072,
|
|
||||||
"5.000000" : 1269760,
|
|
||||||
"10.000000" : 1318912,
|
|
||||||
"20.000000" : 1384448,
|
|
||||||
"30.000000" : 1449984,
|
|
||||||
"40.000000" : 1499136,
|
|
||||||
"50.000000" : 1531904,
|
|
||||||
"60.000000" : 1597440,
|
|
||||||
"70.000000" : 1646592,
|
|
||||||
"80.000000" : 1728512,
|
|
||||||
"90.000000" : 1826816,
|
|
||||||
"95.000000" : 1908736,
|
|
||||||
"99.000000" : 2072576,
|
|
||||||
"99.500000" : 2179072,
|
|
||||||
"99.900000" : 2605056,
|
|
||||||
"99.950000" : 3031040,
|
|
||||||
"99.990000" : 4358144
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"bw_min" : 288464,
|
|
||||||
"bw_max" : 380080,
|
|
||||||
"bw_agg" : 99.998134,
|
|
||||||
"bw_mean" : 340276.650000,
|
|
||||||
"bw_dev" : 6293.894521,
|
|
||||||
"bw_samples" : 240,
|
|
||||||
"iops_min" : 72116,
|
|
||||||
"iops_max" : 95020,
|
|
||||||
"iops_mean" : 85069.133333,
|
|
||||||
"iops_stddev" : 1573.475038,
|
|
||||||
"iops_samples" : 240
|
|
||||||
},
|
|
||||||
"trim" : {
|
|
||||||
"io_bytes" : 0,
|
|
||||||
"io_kbytes" : 0,
|
|
||||||
"bw_bytes" : 0,
|
|
||||||
"bw" : 0,
|
|
||||||
"iops" : 0.000000,
|
|
||||||
"runtime" : 0,
|
|
||||||
"total_ios" : 0,
|
|
||||||
"short_ios" : 0,
|
|
||||||
"drop_ios" : 0,
|
|
||||||
"slat_ns" : {
|
|
||||||
"min" : 0,
|
|
||||||
"max" : 0,
|
|
||||||
"mean" : 0.000000,
|
|
||||||
"stddev" : 0.000000
|
|
||||||
},
|
|
||||||
"clat_ns" : {
|
|
||||||
"min" : 0,
|
|
||||||
"max" : 0,
|
|
||||||
"mean" : 0.000000,
|
|
||||||
"stddev" : 0.000000
|
|
||||||
},
|
|
||||||
"lat_ns" : {
|
|
||||||
"min" : 0,
|
|
||||||
"max" : 0,
|
|
||||||
"mean" : 0.000000,
|
|
||||||
"stddev" : 0.000000,
|
|
||||||
"percentile" : {
|
|
||||||
"1.000000" : 0,
|
|
||||||
"5.000000" : 0,
|
|
||||||
"10.000000" : 0,
|
|
||||||
"20.000000" : 0,
|
|
||||||
"30.000000" : 0,
|
|
||||||
"40.000000" : 0,
|
|
||||||
"50.000000" : 0,
|
|
||||||
"60.000000" : 0,
|
|
||||||
"70.000000" : 0,
|
|
||||||
"80.000000" : 0,
|
|
||||||
"90.000000" : 0,
|
|
||||||
"95.000000" : 0,
|
|
||||||
"99.000000" : 0,
|
|
||||||
"99.500000" : 0,
|
|
||||||
"99.900000" : 0,
|
|
||||||
"99.950000" : 0,
|
|
||||||
"99.990000" : 0
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"bw_min" : 0,
|
|
||||||
"bw_max" : 0,
|
|
||||||
"bw_agg" : 0.000000,
|
|
||||||
"bw_mean" : 0.000000,
|
|
||||||
"bw_dev" : 0.000000,
|
|
||||||
"bw_samples" : 0,
|
|
||||||
"iops_min" : 0,
|
|
||||||
"iops_max" : 0,
|
|
||||||
"iops_mean" : 0.000000,
|
|
||||||
"iops_stddev" : 0.000000,
|
|
||||||
"iops_samples" : 0
|
|
||||||
},
|
|
||||||
"sync" : {
|
|
||||||
"lat_ns" : {
|
|
||||||
"min" : 0,
|
|
||||||
"max" : 0,
|
|
||||||
"mean" : 0.000000,
|
|
||||||
"stddev" : 0.000000
|
|
||||||
},
|
|
||||||
"total_ios" : 0
|
|
||||||
},
|
|
||||||
"job_runtime" : 120004,
|
|
||||||
"usr_cpu" : 4.833172,
|
|
||||||
"sys_cpu" : 20.800973,
|
|
||||||
"ctx" : 3542118,
|
|
||||||
"majf" : 0,
|
|
||||||
"minf" : 1263,
|
|
||||||
"iodepth_level" : {
|
|
||||||
"1" : 0.000000,
|
|
||||||
"2" : 0.000000,
|
|
||||||
"4" : 0.000000,
|
|
||||||
"8" : 0.000000,
|
|
||||||
"16" : 0.000000,
|
|
||||||
"32" : 0.000000,
|
|
||||||
">=64" : 100.000000
|
|
||||||
},
|
|
||||||
"iodepth_submit" : {
|
|
||||||
"0" : 0.000000,
|
|
||||||
"4" : 100.000000,
|
|
||||||
"8" : 0.000000,
|
|
||||||
"16" : 0.000000,
|
|
||||||
"32" : 0.000000,
|
|
||||||
"64" : 0.000000,
|
|
||||||
">=64" : 0.000000
|
|
||||||
},
|
|
||||||
"iodepth_complete" : {
|
|
||||||
"0" : 0.000000,
|
|
||||||
"4" : 99.999922,
|
|
||||||
"8" : 0.000000,
|
|
||||||
"16" : 0.000000,
|
|
||||||
"32" : 0.000000,
|
|
||||||
"64" : 0.100000,
|
|
||||||
">=64" : 0.000000
|
|
||||||
},
|
|
||||||
"latency_ns" : {
|
|
||||||
"2" : 0.000000,
|
|
||||||
"4" : 0.000000,
|
|
||||||
"10" : 0.000000,
|
|
||||||
"20" : 0.000000,
|
|
||||||
"50" : 0.000000,
|
|
||||||
"100" : 0.000000,
|
|
||||||
"250" : 0.000000,
|
|
||||||
"500" : 0.000000,
|
|
||||||
"750" : 0.000000,
|
|
||||||
"1000" : 0.000000
|
|
||||||
},
|
|
||||||
"latency_us" : {
|
|
||||||
"2" : 0.000000,
|
|
||||||
"4" : 0.000000,
|
|
||||||
"10" : 0.000000,
|
|
||||||
"20" : 0.000000,
|
|
||||||
"50" : 0.000000,
|
|
||||||
"100" : 0.000000,
|
|
||||||
"250" : 0.000000,
|
|
||||||
"500" : 0.010000,
|
|
||||||
"750" : 0.070126,
|
|
||||||
"1000" : 1.756079
|
|
||||||
},
|
|
||||||
"latency_ms" : {
|
|
||||||
"2" : 95.414131,
|
|
||||||
"4" : 2.722457,
|
|
||||||
"10" : 0.040830,
|
|
||||||
"20" : 0.010000,
|
|
||||||
"50" : 0.000000,
|
|
||||||
"100" : 0.000000,
|
|
||||||
"250" : 0.000000,
|
|
||||||
"500" : 0.000000,
|
|
||||||
"750" : 0.000000,
|
|
||||||
"1000" : 0.000000,
|
|
||||||
"2000" : 0.000000,
|
|
||||||
">=2000" : 0.000000
|
|
||||||
},
|
|
||||||
"latency_depth" : 64,
|
|
||||||
"latency_target" : 0,
|
|
||||||
"latency_percentile" : 100.000000,
|
|
||||||
"latency_window" : 0
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"disk_util" : [
|
|
||||||
{
|
|
||||||
"name" : "nvme0n1",
|
|
||||||
"read_ios" : 3004914,
|
|
||||||
"write_ios" : 3003760,
|
|
||||||
"read_merges" : 0,
|
|
||||||
"write_merges" : 0,
|
|
||||||
"read_ticks" : 4269143,
|
|
||||||
"write_ticks" : 4598453,
|
|
||||||
"in_queue" : 11104,
|
|
||||||
"util" : 99.840351
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
"""
|
|
||||||
jobname_prefix = 'nvme0n1_rand_read_write'
|
jobname_prefix = 'nvme0n1_rand_read_write'
|
||||||
assert (benchmark._process_raw_result(0, test_raw_output))
|
assert (benchmark._process_raw_result(0, test_raw_output))
|
||||||
assert (benchmark.return_code == ReturnCode.SUCCESS)
|
assert (benchmark.return_code == ReturnCode.SUCCESS)
|
||||||
|
|
|
@ -3,66 +3,27 @@
|
||||||
|
|
||||||
"""Tests for GPCNet benchmark."""
|
"""Tests for GPCNet benchmark."""
|
||||||
|
|
||||||
import os
|
|
||||||
import numbers
|
import numbers
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, Platform, BenchmarkType
|
from superbench.benchmarks import BenchmarkRegistry, Platform, BenchmarkType
|
||||||
|
|
||||||
|
|
||||||
class GPCNetBenchmarkTest(unittest.TestCase): # noqa: E501
|
class GPCNetBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Tests for GPCNetBenchmark benchmark."""
|
"""Tests for GPCNetBenchmark benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench'
|
super().setUpClass()
|
||||||
binary_path = os.path.join(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
Path(binary_path).mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/network_test', 'bin/network_load_test'])
|
||||||
self.__binary_files = []
|
|
||||||
for bin_name in ['network_test', 'network_load_test']:
|
|
||||||
self.__binary_files.append(Path(binary_path, bin_name))
|
|
||||||
Path(binary_path, bin_name).touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
@decorator.load_data('tests/data/gpcnet_network_test.log')
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
@decorator.load_data('tests/data/gpcnet_network_test_error.log')
|
||||||
for bin_file in self.__binary_files:
|
def test_gpcnet_network_test(self, raw_output, raw_output_no_execution):
|
||||||
bin_file.unlink()
|
|
||||||
|
|
||||||
def test_gpcnet_network_test(self):
|
|
||||||
"""Test gpcnet-network-test benchmark."""
|
"""Test gpcnet-network-test benchmark."""
|
||||||
raw_output = """# noqa: E501
|
|
||||||
Network Tests v1.3
|
|
||||||
Test with 2 MPI ranks (2 nodes)
|
|
||||||
|
|
||||||
Legend
|
|
||||||
RR = random ring communication pattern
|
|
||||||
Nat = natural ring communication pattern
|
|
||||||
Lat = latency
|
|
||||||
BW = bandwidth
|
|
||||||
BW+Sync = bandwidth with barrier
|
|
||||||
+------------------------------------------------------------------------------+
|
|
||||||
| Isolated Network Tests |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| Name | Avg | 99% | Units |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided Lat (8 B) | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| RR Get Lat (8 B) | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided BW (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| RR Put BW (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided BW+Sync (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| Nat Two-sided BW (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| Multiple Allreduce (8 B) | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
| Multiple Alltoall (4096 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+
|
|
||||||
"""
|
|
||||||
# Check registry.
|
# Check registry.
|
||||||
benchmark_name = 'gpcnet-network-test'
|
benchmark_name = 'gpcnet-network-test'
|
||||||
(benchmark_class,
|
(benchmark_class,
|
||||||
|
@ -78,20 +39,6 @@ Network Tests v1.3
|
||||||
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
|
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
|
||||||
assert (command == expect_command)
|
assert (command == expect_command)
|
||||||
|
|
||||||
raw_output_no_execution = """
|
|
||||||
ERROR: this application must be run on at least 2 nodes
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
Primary job terminated normally, but 1 process returned
|
|
||||||
a non-zero exit code. Per user-direction, the job has been aborted.
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
mpirun detected that one or more processes exited with non-zero status, thus causing
|
|
||||||
the job to be terminated. The first process to do so was:
|
|
||||||
|
|
||||||
Process name: [[63697,1],0]
|
|
||||||
Exit code: 1
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
"""
|
|
||||||
assert (benchmark._process_raw_result(0, raw_output_no_execution))
|
assert (benchmark._process_raw_result(0, raw_output_no_execution))
|
||||||
assert (len(benchmark.result) == benchmark.default_metric_count)
|
assert (len(benchmark.result) == benchmark.default_metric_count)
|
||||||
|
|
||||||
|
@ -123,107 +70,10 @@ the job to be terminated. The first process to do so was:
|
||||||
assert (benchmark.type == BenchmarkType.MICRO)
|
assert (benchmark.type == BenchmarkType.MICRO)
|
||||||
assert (benchmark._bin_name == 'network_test')
|
assert (benchmark._bin_name == 'network_test')
|
||||||
|
|
||||||
def test_gpcnet_network_load(self): # noqa: C901
|
@decorator.load_data('tests/data/gpcnet_network_load.log')
|
||||||
|
@decorator.load_data('tests/data/gpcnet_network_load_error.log')
|
||||||
|
def test_gpcnet_network_load(self, raw_output, raw_output_no_execution):
|
||||||
"""Test gpcnet-network-load-test benchmark."""
|
"""Test gpcnet-network-load-test benchmark."""
|
||||||
raw_output = """# noqa: E501
|
|
||||||
NetworkLoad Tests v1.3
|
|
||||||
Test with 10 MPI ranks (10 nodes)
|
|
||||||
2 nodes running Network Tests
|
|
||||||
8 nodes running Congestion Tests (min 100 nodes per congestor)
|
|
||||||
|
|
||||||
Legend
|
|
||||||
RR = random ring communication pattern
|
|
||||||
Lat = latency
|
|
||||||
BW = bandwidth
|
|
||||||
BW+Sync = bandwidth with barrier
|
|
||||||
+------------------------------------------------------------------------------------------------------------------------------------------+
|
|
||||||
| Isolated Network Tests |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided Lat (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided BW+Sync (131072 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Multiple Allreduce (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
|
|
||||||
+------------------------------------------------------------------------------------------------------------------------------------------+
|
|
||||||
| Isolated Congestion Tests |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
|
|
||||||
+------------------------------------------------------------------------------------------------------------------------------------------+
|
|
||||||
| Network Tests running with Congestion Tests ( RR Two-sided Lat Network Test) |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided Lat (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
|
|
||||||
+------------------------------------------------------------------------------------------------------------------------------------------+
|
|
||||||
| Network Tests running with Congestion Tests (RR Two-sided BW+Sync Network Test) |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| RR Two-sided BW+Sync (131072 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
|
|
||||||
+------------------------------------------------------------------------------------------------------------------------------------------+
|
|
||||||
| Network Tests running with Congestion Tests ( Multiple Allreduce Network Test) |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Multiple Allreduce (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
|
||||||
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
|
||||||
|
|
||||||
+------------------------------------------------------------------------------+
|
|
||||||
| Network Tests running with Congestion Tests - Key Results |
|
|
||||||
+---------------------------------+--------------------------------------------+
|
|
||||||
| Name | Congestion Impact Factor |
|
|
||||||
+---------------------------------+----------------------+---------------------+
|
|
||||||
| | Avg | 99% |
|
|
||||||
+---------------------------------+----------------------+---------------------+
|
|
||||||
| RR Two-sided Lat (8 B) | 0.0X | 0.0X |
|
|
||||||
+---------------------------------+----------------------+---------------------+
|
|
||||||
| RR Two-sided BW+Sync (131072 B) | 0.0X | 0.0X |
|
|
||||||
+---------------------------------+----------------------+---------------------+
|
|
||||||
| Multiple Allreduce (8 B) | 0.0X | 0.0X |
|
|
||||||
+---------------------------------+----------------------+---------------------+
|
|
||||||
"""
|
|
||||||
# Check registry.
|
# Check registry.
|
||||||
benchmark_name = 'gpcnet-network-load-test'
|
benchmark_name = 'gpcnet-network-load-test'
|
||||||
(benchmark_class,
|
(benchmark_class,
|
||||||
|
@ -240,20 +90,6 @@ NetworkLoad Tests v1.3
|
||||||
assert (command == expect_command)
|
assert (command == expect_command)
|
||||||
|
|
||||||
# Check function process_raw_data.
|
# Check function process_raw_data.
|
||||||
raw_output_no_execution = """
|
|
||||||
ERROR: this application must be run on at least 10 nodes
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
Primary job terminated normally, but 1 process returned
|
|
||||||
a non-zero exit code. Per user-direction, the job has been aborted.
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
mpirun detected that one or more processes exited with non-zero status, thus causing
|
|
||||||
the job to be terminated. The first process to do so was:
|
|
||||||
|
|
||||||
Process name: [[63697,1],0]
|
|
||||||
Exit code: 1
|
|
||||||
--------------------------------------------------------------------------
|
|
||||||
"""
|
|
||||||
assert (benchmark._process_raw_result(0, raw_output_no_execution))
|
assert (benchmark._process_raw_result(0, raw_output_no_execution))
|
||||||
assert (len(benchmark.result) == benchmark.default_metric_count)
|
assert (len(benchmark.result) == benchmark.default_metric_count)
|
||||||
# Positive case - valid raw output.
|
# Positive case - valid raw output.
|
||||||
|
|
|
@ -3,29 +3,22 @@
|
||||||
|
|
||||||
"""Tests for gpu-copy-bw benchmark."""
|
"""Tests for gpu-copy-bw benchmark."""
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
import numbers
|
import numbers
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
|
|
||||||
from tests.helper import decorator
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
|
|
||||||
|
|
||||||
class GpuCopyBwBenchmarkTest(unittest.TestCase):
|
class GpuCopyBwBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Test class for gpu-copy-bw benchmark."""
|
"""Test class for gpu-copy-bw benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = Path(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
binary_path.mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/gpu_copy'])
|
||||||
self.__binary_file = binary_path / 'gpu_copy'
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
|
||||||
self.__binary_file.unlink()
|
|
||||||
|
|
||||||
def _test_gpu_copy_bw_performance_command_generation(self, platform):
|
def _test_gpu_copy_bw_performance_command_generation(self, platform):
|
||||||
"""Test gpu-copy benchmark command generation."""
|
"""Test gpu-copy benchmark command generation."""
|
||||||
|
|
|
@ -6,113 +6,42 @@
|
||||||
import os
|
import os
|
||||||
import numbers
|
import numbers
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
|
||||||
from unittest import mock
|
from unittest import mock
|
||||||
|
|
||||||
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, Platform, BenchmarkType, ReturnCode
|
from superbench.benchmarks import BenchmarkRegistry, Platform, BenchmarkType, ReturnCode
|
||||||
from superbench.common.utils import network
|
from superbench.common.utils import network
|
||||||
from superbench.benchmarks.micro_benchmarks import ib_loopback_performance
|
from superbench.benchmarks.micro_benchmarks import ib_loopback_performance
|
||||||
|
|
||||||
|
|
||||||
class IBLoopbackBenchmarkTest(unittest.TestCase):
|
class IBLoopbackBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Tests for IBLoopbackBenchmark benchmark."""
|
"""Tests for IBLoopbackBenchmark benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
if (len(network.get_ib_devices()) < 1):
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
# Create fake binary file just for testing.
|
super().setUpClass()
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench'
|
cls.createMockEnvs(cls)
|
||||||
binary_path = Path(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockFiles(cls, ['bin/run_perftest_loopback'])
|
||||||
binary_path.mkdir(parents=True, exist_ok=True)
|
|
||||||
self.__binary_file = Path(binary_path, 'run_perftest_loopback')
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
|
||||||
if (len(network.get_ib_devices()) < 1):
|
|
||||||
self.__binary_file.unlink()
|
|
||||||
|
|
||||||
def test_ib_loopback_util(self):
|
def test_ib_loopback_util(self):
|
||||||
"""Test util functions 'get_numa_cores' and 'get_free_port' used in ib-loopback benchmark."""
|
"""Test util functions 'get_numa_cores' and 'get_free_port' used in ib-loopback benchmark."""
|
||||||
port = network.get_free_port()
|
port = network.get_free_port()
|
||||||
assert (isinstance(port, numbers.Number))
|
assert (isinstance(port, numbers.Number))
|
||||||
numa_cores = ib_loopback_performance.get_numa_cores(0)
|
numa_cores = ib_loopback_performance.get_numa_cores(0)
|
||||||
|
if numa_cores is None:
|
||||||
|
# in case no NUMA support available on test system
|
||||||
|
return
|
||||||
assert (len(numa_cores) >= 2)
|
assert (len(numa_cores) >= 2)
|
||||||
for i in range(len(numa_cores)):
|
for i in range(len(numa_cores)):
|
||||||
assert (isinstance(numa_cores[i], numbers.Number))
|
assert (isinstance(numa_cores[i], numbers.Number))
|
||||||
|
|
||||||
|
@decorator.load_data('tests/data/ib_loopback_all_sizes.log')
|
||||||
@mock.patch('superbench.common.utils.network.get_free_port')
|
@mock.patch('superbench.common.utils.network.get_free_port')
|
||||||
@mock.patch('superbench.benchmarks.micro_benchmarks.ib_loopback_performance.get_numa_cores')
|
@mock.patch('superbench.benchmarks.micro_benchmarks.ib_loopback_performance.get_numa_cores')
|
||||||
@mock.patch('superbench.common.utils.network.get_ib_devices')
|
@mock.patch('superbench.common.utils.network.get_ib_devices')
|
||||||
def test_ib_loopback_all_sizes(self, mock_ib_devices, mock_numa_cores, mock_port):
|
def test_ib_loopback_all_sizes(self, raw_output, mock_ib_devices, mock_numa_cores, mock_port):
|
||||||
"""Test ib-loopback benchmark for all sizes."""
|
"""Test ib-loopback benchmark for all sizes."""
|
||||||
raw_output = """
|
|
||||||
************************************
|
|
||||||
* Waiting for client to connect... *
|
|
||||||
************************************
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
RDMA_Write BW Test
|
|
||||||
Dual-port : OFF Device : ibP257p0s0
|
|
||||||
Number of qps : 1 Transport type : IB
|
|
||||||
Connection type : RC Using SRQ : OFF
|
|
||||||
PCIe relax order: ON
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
RDMA_Write BW Test
|
|
||||||
Dual-port : OFF Device : ibP257p0s0
|
|
||||||
Number of qps : 1 Transport type : IB
|
|
||||||
Connection type : RC Using SRQ : OFF
|
|
||||||
PCIe relax order: ON
|
|
||||||
ibv_wr* API : ON
|
|
||||||
TX depth : 128
|
|
||||||
CQ Moderation : 100
|
|
||||||
Mtu : 4096[B]
|
|
||||||
Link type : IB
|
|
||||||
Max inline data : 0[B]
|
|
||||||
rdma_cm QPs : OFF
|
|
||||||
Data ex. method : Ethernet
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
ibv_wr* API : ON
|
|
||||||
CQ Moderation : 100
|
|
||||||
Mtu : 4096[B]
|
|
||||||
Link type : IB
|
|
||||||
Max inline data : 0[B]
|
|
||||||
rdma_cm QPs : OFF
|
|
||||||
Data ex. method : Ethernet
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
local address: LID 0xd06 QPN 0x092f PSN 0x3ff1bc RKey 0x080329 VAddr 0x007fc97ff50000
|
|
||||||
local address: LID 0xd06 QPN 0x092e PSN 0x3eb82d RKey 0x080228 VAddr 0x007f19adcbf000
|
|
||||||
remote address: LID 0xd06 QPN 0x092e PSN 0x3eb82d RKey 0x080228 VAddr 0x007f19adcbf000
|
|
||||||
remote address: LID 0xd06 QPN 0x092f PSN 0x3ff1bc RKey 0x080329 VAddr 0x007fc97ff50000
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
|
||||||
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
|
||||||
2 2000 5.32 5.30 2.778732
|
|
||||||
4 2000 10.65 10.64 2.788833
|
|
||||||
8 2000 21.30 21.27 2.787609
|
|
||||||
16 2000 42.60 42.55 2.788268
|
|
||||||
32 2000 84.90 82.82 2.713896
|
|
||||||
64 2000 173.55 171.66 2.812504
|
|
||||||
128 2000 362.27 353.83 2.898535
|
|
||||||
256 2000 687.82 679.37 2.782698
|
|
||||||
512 2000 1337.12 1311.59 2.686135
|
|
||||||
1024 2000 2674.25 2649.39 2.712980
|
|
||||||
2048 2000 5248.56 5118.18 2.620509
|
|
||||||
4096 2000 10034.02 9948.41 2.546793
|
|
||||||
8192 2000 18620.51 12782.56 1.636168
|
|
||||||
16384 2000 23115.27 16782.50 1.074080
|
|
||||||
32768 2000 22927.94 18586.03 0.594753
|
|
||||||
65536 2000 23330.56 21167.79 0.338685
|
|
||||||
131072 2000 22750.35 21443.14 0.171545
|
|
||||||
262144 2000 22673.63 22411.35 0.089645
|
|
||||||
524288 2000 22679.02 22678.86 0.045358
|
|
||||||
1048576 2000 22817.06 22816.86 0.022817
|
|
||||||
2097152 2000 22919.37 22919.27 0.011460
|
|
||||||
4194304 2000 23277.93 23277.91 0.005819
|
|
||||||
8388608 2000 23240.68 23240.68 0.002905
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
8388608 2000 23240.68 23240.68 0.002905
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
"""
|
|
||||||
# Test without ib devices
|
# Test without ib devices
|
||||||
# Check registry.
|
# Check registry.
|
||||||
benchmark_name = 'ib-loopback'
|
benchmark_name = 'ib-loopback'
|
||||||
|
@ -179,56 +108,12 @@ remote address: LID 0xd06 QPN 0x092f PSN 0x3ff1bc RKey 0x080329 VAddr 0x007fc97f
|
||||||
assert (benchmark._args.iters == 2000)
|
assert (benchmark._args.iters == 2000)
|
||||||
assert (benchmark._args.commands == ['write'])
|
assert (benchmark._args.commands == ['write'])
|
||||||
|
|
||||||
|
@decorator.load_data('tests/data/ib_loopback_8M_size.log')
|
||||||
@mock.patch('superbench.common.utils.network.get_free_port')
|
@mock.patch('superbench.common.utils.network.get_free_port')
|
||||||
@mock.patch('superbench.benchmarks.micro_benchmarks.ib_loopback_performance.get_numa_cores')
|
@mock.patch('superbench.benchmarks.micro_benchmarks.ib_loopback_performance.get_numa_cores')
|
||||||
@mock.patch('superbench.common.utils.network.get_ib_devices')
|
@mock.patch('superbench.common.utils.network.get_ib_devices')
|
||||||
def test_ib_loopback_8M_size(self, mock_ib_devices, mock_numa_cores, mock_port):
|
def test_ib_loopback_8M_size(self, raw_output, mock_ib_devices, mock_numa_cores, mock_port):
|
||||||
"""Test ib-loopback benchmark for 8M size."""
|
"""Test ib-loopback benchmark for 8M size."""
|
||||||
raw_output = """
|
|
||||||
RDMA_Write BW Test
|
|
||||||
Dual-port : OFF Device : ibP257p0s0
|
|
||||||
Number of qps : 1 Transport type : IB
|
|
||||||
Connection type : RC Using SRQ : OFF
|
|
||||||
PCIe relax order: ON
|
|
||||||
TX depth : 128
|
|
||||||
CQ Moderation : 1
|
|
||||||
Mtu : 4096[B]
|
|
||||||
Link type : IB
|
|
||||||
Max inline data : 0[B]
|
|
||||||
rdma_cm QPs : OFF
|
|
||||||
Data ex. method : Ethernet
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
local address: LID 0xd06 QPN 0x095f PSN 0x3c9e82 RKey 0x080359 VAddr 0x007f9fc479c000
|
|
||||||
remote address: LID 0xd06 QPN 0x095e PSN 0xbd024b RKey 0x080258 VAddr 0x007fe62504b000
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
|
||||||
8388608 20000 24056.74 24056.72 0.003007
|
|
||||||
************************************
|
|
||||||
* Waiting for client to connect... *
|
|
||||||
************************************
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
RDMA_Write BW Test
|
|
||||||
Dual-port : OFF Device : ibP257p0s0
|
|
||||||
Number of qps : 1 Transport type : IB
|
|
||||||
Connection type : RC Using SRQ : OFF
|
|
||||||
PCIe relax order: ON
|
|
||||||
CQ Moderation : 1
|
|
||||||
Mtu : 4096[B]
|
|
||||||
Link type : IB
|
|
||||||
Max inline data : 0[B]
|
|
||||||
rdma_cm QPs : OFF
|
|
||||||
Data ex. method : Ethernet
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
local address: LID 0xd06 QPN 0x095e PSN 0xbd024b RKey 0x080258 VAddr 0x007fe62504b000
|
|
||||||
remote address: LID 0xd06 QPN 0x095f PSN 0x3c9e82 RKey 0x080359 VAddr 0x007f9fc479c000
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
|
||||||
8388608 20000 24056.74 24056.72 0.003007
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
"""
|
|
||||||
# Test without ib devices
|
# Test without ib devices
|
||||||
# Check registry.
|
# Check registry.
|
||||||
benchmark_name = 'ib-loopback'
|
benchmark_name = 'ib-loopback'
|
||||||
|
|
|
@ -10,26 +10,26 @@ from pathlib import Path
|
||||||
from unittest import mock
|
from unittest import mock
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
|
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, Platform, BenchmarkType, ReturnCode
|
from superbench.benchmarks import BenchmarkRegistry, Platform, BenchmarkType, ReturnCode
|
||||||
|
|
||||||
|
|
||||||
class IBBenchmarkTest(unittest.TestCase):
|
class IBBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Tests for IBBenchmark benchmark."""
|
"""Tests for IBBenchmark benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench'
|
super().setUpClass()
|
||||||
binary_path = Path(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
binary_path.mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/ib_validation'])
|
||||||
self.__binary_file = Path(binary_path, 'ib_validation')
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
@classmethod
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
def tearDownClass(cls):
|
||||||
self.__binary_file.unlink()
|
"""Hook method for deconstructing the class fixture after running all tests in the class."""
|
||||||
p = Path('hostfile')
|
p = Path('hostfile')
|
||||||
if p.is_file():
|
if p.is_file():
|
||||||
p.unlink()
|
p.unlink()
|
||||||
|
super().tearDownClass()
|
||||||
|
|
||||||
def test_generate_config(self): # noqa: C901
|
def test_generate_config(self): # noqa: C901
|
||||||
"""Test util functions ."""
|
"""Test util functions ."""
|
||||||
|
@ -117,8 +117,9 @@ class IBBenchmarkTest(unittest.TestCase):
|
||||||
|
|
||||||
Path(test_config_file).unlink()
|
Path(test_config_file).unlink()
|
||||||
|
|
||||||
|
@mock.patch('superbench.common.devices.GPU.vendor', new_callable=mock.PropertyMock)
|
||||||
@mock.patch('superbench.common.utils.network.get_ib_devices')
|
@mock.patch('superbench.common.utils.network.get_ib_devices')
|
||||||
def test_ib_traffic_performance(self, mock_ib_devices):
|
def test_ib_traffic_performance(self, mock_ib_devices, mock_gpu):
|
||||||
"""Test ib-traffic benchmark."""
|
"""Test ib-traffic benchmark."""
|
||||||
# Test without ib devices
|
# Test without ib devices
|
||||||
# Check registry.
|
# Check registry.
|
||||||
|
@ -168,6 +169,22 @@ class IBBenchmarkTest(unittest.TestCase):
|
||||||
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
|
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
|
||||||
assert (command == expect_command)
|
assert (command == expect_command)
|
||||||
|
|
||||||
|
parameters = '--ib_index 0 --iters 2000 --pattern one-to-one --hostfile hostfile --gpu_index 0'
|
||||||
|
mock_gpu.return_value = 'nvidia'
|
||||||
|
benchmark = benchmark_class(benchmark_name, parameters=parameters)
|
||||||
|
ret = benchmark._preprocess()
|
||||||
|
expect_command = 'ib_validation --hostfile hostfile --cmd_prefix "ib_write_bw -F ' + \
|
||||||
|
'--iters=2000 -d mlx5_0 -a --use_cuda=0" --input_config ' + os.getcwd() + '/config.txt'
|
||||||
|
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
|
||||||
|
assert (command == expect_command)
|
||||||
|
mock_gpu.return_value = 'amd'
|
||||||
|
benchmark = benchmark_class(benchmark_name, parameters=parameters)
|
||||||
|
ret = benchmark._preprocess()
|
||||||
|
expect_command = 'ib_validation --hostfile hostfile --cmd_prefix "ib_write_bw -F ' + \
|
||||||
|
'--iters=2000 -d mlx5_0 -a --use_rocm=0" --input_config ' + os.getcwd() + '/config.txt'
|
||||||
|
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
|
||||||
|
assert (command == expect_command)
|
||||||
|
|
||||||
# Custom config
|
# Custom config
|
||||||
config = ['0,1', '1,0;0,1', '0,1;1,0', '1,0;0,1']
|
config = ['0,1', '1,0;0,1', '0,1;1,0', '1,0;0,1']
|
||||||
with open('test_config.txt', 'w') as f:
|
with open('test_config.txt', 'w') as f:
|
||||||
|
|
|
@ -3,27 +3,20 @@
|
||||||
|
|
||||||
"""Tests for gemm-flops benchmark."""
|
"""Tests for gemm-flops benchmark."""
|
||||||
|
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, ReturnCode, Platform, BenchmarkType
|
from superbench.benchmarks import BenchmarkRegistry, ReturnCode, Platform, BenchmarkType
|
||||||
|
|
||||||
|
|
||||||
class RocmGemmFlopsTest(unittest.TestCase):
|
class RocmGemmFlopsTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Tests for RocmGemmFlops benchmark."""
|
"""Tests for RocmGemmFlops benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = os.path.join(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
Path(binary_path).mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/rocblas-bench'])
|
||||||
self.__binary_file = Path(os.path.join(binary_path, 'rocblas-bench'))
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
|
||||||
self.__binary_file.unlink()
|
|
||||||
|
|
||||||
def test_rocm_flops_performance(self):
|
def test_rocm_flops_performance(self):
|
||||||
"""Test gemm-flops benchmark."""
|
"""Test gemm-flops benchmark."""
|
||||||
|
|
|
@ -4,29 +4,25 @@
|
||||||
"""Tests for mem-bw benchmark."""
|
"""Tests for mem-bw benchmark."""
|
||||||
|
|
||||||
import numbers
|
import numbers
|
||||||
from pathlib import Path
|
|
||||||
import os
|
|
||||||
import unittest
|
import unittest
|
||||||
|
|
||||||
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
|
|
||||||
|
|
||||||
class RocmMemBwTest(unittest.TestCase):
|
class RocmMemBwTest(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Test class for rocm mem-bw benchmark."""
|
"""Test class for rocm mem-bw benchmark."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Method called to prepare the test fixture."""
|
def setUpClass(cls):
|
||||||
# Create fake binary file just for testing.
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
os.environ['SB_MICRO_PATH'] = '/tmp/superbench/'
|
super().setUpClass()
|
||||||
binary_path = os.path.join(os.getenv('SB_MICRO_PATH'), 'bin')
|
cls.createMockEnvs(cls)
|
||||||
Path(os.getenv('SB_MICRO_PATH'), 'bin').mkdir(parents=True, exist_ok=True)
|
cls.createMockFiles(cls, ['bin/hipBusBandwidth'])
|
||||||
self.__binary_file = Path(binary_path, 'hipBusBandwidth')
|
|
||||||
self.__binary_file.touch(mode=0o755, exist_ok=True)
|
|
||||||
|
|
||||||
def tearDown(self):
|
@decorator.load_data('tests/data/rocm_memory_h2d_bw.log')
|
||||||
"""Method called after the test method has been called and the result recorded."""
|
@decorator.load_data('tests/data/rocm_memory_d2h_bw.log')
|
||||||
self.__binary_file.unlink()
|
def test_rocm_memory_bw_performance(self, raw_output_h2d, raw_output_d2h):
|
||||||
|
|
||||||
def test_rocm_memory_bw_performance(self):
|
|
||||||
"""Test rocm mem-bw benchmark."""
|
"""Test rocm mem-bw benchmark."""
|
||||||
benchmark_name = 'mem-bw'
|
benchmark_name = 'mem-bw'
|
||||||
(benchmark_class,
|
(benchmark_class,
|
||||||
|
@ -51,114 +47,7 @@ class RocmMemBwTest(unittest.TestCase):
|
||||||
assert (commnad == expected_command[i])
|
assert (commnad == expected_command[i])
|
||||||
|
|
||||||
# Check results and metrics.
|
# Check results and metrics.
|
||||||
raw_output = {}
|
raw_output = [raw_output_h2d, raw_output_d2h]
|
||||||
raw_output[0] = """
|
|
||||||
Device:Device 738c Mem=32.0GB #CUs=120 Freq=1502Mhz MallocMode=pinned
|
|
||||||
test atts units median mean stddev min max
|
|
||||||
H2D_Bandwidth_pinned +064By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
|
||||||
H2D_Bandwidth_pinned +256By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
|
||||||
H2D_Bandwidth_pinned +512By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
|
||||||
H2D_Bandwidth_pinned 1kB GB/sec 0.0414 0.0411 0.0017 0.0189 0.0434
|
|
||||||
H2D_Bandwidth_pinned 2kB GB/sec 0.0828 0.0824 0.0018 0.0683 0.0862
|
|
||||||
H2D_Bandwidth_pinned 4kB GB/sec 0.1656 0.1652 0.0032 0.1374 0.1724
|
|
||||||
H2D_Bandwidth_pinned 8kB GB/sec 0.3268 0.3251 0.0117 0.1880 0.3425
|
|
||||||
H2D_Bandwidth_pinned 16kB GB/sec 0.6410 0.6365 0.0259 0.3597 0.6757
|
|
||||||
H2D_Bandwidth_pinned 32kB GB/sec 1.2422 1.2432 0.0278 0.9346 1.2987
|
|
||||||
H2D_Bandwidth_pinned 64kB GB/sec 2.3968 2.4161 0.1486 0.7242 2.6042
|
|
||||||
H2D_Bandwidth_pinned 128kB GB/sec 4.6786 4.6339 0.1310 4.1143 4.8162
|
|
||||||
H2D_Bandwidth_pinned 256kB GB/sec 7.8349 7.8369 0.1150 6.9093 8.0270
|
|
||||||
H2D_Bandwidth_pinned 512kB GB/sec 11.9963 11.9828 0.1287 11.2158 12.2201
|
|
||||||
H2D_Bandwidth_pinned 1024kB GB/sec 16.3342 16.3315 0.0956 16.0147 16.5823
|
|
||||||
H2D_Bandwidth_pinned 2048kB GB/sec 19.9790 19.9770 0.0853 19.7681 20.1635
|
|
||||||
H2D_Bandwidth_pinned 4096kB GB/sec 22.2706 22.2642 0.0552 22.0644 22.3847
|
|
||||||
H2D_Bandwidth_pinned 8192kB GB/sec 22.8232 22.7881 0.1669 21.3196 22.8930
|
|
||||||
H2D_Bandwidth_pinned 16384kB GB/sec 24.1521 24.1411 0.0429 24.0165 24.2162
|
|
||||||
H2D_Bandwidth_pinned 32768kB GB/sec 24.8695 24.7086 0.7491 20.6288 24.9035
|
|
||||||
H2D_Bandwidth_pinned 65536kB GB/sec 24.4840 24.0101 2.5769 6.1754 24.5292
|
|
||||||
H2D_Bandwidth_pinned 131072kB GB/sec 25.0487 24.9593 0.2601 24.1286 25.0711
|
|
||||||
H2D_Bandwidth_pinned 262144kB GB/sec 25.3280 25.2351 0.1788 24.8746 25.3498
|
|
||||||
H2D_Bandwidth_pinned 524288kB GB/sec 24.7523 24.6708 0.1586 24.3154 24.7880
|
|
||||||
H2D_Timepinned +064By ms 0.0245 0.0253 0.0240 0.0232 0.7821
|
|
||||||
H2D_Timepinned +256By ms 0.0243 0.0244 0.0013 0.0232 0.0546
|
|
||||||
H2D_Timepinned +512By ms 0.0243 0.0244 0.0014 0.0230 0.0566
|
|
||||||
H2D_Timepinned 1kB ms 0.0242 0.0244 0.0016 0.0230 0.0530
|
|
||||||
H2D_Timepinned 2kB ms 0.0242 0.0243 0.0005 0.0232 0.0293
|
|
||||||
H2D_Timepinned 4kB ms 0.0242 0.0242 0.0005 0.0232 0.0291
|
|
||||||
H2D_Timepinned 8kB ms 0.0245 0.0247 0.0013 0.0234 0.0426
|
|
||||||
H2D_Timepinned 16kB ms 0.0250 0.0252 0.0015 0.0237 0.0445
|
|
||||||
H2D_Timepinned 32kB ms 0.0258 0.0258 0.0006 0.0246 0.0342
|
|
||||||
H2D_Timepinned 64kB ms 0.0271 0.0272 0.0045 0.0250 0.0898
|
|
||||||
H2D_Timepinned 128kB ms 0.0280 0.0283 0.0008 0.0272 0.0318
|
|
||||||
H2D_Timepinned 256kB ms 0.0334 0.0334 0.0005 0.0326 0.0379
|
|
||||||
H2D_Timepinned 512kB ms 0.0437 0.0437 0.0005 0.0429 0.0467
|
|
||||||
H2D_Timepinned 1024kB ms 0.0642 0.0642 0.0004 0.0632 0.0654
|
|
||||||
H2D_Timepinned 2048kB ms 0.1050 0.1050 0.0004 0.1040 0.1061
|
|
||||||
H2D_Timepinned 4096kB ms 0.1883 0.1884 0.0005 0.1874 0.1901
|
|
||||||
H2D_Timepinned 8192kB ms 0.3675 0.3681 0.0028 0.3664 0.3934
|
|
||||||
H2D_Timepinned 16384kB ms 0.6946 0.6950 0.0012 0.6928 0.6986
|
|
||||||
H2D_Timepinned 32768kB ms 1.3492 1.3595 0.0482 1.3474 1.6266
|
|
||||||
H2D_Timepinned 65536kB ms 2.7409 2.9163 1.1368 2.7358 10.8670
|
|
||||||
H2D_Timepinned 131072kB ms 5.3582 5.3780 0.0576 5.3534 5.5626
|
|
||||||
H2D_Timepinned 262144kB ms 10.5983 10.6379 0.0761 10.5892 10.7915
|
|
||||||
H2D_Timepinned 524288kB ms 21.6897 21.7622 0.1411 21.6585 22.0794
|
|
||||||
|
|
||||||
Note: results marked with (*) had missing values such as
|
|
||||||
might occur with a mixture of architectural capabilities.
|
|
||||||
"""
|
|
||||||
raw_output[1] = """
|
|
||||||
Device:Device 738c Mem=32.0GB #CUs=120 Freq=1502Mhz MallocMode=pinned
|
|
||||||
test atts units median mean stddev min max
|
|
||||||
D2H_Bandwidth_pinned +064By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
|
||||||
D2H_Bandwidth_pinned +256By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
|
||||||
D2H_Bandwidth_pinned +512By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
|
||||||
D2H_Bandwidth_pinned 1kB GB/sec 0.0428 0.0426 0.0019 0.0114 0.0446
|
|
||||||
D2H_Bandwidth_pinned 2kB GB/sec 0.0850 0.0844 0.0034 0.0415 0.0893
|
|
||||||
D2H_Bandwidth_pinned 4kB GB/sec 0.1701 0.1687 0.0084 0.0504 0.1773
|
|
||||||
D2H_Bandwidth_pinned 8kB GB/sec 0.3378 0.3348 0.0168 0.1085 0.3546
|
|
||||||
D2H_Bandwidth_pinned 16kB GB/sec 0.6667 0.6606 0.0218 0.5618 0.6897
|
|
||||||
D2H_Bandwidth_pinned 32kB GB/sec 1.3072 1.2954 0.0663 0.5682 1.3605
|
|
||||||
D2H_Bandwidth_pinned 64kB GB/sec 2.5550 2.5339 0.0955 2.1382 2.6904
|
|
||||||
D2H_Bandwidth_pinned 128kB GB/sec 4.8162 4.7807 0.2331 2.0940 4.9621
|
|
||||||
D2H_Bandwidth_pinned 256kB GB/sec 8.2286 8.2192 0.1671 7.2456 8.5286
|
|
||||||
D2H_Bandwidth_pinned 512kB GB/sec 12.7930 12.7062 0.4407 7.1196 13.0478
|
|
||||||
D2H_Bandwidth_pinned 1024kB GB/sec 17.5603 17.4938 0.3921 12.7184 17.7989
|
|
||||||
D2H_Bandwidth_pinned 2048kB GB/sec 21.6275 21.5591 0.2233 20.6073 21.8076
|
|
||||||
D2H_Bandwidth_pinned 4096kB GB/sec 24.2708 24.2556 0.0942 23.5724 24.4292
|
|
||||||
D2H_Bandwidth_pinned 8192kB GB/sec 24.9287 24.9093 0.0733 24.7171 25.0359
|
|
||||||
D2H_Bandwidth_pinned 16384kB GB/sec 26.4588 26.1976 2.4387 1.9387 26.5191
|
|
||||||
D2H_Bandwidth_pinned 32768kB GB/sec 27.2939 27.1202 0.7941 23.2086 27.3277
|
|
||||||
D2H_Bandwidth_pinned 65536kB GB/sec 26.8278 26.7238 0.3894 24.7946 26.9000
|
|
||||||
D2H_Bandwidth_pinned 131072kB GB/sec 27.4751 27.3457 0.3968 25.4168 27.5098
|
|
||||||
D2H_Bandwidth_pinned 262144kB GB/sec 27.8236 27.7173 0.3072 26.7977 27.8525
|
|
||||||
D2H_Bandwidth_pinned 524288kB GB/sec 28.0193 27.9348 0.1912 27.4707 28.0314
|
|
||||||
D2H_Time_pinned +064By ms 0.0229 0.0246 0.0457 0.0216 1.4690
|
|
||||||
D2H_Time_pinned +256By ms 0.0232 0.0234 0.0013 0.0221 0.0378
|
|
||||||
D2H_Time_pinned +512By ms 0.0234 0.0238 0.0063 0.0224 0.2091
|
|
||||||
D2H_Time_pinned 1kB ms 0.0234 0.0236 0.0028 0.0224 0.0875
|
|
||||||
D2H_Time_pinned 2kB ms 0.0235 0.0237 0.0014 0.0224 0.0482
|
|
||||||
D2H_Time_pinned 4kB ms 0.0235 0.0239 0.0031 0.0226 0.0794
|
|
||||||
D2H_Time_pinned 8kB ms 0.0237 0.0240 0.0027 0.0226 0.0738
|
|
||||||
D2H_Time_pinned 16kB ms 0.0240 0.0242 0.0009 0.0232 0.0285
|
|
||||||
D2H_Time_pinned 32kB ms 0.0245 0.0248 0.0021 0.0235 0.0563
|
|
||||||
D2H_Time_pinned 64kB ms 0.0254 0.0257 0.0011 0.0242 0.0304
|
|
||||||
D2H_Time_pinned 128kB ms 0.0272 0.0275 0.0026 0.0264 0.0626
|
|
||||||
D2H_Time_pinned 256kB ms 0.0318 0.0319 0.0007 0.0307 0.0362
|
|
||||||
D2H_Time_pinned 512kB ms 0.0410 0.0413 0.0024 0.0402 0.0736
|
|
||||||
D2H_Time_pinned 1024kB ms 0.0597 0.0599 0.0017 0.0589 0.0824
|
|
||||||
D2H_Time_pinned 2048kB ms 0.0970 0.0973 0.0010 0.0962 0.1018
|
|
||||||
D2H_Time_pinned 4096kB ms 0.1728 0.1729 0.0007 0.1717 0.1779
|
|
||||||
D2H_Time_pinned 8192kB ms 0.3365 0.3367 0.0010 0.3350 0.3394
|
|
||||||
D2H_Time_pinned 16384kB ms 0.6341 0.7147 0.7979 0.6326 8.6538
|
|
||||||
D2H_Time_pinned 32768kB ms 1.2294 1.2385 0.0420 1.2278 1.4458
|
|
||||||
D2H_Time_pinned 65536kB ms 2.5014 2.5117 0.0391 2.4947 2.7066
|
|
||||||
D2H_Time_pinned 131072kB ms 4.8850 4.9092 0.0748 4.8789 5.2806
|
|
||||||
D2H_Time_pinned 262144kB ms 9.6478 9.6860 0.1106 9.6377 10.0171
|
|
||||||
D2H_Time_pinned 524288kB ms 19.1607 19.2196 0.1333 19.1525 19.5434
|
|
||||||
|
|
||||||
Note: results marked with (*) had missing values such as
|
|
||||||
might occur with a mixture of architectural capabilities.
|
|
||||||
"""
|
|
||||||
|
|
||||||
for i, metric in enumerate(['h2d_bw', 'd2h_bw']):
|
for i, metric in enumerate(['h2d_bw', 'd2h_bw']):
|
||||||
assert (benchmark._process_raw_result(i, raw_output[i]))
|
assert (benchmark._process_raw_result(i, raw_output[i]))
|
||||||
assert (metric in benchmark.result)
|
assert (metric in benchmark.result)
|
||||||
|
|
|
@ -3,36 +3,29 @@
|
||||||
|
|
||||||
"""Tests for tensorrt-inference benchmark."""
|
"""Tests for tensorrt-inference benchmark."""
|
||||||
|
|
||||||
import os
|
|
||||||
import shutil
|
|
||||||
import tempfile
|
|
||||||
import unittest
|
import unittest
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from types import SimpleNamespace
|
from types import SimpleNamespace
|
||||||
|
|
||||||
from tests.helper import decorator
|
from tests.helper import decorator
|
||||||
|
from tests.helper.testcase import BenchmarkTestCase
|
||||||
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
from superbench.benchmarks import BenchmarkRegistry, BenchmarkType, ReturnCode, Platform
|
||||||
from superbench.benchmarks.result import BenchmarkResult
|
from superbench.benchmarks.result import BenchmarkResult
|
||||||
|
|
||||||
|
|
||||||
class TensorRTInferenceBenchmarkTestCase(unittest.TestCase):
|
class TensorRTInferenceBenchmarkTestCase(BenchmarkTestCase, unittest.TestCase):
|
||||||
"""Class for tensorrt-inferencee benchmark test cases."""
|
"""Class for tensorrt-inferencee benchmark test cases."""
|
||||||
def setUp(self):
|
@classmethod
|
||||||
"""Hook method for setting up the test fixture before exercising it."""
|
def setUpClass(cls):
|
||||||
self.benchmark_name = 'tensorrt-inference'
|
"""Hook method for setting up class fixture before running tests in the class."""
|
||||||
self.__tmp_dir = tempfile.mkdtemp()
|
super().setUpClass()
|
||||||
self.__model_path = Path(self.__tmp_dir) / 'hub' / 'onnx'
|
cls.benchmark_name = 'tensorrt-inference'
|
||||||
self.__curr_micro_path = os.environ.get('SB_MICRO_PATH', '')
|
cls._model_path = Path(cls._tmp_dir) / 'hub' / 'onnx'
|
||||||
os.environ['TORCH_HOME'] = self.__tmp_dir
|
cls.createMockEnvs(cls, {
|
||||||
os.environ['SB_MICRO_PATH'] = self.__tmp_dir
|
'TORCH_HOME': cls._tmp_dir,
|
||||||
(Path(self.__tmp_dir) / 'bin').mkdir(parents=True, exist_ok=True)
|
'SB_MICRO_PATH': cls._tmp_dir,
|
||||||
(Path(self.__tmp_dir) / 'bin' / 'trtexec').touch(mode=0o755, exist_ok=True)
|
})
|
||||||
|
cls.createMockFiles(cls, ['bin/trtexec'])
|
||||||
def tearDown(self):
|
|
||||||
"""Hook method for deconstructing the test fixture after testing it."""
|
|
||||||
shutil.rmtree(self.__tmp_dir)
|
|
||||||
os.environ['SB_MICRO_PATH'] = self.__curr_micro_path
|
|
||||||
del os.environ['TORCH_HOME']
|
|
||||||
|
|
||||||
def test_tensorrt_inference_cls(self):
|
def test_tensorrt_inference_cls(self):
|
||||||
"""Test tensorrt-inference benchmark class."""
|
"""Test tensorrt-inference benchmark class."""
|
||||||
|
@ -116,7 +109,7 @@ class TensorRTInferenceBenchmarkTestCase(unittest.TestCase):
|
||||||
|
|
||||||
# Check models
|
# Check models
|
||||||
for model in benchmark._args.pytorch_models:
|
for model in benchmark._args.pytorch_models:
|
||||||
self.assertTrue((self.__model_path / f'{model}.onnx').is_file())
|
self.assertTrue((self._model_path / f'{model}.onnx').is_file())
|
||||||
|
|
||||||
# Command list should equal to default model number
|
# Command list should equal to default model number
|
||||||
self.assertEqual(
|
self.assertEqual(
|
||||||
|
|
|
@ -0,0 +1,89 @@
|
||||||
|
[CUDA Bandwidth Test] - Starting...
|
||||||
|
Running on...
|
||||||
|
|
||||||
|
Device 0: Tesla V100-PCIE-32GB
|
||||||
|
Shmoo Mode
|
||||||
|
|
||||||
|
.................................................................................
|
||||||
|
bandwidthTest-D2D, Bandwidth = 0.4 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 0.1 GB/s, Time = 0.00004 s, Size = 2000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 0.8 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 1.2 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 0.4 GB/s, Time = 0.00001 s, Size = 5000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 1.7 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 7.0 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 8.0 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 9.0 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 10.0 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 6.1 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 12.0 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 13.1 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 8.0 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 8.9 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 9.5 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 9.8 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 19.0 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 22.0 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 6.3 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 0.7 GB/s, Time = 0.00004 s, Size = 26000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 28.1 GB/s, Time = 0.00000 s, Size = 28000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 30.1 GB/s, Time = 0.00000 s, Size = 30000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 32.0 GB/s, Time = 0.00000 s, Size = 32000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 14.6 GB/s, Time = 0.00000 s, Size = 34000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 20.9 GB/s, Time = 0.00000 s, Size = 36000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 22.7 GB/s, Time = 0.00000 s, Size = 38000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 23.5 GB/s, Time = 0.00000 s, Size = 40000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 24.8 GB/s, Time = 0.00000 s, Size = 42000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 44.1 GB/s, Time = 0.00000 s, Size = 44000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 27.2 GB/s, Time = 0.00000 s, Size = 46000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 48.0 GB/s, Time = 0.00000 s, Size = 48000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 28.5 GB/s, Time = 0.00000 s, Size = 50000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 60.2 GB/s, Time = 0.00000 s, Size = 60000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 42.7 GB/s, Time = 0.00000 s, Size = 70000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 8.4 GB/s, Time = 0.00001 s, Size = 80000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 55.6 GB/s, Time = 0.00000 s, Size = 90000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 59.6 GB/s, Time = 0.00000 s, Size = 100000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 127.9 GB/s, Time = 0.00000 s, Size = 200000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 183.1 GB/s, Time = 0.00000 s, Size = 300000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 270.2 GB/s, Time = 0.00000 s, Size = 400000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 15.5 GB/s, Time = 0.00003 s, Size = 500000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 399.2 GB/s, Time = 0.00000 s, Size = 600000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 172.1 GB/s, Time = 0.00000 s, Size = 700000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 27.5 GB/s, Time = 0.00003 s, Size = 800000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 71.3 GB/s, Time = 0.00001 s, Size = 900000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 502.2 GB/s, Time = 0.00000 s, Size = 1000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 59.4 GB/s, Time = 0.00003 s, Size = 2000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 348.7 GB/s, Time = 0.00001 s, Size = 3000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 519.4 GB/s, Time = 0.00001 s, Size = 4000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 422.3 GB/s, Time = 0.00001 s, Size = 5000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 447.9 GB/s, Time = 0.00001 s, Size = 6000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 225.3 GB/s, Time = 0.00003 s, Size = 7000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 146.0 GB/s, Time = 0.00005 s, Size = 8000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 190.9 GB/s, Time = 0.00005 s, Size = 9000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 301.1 GB/s, Time = 0.00003 s, Size = 10000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 192.8 GB/s, Time = 0.00006 s, Size = 11000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 243.9 GB/s, Time = 0.00005 s, Size = 12000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 328.7 GB/s, Time = 0.00004 s, Size = 13000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 621.2 GB/s, Time = 0.00002 s, Size = 14000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 682.5 GB/s, Time = 0.00002 s, Size = 15000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 686.3 GB/s, Time = 0.00002 s, Size = 16000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 693.1 GB/s, Time = 0.00003 s, Size = 18000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 707.0 GB/s, Time = 0.00003 s, Size = 20000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 714.4 GB/s, Time = 0.00003 s, Size = 22000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 719.4 GB/s, Time = 0.00003 s, Size = 24000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 723.2 GB/s, Time = 0.00004 s, Size = 26000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 726.7 GB/s, Time = 0.00004 s, Size = 28000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 728.8 GB/s, Time = 0.00004 s, Size = 30000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 724.2 GB/s, Time = 0.00004 s, Size = 32000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 735.3 GB/s, Time = 0.00005 s, Size = 36000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 741.1 GB/s, Time = 0.00005 s, Size = 40000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 748.9 GB/s, Time = 0.00006 s, Size = 44000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 748.9 GB/s, Time = 0.00006 s, Size = 48000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 754.1 GB/s, Time = 0.00007 s, Size = 52000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 757.4 GB/s, Time = 0.00007 s, Size = 56000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 758.5 GB/s, Time = 0.00008 s, Size = 60000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 772.0 GB/s, Time = 0.00008 s, Size = 64000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2D, Bandwidth = 762.8 GB/s, Time = 0.00009 s, Size = 68000000 bytes, NumDevsUsed = 1
|
||||||
|
Result = PASS
|
|
@ -0,0 +1,89 @@
|
||||||
|
[CUDA Bandwidth Test] - Starting...
|
||||||
|
Running on...
|
||||||
|
|
||||||
|
Device 0: Tesla V100-PCIE-32GB
|
||||||
|
Shmoo Mode
|
||||||
|
|
||||||
|
.................................................................................
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 0.4 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 0.5 GB/s, Time = 0.00000 s, Size = 2000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 0.9 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 1.1 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 1.4 GB/s, Time = 0.00000 s, Size = 5000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 1.9 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 2.6 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 2.9 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 3.3 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 3.7 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 4.0 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 4.5 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 4.9 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 5.6 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 5.7 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 6.0 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 6.2 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 6.3 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 6.5 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 6.9 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 7.1 GB/s, Time = 0.00000 s, Size = 26000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 7.4 GB/s, Time = 0.00000 s, Size = 28000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 7.6 GB/s, Time = 0.00000 s, Size = 30000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 7.9 GB/s, Time = 0.00000 s, Size = 32000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 8.0 GB/s, Time = 0.00000 s, Size = 34000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 8.3 GB/s, Time = 0.00000 s, Size = 36000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 8.5 GB/s, Time = 0.00000 s, Size = 38000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 8.6 GB/s, Time = 0.00000 s, Size = 40000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 8.7 GB/s, Time = 0.00000 s, Size = 42000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 9.3 GB/s, Time = 0.00000 s, Size = 44000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 9.4 GB/s, Time = 0.00000 s, Size = 46000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 9.5 GB/s, Time = 0.00001 s, Size = 48000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 9.5 GB/s, Time = 0.00001 s, Size = 50000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 10.1 GB/s, Time = 0.00001 s, Size = 60000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 10.4 GB/s, Time = 0.00001 s, Size = 70000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 10.6 GB/s, Time = 0.00001 s, Size = 80000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 10.9 GB/s, Time = 0.00001 s, Size = 90000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 11.1 GB/s, Time = 0.00001 s, Size = 100000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.0 GB/s, Time = 0.00002 s, Size = 200000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00002 s, Size = 300000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.6 GB/s, Time = 0.00003 s, Size = 400000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.6 GB/s, Time = 0.00004 s, Size = 500000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.7 GB/s, Time = 0.00005 s, Size = 600000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.7 GB/s, Time = 0.00006 s, Size = 700000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.8 GB/s, Time = 0.00006 s, Size = 800000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.9 GB/s, Time = 0.00007 s, Size = 900000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.8 GB/s, Time = 0.00008 s, Size = 1000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.0 GB/s, Time = 0.00015 s, Size = 2000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.0 GB/s, Time = 0.00023 s, Size = 3000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00031 s, Size = 4000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00038 s, Size = 5000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00046 s, Size = 6000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00053 s, Size = 7000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00061 s, Size = 8000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.5 GB/s, Time = 0.00072 s, Size = 9000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00076 s, Size = 10000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00084 s, Size = 11000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00091 s, Size = 12000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00099 s, Size = 13000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00106 s, Size = 14000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00114 s, Size = 15000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00122 s, Size = 16000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00137 s, Size = 18000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00152 s, Size = 20000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00167 s, Size = 22000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00183 s, Size = 24000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 12.9 GB/s, Time = 0.00202 s, Size = 26000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00213 s, Size = 28000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00228 s, Size = 30000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00243 s, Size = 32000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00273 s, Size = 36000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00304 s, Size = 40000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00334 s, Size = 44000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00364 s, Size = 48000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00395 s, Size = 52000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00425 s, Size = 56000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.2 GB/s, Time = 0.00455 s, Size = 60000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00487 s, Size = 64000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-D2H-Pinned, Bandwidth = 13.1 GB/s, Time = 0.00520 s, Size = 68000000 bytes, NumDevsUsed = 1
|
||||||
|
Result = PASS
|
|
@ -0,0 +1,89 @@
|
||||||
|
[CUDA Bandwidth Test] - Starting...
|
||||||
|
Running on...
|
||||||
|
|
||||||
|
Device 0: Tesla V100-PCIE-32GB
|
||||||
|
Shmoo Mode
|
||||||
|
|
||||||
|
.................................................................................
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 0.4 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 0.7 GB/s, Time = 0.00000 s, Size = 2000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 1.0 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 1.4 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 1.7 GB/s, Time = 0.00000 s, Size = 5000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 2.0 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 2.3 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 2.5 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 2.7 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 2.9 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 3.2 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 3.4 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 3.5 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 3.5 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 3.8 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 4.0 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 4.1 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 4.3 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 4.4 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 4.6 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 4.8 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 5.0 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 5.2 GB/s, Time = 0.00000 s, Size = 26000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 5.4 GB/s, Time = 0.00001 s, Size = 28000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 5.7 GB/s, Time = 0.00001 s, Size = 30000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 5.9 GB/s, Time = 0.00001 s, Size = 32000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 6.1 GB/s, Time = 0.00001 s, Size = 34000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 6.3 GB/s, Time = 0.00001 s, Size = 36000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 6.4 GB/s, Time = 0.00001 s, Size = 38000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 6.6 GB/s, Time = 0.00001 s, Size = 40000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 6.7 GB/s, Time = 0.00001 s, Size = 42000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 6.9 GB/s, Time = 0.00001 s, Size = 44000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 7.0 GB/s, Time = 0.00001 s, Size = 46000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 7.1 GB/s, Time = 0.00001 s, Size = 48000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 7.3 GB/s, Time = 0.00001 s, Size = 50000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 7.8 GB/s, Time = 0.00001 s, Size = 60000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 8.2 GB/s, Time = 0.00001 s, Size = 70000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 8.6 GB/s, Time = 0.00001 s, Size = 80000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 8.9 GB/s, Time = 0.00001 s, Size = 90000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 9.2 GB/s, Time = 0.00001 s, Size = 100000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 10.5 GB/s, Time = 0.00002 s, Size = 200000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.1 GB/s, Time = 0.00003 s, Size = 300000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.4 GB/s, Time = 0.00004 s, Size = 400000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.6 GB/s, Time = 0.00004 s, Size = 500000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.7 GB/s, Time = 0.00005 s, Size = 600000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.8 GB/s, Time = 0.00006 s, Size = 700000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.9 GB/s, Time = 0.00007 s, Size = 800000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.9 GB/s, Time = 0.00008 s, Size = 900000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.7 GB/s, Time = 0.00009 s, Size = 1000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.1 GB/s, Time = 0.00016 s, Size = 2000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00024 s, Size = 3000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00033 s, Size = 4000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.5 GB/s, Time = 0.00043 s, Size = 5000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00049 s, Size = 6000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00057 s, Size = 7000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00065 s, Size = 8000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.3 GB/s, Time = 0.00073 s, Size = 9000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00081 s, Size = 10000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00089 s, Size = 11000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00097 s, Size = 12000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00105 s, Size = 13000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00113 s, Size = 14000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00121 s, Size = 15000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00129 s, Size = 16000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00145 s, Size = 18000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00162 s, Size = 20000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00178 s, Size = 22000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00194 s, Size = 24000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00210 s, Size = 26000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00226 s, Size = 28000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00242 s, Size = 30000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 10.5 GB/s, Time = 0.00304 s, Size = 32000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.2 GB/s, Time = 0.00295 s, Size = 36000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 10.8 GB/s, Time = 0.00369 s, Size = 40000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00355 s, Size = 44000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00387 s, Size = 48000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.1 GB/s, Time = 0.00431 s, Size = 52000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 11.7 GB/s, Time = 0.00480 s, Size = 56000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00484 s, Size = 60000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.1 GB/s, Time = 0.00528 s, Size = 64000000 bytes, NumDevsUsed = 1
|
||||||
|
bandwidthTest-H2D-Pinned, Bandwidth = 12.4 GB/s, Time = 0.00549 s, Size = 68000000 bytes, NumDevsUsed = 1
|
||||||
|
Result = PASS
|
|
@ -0,0 +1,2 @@
|
||||||
|
{"Category": "KernelLaunch", "Defective Details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)", "kernel-launch/event_overhead:0": 15.7785234899, "kernel-launch/event_overhead:1": -0.0016778523, "kernel-launch/event_overhead:2": -0.0654362416, "kernel-launch/event_overhead:3": -0.0771812081, "kernel-launch/event_overhead:4": -0.0067114094, "kernel-launch/event_overhead:5": -0.0117449664, "kernel-launch/event_overhead:6": -0.0402684564, "kernel-launch/event_overhead:7": -0.0100671141, "kernel-launch/return_code": 0.0, "kernel-launch/wall_overhead:0": 0.0, "kernel-launch/wall_overhead:1": 0.0, "kernel-launch/wall_overhead:2": 0.0194931774, "kernel-launch/wall_overhead:3": 0.022417154, "kernel-launch/wall_overhead:4": 0.0360623782, "kernel-launch/wall_overhead:5": -0.0194931774, "kernel-launch/wall_overhead:6": 0.0185185185, "kernel-launch/wall_overhead:7": 0.0438596491, "mem-bw/D2H_Mem_BW:0": 0.0, "mem-bw/D2H_Mem_BW:1": 0.012345679, "mem-bw/D2H_Mem_BW:2": 0.0082304527, "mem-bw/D2H_Mem_BW:3": 0.012345679, "mem-bw/D2H_Mem_BW:4": 0.0, "mem-bw/D2H_Mem_BW:5": 0.0, "mem-bw/D2H_Mem_BW:6": -0.0164609053, "mem-bw/D2H_Mem_BW:7": 0.012345679, "mem-bw/H2D_Mem_BW:0": 0.0, "mem-bw/H2D_Mem_BW:1": 0.0078125, "mem-bw/H2D_Mem_BW:2": 0.015625, "mem-bw/H2D_Mem_BW:3": 0.01953125, "mem-bw/H2D_Mem_BW:4": 0.0234375, "mem-bw/H2D_Mem_BW:5": 0.0078125, "mem-bw/H2D_Mem_BW:6": -0.01171875, "mem-bw/H2D_Mem_BW:7": 0.01953125, "mem-bw/return_code": 0.0, "Index": "sb-validation-01"}
|
||||||
|
{"Category": "FailedTest,Mem", "Defective Details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)", "kernel-launch/event_overhead:0": 0.0, "kernel-launch/event_overhead:1": -0.0016778523, "kernel-launch/event_overhead:2": -0.0654362416, "kernel-launch/event_overhead:3": -0.0771812081, "kernel-launch/event_overhead:4": -0.0067114094, "kernel-launch/event_overhead:5": -0.0117449664, "kernel-launch/event_overhead:6": -0.0402684564, "kernel-launch/event_overhead:7": -0.0100671141, "kernel-launch/return_code": 0.0, "kernel-launch/wall_overhead:0": 0.0, "kernel-launch/wall_overhead:1": 0.0, "kernel-launch/wall_overhead:2": 0.0194931774, "kernel-launch/wall_overhead:3": 0.022417154, "kernel-launch/wall_overhead:4": 0.0360623782, "kernel-launch/wall_overhead:5": -0.0194931774, "kernel-launch/wall_overhead:6": 0.0185185185, "kernel-launch/wall_overhead:7": 0.0438596491, "mem-bw/D2H_Mem_BW:0": null, "mem-bw/D2H_Mem_BW:1": null, "mem-bw/D2H_Mem_BW:2": null, "mem-bw/D2H_Mem_BW:3": null, "mem-bw/D2H_Mem_BW:4": null, "mem-bw/D2H_Mem_BW:5": null, "mem-bw/D2H_Mem_BW:6": null, "mem-bw/D2H_Mem_BW:7": null, "mem-bw/H2D_Mem_BW:0": null, "mem-bw/H2D_Mem_BW:1": null, "mem-bw/H2D_Mem_BW:2": null, "mem-bw/H2D_Mem_BW:3": null, "mem-bw/H2D_Mem_BW:4": null, "mem-bw/H2D_Mem_BW:5": null, "mem-bw/H2D_Mem_BW:6": null, "mem-bw/H2D_Mem_BW:7": null, "mem-bw/return_code": 1.0, "Index": "sb-validation-03"}
|
Двоичный файл не отображается.
|
@ -0,0 +1,309 @@
|
||||||
|
{
|
||||||
|
"fio version" : "fio-3.16",
|
||||||
|
"timestamp" : 1626763278,
|
||||||
|
"timestamp_ms" : 1626763278577,
|
||||||
|
"time" : "Tue Jul 20 06:41:18 2021",
|
||||||
|
"global options" : {
|
||||||
|
"filename" : "/dev/nvme0n1",
|
||||||
|
"ramp_time" : "10s",
|
||||||
|
"runtime" : "30s",
|
||||||
|
"iodepth" : "64",
|
||||||
|
"numjobs" : "4",
|
||||||
|
"randrepeat" : "1",
|
||||||
|
"thread" : "1",
|
||||||
|
"ioengine" : "libaio",
|
||||||
|
"direct" : "1",
|
||||||
|
"norandommap" : "1",
|
||||||
|
"lat_percentiles" : "1",
|
||||||
|
"group_reporting" : "1"
|
||||||
|
},
|
||||||
|
"jobs" : [
|
||||||
|
{
|
||||||
|
"jobname" : "rand_read_write",
|
||||||
|
"groupid" : 0,
|
||||||
|
"error" : 0,
|
||||||
|
"eta" : 0,
|
||||||
|
"elapsed" : 41,
|
||||||
|
"job options" : {
|
||||||
|
"name" : "rand_read",
|
||||||
|
"rw" : "randrw",
|
||||||
|
"bs" : "4096",
|
||||||
|
"time_based" : "1"
|
||||||
|
},
|
||||||
|
"read" : {
|
||||||
|
"io_bytes" : 10463010816,
|
||||||
|
"io_kbytes" : 10217784,
|
||||||
|
"bw_bytes" : 348743777,
|
||||||
|
"bw" : 340570,
|
||||||
|
"iops" : 85138.890741,
|
||||||
|
"runtime" : 30002,
|
||||||
|
"total_ios" : 2554337,
|
||||||
|
"short_ios" : 0,
|
||||||
|
"drop_ios" : 0,
|
||||||
|
"slat_ns" : {
|
||||||
|
"min" : 1332,
|
||||||
|
"max" : 48691,
|
||||||
|
"mean" : 2032.588341,
|
||||||
|
"stddev" : 864.921965
|
||||||
|
},
|
||||||
|
"clat_ns" : {
|
||||||
|
"min" : 278533,
|
||||||
|
"max" : 10175655,
|
||||||
|
"mean" : 1444476.063469,
|
||||||
|
"stddev" : 300748.583131
|
||||||
|
},
|
||||||
|
"lat_ns" : {
|
||||||
|
"min" : 280646,
|
||||||
|
"max" : 10177629,
|
||||||
|
"mean" : 1446562.147113,
|
||||||
|
"stddev" : 300723.879349,
|
||||||
|
"percentile" : {
|
||||||
|
"1.000000" : 872448,
|
||||||
|
"5.000000" : 1036288,
|
||||||
|
"10.000000" : 1122304,
|
||||||
|
"20.000000" : 1220608,
|
||||||
|
"30.000000" : 1286144,
|
||||||
|
"40.000000" : 1351680,
|
||||||
|
"50.000000" : 1417216,
|
||||||
|
"60.000000" : 1482752,
|
||||||
|
"70.000000" : 1564672,
|
||||||
|
"80.000000" : 1662976,
|
||||||
|
"90.000000" : 1810432,
|
||||||
|
"95.000000" : 1941504,
|
||||||
|
"99.000000" : 2244608,
|
||||||
|
"99.500000" : 2408448,
|
||||||
|
"99.900000" : 3620864,
|
||||||
|
"99.950000" : 4358144,
|
||||||
|
"99.990000" : 6062080
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bw_min" : 291288,
|
||||||
|
"bw_max" : 380288,
|
||||||
|
"bw_agg" : 99.999134,
|
||||||
|
"bw_mean" : 340567.050000,
|
||||||
|
"bw_dev" : 6222.338382,
|
||||||
|
"bw_samples" : 240,
|
||||||
|
"iops_min" : 72822,
|
||||||
|
"iops_max" : 95072,
|
||||||
|
"iops_mean" : 85141.733333,
|
||||||
|
"iops_stddev" : 1555.582888,
|
||||||
|
"iops_samples" : 240
|
||||||
|
},
|
||||||
|
"write" : {
|
||||||
|
"io_bytes" : 10454208512,
|
||||||
|
"io_kbytes" : 10209188,
|
||||||
|
"bw_bytes" : 348450387,
|
||||||
|
"bw" : 340283,
|
||||||
|
"iops" : 85066.128925,
|
||||||
|
"runtime" : 30002,
|
||||||
|
"total_ios" : 2552154,
|
||||||
|
"short_ios" : 0,
|
||||||
|
"drop_ios" : 0,
|
||||||
|
"slat_ns" : {
|
||||||
|
"min" : 1383,
|
||||||
|
"max" : 315361,
|
||||||
|
"mean" : 2182.824623,
|
||||||
|
"stddev" : 919.625590
|
||||||
|
},
|
||||||
|
"clat_ns" : {
|
||||||
|
"min" : 433904,
|
||||||
|
"max" : 6300941,
|
||||||
|
"mean" : 1558511.433458,
|
||||||
|
"stddev" : 207734.850159
|
||||||
|
},
|
||||||
|
"lat_ns" : {
|
||||||
|
"min" : 441909,
|
||||||
|
"max" : 6302845,
|
||||||
|
"mean" : 1560749.444938,
|
||||||
|
"stddev" : 207695.144244,
|
||||||
|
"percentile" : {
|
||||||
|
"1.000000" : 1155072,
|
||||||
|
"5.000000" : 1269760,
|
||||||
|
"10.000000" : 1318912,
|
||||||
|
"20.000000" : 1384448,
|
||||||
|
"30.000000" : 1449984,
|
||||||
|
"40.000000" : 1499136,
|
||||||
|
"50.000000" : 1531904,
|
||||||
|
"60.000000" : 1597440,
|
||||||
|
"70.000000" : 1646592,
|
||||||
|
"80.000000" : 1728512,
|
||||||
|
"90.000000" : 1826816,
|
||||||
|
"95.000000" : 1908736,
|
||||||
|
"99.000000" : 2072576,
|
||||||
|
"99.500000" : 2179072,
|
||||||
|
"99.900000" : 2605056,
|
||||||
|
"99.950000" : 3031040,
|
||||||
|
"99.990000" : 4358144
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bw_min" : 288464,
|
||||||
|
"bw_max" : 380080,
|
||||||
|
"bw_agg" : 99.998134,
|
||||||
|
"bw_mean" : 340276.650000,
|
||||||
|
"bw_dev" : 6293.894521,
|
||||||
|
"bw_samples" : 240,
|
||||||
|
"iops_min" : 72116,
|
||||||
|
"iops_max" : 95020,
|
||||||
|
"iops_mean" : 85069.133333,
|
||||||
|
"iops_stddev" : 1573.475038,
|
||||||
|
"iops_samples" : 240
|
||||||
|
},
|
||||||
|
"trim" : {
|
||||||
|
"io_bytes" : 0,
|
||||||
|
"io_kbytes" : 0,
|
||||||
|
"bw_bytes" : 0,
|
||||||
|
"bw" : 0,
|
||||||
|
"iops" : 0.000000,
|
||||||
|
"runtime" : 0,
|
||||||
|
"total_ios" : 0,
|
||||||
|
"short_ios" : 0,
|
||||||
|
"drop_ios" : 0,
|
||||||
|
"slat_ns" : {
|
||||||
|
"min" : 0,
|
||||||
|
"max" : 0,
|
||||||
|
"mean" : 0.000000,
|
||||||
|
"stddev" : 0.000000
|
||||||
|
},
|
||||||
|
"clat_ns" : {
|
||||||
|
"min" : 0,
|
||||||
|
"max" : 0,
|
||||||
|
"mean" : 0.000000,
|
||||||
|
"stddev" : 0.000000
|
||||||
|
},
|
||||||
|
"lat_ns" : {
|
||||||
|
"min" : 0,
|
||||||
|
"max" : 0,
|
||||||
|
"mean" : 0.000000,
|
||||||
|
"stddev" : 0.000000,
|
||||||
|
"percentile" : {
|
||||||
|
"1.000000" : 0,
|
||||||
|
"5.000000" : 0,
|
||||||
|
"10.000000" : 0,
|
||||||
|
"20.000000" : 0,
|
||||||
|
"30.000000" : 0,
|
||||||
|
"40.000000" : 0,
|
||||||
|
"50.000000" : 0,
|
||||||
|
"60.000000" : 0,
|
||||||
|
"70.000000" : 0,
|
||||||
|
"80.000000" : 0,
|
||||||
|
"90.000000" : 0,
|
||||||
|
"95.000000" : 0,
|
||||||
|
"99.000000" : 0,
|
||||||
|
"99.500000" : 0,
|
||||||
|
"99.900000" : 0,
|
||||||
|
"99.950000" : 0,
|
||||||
|
"99.990000" : 0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bw_min" : 0,
|
||||||
|
"bw_max" : 0,
|
||||||
|
"bw_agg" : 0.000000,
|
||||||
|
"bw_mean" : 0.000000,
|
||||||
|
"bw_dev" : 0.000000,
|
||||||
|
"bw_samples" : 0,
|
||||||
|
"iops_min" : 0,
|
||||||
|
"iops_max" : 0,
|
||||||
|
"iops_mean" : 0.000000,
|
||||||
|
"iops_stddev" : 0.000000,
|
||||||
|
"iops_samples" : 0
|
||||||
|
},
|
||||||
|
"sync" : {
|
||||||
|
"lat_ns" : {
|
||||||
|
"min" : 0,
|
||||||
|
"max" : 0,
|
||||||
|
"mean" : 0.000000,
|
||||||
|
"stddev" : 0.000000
|
||||||
|
},
|
||||||
|
"total_ios" : 0
|
||||||
|
},
|
||||||
|
"job_runtime" : 120004,
|
||||||
|
"usr_cpu" : 4.833172,
|
||||||
|
"sys_cpu" : 20.800973,
|
||||||
|
"ctx" : 3542118,
|
||||||
|
"majf" : 0,
|
||||||
|
"minf" : 1263,
|
||||||
|
"iodepth_level" : {
|
||||||
|
"1" : 0.000000,
|
||||||
|
"2" : 0.000000,
|
||||||
|
"4" : 0.000000,
|
||||||
|
"8" : 0.000000,
|
||||||
|
"16" : 0.000000,
|
||||||
|
"32" : 0.000000,
|
||||||
|
">=64" : 100.000000
|
||||||
|
},
|
||||||
|
"iodepth_submit" : {
|
||||||
|
"0" : 0.000000,
|
||||||
|
"4" : 100.000000,
|
||||||
|
"8" : 0.000000,
|
||||||
|
"16" : 0.000000,
|
||||||
|
"32" : 0.000000,
|
||||||
|
"64" : 0.000000,
|
||||||
|
">=64" : 0.000000
|
||||||
|
},
|
||||||
|
"iodepth_complete" : {
|
||||||
|
"0" : 0.000000,
|
||||||
|
"4" : 99.999922,
|
||||||
|
"8" : 0.000000,
|
||||||
|
"16" : 0.000000,
|
||||||
|
"32" : 0.000000,
|
||||||
|
"64" : 0.100000,
|
||||||
|
">=64" : 0.000000
|
||||||
|
},
|
||||||
|
"latency_ns" : {
|
||||||
|
"2" : 0.000000,
|
||||||
|
"4" : 0.000000,
|
||||||
|
"10" : 0.000000,
|
||||||
|
"20" : 0.000000,
|
||||||
|
"50" : 0.000000,
|
||||||
|
"100" : 0.000000,
|
||||||
|
"250" : 0.000000,
|
||||||
|
"500" : 0.000000,
|
||||||
|
"750" : 0.000000,
|
||||||
|
"1000" : 0.000000
|
||||||
|
},
|
||||||
|
"latency_us" : {
|
||||||
|
"2" : 0.000000,
|
||||||
|
"4" : 0.000000,
|
||||||
|
"10" : 0.000000,
|
||||||
|
"20" : 0.000000,
|
||||||
|
"50" : 0.000000,
|
||||||
|
"100" : 0.000000,
|
||||||
|
"250" : 0.000000,
|
||||||
|
"500" : 0.010000,
|
||||||
|
"750" : 0.070126,
|
||||||
|
"1000" : 1.756079
|
||||||
|
},
|
||||||
|
"latency_ms" : {
|
||||||
|
"2" : 95.414131,
|
||||||
|
"4" : 2.722457,
|
||||||
|
"10" : 0.040830,
|
||||||
|
"20" : 0.010000,
|
||||||
|
"50" : 0.000000,
|
||||||
|
"100" : 0.000000,
|
||||||
|
"250" : 0.000000,
|
||||||
|
"500" : 0.000000,
|
||||||
|
"750" : 0.000000,
|
||||||
|
"1000" : 0.000000,
|
||||||
|
"2000" : 0.000000,
|
||||||
|
">=2000" : 0.000000
|
||||||
|
},
|
||||||
|
"latency_depth" : 64,
|
||||||
|
"latency_target" : 0,
|
||||||
|
"latency_percentile" : 100.000000,
|
||||||
|
"latency_window" : 0
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"disk_util" : [
|
||||||
|
{
|
||||||
|
"name" : "nvme0n1",
|
||||||
|
"read_ios" : 3004914,
|
||||||
|
"write_ios" : 3003760,
|
||||||
|
"read_merges" : 0,
|
||||||
|
"write_merges" : 0,
|
||||||
|
"read_ticks" : 4269143,
|
||||||
|
"write_ticks" : 4598453,
|
||||||
|
"in_queue" : 11104,
|
||||||
|
"util" : 99.840351
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
|
@ -0,0 +1,97 @@
|
||||||
|
NetworkLoad Tests v1.3
|
||||||
|
Test with 10 MPI ranks (10 nodes)
|
||||||
|
2 nodes running Network Tests
|
||||||
|
8 nodes running Congestion Tests (min 100 nodes per congestor)
|
||||||
|
|
||||||
|
Legend
|
||||||
|
RR = random ring communication pattern
|
||||||
|
Lat = latency
|
||||||
|
BW = bandwidth
|
||||||
|
BW+Sync = bandwidth with barrier
|
||||||
|
+------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| Isolated Network Tests |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided Lat (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided BW+Sync (131072 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Multiple Allreduce (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
|
||||||
|
+------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| Isolated Congestion Tests |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
|
||||||
|
+------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| Network Tests running with Congestion Tests ( RR Two-sided Lat Network Test) |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided Lat (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
|
||||||
|
+------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| Network Tests running with Congestion Tests (RR Two-sided BW+Sync Network Test) |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided BW+Sync (131072 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
|
||||||
|
+------------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| Network Tests running with Congestion Tests ( Multiple Allreduce Network Test) |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Name | Min | Max | Avg | Avg(Worst) | 99% | 99.9% | Units |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Multiple Allreduce (8 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Alltoall (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Two-sided Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Put Incast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
| Get Bcast (4096 B) | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|
||||||
|
|
||||||
|
+------------------------------------------------------------------------------+
|
||||||
|
| Network Tests running with Congestion Tests - Key Results |
|
||||||
|
+---------------------------------+--------------------------------------------+
|
||||||
|
| Name | Congestion Impact Factor |
|
||||||
|
+---------------------------------+----------------------+---------------------+
|
||||||
|
| | Avg | 99% |
|
||||||
|
+---------------------------------+----------------------+---------------------+
|
||||||
|
| RR Two-sided Lat (8 B) | 0.0X | 0.0X |
|
||||||
|
+---------------------------------+----------------------+---------------------+
|
||||||
|
| RR Two-sided BW+Sync (131072 B) | 0.0X | 0.0X |
|
||||||
|
+---------------------------------+----------------------+---------------------+
|
||||||
|
| Multiple Allreduce (8 B) | 0.0X | 0.0X |
|
||||||
|
+---------------------------------+----------------------+---------------------+
|
|
@ -0,0 +1,12 @@
|
||||||
|
ERROR: this application must be run on at least 10 nodes
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
Primary job terminated normally, but 1 process returned
|
||||||
|
a non-zero exit code. Per user-direction, the job has been aborted.
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
mpirun detected that one or more processes exited with non-zero status, thus causing
|
||||||
|
the job to be terminated. The first process to do so was:
|
||||||
|
|
||||||
|
Process name: [[63697,1],0]
|
||||||
|
Exit code: 1
|
||||||
|
--------------------------------------------------------------------------
|
|
@ -0,0 +1,30 @@
|
||||||
|
Network Tests v1.3
|
||||||
|
Test with 2 MPI ranks (2 nodes)
|
||||||
|
|
||||||
|
Legend
|
||||||
|
RR = random ring communication pattern
|
||||||
|
Nat = natural ring communication pattern
|
||||||
|
Lat = latency
|
||||||
|
BW = bandwidth
|
||||||
|
BW+Sync = bandwidth with barrier
|
||||||
|
+------------------------------------------------------------------------------+
|
||||||
|
| Isolated Network Tests |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| Name | Avg | 99% | Units |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided Lat (8 B) | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| RR Get Lat (8 B) | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided BW (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| RR Put BW (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| RR Two-sided BW+Sync (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| Nat Two-sided BW (131072 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| Multiple Allreduce (8 B) | 10000.0 | 10000.0 | usec |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
||||||
|
| Multiple Alltoall (4096 B) | 10000.0 | 10000.0 | MiB/s/rank |
|
||||||
|
+---------------------------------+--------------+--------------+--------------+
|
|
@ -0,0 +1,12 @@
|
||||||
|
ERROR: this application must be run on at least 2 nodes
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
Primary job terminated normally, but 1 process returned
|
||||||
|
a non-zero exit code. Per user-direction, the job has been aborted.
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
--------------------------------------------------------------------------
|
||||||
|
mpirun detected that one or more processes exited with non-zero status, thus causing
|
||||||
|
the job to be terminated. The first process to do so was:
|
||||||
|
|
||||||
|
Process name: [[63697,1],0]
|
||||||
|
Exit code: 1
|
||||||
|
--------------------------------------------------------------------------
|
|
@ -0,0 +1,43 @@
|
||||||
|
RDMA_Write BW Test
|
||||||
|
Dual-port : OFF Device : ib0
|
||||||
|
Number of qps : 1 Transport type : IB
|
||||||
|
Connection type : RC Using SRQ : OFF
|
||||||
|
PCIe relax order: ON
|
||||||
|
TX depth : 128
|
||||||
|
CQ Moderation : 1
|
||||||
|
Mtu : 4096[B]
|
||||||
|
Link type : IB
|
||||||
|
Max inline data : 0[B]
|
||||||
|
rdma_cm QPs : OFF
|
||||||
|
Data ex. method : Ethernet
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
local address: LID 0xd06 QPN 0x095f PSN 0x3c9e82 RKey 0x080359 VAddr 0x007f9fc479c000
|
||||||
|
remote address: LID 0xd06 QPN 0x095e PSN 0xbd024b RKey 0x080258 VAddr 0x007fe62504b000
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
||||||
|
8388608 20000 24056.74 24056.72 0.003007
|
||||||
|
************************************
|
||||||
|
* Waiting for client to connect... *
|
||||||
|
************************************
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
RDMA_Write BW Test
|
||||||
|
Dual-port : OFF Device : ib0
|
||||||
|
Number of qps : 1 Transport type : IB
|
||||||
|
Connection type : RC Using SRQ : OFF
|
||||||
|
PCIe relax order: ON
|
||||||
|
CQ Moderation : 1
|
||||||
|
Mtu : 4096[B]
|
||||||
|
Link type : IB
|
||||||
|
Max inline data : 0[B]
|
||||||
|
rdma_cm QPs : OFF
|
||||||
|
Data ex. method : Ethernet
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
local address: LID 0xd06 QPN 0x095e PSN 0xbd024b RKey 0x080258 VAddr 0x007fe62504b000
|
||||||
|
remote address: LID 0xd06 QPN 0x095f PSN 0x3c9e82 RKey 0x080359 VAddr 0x007f9fc479c000
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
||||||
|
8388608 20000 24056.74 24056.72 0.003007
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
---------------------------------------------------------------------------------------
|
|
@ -0,0 +1,66 @@
|
||||||
|
************************************
|
||||||
|
* Waiting for client to connect... *
|
||||||
|
************************************
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
RDMA_Write BW Test
|
||||||
|
Dual-port : OFF Device : ib0
|
||||||
|
Number of qps : 1 Transport type : IB
|
||||||
|
Connection type : RC Using SRQ : OFF
|
||||||
|
PCIe relax order: ON
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
RDMA_Write BW Test
|
||||||
|
Dual-port : OFF Device : ib0
|
||||||
|
Number of qps : 1 Transport type : IB
|
||||||
|
Connection type : RC Using SRQ : OFF
|
||||||
|
PCIe relax order: ON
|
||||||
|
ibv_wr* API : ON
|
||||||
|
TX depth : 128
|
||||||
|
CQ Moderation : 100
|
||||||
|
Mtu : 4096[B]
|
||||||
|
Link type : IB
|
||||||
|
Max inline data : 0[B]
|
||||||
|
rdma_cm QPs : OFF
|
||||||
|
Data ex. method : Ethernet
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
ibv_wr* API : ON
|
||||||
|
CQ Moderation : 100
|
||||||
|
Mtu : 4096[B]
|
||||||
|
Link type : IB
|
||||||
|
Max inline data : 0[B]
|
||||||
|
rdma_cm QPs : OFF
|
||||||
|
Data ex. method : Ethernet
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
local address: LID 0xd06 QPN 0x092f PSN 0x3ff1bc RKey 0x080329 VAddr 0x007fc97ff50000
|
||||||
|
local address: LID 0xd06 QPN 0x092e PSN 0x3eb82d RKey 0x080228 VAddr 0x007f19adcbf000
|
||||||
|
remote address: LID 0xd06 QPN 0x092e PSN 0x3eb82d RKey 0x080228 VAddr 0x007f19adcbf000
|
||||||
|
remote address: LID 0xd06 QPN 0x092f PSN 0x3ff1bc RKey 0x080329 VAddr 0x007fc97ff50000
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
||||||
|
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
|
||||||
|
2 2000 5.32 5.30 2.778732
|
||||||
|
4 2000 10.65 10.64 2.788833
|
||||||
|
8 2000 21.30 21.27 2.787609
|
||||||
|
16 2000 42.60 42.55 2.788268
|
||||||
|
32 2000 84.90 82.82 2.713896
|
||||||
|
64 2000 173.55 171.66 2.812504
|
||||||
|
128 2000 362.27 353.83 2.898535
|
||||||
|
256 2000 687.82 679.37 2.782698
|
||||||
|
512 2000 1337.12 1311.59 2.686135
|
||||||
|
1024 2000 2674.25 2649.39 2.712980
|
||||||
|
2048 2000 5248.56 5118.18 2.620509
|
||||||
|
4096 2000 10034.02 9948.41 2.546793
|
||||||
|
8192 2000 18620.51 12782.56 1.636168
|
||||||
|
16384 2000 23115.27 16782.50 1.074080
|
||||||
|
32768 2000 22927.94 18586.03 0.594753
|
||||||
|
65536 2000 23330.56 21167.79 0.338685
|
||||||
|
131072 2000 22750.35 21443.14 0.171545
|
||||||
|
262144 2000 22673.63 22411.35 0.089645
|
||||||
|
524288 2000 22679.02 22678.86 0.045358
|
||||||
|
1048576 2000 22817.06 22816.86 0.022817
|
||||||
|
2097152 2000 22919.37 22919.27 0.011460
|
||||||
|
4194304 2000 23277.93 23277.91 0.005819
|
||||||
|
8388608 2000 23240.68 23240.68 0.002905
|
||||||
|
---------------------------------------------------------------------------------------
|
||||||
|
8388608 2000 23240.68 23240.68 0.002905
|
||||||
|
---------------------------------------------------------------------------------------
|
|
@ -0,0 +1,53 @@
|
||||||
|
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
||||||
|
#
|
||||||
|
# Using devices
|
||||||
|
# Rank 0 Pid 112372 on localhost device 0 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 1 Pid 112372 on localhost device 1 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 2 Pid 112372 on localhost device 2 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 3 Pid 112372 on localhost device 3 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 4 Pid 112372 on localhost device 4 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 5 Pid 112372 on localhost device 5 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 6 Pid 112372 on localhost device 6 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 7 Pid 112372 on localhost device 7 [0x00] A100-SXM4-40GB
|
||||||
|
#
|
||||||
|
# out-of-place in-place
|
||||||
|
# size count type time algbw busbw error time algbw busbw error
|
||||||
|
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
||||||
|
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
||||||
|
0 0 float 34.27 0.00 0.00 N/A 33.57 0.00 0.00 N/A
|
||||||
|
0 0 float 33.41 0.00 0.00 N/A 33.62 0.00 0.00 N/A
|
||||||
|
0 0 float 33.94 0.00 0.00 N/A 33.48 0.00 0.00 N/A
|
||||||
|
0 0 float 33.83 0.00 0.00 N/A 33.62 0.00 0.00 N/A
|
||||||
|
0 0 float 33.82 0.00 0.00 N/A 33.57 0.00 0.00 N/A
|
||||||
|
32 1 float 35.03 0.00 0.00 N/A 34.15 0.00 0.00 N/A
|
||||||
|
64 2 float 34.36 0.00 0.00 N/A 33.83 0.00 0.00 N/A
|
||||||
|
128 4 float 33.94 0.00 0.00 N/A 35.22 0.00 0.00 N/A
|
||||||
|
256 8 float 34.44 0.01 0.01 N/A 34.82 0.01 0.01 N/A
|
||||||
|
512 16 float 34.84 0.01 0.01 N/A 34.76 0.01 0.01 N/A
|
||||||
|
1024 32 float 35.38 0.03 0.03 N/A 34.53 0.03 0.03 N/A
|
||||||
|
2048 64 float 34.67 0.06 0.05 N/A 34.91 0.06 0.05 N/A
|
||||||
|
4096 128 float 34.62 0.12 0.10 N/A 34.81 0.12 0.10 N/A
|
||||||
|
8192 256 float 34.76 0.24 0.21 N/A 35.03 0.23 0.20 N/A
|
||||||
|
16384 512 float 34.80 0.47 0.41 N/A 34.90 0.47 0.41 N/A
|
||||||
|
32768 1024 float 34.54 0.95 0.83 N/A 35.23 0.93 0.81 N/A
|
||||||
|
65536 2048 float 36.34 1.80 1.58 N/A 36.01 1.82 1.59 N/A
|
||||||
|
131072 4096 float 40.18 3.26 2.85 N/A 39.43 3.32 2.91 N/A
|
||||||
|
262144 8192 float 46.45 5.64 4.94 N/A 46.27 5.67 4.96 N/A
|
||||||
|
524288 16384 float 58.48 8.96 7.84 N/A 60.40 8.68 7.60 N/A
|
||||||
|
1048576 32768 float 72.95 14.37 12.58 N/A 73.07 14.35 12.56 N/A
|
||||||
|
2097152 65536 float 77.28 27.14 23.75 N/A 75.84 27.65 24.20 N/A
|
||||||
|
4194304 131072 float 100.7 41.64 36.43 N/A 99.56 42.13 36.86 N/A
|
||||||
|
8388608 262144 float 123.5 67.94 59.44 N/A 120.7 69.51 60.82 N/A
|
||||||
|
16777216 524288 float 167.7 100.03 87.52 N/A 164.6 101.94 89.20 N/A
|
||||||
|
33554432 1048576 float 265.8 126.24 110.46 N/A 257.5 130.33 114.04 N/A
|
||||||
|
67108864 2097152 float 379.7 176.74 154.65 N/A 367.6 182.57 159.75 N/A
|
||||||
|
134217728 4194304 float 698.6 192.13 168.12 N/A 657.3 204.20 178.67 N/A
|
||||||
|
268435456 8388608 float 1192.2 225.16 197.01 N/A 1136.0 236.29 206.76 N/A
|
||||||
|
536870912 16777216 float 2304.1 233.01 203.88 N/A 2227.9 240.98 210.85 N/A
|
||||||
|
1073741824 33554432 float 4413.4 243.29 212.88 N/A 4258.8 252.12 220.61 N/A
|
||||||
|
2147483648 67108864 float 8658.8 248.01 217.01 N/A 8389.4 255.98 223.98 N/A
|
||||||
|
4294967296 134217728 float 17016 252.40 220.85 N/A 16474 260.71 228.12 N/A
|
||||||
|
8589934592 268435456 float 33646 255.31 223.39 N/A 32669 262.94 230.07 N/A
|
||||||
|
# Out of bounds values : 0 OK
|
||||||
|
# Avg bus bandwidth : 58.2651
|
||||||
|
#
|
|
@ -0,0 +1,53 @@
|
||||||
|
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
||||||
|
#
|
||||||
|
# Using devices
|
||||||
|
# Rank 0 Pid 112424 on localhost device 0 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 1 Pid 112424 on localhost device 1 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 2 Pid 112424 on localhost device 2 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 3 Pid 112424 on localhost device 3 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 4 Pid 112424 on localhost device 4 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 5 Pid 112424 on localhost device 5 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 6 Pid 112424 on localhost device 6 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 7 Pid 112424 on localhost device 7 [0x00] A100-SXM4-40GB
|
||||||
|
#
|
||||||
|
# out-of-place in-place
|
||||||
|
# size count type redop time algbw busbw error time algbw busbw error
|
||||||
|
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
||||||
|
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
||||||
|
0 0 float sum 35.20 0.00 0.00 N/A 34.05 0.00 0.00 N/A
|
||||||
|
0 0 float sum 34.18 0.00 0.00 N/A 33.50 0.00 0.00 N/A
|
||||||
|
4 1 float sum 34.73 0.00 0.00 N/A 35.30 0.00 0.00 N/A
|
||||||
|
8 2 float sum 34.66 0.00 0.00 N/A 34.84 0.00 0.00 N/A
|
||||||
|
16 4 float sum 35.00 0.00 0.00 N/A 35.61 0.00 0.00 N/A
|
||||||
|
32 8 float sum 35.60 0.00 0.00 N/A 35.27 0.00 0.00 N/A
|
||||||
|
64 16 float sum 34.83 0.00 0.00 N/A 34.61 0.00 0.00 N/A
|
||||||
|
128 32 float sum 34.53 0.00 0.01 N/A 43.78 0.00 0.01 N/A
|
||||||
|
256 64 float sum 34.56 0.01 0.01 N/A 34.95 0.01 0.01 N/A
|
||||||
|
512 128 float sum 34.94 0.01 0.03 N/A 35.20 0.01 0.03 N/A
|
||||||
|
1024 256 float sum 36.07 0.03 0.05 N/A 35.77 0.03 0.05 N/A
|
||||||
|
2048 512 float sum 35.42 0.06 0.10 N/A 35.89 0.06 0.10 N/A
|
||||||
|
4096 1024 float sum 35.92 0.11 0.20 N/A 36.11 0.11 0.20 N/A
|
||||||
|
8192 2048 float sum 35.91 0.23 0.40 N/A 36.07 0.23 0.40 N/A
|
||||||
|
16384 4096 float sum 36.18 0.45 0.79 N/A 35.87 0.46 0.80 N/A
|
||||||
|
32768 8192 float sum 36.65 0.89 1.56 N/A 35.73 0.92 1.60 N/A
|
||||||
|
65536 16384 float sum 37.82 1.73 3.03 N/A 37.25 1.76 3.08 N/A
|
||||||
|
131072 32768 float sum 41.19 3.18 5.57 N/A 41.11 3.19 5.58 N/A
|
||||||
|
262144 65536 float sum 47.53 5.52 9.65 N/A 47.94 5.47 9.57 N/A
|
||||||
|
524288 131072 float sum 60.32 8.69 15.21 N/A 60.52 8.66 15.16 N/A
|
||||||
|
1048576 262144 float sum 74.78 14.02 24.54 N/A 76.17 13.77 24.09 N/A
|
||||||
|
2097152 524288 float sum 93.48 22.43 39.26 N/A 96.10 21.82 38.19 N/A
|
||||||
|
4194304 1048576 float sum 112.0 37.44 65.52 N/A 110.2 38.06 66.60 N/A
|
||||||
|
8388608 2097152 float sum 162.0 51.79 90.63 N/A 160.0 52.44 91.77 N/A
|
||||||
|
16777216 4194304 float sum 226.0 74.23 129.90 N/A 225.0 74.57 130.49 N/A
|
||||||
|
33554432 8388608 float sum 374.3 89.65 156.89 N/A 372.8 90.00 157.50 N/A
|
||||||
|
67108864 16777216 float sum 584.5 114.81 200.91 N/A 581.9 115.33 201.82 N/A
|
||||||
|
134217728 33554432 float sum 1162.2 115.49 202.11 N/A 1162.5 115.46 202.05 N/A
|
||||||
|
268435456 67108864 float sum 2112.2 127.09 222.40 N/A 2111.8 127.11 222.45 N/A
|
||||||
|
536870912 134217728 float sum 4200.3 127.82 223.68 N/A 4184.0 128.32 224.55 N/A
|
||||||
|
1073741824 268435456 float sum 8159.5 131.59 230.29 N/A 8176.5 131.32 229.81 N/A
|
||||||
|
2147483648 536870912 float sum 16215 132.44 231.76 N/A 16203 132.53 231.93 N/A
|
||||||
|
4294967296 1073741824 float sum 32070 133.92 234.37 N/A 32052 134.00 234.50 N/A
|
||||||
|
8589934592 2147483648 float sum 63896 134.44 235.26 N/A 63959 134.30 235.03 N/A
|
||||||
|
# Out of bounds values : 0 OK
|
||||||
|
# Avg bus bandwidth : 68.4048
|
||||||
|
#
|
|
@ -0,0 +1,52 @@
|
||||||
|
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
||||||
|
#
|
||||||
|
# Using devices
|
||||||
|
# Rank 0 Pid 167261 on localhost device 0 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 1 Pid 167261 on localhost device 1 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 2 Pid 167261 on localhost device 2 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 3 Pid 167261 on localhost device 3 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 4 Pid 167261 on localhost device 4 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 5 Pid 167261 on localhost device 5 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 6 Pid 167261 on localhost device 6 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 7 Pid 167261 on localhost device 7 [0x00] A100-SXM4-40GB
|
||||||
|
#
|
||||||
|
# out-of-place in-place
|
||||||
|
# size count type redop time algbw busbw error time algbw busbw error
|
||||||
|
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
||||||
|
0 0 float 1.63 0.00 0.00 N/A 1.38 0.00 0.00 N/A
|
||||||
|
0 0 float 1.35 0.00 0.00 N/A 1.34 0.00 0.00 N/A
|
||||||
|
0 0 float 1.35 0.00 0.00 N/A 1.77 0.00 0.00 N/A
|
||||||
|
0 0 float 1.37 0.00 0.00 N/A 1.39 0.00 0.00 N/A
|
||||||
|
0 0 float 1.34 0.00 0.00 N/A 1.33 0.00 0.00 N/A
|
||||||
|
32 1 float 89.00 0.00 0.00 N/A 85.13 0.00 0.00 N/A
|
||||||
|
64 2 float 86.83 0.00 0.00 N/A 85.77 0.00 0.00 N/A
|
||||||
|
128 4 float 86.02 0.00 0.00 N/A 85.30 0.00 0.00 N/A
|
||||||
|
256 8 float 87.20 0.00 0.00 N/A 86.21 0.00 0.00 N/A
|
||||||
|
512 16 float 87.33 0.01 0.01 N/A 88.47 0.01 0.01 N/A
|
||||||
|
1024 32 float 88.17 0.01 0.01 N/A 88.98 0.01 0.01 N/A
|
||||||
|
2048 64 float 86.44 0.02 0.02 N/A 86.65 0.02 0.02 N/A
|
||||||
|
4096 128 float 86.75 0.05 0.04 N/A 86.68 0.05 0.04 N/A
|
||||||
|
8192 256 float 88.78 0.09 0.08 N/A 87.05 0.09 0.08 N/A
|
||||||
|
16384 512 float 87.71 0.19 0.16 N/A 86.76 0.19 0.17 N/A
|
||||||
|
32768 1024 float 86.26 0.38 0.33 N/A 88.92 0.37 0.32 N/A
|
||||||
|
65536 2048 float 87.67 0.75 0.65 N/A 89.16 0.74 0.64 N/A
|
||||||
|
131072 4096 float 87.35 1.50 1.31 N/A 86.76 1.51 1.32 N/A
|
||||||
|
262144 8192 float 87.02 3.01 2.64 N/A 87.98 2.98 2.61 N/A
|
||||||
|
524288 16384 float 86.58 6.06 5.30 N/A 89.33 5.87 5.14 N/A
|
||||||
|
1048576 32768 float 87.42 11.99 10.50 N/A 88.90 11.79 10.32 N/A
|
||||||
|
2097152 65536 float 89.61 23.40 20.48 N/A 90.10 23.27 20.37 N/A
|
||||||
|
4194304 131072 float 96.44 43.49 38.05 N/A 99.62 42.10 36.84 N/A
|
||||||
|
8388608 262144 float 121.1 69.28 60.62 N/A 120.6 69.56 60.87 N/A
|
||||||
|
16777216 524288 float 160.4 104.62 91.55 N/A 158.8 105.64 92.43 N/A
|
||||||
|
33554432 1048576 float 237.5 141.30 123.64 N/A 234.5 143.11 125.22 N/A
|
||||||
|
67108864 2097152 float 396.8 169.13 147.99 N/A 387.0 173.41 151.73 N/A
|
||||||
|
134217728 4194304 float 633.6 211.83 185.35 N/A 620.9 216.17 189.15 N/A
|
||||||
|
268435456 8388608 float 1189.1 225.75 197.53 N/A 1167.8 229.86 201.13 N/A
|
||||||
|
536870912 16777216 float 2236.6 240.04 210.03 N/A 2197.4 244.32 213.78 N/A
|
||||||
|
1073741824 33554432 float 4335.5 247.66 216.71 N/A 4274.2 251.22 219.81 N/A
|
||||||
|
2147483648 67108864 float 8510.4 252.34 220.79 N/A 8405.3 255.49 223.56 N/A
|
||||||
|
4294967296 134217728 float 16860 254.74 222.90 N/A 16678 257.53 225.34 N/A
|
||||||
|
8589934592 268435456 float 33508 256.36 224.31 N/A 33234 258.47 226.16 N/A
|
||||||
|
# Out of bounds values : 0 OK
|
||||||
|
# Avg bus bandwidth : 58.6481
|
||||||
|
#
|
|
@ -0,0 +1,53 @@
|
||||||
|
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
||||||
|
#
|
||||||
|
# Using devices
|
||||||
|
# Rank 0 Pid 112528 on localhost device 0 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 1 Pid 112528 on localhost device 1 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 2 Pid 112528 on localhost device 2 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 3 Pid 112528 on localhost device 3 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 4 Pid 112528 on localhost device 4 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 5 Pid 112528 on localhost device 5 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 6 Pid 112528 on localhost device 6 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 7 Pid 112528 on localhost device 7 [0x00] A100-SXM4-40GB
|
||||||
|
#
|
||||||
|
# out-of-place in-place
|
||||||
|
# size count type root time algbw busbw error time algbw busbw error
|
||||||
|
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
||||||
|
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
||||||
|
0 0 float 0 34.61 0.00 0.00 N/A 34.33 0.00 0.00 N/A
|
||||||
|
0 0 float 0 34.43 0.00 0.00 N/A 35.06 0.00 0.00 N/A
|
||||||
|
4 1 float 0 33.96 0.00 0.00 N/A 33.80 0.00 0.00 N/A
|
||||||
|
8 2 float 0 34.16 0.00 0.00 N/A 34.32 0.00 0.00 N/A
|
||||||
|
16 4 float 0 34.47 0.00 0.00 N/A 34.85 0.00 0.00 N/A
|
||||||
|
32 8 float 0 35.24 0.00 0.00 N/A 34.75 0.00 0.00 N/A
|
||||||
|
64 16 float 0 35.12 0.00 0.00 N/A 34.89 0.00 0.00 N/A
|
||||||
|
128 32 float 0 34.67 0.00 0.00 N/A 34.36 0.00 0.00 N/A
|
||||||
|
256 64 float 0 34.23 0.01 0.01 N/A 34.42 0.01 0.01 N/A
|
||||||
|
512 128 float 0 34.26 0.01 0.01 N/A 35.20 0.01 0.01 N/A
|
||||||
|
1024 256 float 0 34.87 0.03 0.03 N/A 34.80 0.03 0.03 N/A
|
||||||
|
2048 512 float 0 34.90 0.06 0.06 N/A 35.27 0.06 0.06 N/A
|
||||||
|
4096 1024 float 0 35.37 0.12 0.12 N/A 34.59 0.12 0.12 N/A
|
||||||
|
8192 2048 float 0 34.95 0.23 0.23 N/A 34.79 0.24 0.24 N/A
|
||||||
|
16384 4096 float 0 34.94 0.47 0.47 N/A 34.94 0.47 0.47 N/A
|
||||||
|
32768 8192 float 0 35.03 0.94 0.94 N/A 34.71 0.94 0.94 N/A
|
||||||
|
65536 16384 float 0 36.04 1.82 1.82 N/A 36.48 1.80 1.80 N/A
|
||||||
|
131072 32768 float 0 40.09 3.27 3.27 N/A 39.92 3.28 3.28 N/A
|
||||||
|
262144 65536 float 0 46.58 5.63 5.63 N/A 45.89 5.71 5.71 N/A
|
||||||
|
524288 131072 float 0 58.37 8.98 8.98 N/A 59.67 8.79 8.79 N/A
|
||||||
|
1048576 262144 float 0 76.02 13.79 13.79 N/A 78.43 13.37 13.37 N/A
|
||||||
|
2097152 524288 float 0 78.12 26.85 26.85 N/A 78.84 26.60 26.60 N/A
|
||||||
|
4194304 1048576 float 0 81.06 51.74 51.74 N/A 80.39 52.17 52.17 N/A
|
||||||
|
8388608 2097152 float 0 97.20 86.30 86.30 N/A 96.09 87.30 87.30 N/A
|
||||||
|
16777216 4194304 float 0 143.1 117.22 117.22 N/A 142.1 118.06 118.06 N/A
|
||||||
|
33554432 8388608 float 0 223.4 150.21 150.21 N/A 221.3 151.61 151.61 N/A
|
||||||
|
67108864 16777216 float 0 374.8 179.05 179.05 N/A 374.4 179.23 179.23 N/A
|
||||||
|
134217728 33554432 float 0 672.2 199.67 199.67 N/A 670.0 200.34 200.34 N/A
|
||||||
|
268435456 67108864 float 0 1271.5 211.11 211.11 N/A 1264.5 212.28 212.28 N/A
|
||||||
|
536870912 134217728 float 0 2436.3 220.37 220.37 N/A 2434.5 220.53 220.53 N/A
|
||||||
|
1073741824 268435456 float 0 4769.2 225.14 225.14 N/A 4697.5 228.58 228.58 N/A
|
||||||
|
2147483648 536870912 float 0 9314.2 230.56 230.56 N/A 9248.3 232.20 232.20 N/A
|
||||||
|
4294967296 1073741824 float 0 18487 232.33 232.33 N/A 18381 233.66 233.66 N/A
|
||||||
|
8589934592 2147483648 float 0 36896 232.81 232.81 N/A 36599 234.70 234.70 N/A
|
||||||
|
# Out of bounds values : 0 OK
|
||||||
|
# Avg bus bandwidth : 64.8653
|
||||||
|
#
|
|
@ -0,0 +1,53 @@
|
||||||
|
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
||||||
|
#
|
||||||
|
# Using devices
|
||||||
|
# Rank 0 Pid 112476 on localhost device 0 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 1 Pid 112476 on localhost device 1 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 2 Pid 112476 on localhost device 2 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 3 Pid 112476 on localhost device 3 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 4 Pid 112476 on localhost device 4 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 5 Pid 112476 on localhost device 5 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 6 Pid 112476 on localhost device 6 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 7 Pid 112476 on localhost device 7 [0x00] A100-SXM4-40GB
|
||||||
|
#
|
||||||
|
# out-of-place in-place
|
||||||
|
# size count type redop root time algbw busbw error time algbw busbw error
|
||||||
|
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
||||||
|
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
||||||
|
0 0 float sum 0 36.90 0.00 0.00 N/A 36.47 0.00 0.00 N/A
|
||||||
|
0 0 float sum 0 34.18 0.00 0.00 N/A 35.70 0.00 0.00 N/A
|
||||||
|
4 1 float sum 0 35.40 0.00 0.00 N/A 35.59 0.00 0.00 N/A
|
||||||
|
8 2 float sum 0 36.35 0.00 0.00 N/A 35.74 0.00 0.00 N/A
|
||||||
|
16 4 float sum 0 35.47 0.00 0.00 N/A 34.27 0.00 0.00 N/A
|
||||||
|
32 8 float sum 0 36.16 0.00 0.00 N/A 36.19 0.00 0.00 N/A
|
||||||
|
64 16 float sum 0 35.61 0.00 0.00 N/A 35.45 0.00 0.00 N/A
|
||||||
|
128 32 float sum 0 34.78 0.00 0.00 N/A 35.80 0.00 0.00 N/A
|
||||||
|
256 64 float sum 0 35.37 0.01 0.01 N/A 35.89 0.01 0.01 N/A
|
||||||
|
512 128 float sum 0 35.49 0.01 0.01 N/A 35.53 0.01 0.01 N/A
|
||||||
|
1024 256 float sum 0 35.38 0.03 0.03 N/A 35.52 0.03 0.03 N/A
|
||||||
|
2048 512 float sum 0 35.97 0.06 0.06 N/A 35.13 0.06 0.06 N/A
|
||||||
|
4096 1024 float sum 0 36.03 0.11 0.11 N/A 35.82 0.11 0.11 N/A
|
||||||
|
8192 2048 float sum 0 36.80 0.22 0.22 N/A 36.71 0.22 0.22 N/A
|
||||||
|
16384 4096 float sum 0 35.37 0.46 0.46 N/A 36.79 0.45 0.45 N/A
|
||||||
|
32768 8192 float sum 0 35.16 0.93 0.93 N/A 35.72 0.92 0.92 N/A
|
||||||
|
65536 16384 float sum 0 38.08 1.72 1.72 N/A 37.74 1.74 1.74 N/A
|
||||||
|
131072 32768 float sum 0 43.07 3.04 3.04 N/A 41.59 3.15 3.15 N/A
|
||||||
|
262144 65536 float sum 0 52.16 5.03 5.03 N/A 50.49 5.19 5.19 N/A
|
||||||
|
524288 131072 float sum 0 67.58 7.76 7.76 N/A 66.57 7.88 7.88 N/A
|
||||||
|
1048576 262144 float sum 0 76.74 13.66 13.66 N/A 80.47 13.03 13.03 N/A
|
||||||
|
2097152 524288 float sum 0 78.51 26.71 26.71 N/A 78.76 26.63 26.63 N/A
|
||||||
|
4194304 1048576 float sum 0 81.47 51.48 51.48 N/A 80.30 52.23 52.23 N/A
|
||||||
|
8388608 2097152 float sum 0 94.72 88.57 88.57 N/A 94.06 89.19 89.19 N/A
|
||||||
|
16777216 4194304 float sum 0 137.7 121.83 121.83 N/A 139.6 120.17 120.17 N/A
|
||||||
|
33554432 8388608 float sum 0 218.3 153.70 153.70 N/A 218.1 153.83 153.83 N/A
|
||||||
|
67108864 16777216 float sum 0 370.8 180.96 180.96 N/A 369.8 181.49 181.49 N/A
|
||||||
|
134217728 33554432 float sum 0 661.0 203.06 203.06 N/A 659.9 203.39 203.39 N/A
|
||||||
|
268435456 67108864 float sum 0 1251.4 214.52 214.52 N/A 1268.1 211.68 211.68 N/A
|
||||||
|
536870912 134217728 float sum 0 2421.6 221.70 221.70 N/A 2413.4 222.45 222.45 N/A
|
||||||
|
1073741824 268435456 float sum 0 4736.0 226.72 226.72 N/A 4757.9 225.68 225.68 N/A
|
||||||
|
2147483648 536870912 float sum 0 9323.5 230.33 230.33 N/A 9354.0 229.58 229.58 N/A
|
||||||
|
4294967296 1073741824 float sum 0 18594 230.99 230.99 N/A 18570 231.28 231.28 N/A
|
||||||
|
8589934592 2147483648 float sum 0 37613 228.38 228.38 N/A 37539 228.83 228.83 N/A
|
||||||
|
# Out of bounds values : 0 OK
|
||||||
|
# Avg bus bandwidth : 65.018
|
||||||
|
#
|
|
@ -0,0 +1,53 @@
|
||||||
|
# nThread 1 nGpus 8 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 validation: 0
|
||||||
|
#
|
||||||
|
# Using devices
|
||||||
|
# Rank 0 Pid 112580 on localhost device 0 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 1 Pid 112580 on localhost device 1 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 2 Pid 112580 on localhost device 2 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 3 Pid 112580 on localhost device 3 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 4 Pid 112580 on localhost device 4 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 5 Pid 112580 on localhost device 5 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 6 Pid 112580 on localhost device 6 [0x00] A100-SXM4-40GB
|
||||||
|
# Rank 7 Pid 112580 on localhost device 7 [0x00] A100-SXM4-40GB
|
||||||
|
#
|
||||||
|
# out-of-place in-place
|
||||||
|
# size count type redop time algbw busbw error time algbw busbw error
|
||||||
|
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
|
||||||
|
hostname:3442:3442 [0] NCCL INFO Launch mode Parallel
|
||||||
|
0 0 float sum 34.88 0.00 0.00 N/A 33.65 0.00 0.00 N/A
|
||||||
|
0 0 float sum 33.54 0.00 0.00 N/A 33.72 0.00 0.00 N/A
|
||||||
|
0 0 float sum 33.45 0.00 0.00 N/A 33.44 0.00 0.00 N/A
|
||||||
|
0 0 float sum 34.07 0.00 0.00 N/A 33.44 0.00 0.00 N/A
|
||||||
|
0 0 float sum 33.55 0.00 0.00 N/A 33.43 0.00 0.00 N/A
|
||||||
|
32 1 float sum 35.06 0.00 0.00 N/A 35.14 0.00 0.00 N/A
|
||||||
|
64 2 float sum 34.82 0.00 0.00 N/A 34.76 0.00 0.00 N/A
|
||||||
|
128 4 float sum 34.38 0.00 0.00 N/A 34.52 0.00 0.00 N/A
|
||||||
|
256 8 float sum 34.75 0.01 0.01 N/A 34.32 0.01 0.01 N/A
|
||||||
|
512 16 float sum 34.71 0.01 0.01 N/A 35.43 0.01 0.01 N/A
|
||||||
|
1024 32 float sum 35.16 0.03 0.03 N/A 34.75 0.03 0.03 N/A
|
||||||
|
2048 64 float sum 35.43 0.06 0.05 N/A 35.29 0.06 0.05 N/A
|
||||||
|
4096 128 float sum 35.49 0.12 0.10 N/A 35.17 0.12 0.10 N/A
|
||||||
|
8192 256 float sum 35.18 0.23 0.20 N/A 35.77 0.23 0.20 N/A
|
||||||
|
16384 512 float sum 35.27 0.46 0.41 N/A 35.49 0.46 0.40 N/A
|
||||||
|
32768 1024 float sum 35.00 0.94 0.82 N/A 35.09 0.93 0.82 N/A
|
||||||
|
65536 2048 float sum 36.78 1.78 1.56 N/A 36.92 1.77 1.55 N/A
|
||||||
|
131072 4096 float sum 40.71 3.22 2.82 N/A 39.78 3.29 2.88 N/A
|
||||||
|
262144 8192 float sum 48.12 5.45 4.77 N/A 46.65 5.62 4.92 N/A
|
||||||
|
524288 16384 float sum 59.81 8.77 7.67 N/A 58.88 8.90 7.79 N/A
|
||||||
|
1048576 32768 float sum 72.37 14.49 12.68 N/A 74.95 13.99 12.24 N/A
|
||||||
|
2097152 65536 float sum 80.64 26.01 22.76 N/A 79.62 26.34 23.05 N/A
|
||||||
|
4194304 131072 float sum 108.9 38.53 33.72 N/A 109.3 38.37 33.57 N/A
|
||||||
|
8388608 262144 float sum 147.3 56.96 49.84 N/A 166.8 50.28 44.00 N/A
|
||||||
|
16777216 524288 float sum 152.4 110.11 96.34 N/A 152.8 109.82 96.09 N/A
|
||||||
|
33554432 1048576 float sum 240.5 139.50 122.06 N/A 240.8 139.33 121.91 N/A
|
||||||
|
67108864 2097152 float sum 356.1 188.45 164.89 N/A 352.1 190.57 166.75 N/A
|
||||||
|
134217728 4194304 float sum 618.1 217.15 190.01 N/A 615.2 218.18 190.90 N/A
|
||||||
|
268435456 8388608 float sum 1108.7 242.11 211.84 N/A 1112.6 241.27 211.11 N/A
|
||||||
|
536870912 16777216 float sum 2169.0 247.52 216.58 N/A 2181.8 246.07 215.31 N/A
|
||||||
|
1073741824 33554432 float sum 4203.0 255.47 223.54 N/A 4206.3 255.27 223.36 N/A
|
||||||
|
2147483648 67108864 float sum 8356.9 256.97 224.85 N/A 8323.5 258.00 225.75 N/A
|
||||||
|
4294967296 134217728 float sum 16400 261.89 229.15 N/A 16402 261.86 229.13 N/A
|
||||||
|
8589934592 268435456 float sum 32464 264.60 231.52 N/A 32502 264.29 231.25 N/A
|
||||||
|
# Out of bounds values : 0 OK
|
||||||
|
# Avg bus bandwidth : 60.168
|
||||||
|
#
|
|
@ -0,0 +1,51 @@
|
||||||
|
Device:Device 738c Mem=32.0GB #CUs=120 Freq=1502Mhz MallocMode=pinned
|
||||||
|
test atts units median mean stddev min max
|
||||||
|
D2H_Bandwidth_pinned +064By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
||||||
|
D2H_Bandwidth_pinned +256By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
||||||
|
D2H_Bandwidth_pinned +512By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
||||||
|
D2H_Bandwidth_pinned 1kB GB/sec 0.0428 0.0426 0.0019 0.0114 0.0446
|
||||||
|
D2H_Bandwidth_pinned 2kB GB/sec 0.0850 0.0844 0.0034 0.0415 0.0893
|
||||||
|
D2H_Bandwidth_pinned 4kB GB/sec 0.1701 0.1687 0.0084 0.0504 0.1773
|
||||||
|
D2H_Bandwidth_pinned 8kB GB/sec 0.3378 0.3348 0.0168 0.1085 0.3546
|
||||||
|
D2H_Bandwidth_pinned 16kB GB/sec 0.6667 0.6606 0.0218 0.5618 0.6897
|
||||||
|
D2H_Bandwidth_pinned 32kB GB/sec 1.3072 1.2954 0.0663 0.5682 1.3605
|
||||||
|
D2H_Bandwidth_pinned 64kB GB/sec 2.5550 2.5339 0.0955 2.1382 2.6904
|
||||||
|
D2H_Bandwidth_pinned 128kB GB/sec 4.8162 4.7807 0.2331 2.0940 4.9621
|
||||||
|
D2H_Bandwidth_pinned 256kB GB/sec 8.2286 8.2192 0.1671 7.2456 8.5286
|
||||||
|
D2H_Bandwidth_pinned 512kB GB/sec 12.7930 12.7062 0.4407 7.1196 13.0478
|
||||||
|
D2H_Bandwidth_pinned 1024kB GB/sec 17.5603 17.4938 0.3921 12.7184 17.7989
|
||||||
|
D2H_Bandwidth_pinned 2048kB GB/sec 21.6275 21.5591 0.2233 20.6073 21.8076
|
||||||
|
D2H_Bandwidth_pinned 4096kB GB/sec 24.2708 24.2556 0.0942 23.5724 24.4292
|
||||||
|
D2H_Bandwidth_pinned 8192kB GB/sec 24.9287 24.9093 0.0733 24.7171 25.0359
|
||||||
|
D2H_Bandwidth_pinned 16384kB GB/sec 26.4588 26.1976 2.4387 1.9387 26.5191
|
||||||
|
D2H_Bandwidth_pinned 32768kB GB/sec 27.2939 27.1202 0.7941 23.2086 27.3277
|
||||||
|
D2H_Bandwidth_pinned 65536kB GB/sec 26.8278 26.7238 0.3894 24.7946 26.9000
|
||||||
|
D2H_Bandwidth_pinned 131072kB GB/sec 27.4751 27.3457 0.3968 25.4168 27.5098
|
||||||
|
D2H_Bandwidth_pinned 262144kB GB/sec 27.8236 27.7173 0.3072 26.7977 27.8525
|
||||||
|
D2H_Bandwidth_pinned 524288kB GB/sec 28.0193 27.9348 0.1912 27.4707 28.0314
|
||||||
|
D2H_Time_pinned +064By ms 0.0229 0.0246 0.0457 0.0216 1.4690
|
||||||
|
D2H_Time_pinned +256By ms 0.0232 0.0234 0.0013 0.0221 0.0378
|
||||||
|
D2H_Time_pinned +512By ms 0.0234 0.0238 0.0063 0.0224 0.2091
|
||||||
|
D2H_Time_pinned 1kB ms 0.0234 0.0236 0.0028 0.0224 0.0875
|
||||||
|
D2H_Time_pinned 2kB ms 0.0235 0.0237 0.0014 0.0224 0.0482
|
||||||
|
D2H_Time_pinned 4kB ms 0.0235 0.0239 0.0031 0.0226 0.0794
|
||||||
|
D2H_Time_pinned 8kB ms 0.0237 0.0240 0.0027 0.0226 0.0738
|
||||||
|
D2H_Time_pinned 16kB ms 0.0240 0.0242 0.0009 0.0232 0.0285
|
||||||
|
D2H_Time_pinned 32kB ms 0.0245 0.0248 0.0021 0.0235 0.0563
|
||||||
|
D2H_Time_pinned 64kB ms 0.0254 0.0257 0.0011 0.0242 0.0304
|
||||||
|
D2H_Time_pinned 128kB ms 0.0272 0.0275 0.0026 0.0264 0.0626
|
||||||
|
D2H_Time_pinned 256kB ms 0.0318 0.0319 0.0007 0.0307 0.0362
|
||||||
|
D2H_Time_pinned 512kB ms 0.0410 0.0413 0.0024 0.0402 0.0736
|
||||||
|
D2H_Time_pinned 1024kB ms 0.0597 0.0599 0.0017 0.0589 0.0824
|
||||||
|
D2H_Time_pinned 2048kB ms 0.0970 0.0973 0.0010 0.0962 0.1018
|
||||||
|
D2H_Time_pinned 4096kB ms 0.1728 0.1729 0.0007 0.1717 0.1779
|
||||||
|
D2H_Time_pinned 8192kB ms 0.3365 0.3367 0.0010 0.3350 0.3394
|
||||||
|
D2H_Time_pinned 16384kB ms 0.6341 0.7147 0.7979 0.6326 8.6538
|
||||||
|
D2H_Time_pinned 32768kB ms 1.2294 1.2385 0.0420 1.2278 1.4458
|
||||||
|
D2H_Time_pinned 65536kB ms 2.5014 2.5117 0.0391 2.4947 2.7066
|
||||||
|
D2H_Time_pinned 131072kB ms 4.8850 4.9092 0.0748 4.8789 5.2806
|
||||||
|
D2H_Time_pinned 262144kB ms 9.6478 9.6860 0.1106 9.6377 10.0171
|
||||||
|
D2H_Time_pinned 524288kB ms 19.1607 19.2196 0.1333 19.1525 19.5434
|
||||||
|
|
||||||
|
Note: results marked with (*) had missing values such as
|
||||||
|
might occur with a mixture of architectural capabilities.
|
|
@ -0,0 +1,51 @@
|
||||||
|
Device:Device 738c Mem=32.0GB #CUs=120 Freq=1502Mhz MallocMode=pinned
|
||||||
|
test atts units median mean stddev min max
|
||||||
|
H2D_Bandwidth_pinned +064By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
||||||
|
H2D_Bandwidth_pinned +256By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
||||||
|
H2D_Bandwidth_pinned +512By GB/sec 0.0000 0.0000 0.0000 0.0000 0.0000
|
||||||
|
H2D_Bandwidth_pinned 1kB GB/sec 0.0414 0.0411 0.0017 0.0189 0.0434
|
||||||
|
H2D_Bandwidth_pinned 2kB GB/sec 0.0828 0.0824 0.0018 0.0683 0.0862
|
||||||
|
H2D_Bandwidth_pinned 4kB GB/sec 0.1656 0.1652 0.0032 0.1374 0.1724
|
||||||
|
H2D_Bandwidth_pinned 8kB GB/sec 0.3268 0.3251 0.0117 0.1880 0.3425
|
||||||
|
H2D_Bandwidth_pinned 16kB GB/sec 0.6410 0.6365 0.0259 0.3597 0.6757
|
||||||
|
H2D_Bandwidth_pinned 32kB GB/sec 1.2422 1.2432 0.0278 0.9346 1.2987
|
||||||
|
H2D_Bandwidth_pinned 64kB GB/sec 2.3968 2.4161 0.1486 0.7242 2.6042
|
||||||
|
H2D_Bandwidth_pinned 128kB GB/sec 4.6786 4.6339 0.1310 4.1143 4.8162
|
||||||
|
H2D_Bandwidth_pinned 256kB GB/sec 7.8349 7.8369 0.1150 6.9093 8.0270
|
||||||
|
H2D_Bandwidth_pinned 512kB GB/sec 11.9963 11.9828 0.1287 11.2158 12.2201
|
||||||
|
H2D_Bandwidth_pinned 1024kB GB/sec 16.3342 16.3315 0.0956 16.0147 16.5823
|
||||||
|
H2D_Bandwidth_pinned 2048kB GB/sec 19.9790 19.9770 0.0853 19.7681 20.1635
|
||||||
|
H2D_Bandwidth_pinned 4096kB GB/sec 22.2706 22.2642 0.0552 22.0644 22.3847
|
||||||
|
H2D_Bandwidth_pinned 8192kB GB/sec 22.8232 22.7881 0.1669 21.3196 22.8930
|
||||||
|
H2D_Bandwidth_pinned 16384kB GB/sec 24.1521 24.1411 0.0429 24.0165 24.2162
|
||||||
|
H2D_Bandwidth_pinned 32768kB GB/sec 24.8695 24.7086 0.7491 20.6288 24.9035
|
||||||
|
H2D_Bandwidth_pinned 65536kB GB/sec 24.4840 24.0101 2.5769 6.1754 24.5292
|
||||||
|
H2D_Bandwidth_pinned 131072kB GB/sec 25.0487 24.9593 0.2601 24.1286 25.0711
|
||||||
|
H2D_Bandwidth_pinned 262144kB GB/sec 25.3280 25.2351 0.1788 24.8746 25.3498
|
||||||
|
H2D_Bandwidth_pinned 524288kB GB/sec 24.7523 24.6708 0.1586 24.3154 24.7880
|
||||||
|
H2D_Timepinned +064By ms 0.0245 0.0253 0.0240 0.0232 0.7821
|
||||||
|
H2D_Timepinned +256By ms 0.0243 0.0244 0.0013 0.0232 0.0546
|
||||||
|
H2D_Timepinned +512By ms 0.0243 0.0244 0.0014 0.0230 0.0566
|
||||||
|
H2D_Timepinned 1kB ms 0.0242 0.0244 0.0016 0.0230 0.0530
|
||||||
|
H2D_Timepinned 2kB ms 0.0242 0.0243 0.0005 0.0232 0.0293
|
||||||
|
H2D_Timepinned 4kB ms 0.0242 0.0242 0.0005 0.0232 0.0291
|
||||||
|
H2D_Timepinned 8kB ms 0.0245 0.0247 0.0013 0.0234 0.0426
|
||||||
|
H2D_Timepinned 16kB ms 0.0250 0.0252 0.0015 0.0237 0.0445
|
||||||
|
H2D_Timepinned 32kB ms 0.0258 0.0258 0.0006 0.0246 0.0342
|
||||||
|
H2D_Timepinned 64kB ms 0.0271 0.0272 0.0045 0.0250 0.0898
|
||||||
|
H2D_Timepinned 128kB ms 0.0280 0.0283 0.0008 0.0272 0.0318
|
||||||
|
H2D_Timepinned 256kB ms 0.0334 0.0334 0.0005 0.0326 0.0379
|
||||||
|
H2D_Timepinned 512kB ms 0.0437 0.0437 0.0005 0.0429 0.0467
|
||||||
|
H2D_Timepinned 1024kB ms 0.0642 0.0642 0.0004 0.0632 0.0654
|
||||||
|
H2D_Timepinned 2048kB ms 0.1050 0.1050 0.0004 0.1040 0.1061
|
||||||
|
H2D_Timepinned 4096kB ms 0.1883 0.1884 0.0005 0.1874 0.1901
|
||||||
|
H2D_Timepinned 8192kB ms 0.3675 0.3681 0.0028 0.3664 0.3934
|
||||||
|
H2D_Timepinned 16384kB ms 0.6946 0.6950 0.0012 0.6928 0.6986
|
||||||
|
H2D_Timepinned 32768kB ms 1.3492 1.3595 0.0482 1.3474 1.6266
|
||||||
|
H2D_Timepinned 65536kB ms 2.7409 2.9163 1.1368 2.7358 10.8670
|
||||||
|
H2D_Timepinned 131072kB ms 5.3582 5.3780 0.0576 5.3534 5.5626
|
||||||
|
H2D_Timepinned 262144kB ms 10.5983 10.6379 0.0761 10.5892 10.7915
|
||||||
|
H2D_Timepinned 524288kB ms 21.6897 21.7622 0.1411 21.6585 22.0794
|
||||||
|
|
||||||
|
Note: results marked with (*) had missing values such as
|
||||||
|
might occur with a mixture of architectural capabilities.
|
|
@ -0,0 +1,83 @@
|
||||||
|
# Copyright (c) Microsoft Corporation.
|
||||||
|
# Licensed under the MIT License.
|
||||||
|
|
||||||
|
"""Unittest TestCase helpers."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
import tempfile
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
class BenchmarkTestCase(object):
|
||||||
|
"""Base class for benchmark test case.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
Inherit from both BenchmarkTestCase and unittest.TestCase.
|
||||||
|
```
|
||||||
|
def FooBenchmarkTestCase(BenchmarkTestCase, unittest.TestCase):
|
||||||
|
def setUp(self):
|
||||||
|
super().setUp()
|
||||||
|
...
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
def setUp(self):
|
||||||
|
"""Hook method for setting up the test fixture before exercising it."""
|
||||||
|
pass
|
||||||
|
|
||||||
|
def tearDown(self):
|
||||||
|
"""Hook method for deconstructing the test fixture after testing it."""
|
||||||
|
pass
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def setUpClass(cls):
|
||||||
|
"""Hook method for setting up class fixture before running tests in the class.
|
||||||
|
|
||||||
|
Will create a temp directory and mock envs for all tests.
|
||||||
|
Run once for the whole class.
|
||||||
|
"""
|
||||||
|
cls._tmp_dir = tempfile.mkdtemp(prefix='sbtest')
|
||||||
|
cls._curr_mock_envs = {}
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def tearDownClass(cls):
|
||||||
|
"""Hook method for deconstructing the class fixture after running all tests in the class.
|
||||||
|
|
||||||
|
Will restore original envs and cleanup temp directory.
|
||||||
|
Run once for the whole class.
|
||||||
|
"""
|
||||||
|
cls.cleanupMockEnvs(cls)
|
||||||
|
shutil.rmtree(cls._tmp_dir)
|
||||||
|
|
||||||
|
def createMockEnvs(self, envs=None):
|
||||||
|
"""Create mock envs for tests.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
envs (dict, optional): Environment variables to be mocked.
|
||||||
|
Defaults to None and will mock SB_MICRO_PATH to temp directory.
|
||||||
|
"""
|
||||||
|
if not envs:
|
||||||
|
envs = {'SB_MICRO_PATH': self._tmp_dir}
|
||||||
|
for name in envs:
|
||||||
|
self._curr_mock_envs[name] = os.environ.get(name, None)
|
||||||
|
os.environ[name] = envs[name]
|
||||||
|
|
||||||
|
def cleanupMockEnvs(self):
|
||||||
|
"""Cleanup mock envs and restore original envs."""
|
||||||
|
for name in self._curr_mock_envs:
|
||||||
|
if self._curr_mock_envs[name] is None:
|
||||||
|
del os.environ[name]
|
||||||
|
else:
|
||||||
|
os.environ[name] = self._curr_mock_envs[name]
|
||||||
|
|
||||||
|
def createMockFiles(self, files, mode=0o755):
|
||||||
|
"""Create mock files for tests.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
files (List[str]): List of file names, relative path will be created under temp directory.
|
||||||
|
mode (int, optional): Octal integer for file mode. Defaults to 0o755.
|
||||||
|
"""
|
||||||
|
for filename in files:
|
||||||
|
filepath = Path(self._tmp_dir) / filename
|
||||||
|
filepath.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
filepath.touch(mode=mode, exist_ok=True)
|
|
@ -38,16 +38,17 @@ class AnsibleClientTestCase(unittest.TestCase):
|
||||||
'host_password': 'pass',
|
'host_password': 'pass',
|
||||||
})
|
})
|
||||||
)
|
)
|
||||||
|
_, self.test_mpi_host_file = tempfile.mkstemp()
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
"""Hook method for deconstructing the test fixture after testing it."""
|
"""Hook method for deconstructing the test fixture after testing it."""
|
||||||
Path(self.host_file).unlink()
|
Path(self.host_file).unlink()
|
||||||
|
Path(self.test_mpi_host_file).unlink()
|
||||||
|
|
||||||
def test_init_config(self):
|
def test_init_config(self):
|
||||||
"""Test initial config of client."""
|
"""Test initial config of client."""
|
||||||
self.assertDictEqual(
|
self.assertDictEqual(
|
||||||
self.ansible_client._config, {
|
self.ansible_client._config, {
|
||||||
'private_data_dir': None,
|
|
||||||
'host_pattern': 'all',
|
'host_pattern': 'all',
|
||||||
'cmdline': f'--forks 5 --inventory {self.host_file} --user user --ask-pass --ask-become-pass',
|
'cmdline': f'--forks 5 --inventory {self.host_file} --user user --ask-pass --ask-become-pass',
|
||||||
'passwords': {
|
'passwords': {
|
||||||
|
@ -62,6 +63,63 @@ class AnsibleClientTestCase(unittest.TestCase):
|
||||||
self.assertDictEqual(
|
self.assertDictEqual(
|
||||||
self.ansible_client.update_mpi_config(self.ansible_client._config), {
|
self.ansible_client.update_mpi_config(self.ansible_client._config), {
|
||||||
**self.ansible_client._config,
|
**self.ansible_client._config,
|
||||||
|
'host_pattern': '10.0.0.10',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_update_mpi_config_for_different_inventory(self):
|
||||||
|
"""Test update_mpi_config of client for different inventory."""
|
||||||
|
# Test for out-of-order
|
||||||
|
with open(self.test_mpi_host_file, 'w') as fd:
|
||||||
|
fd.write('all:\n hosts:\n 10.0.0.12:\n 10.0.0.11:\n 10.0.0.10:\n 10.0.0.13:\n 10.0.0.14:\n')
|
||||||
|
mess_hosts = AnsibleClient(
|
||||||
|
OmegaConf.create(
|
||||||
|
{
|
||||||
|
'host_file': self.test_mpi_host_file,
|
||||||
|
'host_username': 'user',
|
||||||
|
'host_password': 'pass',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
self.assertDictEqual(
|
||||||
|
mess_hosts.update_mpi_config(mess_hosts._config), {
|
||||||
|
**mess_hosts._config,
|
||||||
|
'host_pattern': '10.0.0.10',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
# Test for localhost
|
||||||
|
with open(self.test_mpi_host_file, 'w') as fd:
|
||||||
|
fd.write('all:\n hosts:\n localhost:\n')
|
||||||
|
localhost = AnsibleClient(
|
||||||
|
OmegaConf.create(
|
||||||
|
{
|
||||||
|
'host_file': self.test_mpi_host_file,
|
||||||
|
'host_username': 'user',
|
||||||
|
'host_password': 'pass',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
self.assertDictEqual(
|
||||||
|
localhost.update_mpi_config(localhost._config), {
|
||||||
|
**localhost._config,
|
||||||
|
'host_pattern': 'localhost',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
# Test for no host
|
||||||
|
with open(self.test_mpi_host_file, 'w') as fd:
|
||||||
|
fd.write('all:\n hosts:\n')
|
||||||
|
no_hosts = AnsibleClient(
|
||||||
|
OmegaConf.create(
|
||||||
|
{
|
||||||
|
'host_file': self.test_mpi_host_file,
|
||||||
|
'host_username': 'user',
|
||||||
|
'host_password': 'pass',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
self.assertDictEqual(
|
||||||
|
no_hosts.update_mpi_config(no_hosts._config), {
|
||||||
|
**no_hosts._config,
|
||||||
'host_pattern': 'all[0]',
|
'host_pattern': 'all[0]',
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
|
@ -71,7 +129,6 @@ class AnsibleClientTestCase(unittest.TestCase):
|
||||||
cmd = 'ls -la'
|
cmd = 'ls -la'
|
||||||
self.assertDictEqual(
|
self.assertDictEqual(
|
||||||
self.ansible_client.get_shell_config(cmd), {
|
self.ansible_client.get_shell_config(cmd), {
|
||||||
'private_data_dir': None,
|
|
||||||
'host_pattern': 'all',
|
'host_pattern': 'all',
|
||||||
'cmdline': f'--forks 5 --inventory {self.host_file} --user user --ask-pass --ask-become-pass',
|
'cmdline': f'--forks 5 --inventory {self.host_file} --user user --ask-pass --ask-become-pass',
|
||||||
'passwords': {
|
'passwords': {
|
||||||
|
@ -87,7 +144,6 @@ class AnsibleClientTestCase(unittest.TestCase):
|
||||||
"""Test get_playbook_config of client."""
|
"""Test get_playbook_config of client."""
|
||||||
self.assertDictEqual(
|
self.assertDictEqual(
|
||||||
self.ansible_client.get_playbook_config('play', {'foo': 'bar'}), {
|
self.ansible_client.get_playbook_config('play', {'foo': 'bar'}), {
|
||||||
'private_data_dir': None,
|
|
||||||
'host_pattern': 'all',
|
'host_pattern': 'all',
|
||||||
'cmdline': f'--forks 5 --inventory {self.host_file} --user user --ask-pass --ask-become-pass',
|
'cmdline': f'--forks 5 --inventory {self.host_file} --user user --ask-pass --ask-become-pass',
|
||||||
'passwords': {
|
'passwords': {
|
||||||
|
|
|
@ -244,37 +244,37 @@ class RunnerTestCase(unittest.TestCase):
|
||||||
"""Test __merge_monitor_metrics."""
|
"""Test __merge_monitor_metrics."""
|
||||||
path = Path('tests/data/monitor/')
|
path = Path('tests/data/monitor/')
|
||||||
expected = {
|
expected = {
|
||||||
'gpu_temperature:0': 50,
|
'monitor/gpu_temperature:0': 50,
|
||||||
'gpu_temperature:1': 27,
|
'monitor/gpu_temperature:1': 27,
|
||||||
'gpu_temperature:2': 24,
|
'monitor/gpu_temperature:2': 24,
|
||||||
'gpu_temperature:3': 26,
|
'monitor/gpu_temperature:3': 26,
|
||||||
'gpu_temperature:4': 25,
|
'monitor/gpu_temperature:4': 25,
|
||||||
'gpu_temperature:5': 25,
|
'monitor/gpu_temperature:5': 25,
|
||||||
'gpu_temperature:6': 23,
|
'monitor/gpu_temperature:6': 23,
|
||||||
'gpu_temperature:7': 26,
|
'monitor/gpu_temperature:7': 26,
|
||||||
'gpu_power_limit:0': 250,
|
'monitor/gpu_power_limit:0': 250,
|
||||||
'gpu_power_limit:1': 200,
|
'monitor/gpu_power_limit:1': 200,
|
||||||
'gpu_power_limit:2': 250,
|
'monitor/gpu_power_limit:2': 250,
|
||||||
'gpu_power_limit:3': 250,
|
'monitor/gpu_power_limit:3': 250,
|
||||||
'gpu_power_limit:4': 250,
|
'monitor/gpu_power_limit:4': 250,
|
||||||
'gpu_power_limit:5': 250,
|
'monitor/gpu_power_limit:5': 250,
|
||||||
'gpu_power_limit:6': 250,
|
'monitor/gpu_power_limit:6': 250,
|
||||||
'gpu_power_limit:7': 250,
|
'monitor/gpu_power_limit:7': 250,
|
||||||
'gpu_corrected_ecc:0': 12,
|
'monitor/gpu_corrected_ecc:0': 12,
|
||||||
'gpu_corrected_ecc:1': 0,
|
'monitor/gpu_corrected_ecc:1': 0,
|
||||||
'gpu_corrected_ecc:2': 0,
|
'monitor/gpu_corrected_ecc:2': 0,
|
||||||
'gpu_corrected_ecc:3': 0,
|
'monitor/gpu_corrected_ecc:3': 0,
|
||||||
'gpu_corrected_ecc:4': 0,
|
'monitor/gpu_corrected_ecc:4': 0,
|
||||||
'gpu_corrected_ecc:5': 0,
|
'monitor/gpu_corrected_ecc:5': 0,
|
||||||
'gpu_corrected_ecc:6': 0,
|
'monitor/gpu_corrected_ecc:6': 0,
|
||||||
'gpu_corrected_ecc:7': 0,
|
'monitor/gpu_corrected_ecc:7': 0,
|
||||||
'gpu_uncorrected_ecc:0': 0,
|
'monitor/gpu_uncorrected_ecc:0': 0,
|
||||||
'gpu_uncorrected_ecc:1': 0,
|
'monitor/gpu_uncorrected_ecc:1': 0,
|
||||||
'gpu_uncorrected_ecc:2': 0,
|
'monitor/gpu_uncorrected_ecc:2': 0,
|
||||||
'gpu_uncorrected_ecc:3': 0,
|
'monitor/gpu_uncorrected_ecc:3': 0,
|
||||||
'gpu_uncorrected_ecc:4': 0,
|
'monitor/gpu_uncorrected_ecc:4': 0,
|
||||||
'gpu_uncorrected_ecc:5': 0,
|
'monitor/gpu_uncorrected_ecc:5': 0,
|
||||||
'gpu_uncorrected_ecc:6': 0,
|
'monitor/gpu_uncorrected_ecc:6': 0,
|
||||||
'gpu_uncorrected_ecc:7': 0
|
'monitor/gpu_uncorrected_ecc:7': 0
|
||||||
}
|
}
|
||||||
self.assertEqual(self.runner._SuperBenchRunner__merge_monitor_metrics(path), expected)
|
self.assertEqual(self.runner._SuperBenchRunner__merge_monitor_metrics(path), expected)
|
||||||
|
|
|
@ -63,7 +63,7 @@ endif
|
||||||
# Build FIO from commit d83ac9 (fio-3.28 tag).
|
# Build FIO from commit d83ac9 (fio-3.28 tag).
|
||||||
fio:
|
fio:
|
||||||
ifneq (,$(wildcard fio/Makefile))
|
ifneq (,$(wildcard fio/Makefile))
|
||||||
cd ./fio && ./configure --prefix=$(SB_MICRO_PATH) && make -j && make install
|
cd ./fio && ./configure --prefix=$(SB_MICRO_PATH) --disable-native && make -j && make install
|
||||||
endif
|
endif
|
||||||
|
|
||||||
# Build rccl-tests from commit dc1ad48 of develop branch (default branch).
|
# Build rccl-tests from commit dc1ad48 of develop branch (default branch).
|
||||||
|
|
|
@ -0,0 +1,58 @@
|
||||||
|
---
|
||||||
|
slug: release-sb-v0.4
|
||||||
|
title: Releasing SuperBench v0.4
|
||||||
|
author: Peng Cheng
|
||||||
|
author_title: SuperBench Team
|
||||||
|
author_url: https://github.com/cp5555
|
||||||
|
author_image_url: https://github.com/cp5555.png
|
||||||
|
tags: [superbench, announcement, release]
|
||||||
|
---
|
||||||
|
|
||||||
|
We are very happy to announce that **SuperBench 0.4.0 version** is officially released today!
|
||||||
|
|
||||||
|
You can install and try superbench by following [Getting Started Tutorial](https://microsoft.github.io/superbenchmark/docs/getting-started/installation).
|
||||||
|
|
||||||
|
## SuperBench 0.4.0 Release Notes
|
||||||
|
|
||||||
|
### SuperBench Framework
|
||||||
|
|
||||||
|
#### Monitor
|
||||||
|
|
||||||
|
- Add monitor framework for NVIDIA GPU, CPU, memory and disk.
|
||||||
|
|
||||||
|
#### Data Diagnosis and Analysis
|
||||||
|
|
||||||
|
- Support baseline-based data diagnosis.
|
||||||
|
- Support basic analysis feature (boxplot figure, outlier detection, etc.).
|
||||||
|
|
||||||
|
### Single-node Validation
|
||||||
|
|
||||||
|
#### Micro Benchmarks
|
||||||
|
|
||||||
|
- CPU Memory Validation (tool: Intel Memory Latency Checker).
|
||||||
|
- GPU Copy Bandwidth (tool: built by MSRA).
|
||||||
|
- Add ORT Model on AMD GPU platform.
|
||||||
|
- Add inference backend TensorRT.
|
||||||
|
- Add inference backend ORT.
|
||||||
|
|
||||||
|
### Multi-node Validation
|
||||||
|
|
||||||
|
#### Micro Benchmarks
|
||||||
|
|
||||||
|
- IB Networking validation.
|
||||||
|
- TCP validation (tool: TCPing).
|
||||||
|
- GPCNet Validation (tool: GPCNet).
|
||||||
|
|
||||||
|
### Other Improvement
|
||||||
|
|
||||||
|
1. Enhancement
|
||||||
|
- Add pipeline for AMD docker.
|
||||||
|
- Integrate system config info script with SuperBench.
|
||||||
|
- Support FP32 mode without TF32.
|
||||||
|
- Refine unit test for microbenchmark.
|
||||||
|
- Unify metric names for all benchmarks.
|
||||||
|
|
||||||
|
2. Document
|
||||||
|
- Add benchmark list
|
||||||
|
- Add monitor document
|
||||||
|
- Add data diagnosis document
|
|
@ -101,7 +101,7 @@ module.exports = {
|
||||||
announcementBar: {
|
announcementBar: {
|
||||||
id: 'supportus',
|
id: 'supportus',
|
||||||
content:
|
content:
|
||||||
'📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.3">v0.3.0</a> has been released! ' +
|
'📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.4">v0.4.0</a> has been released! ' +
|
||||||
'⭐️ If you like SuperBench, give it a star on <a target="_blank" rel="noopener noreferrer" href="https://github.com/microsoft/superbenchmark">GitHub</a>! ⭐️',
|
'⭐️ If you like SuperBench, give it a star on <a target="_blank" rel="noopener noreferrer" href="https://github.com/microsoft/superbenchmark">GitHub</a>! ⭐️',
|
||||||
},
|
},
|
||||||
algolia: {
|
algolia: {
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
{
|
{
|
||||||
"name": "superbench-website",
|
"name": "superbench-website",
|
||||||
"version": "0.3.0",
|
"version": "0.4.0",
|
||||||
"lockfileVersion": 1,
|
"lockfileVersion": 1,
|
||||||
"requires": true,
|
"requires": true,
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
{
|
{
|
||||||
"name": "superbench-website",
|
"name": "superbench-website",
|
||||||
"version": "0.3.0",
|
"version": "0.4.0",
|
||||||
"private": true,
|
"private": true,
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"docusaurus": "docusaurus",
|
"docusaurus": "docusaurus",
|
||||||
|
|
Загрузка…
Ссылка в новой задаче