--- slug: release-sb-v0.10 title: Releasing SuperBench v0.10 author: Peng Cheng author_title: SuperBench Team author_url: https://github.com/cp5555 author_image_url: https://github.com/cp5555.png tags: [superbench, announcement, release] --- We are very happy to announce that **SuperBench 0.10.0 version** is officially released today! You can install and try superbench by following [Getting Started Tutorial](https://microsoft.github.io/superbenchmark/docs/getting-started/installation). ## SuperBench 0.10.0 Release Notes ### SuperBench Improvements - Support monitoring for AMD GPUs. - Support ROCm 5.7 and ROCm 6.0 dockerfile. - Add MSCCL support for Nvidia GPU. - Fix NUMA domains swap issue in NDv4 topology file. - Add NDv5 topo file. - Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2. ### Micro-benchmark Improvements - Add HPL random generator to gemm-flops with ROCm. - Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames. - Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance. - Update Docker image for H100 support. - Update MLC version into 3.10 for CUDA/ROCm dockerfile. - Bug fix for GPU Burn test. - Support INT8 in cublaslt function. - Add hipBLASLt function benchmark. - Support cpu-gpu and gpu-cpu in ib-validation. - Support graph mode in NCCL/RCCL benchmarks for latency metrics. - Support cpp implementation in distributed inference benchmark. - Add O2 option for gpu copy ROCm build. - Support different hipblasLt data types in dist inference. - Support in-place in NCCL/RCCL benchmark. - Support data type option in NCCL/RCCL benchmark. - Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs. - Update hipblaslt GEMM metric unit to tflops. - Support FP8 for hipblaslt benchmark. ### Model Benchmark Improvements - Change torch.distributed.launch to torchrun. - Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark. ### Result Analysis - Support baseline generation from multiple nodes.