2.0 KiB
2.0 KiB
slug | title | author | author_title | author_url | author_image_url | tags | |||
---|---|---|---|---|---|---|---|---|---|
release-sb-v0.10 | Releasing SuperBench v0.10 | Peng Cheng | SuperBench Team | https://github.com/cp5555 | https://github.com/cp5555.png |
|
We are very happy to announce that SuperBench 0.10.0 version is officially released today!
You can install and try superbench by following Getting Started Tutorial.
SuperBench 0.10.0 Release Notes
SuperBench Improvements
- Support monitoring for AMD GPUs.
- Support ROCm 5.7 and ROCm 6.0 dockerfile.
- Add MSCCL support for Nvidia GPU.
- Fix NUMA domains swap issue in NDv4 topology file.
- Add NDv5 topo file.
- Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2.
Micro-benchmark Improvements
- Add HPL random generator to gemm-flops with ROCm.
- Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
- Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
- Update Docker image for H100 support.
- Update MLC version into 3.10 for CUDA/ROCm dockerfile.
- Bug fix for GPU Burn test.
- Support INT8 in cublaslt function.
- Add hipBLASLt function benchmark.
- Support cpu-gpu and gpu-cpu in ib-validation.
- Support graph mode in NCCL/RCCL benchmarks for latency metrics.
- Support cpp implementation in distributed inference benchmark.
- Add O2 option for gpu copy ROCm build.
- Support different hipblasLt data types in dist inference.
- Support in-place in NCCL/RCCL benchmark.
- Support data type option in NCCL/RCCL benchmark.
- Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
- Update hipblaslt GEMM metric unit to tflops.
- Support FP8 for hipblaslt benchmark.
Model Benchmark Improvements
- Change torch.distributed.launch to torchrun.
- Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark.
Result Analysis
- Support baseline generation from multiple nodes.