superbenchmark

Граф коммитов

Автор	SHA1	Сообщение	Дата
pdr	479491279e	Dockerfile - Add support for arm64 build (#660 ) Add support for arm64 build: - Updated dockerfile for arm64 build - extend cpu stream compilation for neoverse - handle onnxruntime-gpu installation - third party builds filtering based on arch - disable cuda decode perf build for non x86	2024-11-06 23:16:12 +00:00
Yuting Jiang	949f9cb406	Release - SuperBench v0.11.0 (#654 ) Description Cherry pick bug fixes from v0.11.0 to main Major Revision * #645 * #648 * #646 * #647 * #651 * #652 * #650 --------- Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>	2024-10-10 09:59:47 +08:00
Yang Wang	46a5792915	Bug Fix - Update Docker Exec Command for Persistent HPCX Environment (#635 ) Add 10-hpcx.sh to /etc/profile.d Update the Docker exec command to ensure a persistent HPCX environment.	2024-08-13 16:35:01 +00:00
Yuting Jiang	2101e933cc	CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633 ) Description Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM issue.	2024-07-28 23:43:06 +00:00
Yuting Jiang	7435f10a22	Dockerfile - Add CUDA 12.4 dockerfile (#619 ) Description Add CUDA 12.4 dockerfile. Major Revision - upgrade nvidia docker into 23.04 Minor Revision - upgrade hpcx into 2.18	2024-04-22 06:36:19 +00:00
Yuting Jiang	dc3846cbd4	Dockerfile - Upgrade mlc to v3.11 (#620 ) Description Upgrade mlc to v3.11.	2024-04-18 10:59:36 +08:00
Yang Wang	eeaa9b1ac9	Bug Fix - Bug fix for cuda 12.2 dockerfile LD_LIBRARY_PATH issue (#614 ) Description Cuda 12.2 image will report undfined symbol error due to incomplete LD_LIBRARY_PATH: ![image](https://github.com/microsoft/superbenchmark/assets/25875482/1a7c48c7-cb6b-4e3a-abbe-dde23007a96b) ### How to reproduce: 1. Deploy sb with cuda12.2 image ``` sb deploy -f local.ini -i superbench/superbench:v0.10.0-cuda12.2 ``` 2. Enter to the container ``` sudo docker exec -it sb-workspace bash ``` 3. Execute `mpirun`: ``` root@sb-container:~# mpirun mpirun: symbol lookup error: mpirun: undefined symbol: opal_libevent2022_event_base_loop ``` ### Fix to fix * Append hpcx_load into /etc/bash.bashrc for updaing env LD_LIBRARY_PATH in each time ---------	2024-03-21 15:05:55 +00:00
Yifan Xiong	2c88db907f	Release - SuperBench v0.10.0 (#607 ) Description Cherry-pick bug fixes from v0.10.0 to main. Major Revisions * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - Upgrade pyrsmi to amdsmi python library. #601 * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605 * Dockerfile - Add rocm6.0 dockerfile #602 * Bug Fix - Bug fix for latest megatron-lm benchmark #600 * Docs - Upgrade version and release note #606 Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com> Co-authored-by: guoshzhao <guzhao@microsoft.com>	2024-01-08 05:40:52 +00:00
Yuting Jiang	1f5031bd74	Dockerfile - Upgrade to rocm5.7 dockerfile (#587 ) Description upgrade to rocm5.7 dockerfile. --------- Co-authored-by: yukirora <yuting.jiang@microsoft.com>	2023-12-09 17:41:12 +00:00
Ziyue Yang	6ef3a0110f	Benchmarks: Add MSCCL Support for Nvidia GPU (#584 ) Description Add MSCCL support for Nvidia GPU	2023-12-07 19:57:28 +08:00
Yuting Jiang	dd5a6329ed	Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582 ) Description Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark	2023-12-07 09:37:09 +08:00
Yifan Xiong	1ad1c21c38	Dockerfile - Upgrade Docker image to CUDA 12.2 (#577 ) Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors	2023-11-22 13:48:18 +00:00
Yuting Jiang	79089b6517	Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576 ) Description hipblaslt function benchmark and rebase cublaslt function benchmark.	2023-11-22 19:48:10 +08:00
guoshzhao	9f4880cb8e	Analyzer - Generate baseline given results from multiple nodes. (#575 ) Description Generate baseline given results from multiple nodes. Major Revision - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by: 454314380 <454314380@qq.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-11-22 14:42:32 +08:00
Yuting Jiang	d246bab430	Dockerfile - update mlc version into 3.10 for cuda and rocm dockerfiles (#562 ) Description Update mlc version into 3.10 for cuda and rocm dockerfiles to be consistent with cuda12 dockerfile Co-authored-by: yukirora <yuting.jiang@microsoft.com>	2023-10-23 11:21:17 +08:00
Yuting Jiang	27a10811af	Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560 ) Description source code for evaluating NVDEC decoding performance. --------- Co-authored-by: yukirora <yuting.jiang@microsoft.com>	2023-08-22 10:56:33 +00:00
Yuting Jiang	e8ac0b1e28	Benchmarks: micro benchmarks - add python code for DirectXGPUEncodingLatency (#548 ) Description add python code for DirectXGPUEncodingLatency.	2023-07-06 15:31:28 +08:00
Yuting Jiang	865472177f	Benchmarks: Build Pipeline - add AMF in third party and build AMF encoding latency test (#543 ) Description add AMF in third party and build AMF encoding latency test.	2023-07-03 14:43:21 +00:00
Yuting Jiang	44ef531465	Dockerfile - Add SuperBench Windows Dockerfile (#534 ) Description Add dockerfile for win10 and building script for directx_benchmarks. Major Revision - Add docker file for win10 and required scripts to install the dependency - Add building script to build all directx vs benchmarks - Add call of building script in Makefile --------- Co-authored-by: yukirora <yuting.jiang@microsoft.com> Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>	2023-06-28 05:35:11 +00:00
Yifan Xiong	51761b3af1	Release - SuperBench v0.8.0 (#517 ) Description Cherry-pick bug fixes from v0.8.0 to main. Major Revisions * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by: guoshzhao <guzhao@microsoft.com> Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-04-14 12:57:55 +00:00
rafsalas19	655bd0aa59	Adding HPL benchmark (#482 ) Description - Adding HPL benchmark --------- Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>	2023-03-21 16:44:08 +00:00
Yifan Xiong	35f5390512	Pin setuptools version to v65.7.0 (#483 ) Pin setuptools version to [v65.7.0](https://setuptools.pypa.io/en/latest/history.html#v65-7-0) to avoid breaking changes since v66.0.0.	2023-03-06 11:43:44 +00:00
rafsalas19	32896ca477	Adding Stream Benchmark (#473 ) Description - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by: Peng Cheng <chengpeng5555@outlook.com> Co-authored-by: Yifan Xiong <xiongyf@yandex.com>	2023-02-13 15:34:37 -05:00
pnunna93	f21bfef2f3	Dockerfile: Remove fixed rccl version in rocm5.1.x docker file (#476 ) Description The commit(`e08b6d3a1c`) installs a rccl version which is causing "undefined symbol: ncclGetLastError" while trying to import torch. Revert it to avoid the error.	2023-02-07 15:24:26 +08:00
Yifan Xiong	a3c65b2a57	Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449 ) Add Docker image for arch90 NVIDIA GPUs: * add CUDA11.8 Dockerfile * update archs in Makefile and benchmarks accordingly * update image build pipeline	2022-12-29 12:19:38 +00:00
Yifan Xiong	d7bb8303fb	CLI - Update version to include revision hash and date (#427 ) Update version to include revision hash and date in "{last tag}+g{git hash}.d{date}" format, here're the examples: * exact tag: 0.6.0 * commit after tag: 0.6.0+gcbb1b34 * commit after tag with local changes: 0.6.0+gcbb1b34.d20221028	2022-10-31 10:44:41 +08:00
Yifan Xiong	63e9b2d1bc	Release - SuperBench v0.6.0 (#409 ) Description Cherry-pick bug fixes from v0.6.0 to main. Major Revisions * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2022-09-06 18:06:05 +08:00
Yifan Xiong	626ac0a463	Update Python setup for require packages (#387 ) __Description__ Update Python setup for require packages. __Major Revisions__ * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6 * add extra entry in extras_require for nested packages * update `pip install` contents accordingly	2022-08-17 11:33:57 +08:00
Yang Wang	faeee0a7cc	Auto generate ibstat file for topo aware traffic pattern (#381 ) An enhancement for topo-aware IB performance validation #373. This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.	2022-08-13 18:20:42 +08:00
Yifan Xiong	16b6385dee	Add dependencies (#374 ) Add dependencies * include ndv4-topo.xml in cuda docker images * require requests version to avoid RequestsDependencyWarning	2022-07-13 08:42:53 +00:00
Yifan Xiong	9f03d5687a	Update dependencies and Dockerfile (#371 ) Update dependencies and Dockerfile: * upgrade nccl-tests and rccl-tests to current latest version to match NCCL/RCCL versions * unify image tag names on DockerHub * remove verbose output in Dockerfile and minor fix some flags	2022-07-06 10:31:41 +00:00
Yifan Xiong	325a7338bf	Fix incorrect ulimit config in Dockerfile (#364 ) Fix incorrect ulimit nofile config in Dockerfile. Instead of bash, sh is used by default where `echo` does not accept any parameters and `-e` is written into /etc/security/limits.conf.	2022-06-24 14:14:00 +00:00
Yifan Xiong	bfaa1c837b	Support multiple IB/GPU in ib validation (#363 ) Description Support multiple IB/GPU devices run simultaneously in ib validation benchmark. Major Revisions - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.	2022-06-24 08:35:20 +00:00
Yifan Xiong	483bf782e1	Update ROCm Dockerfile (#361 ) Description Update ROCm Dockerfile. Major Revisions - Add dockerfile for ROCm 5.1.3 - Merge 5.1.x and 5.0.x dockerfile - Remove 4.2 and 4.0 legacy - Update build pipeline accordingly	2022-06-19 17:26:39 +08:00
Yifan Xiong	60a3c74306	Fix cmake and build issues (#360 ) Description Fix cmake and build issues. Major Revision * Remove unnecessary boost build * Remove user-agent for mlc * Remove -j for third party to build each project in sequence * Fix ansible collections installation path	2022-06-15 13:07:57 +08:00
Yuting Jiang	3f135e4669	Dockerfile - Add support to run sb command inside docker image (#356 ) Description Add support to run sb command inside docker image - install missing dependency.	2022-06-01 01:11:28 +08:00
Yuting Jiang	e08b6d3a1c	Dockerfile: Update rccl version and fix issue in rocm5.1.1 dockerfile (#354 ) Description Update rccl version and fix issue in rocm5.1.1 dockerfile.	2022-05-27 10:46:40 +08:00
Yuting Jiang	81a4146bc1	Dockerfile - Add dockerfile for rocm5.1.1 (#353 ) Description Add dockerfile for rocm5.1.1.	2022-05-25 20:28:11 +08:00
Yuting Jiang	425b9ff865	Dockerfile - Add dockerfile for rocm5.0.1 (#319 ) Description Add dockerfile for rocm5.0.1.	2022-02-28 19:30:43 +08:00
Yuting Jiang	a4950a707e	Dockerfile - Add rocm5.0 dockerfile (#307 ) Description Add rocm5.0 dockerfile.	2022-02-26 07:12:45 +08:00
Ziyue Yang	433785fd0c	Benchmarks: Add Feature - Add GDR-only nccl-tests for Nvidia machines (#299 ) This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.	2022-02-08 17:59:48 +08:00
Yifan Xiong	ff563b66af	Release - SuperBench v0.4.0 (#278 ) __Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>	2021-12-30 16:24:00 +08:00
Hossein Pourreza	b590409e0f	Benchmarks: Add Benchmark - Add mlc benchmark to superbench (#216 ) Description Add mlc memory bandwidth and latency micro benchmark to Superbench. Major Revision - Add mlc benchmark with test and example files	2021-12-13 13:47:42 +08:00
guoshzhao	4d85630abb	Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245 ) Description Add ONNXRuntime inference benchmark based on ORT python API. Major Revision - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.	2021-12-10 13:53:11 +00:00
Ziyue Yang	008e0fe1d8	Benchmarks: Add Feature - Add CPU-initiated copy and dtod support to gpu-sm-copy benchmark (#230 ) Description This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.	2021-10-30 11:19:09 +08:00
Ziyue Yang	6a068e2534	Dockerfiles: Fix Bug - Fix ROCm GPG file URL (#234 ) Description This commit fixes the URL of ROCm GPG file.	2021-10-30 07:26:52 +08:00
Yuting Jiang	b592a7c793	Benchmarks: Build Pipeline - Add gpcnet as git submodule and building logic (#228 ) Description Add gpcnet as git submodule and building logic. Major Revision - add gpcnet as a submodule - add build logic in third_party/Makefile	2021-10-21 11:28:51 +00:00
Yifan Xiong	dfbd70b129	Release - SuperBench v0.3.0 (#212 ) Description Cherry-pick bug fixes from v0.3.0 to main. Major Revisions * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>	2021-09-26 09:30:31 +08:00
Yifan Xiong	4e431f11d2	Dockerfile - Fix ulimit nofile in Docker images (#183 ) __Description__ Resolve "too many open files" issue when runnning NCCL/RCCL on multiple nodes using Docker images, increase nofile number in limits.conf.	2021-09-02 21:38:59 +08:00
guoshzhao	115cd2e6ae	Dockerfile: Add Package - Install openmpi for ROCm images (#181 ) Description Install openmpi-4.0.0 for ROCm images.	2021-09-01 18:40:43 +08:00

1 2

63 Коммитов