superbenchmark

Граф коммитов

Автор	SHA1	Сообщение	Дата
omahs	a4c87da0ac	Docs - fix typos (#628 ) Docs - fix typos	2024-07-25 03:49:19 +00:00
Yifan Xiong	2c88db907f	Release - SuperBench v0.10.0 (#607 ) Description Cherry-pick bug fixes from v0.10.0 to main. Major Revisions * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - Upgrade pyrsmi to amdsmi python library. #601 * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605 * Dockerfile - Add rocm6.0 dockerfile #602 * Bug Fix - Bug fix for latest megatron-lm benchmark #600 * Docs - Upgrade version and release note #606 Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com> Co-authored-by: guoshzhao <guzhao@microsoft.com>	2024-01-08 05:40:52 +00:00
Ziyue Yang	719a427fe7	Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation (#586 ) Description Add distributed inference benchmark cpp implementation.	2023-12-11 06:53:51 +08:00
Yuting Jiang	1f5031bd74	Dockerfile - Upgrade to rocm5.7 dockerfile (#587 ) Description upgrade to rocm5.7 dockerfile. --------- Co-authored-by: yukirora <yuting.jiang@microsoft.com>	2023-12-09 17:41:12 +00:00
Ziyue Yang	4fa60be7cd	Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588 ) Description Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy	2023-12-08 23:22:38 +08:00
Yuting Jiang	dd5a6329ed	Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582 ) Description Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark	2023-12-07 09:37:09 +08:00
Yuting Jiang	9ae8c67093	Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation (#581 ) Description Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation Major Revision - Support cpu-gpu and gpu-cpu in ib-validation Minor Revision - support multi msg size, multi direction, multi ib commands in ib-validation	2023-12-04 22:20:46 +08:00
guoshzhao	9f4880cb8e	Analyzer - Generate baseline given results from multiple nodes. (#575 ) Description Generate baseline given results from multiple nodes. Major Revision - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by: 454314380 <454314380@qq.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-11-22 14:42:32 +08:00
Yuting Jiang	e1df877bfe	Release - SuperBench v0.9.0 (#558 ) Description Cherry-pick bug fixes from v0.9.0 to main. Major Revision - CI/CD: pipeline - clean more disk space to fix rocm building image pipeline(#555 ) - Benchmarks: bug fix - use absolute path for input file in DirectXEncodingLatency(#554) - CI/CD - add push win docker image on release branch in pipeline (#552) - Docs - Upgrade version and release note(#557)	2023-07-27 10:42:31 +08:00
Lei Qu	c7d0beaf9e	Doc - Update outdate references in micro-benchmarks.md (#544 ) Modify link for Nvidia bandwidth test tool Description previous link is 404 Minor Revision update the link value to https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest	2023-06-30 19:17:41 +08:00
Yuting Jiang	ed027e4c8e	Tools - Add runner for sys info and update docs (#532 ) Description Add runner for sys info to automatically collect on multiple nodes and update related docs. Major Revision - add runner for sys info which will check docker status and run `sb node info` on all nodes' docker and fetch results from all nodes Minor Revision - update cli and system-info doc - update sb node info to save output info output-dir/sys-info.json	2023-06-29 06:09:44 +00:00
F̷N̷	4c0d96e5d8	Docs - Fix typo on kernel_parameters and kernel_modules in system-config (#528 ) Description Kernel_parameters and kernel_modules command and examples are exchanged.	2023-05-04 00:55:42 +00:00
guoshzhao	f38a9829d0	ModelBenchmarks - Fix early stop logic due to num_steps. (#522 ) Description Model benchmarks can stop due to `num_steps` or `duration` config which will take effect when the value is set greater than 0. If both are set greater than 0, the earliest condition reached will work.	2023-04-28 13:15:47 +08:00
Yifan Xiong	51761b3af1	Release - SuperBench v0.8.0 (#517 ) Description Cherry-pick bug fixes from v0.8.0 to main. Major Revisions * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by: guoshzhao <guzhao@microsoft.com> Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-04-14 12:57:55 +00:00
Ziyue Yang	8daef211dd	Benchmarks - Add distributed inference benchmark (#493 ) Description This PR adds a micro-benchmark of distributed model inference workloads. Major Revision - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>	2023-03-24 17:15:17 +08:00
Yifan Xiong	dbeba8056b	Benchmark - Support batch/shape range in cublaslt gemm (#494 ) Support batch and shape range with multiplication factors in cublaslt gemm benchmark.	2023-03-22 13:22:36 +08:00
rafsalas19	655bd0aa59	Adding HPL benchmark (#482 ) Description - Adding HPL benchmark --------- Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>	2023-03-21 16:44:08 +00:00
rafsalas19	32896ca477	Adding Stream Benchmark (#473 ) Description - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by: Peng Cheng <chengpeng5555@outlook.com> Co-authored-by: Yifan Xiong <xiongyf@yandex.com>	2023-02-13 15:34:37 -05:00
Yifan Xiong	b07fda155e	Release - SuperBench v0.7.0 (#468 ) Description Cherry-pick bug fixes from v0.7.0 to main. Major Revisions * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-01-28 11:07:06 +08:00
Yang Wang	ccccd988df	Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454 ) Support traffic patterns under the different devices in NCCL/RCCL test * change the metrics format if specified the pattern	2023-01-04 12:30:32 +00:00
Yang Wang	8e748d5649	Runner - Generate host groups file in mpi mode (#458 ) Major Revision - Add an option for pattern to generate mpi_pattern.txt file if specified the path. - In mpi pattern, serial_index and parallel_index will add in each benchmark as environment variables. Minor Revision - Fix typo	2023-01-04 19:49:14 +08:00
Yang Wang	65e433c0c6	Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437 ) Description Support the following patterns in `mpi` mode: * `k-batch` * `topo-aware`	2023-01-03 10:28:35 +00:00
Yifan Xiong	616e7a5a5a	Benchmarks - Integrate cublaslt micro-benchmark (#455 ) Integrate cublaslt-gemm micro-benchmark #451.	2023-01-03 08:54:40 +00:00
Yuting Jiang	9dfefce350	Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445 ) Description Add stdout logging util module and enable real-time logging flushing in executor Major Revision - Add stdout logging util module to redirect stdout into file log - enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log` - enable real-time log flushing in run_command of microbenchmarks through config `log_flushing` Minor Revision - add log_n_step args to enable regular step time log in model benchmarks - udpate related docs	2022-12-30 09:40:28 +00:00
Yang Wang	7838b6b154	Runner - Support `pair-wise` pattern in `mpi` mode (#447 ) * Extract pair-wise pattern from ib_validation	2022-12-29 08:23:36 +00:00
Yang Wang	e4eeda0afd	Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430 ) * add mpi-parallels mode * update according to comments * fix and update doc * update * merge into 'mpi' mode * udpate according to comments * fix testcases * fix ansible * regard pattern as field * udpate * fix flake8 version * add flake8 range * remove map-by from host config * udpate comments	2022-11-29 12:30:10 +08:00
Yifan Xiong	63e9b2d1bc	Release - SuperBench v0.6.0 (#409 ) Description Cherry-pick bug fixes from v0.6.0 to main. Major Revisions * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2022-09-06 18:06:05 +08:00
Yuting Jiang	10a79c4ea8	Analyzer - Add support for both jsonl and json format in data diagnosis (#388 ) Description Add support for both jsonl and json format in data diagnosis. Major Revision - Add support for both jsonl and json format in data diagnosis Minor Revision - change related doc - add jsonl support in cli	2022-08-22 10:57:00 +08:00
Yifan Xiong	626ac0a463	Update Python setup for require packages (#387 ) __Description__ Update Python setup for require packages. __Major Revisions__ * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6 * add extra entry in extras_require for nested packages * update `pip install` contents accordingly	2022-08-17 11:33:57 +08:00
Jie Zhang	ef4d65745b	Support topo-aware IB performance validation (#373 ) * Support topo-aware IB performance validation Add a new pattern `topo-aware`, so the user can run IB performance test based on VM's topology information. This way, the user can validate the IB performance across VM pairs with different distance as a quick test instead of pair-wise test. To run with topo-aware pattern, user needs to specify three required (and two optional) parameters in YAML config file: --pattern topo-aware --ibstat path to ibstat output --ibnetdiscover path to ibnetdiscover output --min_dist minimum distance of VM pairs (optional, default 2) --max_dist maximum distance of VM pairs (optional, default 6) The newly added topo_aware module then parses the topology information, builds a graph, and generates the VM pairs with the specified distance (# hops). The specified IB test will then be running across these generated VM pairs. Signed-off-by: Jie Zhang <jessezhang1010@gmail.com> * Add description about topology aware ib traffic tests Signed-off-by: Jie Zhang <jessezhang1010@gmail.com> * Add unit test to verify generated topology aware config file This commit adds unit test to verify the generated topology aware config file is correct. To do so, four new data files are added in order to invoke gen_topo_aware_config function to generate topology aware config file, then compares it with the expected config file. Signed-off-by: Jie Zhang <jessezhang1010@gmail.com> * Fix lint issue on Azure pipeline Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>	2022-07-26 16:56:19 -07:00
Yifan Xiong	e00a8180f6	Support node_num=1 in mpi mode (#372 ) Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in both 1 node and all nodes in one config by changing `node_num`. Update docs and add test case accordingly.	2022-07-08 09:24:17 +08:00
Yifan Xiong	620192a242	Fix issues in ib loopback benchmark (#369 ) Fix several issues in ib loopback benchmark: * use `--report_gbits` and divide by 8 to get GB/s, previous results are MiB/s / 1000 * use the ib_write_bw binary built in third_party instead of system path * update the metrics name so that different hca indices have same metric	2022-06-29 17:53:02 +00:00
Yifan Xiong	bfaa1c837b	Support multiple IB/GPU in ib validation (#363 ) Description Support multiple IB/GPU devices run simultaneously in ib validation benchmark. Major Revisions - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.	2022-06-24 08:35:20 +00:00
Yifan Xiong	a4937e95c6	Support `sb run` on host directly without Docker (#358 ) Description Support `sb run` on host directly without Docker Major Revisions - Add `--no-docker` argument for `sb run`. - Run on host directly if `--no-docker` if specified. - Update docs and tests correspondingly.	2022-06-14 10:57:01 +08:00
Yifan Xiong	6681c72043	Release - SuperBench v0.5.0 (#350 ) Description Cherry-pick bug fixes from v0.5.0 to main. Major Revisions * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>	2022-04-29 16:22:55 +08:00
Yuting Jiang	712eafc373	Docs - Update links using relative file paths with extensions (#346 ) Description Update links of referencing other docs using relative file paths with extensions.	2022-04-21 07:28:19 +08:00
Jared Bowden	cb26691173	Docs - Update link to cli.md (#341 ) Description Fixes relative link in documentation: point to `../cli.md`.	2022-04-15 22:11:14 +08:00
Yuting Jiang	8dc19ca4af	CLI - Integrate output all nodes diagnosis results (#339 ) Description Integrate output all nodes diagnosis results.	2022-04-11 13:42:04 +08:00
Yuting Jiang	56c9a711a8	Docs - Add usage for result summary (#337 ) Description Add usage for result summary.	2022-04-08 20:44:25 +00:00
Yuting Jiang	f15da60b2b	CLI - Integrage result summary and update output format of data diagnosis (#335 ) Description Integrage result summary and update output format of data diagnosis. Major Revision - integrage result summary - add md and html format for data diagnosis	2022-04-08 18:48:43 +08:00
guoshzhao	6d895da83c	Benchmarks: Add Feature - Provide option to save raw data into file. (#333 ) Description Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.	2022-04-01 16:26:09 +08:00
rafsalas19	ff51a3cee9	Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324 ) Description Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn	2022-03-16 16:20:11 +08:00
Yuting Jiang	97ed12f97f	Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289 ) Description Add multi-rules feature for data diagnosis to support multiple rules' combined check. Major Revision - revise rule design to support multiple rules combination check - update related codes and tests	2022-02-20 16:59:38 +08:00
Ziyue Yang	6cdf759543	Benchmarks: Revise Code - Eliminate NUMA binding for device-to-device tests in gpu_copy (#302 ) Description This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.	2022-02-09 20:30:42 +08:00
Yuting Jiang	28195be6db	Bug - Fix typo in document (#297 ) Fix typo in document.	2022-01-30 13:38:00 +08:00
Yifan Xiong	3524975cfc	Config - Support customized env for all modes (#295 ) Support customized env for all modes in configuration.	2022-01-29 08:19:48 +00:00
guoshzhao	d877ca2322	Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288 ) Description Add timeout feature for each benchmark. Major Revision - Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future. - Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254: [ansible.py:80][WARNING] Run failed, return code 254. - Using `timeout` command to terminate the client process.	2022-01-28 08:16:32 +00:00
Yifan Xiong	7d7cd3dc63	Config - Update benchmark naming to support annotations (#284 ) __Description__ Update benchmark naming to support annotations. __Major Revisions__ - Update name for `create_benchmark_context` in executor. - Backward compatibility for model benchmarks using "_models" suffix. - Update documents.	2022-01-25 09:54:58 +00:00
Ziyue Yang	74421ffee0	Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285 ) Description This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.	2022-01-21 13:45:37 +08:00
guoshzhao	fd2bc9e048	Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283 ) Description Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.	2022-01-19 10:49:56 +08:00

1 2

86 Коммитов