superbenchmark

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yang Wang	96cc4d9397	Bug: Executor - Fix executor for Benchmark Execution Without Explicit Framework Field (#636 ) Description Fix executor for Benchmark Execution Without Explicit Framework Field	2024-08-20 16:52:20 -07:00
Yang Wang	9a3ce39d5a	Update omegaconf version to 2.3.0 (#631 ) Update `omegaconf` version to [2.3.0](https://pypi.org/project/omegaconf/2.3.0/) as omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.1 will enforce this behaviour change. Discussion can be found at https://github.com/pypa/pip/issues/12063.	2024-07-23 14:46:28 -07:00
Ziyue Yang	cc89ee591c	Benchmarks: Revise Code - Add hipblasLt tuning to dist-inference cpp implementation (#616 ) Description Adds hipblasLt tuning to dist-inference cpp implementation.	2024-04-02 09:56:33 +08:00
Yifan Xiong	2c88db907f	Release - SuperBench v0.10.0 (#607 ) Description Cherry-pick bug fixes from v0.10.0 to main. Major Revisions * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - Upgrade pyrsmi to amdsmi python library. #601 * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605 * Dockerfile - Add rocm6.0 dockerfile #602 * Bug Fix - Bug fix for latest megatron-lm benchmark #600 * Docs - Upgrade version and release note #606 Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com> Co-authored-by: guoshzhao <guzhao@microsoft.com>	2024-01-08 05:40:52 +00:00
Ziyue Yang	719a427fe7	Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation (#586 ) Description Add distributed inference benchmark cpp implementation.	2023-12-11 06:53:51 +08:00
Ziyue Yang	4fa60be7cd	Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588 ) Description Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy	2023-12-08 23:22:38 +08:00
Yuting Jiang	dd5a6329ed	Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582 ) Description Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark	2023-12-07 09:37:09 +08:00
Ziyue Yang	254ea7feba	Benchmarks: Micro benchmark - Add graph mode in NCCL/RCCL benchmarks for latency metrics (#583 ) Description Revise NCCL/RCCL benchmarks to graph mode add latency metrics.	2023-12-05 16:48:13 +08:00
Yuting Jiang	9ae8c67093	Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation (#581 ) Description Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation Major Revision - Support cpu-gpu and gpu-cpu in ib-validation Minor Revision - support multi msg size, multi direction, multi ib commands in ib-validation	2023-12-04 22:20:46 +08:00
Yuting Jiang	2235e084ab	Benchmarks: Micro benchmark - add initialization options for rocm gemm flops (#578 ) Description add initialization options for rocm gemm flops.	2023-11-22 12:52:22 +00:00
Yuting Jiang	79089b6517	Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576 ) Description hipblaslt function benchmark and rebase cublaslt function benchmark.	2023-11-22 19:48:10 +08:00
guoshzhao	9f4880cb8e	Analyzer - Generate baseline given results from multiple nodes. (#575 ) Description Generate baseline given results from multiple nodes. Major Revision - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by: 454314380 <454314380@qq.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-11-22 14:42:32 +08:00
Yuting Jiang	f53d941a22	Benchmarks: micro benchmarks - add int8 support for cublaslt function (#574 ) Description add int8 support for cublaslt function.	2023-11-20 11:21:20 +08:00
Yuting Jiang	c7800bb8e0	Bug Fix - remove cp ptx file command in gpu burn test (#567 ) Description remove cp ptx file in gpu burn test since the command is run inside self.args.bin_dir dir. `d246bab430/superbench/benchmarks/micro_benchmarks/micro_base.py (L183)`	2023-11-14 03:52:56 +00:00
pnunna93	67f2aa7237	Benchmarks: model benchmarks - change torch.distributed.launch to torchrun (#556 ) This PR has following changes - torch.distributed.launch changed to torchrun. torch.distributed.launch is deprecated in latest Pytorch and is recommended to move to torchrun - https://pytorch.org/docs/stable/elastic/run.html - Changes to AMD GPU detection logic. The AMD GPU detection logic throws warning when containers have only renderD in /dev/dri, this change would resolve those warnings --------- Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-08-08 13:03:32 +08:00
Yuting Jiang	e8ac0b1e28	Benchmarks: micro benchmarks - add python code for DirectXGPUEncodingLatency (#548 ) Description add python code for DirectXGPUEncodingLatency.	2023-07-06 15:31:28 +08:00
Yuting Jiang	c8c079c2af	Benchmarks: micro benchmarks - add python code for DirectXGPUCopy (#546 ) Description add python code for DirectXGPUCopy.	2023-07-06 00:15:32 +08:00
Yuting Jiang	af4cfd5bbf	Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547 ) Description add python code for DirecXGPUMemBw.	2023-07-05 22:07:13 +08:00
Yuting Jiang	f1d608aef7	Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542 ) Description add python code for DirectX core flops and init DirectX test pipeline. Major Revision - add python code for DirectX core flops - init DirectX test pipeline Minor Revision - add test for DirectX core flops	2023-07-05 16:56:21 +08:00
Yuting Jiang	3704a432b9	CI/CD - Support DirectX test pipeline (#545 ) Description Support DirectX test pipeline.	2023-07-05 11:33:40 +08:00
Yuting Jiang	97f7b1df86	Benchmarks: microbenchmark - add auto selecting algorithm support for cudnn functions (#540 ) Description add auto selecting algorithm support for cudnn functions. Major Revision - add auto selecting algorithm support for cudnn functions in source code - add 'auto_algo' option in benchmark - add related test	2023-06-30 12:58:41 +00:00
Yifan Xiong	7184bdd1ed	Benchmarks - Update result parsing in tensorrt inference (#541 ) * Update result parsing for newer tensorrt versions * Update arguments when load torchvision models	2023-06-30 11:22:46 +08:00
guoshzhao	f38a9829d0	ModelBenchmarks - Fix early stop logic due to num_steps. (#522 ) Description Model benchmarks can stop due to `num_steps` or `duration` config which will take effect when the value is set greater than 0. If both are set greater than 0, the earliest condition reached will work.	2023-04-28 13:15:47 +08:00
Yifan Xiong	51761b3af1	Release - SuperBench v0.8.0 (#517 ) Description Cherry-pick bug fixes from v0.8.0 to main. Major Revisions * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by: guoshzhao <guzhao@microsoft.com> Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-04-14 12:57:55 +00:00
Ziyue Yang	8daef211dd	Benchmarks - Add distributed inference benchmark (#493 ) Description This PR adds a micro-benchmark of distributed model inference workloads. Major Revision - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>	2023-03-24 17:15:17 +08:00
guoshzhao	a9b45a072e	Monitor - Support cgroup V2 when read system metrics. (#491 ) Description Since ubuntu 22.04 will use cgroup V2 and the file structure changed. Modify the monitor to adapt to cgroup v1 and v2.	2023-03-22 08:33:18 +00:00
Yifan Xiong	dbeba8056b	Benchmark - Support batch/shape range in cublaslt gemm (#494 ) Support batch and shape range with multiplication factors in cublaslt gemm benchmark.	2023-03-22 13:22:36 +08:00
rafsalas19	655bd0aa59	Adding HPL benchmark (#482 ) Description - Adding HPL benchmark --------- Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>	2023-03-21 16:44:08 +00:00
rafsalas19	32896ca477	Adding Stream Benchmark (#473 ) Description - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by: Peng Cheng <chengpeng5555@outlook.com> Co-authored-by: Yifan Xiong <xiongyf@yandex.com>	2023-02-13 15:34:37 -05:00
Yifan Xiong	b07fda155e	Release - SuperBench v0.7.0 (#468 ) Description Cherry-pick bug fixes from v0.7.0 to main. Major Revisions * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2023-01-28 11:07:06 +08:00
Yang Wang	ccccd988df	Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454 ) Support traffic patterns under the different devices in NCCL/RCCL test * change the metrics format if specified the pattern	2023-01-04 12:30:32 +00:00
Yang Wang	8e748d5649	Runner - Generate host groups file in mpi mode (#458 ) Major Revision - Add an option for pattern to generate mpi_pattern.txt file if specified the path. - In mpi pattern, serial_index and parallel_index will add in each benchmark as environment variables. Minor Revision - Fix typo	2023-01-04 19:49:14 +08:00
Yifan Xiong	5197cdf5cb	Benchmarks - Support FP8 in BERT models (#446 ) Support FP8 in PyTorch BERT models: * add fp8 hybrid/e4m3/e5m2 in precision arguments * build BERT encoders with `te.TransformerLayer` to repalce `transformers.BertModel` * wrap forward steps with fp8 autocast	2023-01-04 11:12:05 +08:00
Yang Wang	65e433c0c6	Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437 ) Description Support the following patterns in `mpi` mode: * `k-batch` * `topo-aware`	2023-01-03 10:28:35 +00:00
Yifan Xiong	616e7a5a5a	Benchmarks - Integrate cublaslt micro-benchmark (#455 ) Integrate cublaslt-gemm micro-benchmark #451.	2023-01-03 08:54:40 +00:00
Yuting Jiang	75573f59da	Benchmarks: Micro benchmarks - Add correctness check in cublas-function benchmark (#452 ) Description Add correctness check in cublas-function benchmark. Major Revision - add python code of correctness check in cublas-function benchmark and test	2023-01-03 14:59:30 +08:00
Yuting Jiang	9dfefce350	Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445 ) Description Add stdout logging util module and enable real-time logging flushing in executor Major Revision - Add stdout logging util module to redirect stdout into file log - enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log` - enable real-time log flushing in run_command of microbenchmarks through config `log_flushing` Minor Revision - add log_n_step args to enable regular step time log in model benchmarks - udpate related docs	2022-12-30 09:40:28 +00:00
Yang Wang	7838b6b154	Runner - Support `pair-wise` pattern in `mpi` mode (#447 ) * Extract pair-wise pattern from ib_validation	2022-12-29 08:23:36 +00:00
Yuting Jiang	6583ba2e40	Benchmark: Revision - Add wait time option to resolve mem-bw unstable issue (#438 ) Description Add wait time option to resolve mem-bw unstable issue.	2022-12-14 17:21:02 +08:00
Yang Wang	e4eeda0afd	Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430 ) * add mpi-parallels mode * update according to comments * fix and update doc * update * merge into 'mpi' mode * udpate according to comments * fix testcases * fix ansible * regard pattern as field * udpate * fix flake8 version * add flake8 range * remove map-by from host config * udpate comments	2022-11-29 12:30:10 +08:00
Yifan Xiong	1b86503d1e	CLI - Add non-zero return code for `sb [deploy,run]` (#425 ) Add non-zero return code for `sb deploy` and `sb run` command when there're Ansible failures in control plane. Return code is set to count of failure. For failures caused by benchmarks, return code is still set per benchmark in results json file.	2022-11-01 10:46:19 +08:00
Yifan Xiong	d7bb8303fb	CLI - Update version to include revision hash and date (#427 ) Update version to include revision hash and date in "{last tag}+g{git hash}.d{date}" format, here're the examples: * exact tag: 0.6.0 * commit after tag: 0.6.0+gcbb1b34 * commit after tag with local changes: 0.6.0+gcbb1b34.d20221028	2022-10-31 10:44:41 +08:00
Yuting Jiang	3367c4f6cc	Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414 ) Description Add support to allow list of custom config string in cudnn-functions and cublas-functions.	2022-10-18 09:59:51 +08:00
Yifan Xiong	63e9b2d1bc	Release - SuperBench v0.6.0 (#409 ) Description Cherry-pick bug fixes from v0.6.0 to main. Major Revisions * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by: Yang Wang <yangwang1@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>	2022-09-06 18:06:05 +08:00
Yuting Jiang	733860d715	Analyzer - Add support to store values of metrics in data diagnosis (#392 ) Description Add support to store values of metrics in data diagnosis. Take the following rules as example: ``` nccl_store_rule: categories: NCCL_DIS store: True metrics: - nccl-bw:allreduce-run0/allreduce_1073741824_busbw - nccl-bw:allreduce-run1/allreduce_1073741824_busbw - nccl-bw:allreduce-run2/allreduce_1073741824_busbw - nccl-bw:allreduce-run3/allreduce_1073741824_busbw - nccl-bw:allreduce-run4/allreduce_1073741824_busbw nccl_rule: function: multi_rules criteria: 'lambda label:True if min(label["nccl_store_rule"].values())/max(label["nccl_store_rule"].values())<0.95 else False' categories: NCCL_DIS ``` nccl_store_rule will store the values of the metrics in dict and save them into `label["nccl_store_rule"]` , and then rccl_rule can use the values of metrics through `label["nccl_store_rule"].values()` in criteria	2022-08-23 03:25:32 +00:00
Yuting Jiang	10a79c4ea8	Analyzer - Add support for both jsonl and json format in data diagnosis (#388 ) Description Add support for both jsonl and json format in data diagnosis. Major Revision - Add support for both jsonl and json format in data diagnosis Minor Revision - change related doc - add jsonl support in cli	2022-08-22 10:57:00 +08:00
Yuting Jiang	b5c7c85d17	Analyzer: Rename fields in json of data diagnosis to be more readable (#382 ) Description Rename field in data diagnosis to be more readable. Major Revision - rename fields according to diagnosis/metric format Minor Revision - change type of diagnosis/issue_num to be int	2022-08-09 10:03:50 +08:00
Yifan Xiong	9b8df883ae	Gracefully exit when timeout (#383 ) * Gracefully exit when timeout, add corresponding log and return code. * Set minimum timeout to 1 minute and enlarge Ansible timeout.	2022-08-04 13:05:34 +08:00
Yuting Jiang	ec16d42564	Analyzer - Add failure check feature in data diagnosis (#378 ) Description Add failure check feature in data diagnosis. Major Revision - Add failure check rule op to support that if there exists metric_regex not been matched by any metric in result, label as failedtest - Split performance issue and failedtest in categories Minor Revision - replace DataFrame.append() with pd.concat since append() will be removed in later version of pandas	2022-08-01 12:35:35 +08:00
Jie Zhang	ef4d65745b	Support topo-aware IB performance validation (#373 ) * Support topo-aware IB performance validation Add a new pattern `topo-aware`, so the user can run IB performance test based on VM's topology information. This way, the user can validate the IB performance across VM pairs with different distance as a quick test instead of pair-wise test. To run with topo-aware pattern, user needs to specify three required (and two optional) parameters in YAML config file: --pattern topo-aware --ibstat path to ibstat output --ibnetdiscover path to ibnetdiscover output --min_dist minimum distance of VM pairs (optional, default 2) --max_dist maximum distance of VM pairs (optional, default 6) The newly added topo_aware module then parses the topology information, builds a graph, and generates the VM pairs with the specified distance (# hops). The specified IB test will then be running across these generated VM pairs. Signed-off-by: Jie Zhang <jessezhang1010@gmail.com> * Add description about topology aware ib traffic tests Signed-off-by: Jie Zhang <jessezhang1010@gmail.com> * Add unit test to verify generated topology aware config file This commit adds unit test to verify the generated topology aware config file is correct. To do so, four new data files are added in order to invoke gen_topo_aware_config function to generate topology aware config file, then compares it with the expected config file. Signed-off-by: Jie Zhang <jessezhang1010@gmail.com> * Fix lint issue on Azure pipeline Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>	2022-07-26 16:56:19 -07:00

1 2 3 4

184 Коммитов