superbenchmark

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yifan Xiong	ea2c10abc4	Config - Add T4 configurations for inference (#311 ) Add T4 configurations for inference.	2022-02-20 13:00:55 +00:00
Yuting Jiang	97ed12f97f	Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289 ) Description Add multi-rules feature for data diagnosis to support multiple rules' combined check. Major Revision - revise rule design to support multiple rules combination check - update related codes and tests	2022-02-20 16:59:38 +08:00
Yifan Xiong	1f48268bf5	Bug - Fix env file path (#310 ) Fix env file path for `docker run`.	2022-02-15 15:23:43 +08:00
dependabot[bot]	53fe0c4798	Bump follow-redirects from 1.14.7 to 1.14.8 in /website (#309 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.7 to 1.14.8. - [Release notes](https://github.com/follow-redirects/follow-redirects/releases) - [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.14.7...v1.14.8) --- updated-dependencies: - dependency-name: follow-redirects dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-15 13:04:37 +08:00
Yuting Jiang	e31b8c9e08	Benchmarks: Revise Code - Add support for pytorch>=1.9.0 of init_process_group (#305 ) Description Add support for pytorch>=1.9.0 of init_process_group. Major Revision - Use PrefixStore(TCPStore) to init_process_group manully for each model run	2022-02-10 22:44:01 +08:00
Yuting Jiang	4abda6f5d4	Benchmarks: Build Pipeline - Update rccl-tests submodule to fix divide by zero error (#306 ) Description Update rccl-tests submodule to fix divide by zero error.	2022-02-09 14:46:29 +00:00
Ziyue Yang	6cdf759543	Benchmarks: Revise Code - Eliminate NUMA binding for device-to-device tests in gpu_copy (#302 ) Description This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.	2022-02-09 20:30:42 +08:00
Ziyue Yang	433785fd0c	Benchmarks: Add Feature - Add GDR-only nccl-tests for Nvidia machines (#299 ) This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.	2022-02-08 17:59:48 +08:00
Ziyue Yang	682b2c120d	Benchmarks: Revise Code - Make data checking in gpu_copy optional (#301 ) This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.	2022-02-08 10:59:27 +08:00
Ziyue Yang	853890559a	Benchmarks: Revise Code - Reduce result variance in gpu_copy benchmark (#298 ) Description This commit does the following to optimize result variance in gpu_copy benchmark: 1) Add warmup phase for gpu_copy benchmark to avoid timing instability caused by first-time CUDA kernel launch overhead; 2) Use CUDA events for timing instead of CPU timestamps; 3) Make data checking an option that is not preferred to be enabled in performance test; 4) Enlarge message size in performance benchmark.	2022-02-07 13:16:13 +08:00
Yuting Jiang	28195be6db	Bug - Fix typo in document (#297 ) Fix typo in document.	2022-01-30 13:38:00 +08:00
Yifan Xiong	3419447c11	Benchmarks - Support T4 and A10 in GEMM benchmark (#294 ) Support T4 and A10 in GEMM benchmark.	2022-01-29 13:26:00 +00:00
Yifan Xiong	3524975cfc	Config - Support customized env for all modes (#295 ) Support customized env for all modes in configuration.	2022-01-29 08:19:48 +00:00
Ziyue Yang	f3d05006d4	Benchmarks: Fix Bug - Fix GPU scan logic in gpu_copy (#296 ) Fix bug of GPU scan logic in bidirectional tests.	2022-01-29 14:04:03 +08:00
guoshzhao	d03d110f55	Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287 ) Description Please write a brief description and link the related issue if have. Major Revision - Sync (do allreduce max) the E2E training results among all workers. - Avoid using ':0' in metric name if there has only one rank having output.	2022-01-28 20:35:53 +08:00
guoshzhao	d877ca2322	Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288 ) Description Add timeout feature for each benchmark. Major Revision - Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future. - Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254: [ansible.py:80][WARNING] Run failed, return code 254. - Using `timeout` command to terminate the client process.	2022-01-28 08:16:32 +00:00
Yuting Jiang	f283b53638	Config - Disable disk-benchmark in ndmv4.yaml and change batch size to 1 in default.yaml (#292 ) Description Disable disk-benchmark in ndmv4.yaml and change batch size to 1 in default.yaml	2022-01-28 06:15:19 +08:00
Yifan Xiong	7d7cd3dc63	Config - Update benchmark naming to support annotations (#284 ) __Description__ Update benchmark naming to support annotations. __Major Revisions__ - Update name for `create_benchmark_context` in executor. - Backward compatibility for model benchmarks using "_models" suffix. - Update documents.	2022-01-25 09:54:58 +00:00
Yuting Jiang	35fc06ebd1	Bug: Fix code insecure issue that binds a socket to all network interfaces (#291 ) Description Fix code insecure issue that binds a socket to all network interfaces.	2022-01-24 10:59:06 +00:00
Yuting Jiang	380ce4001c	Bug: Fix code incesure issue of integer overflow in cublas function (#290 ) Description Fix insecure issue of Multiplication result converted to larger type. Major Revision - Use a cast to ensure that the multiplication is done using the long long to avoid overflow.	2022-01-24 18:15:54 +08:00
dependabot[bot]	5f6ad0cd63	Bump nanoid from 3.1.23 to 3.2.0 in /website (#286 ) Bumps [nanoid](https://github.com/ai/nanoid) from 3.1.23 to 3.2.0. - [Release notes](https://github.com/ai/nanoid/releases) - [Changelog](https://github.com/ai/nanoid/blob/main/CHANGELOG.md) - [Commits](https://github.com/ai/nanoid/compare/3.1.23...3.2.0) --- updated-dependencies: - dependency-name: nanoid dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-01-23 21:05:11 +08:00
Ziyue Yang	74421ffee0	Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285 ) Description This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.	2022-01-21 13:45:37 +08:00
guoshzhao	fd2bc9e048	Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283 ) Description Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.	2022-01-19 10:49:56 +08:00
Yifan Xiong	f7ffc54522	CLI - Add command sb benchmark [list,list-parameters] (#279 ) __Description__ Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks. <details> <summary>Examples</summary> <pre> $ sb benchmark list -n [a-z]+-bw -o table Result -------- mem-bw nccl-bw rccl-bw </pre> <pre> $ sb benchmark list-parameters -n mem-bw === mem-bw === optional arguments: --bin_dir str Specify the directory of the benchmark binary. --duration int The elapsed time of benchmark in seconds. --mem_type str [str ...] Memory types to benchmark. E.g. htod dtoh dtod. --memory str Memory argument for bandwidthtest. E.g. pinned unpinned. --run_count int The run count of benchmark. --shmoo_mode Enable shmoo mode for bandwidthtest. default values: {'bin_dir': None, 'duration': 0, 'mem_type': ['htod', 'dtoh'], 'memory': 'pinned', 'run_count': 1} </pre> </details> __Major Revisions__ * Add `sb benchmark list` to list benchmarks matching given name. * Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name. __Minor Revisions__ * Sort format help text for argparse.	2022-01-18 08:40:03 +00:00
dependabot[bot]	9a909d2bed	Bump follow-redirects from 1.14.1 to 1.14.7 in /website (#282 ) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.1 to 1.14.7. - [Release notes](https://github.com/follow-redirects/follow-redirects/releases) - [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.14.1...v1.14.7) --- updated-dependencies: - dependency-name: follow-redirects dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-01-17 12:56:30 +08:00
dependabot[bot]	2538a7eedd	Bump shelljs from 0.8.4 to 0.8.5 in /website (#281 ) Bumps [shelljs](https://github.com/shelljs/shelljs) from 0.8.4 to 0.8.5. - [Release notes](https://github.com/shelljs/shelljs/releases) - [Changelog](https://github.com/shelljs/shelljs/blob/master/CHANGELOG.md) - [Commits](https://github.com/shelljs/shelljs/compare/v0.8.4...v0.8.5) --- updated-dependencies: - dependency-name: shelljs dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-01-17 10:51:54 +08:00
Yifan Xiong	ff563b66af	Release - SuperBench v0.4.0 (#278 ) __Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>	2021-12-30 16:24:00 +08:00
Yuting Jiang	682ed06aee	Docs - Add usage for data diagnosis (#266 ) Description Add usage for data diagnosis.	2021-12-14 03:10:29 +00:00
guoshzhao	2e10fb0dcd	Docs - Update docs for monitor. (#265 ) Description Update docs for monitor.	2021-12-13 14:07:28 +00:00
Yifan Xiong	cb8a3cfb15	Benchmarks - Add transformers for TensorRT inference (#254 ) Add transformers for TensorRT inference.	2021-12-13 13:21:32 +00:00
Ziyue Yang	10012a0a47	Docs - Add benchmark metrics for cpu-memory-bw-latency (#264 ) Description Add benchmark metrics for cpu-memory-bw-latency.	2021-12-13 19:08:19 +08:00
Ziyue Yang	b6781968f2	Benchmarks: Fix Comment - Correct benchmark name in test_gpu_copy_bw_performance.py #263 Description Benchmarks: Fix Comment - Correct benchmark name in test_gpu_copy_bw_performance.py.	2021-12-13 07:02:39 +00:00
Hossein Pourreza	b590409e0f	Benchmarks: Add Benchmark - Add mlc benchmark to superbench (#216 ) Description Add mlc memory bandwidth and latency micro benchmark to Superbench. Major Revision - Add mlc benchmark with test and example files	2021-12-13 13:47:42 +08:00
yangpanMS	c403b1ca76	Docs - Add a small note for using release container version (#262 ) Description Minor doc change to highlight sb CLI version is independent of the sb container version.	2021-12-13 03:48:11 +00:00
guoshzhao	4d85630abb	Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245 ) Description Add ONNXRuntime inference benchmark based on ORT python API. Major Revision - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.	2021-12-10 13:53:11 +00:00
Yuting Jiang	c2f942cb6f	Analyzer: Add Feature - Add basic analysis features (#248 ) Description Add basic analysis features. Major Revision - Add statistics, correlations of the raw data - Add numeric outlier detection(inter_quartile_range) - Add boxplot for selected metric	2021-12-10 11:01:59 +00:00
guoshzhao	6e357fb9d2	Monitor: Integration - Integrate monitor into Superbench (#259 ) Description Integrate monitor into Superbench. Major Revision - Initialize, start and stop monitor in SB executor. - Parse the monitor data in SB runner and merge into benchmark results. - Specify ReduceType for monitor metrics, such as MAX, MIN and LAST. - Add monitor configs into config file.	2021-12-10 09:33:13 +00:00
guoshzhao	afea9913ae	Benchmarks: Fix Bug - Set reduce_op type for metirc return_code (#261 ) Description Set the `reduce_op` type for metirc `return_code` as `None`.	2021-12-10 16:02:29 +08:00
Yuting Jiang	ed2f3c3c82	CLI - Integrate data diagnosis (#260 ) Description Add cli to integrate data diagnosis module.	2021-12-10 06:11:00 +00:00
Yuting Jiang	9f56b2198f	Benchmarks: Unify metric names of benchmarks (#252 ) Description Unify metric names of benchmarks.	2021-12-09 04:48:42 +00:00
Yuting Jiang	c13ed2a297	Analyzer: Initialization - Add baseline-based data diagnosis module (#242 ) Description Add data diagnosis module. Major Revision - Add DataDiagnosis class to support rule-based data diagnosis for result summary jsonl file of multi nodes - Add RuleOp class to define rule operators	2021-12-08 18:22:00 +08:00
Yifan Xiong	213ab14bea	Bug - Fix issues for distributed runs (#258 ) Fix issues for distributed runs: * fix config for memory bandwidth benchmarks * add throttling for high concurrency docker pull * update rsync path and exclude directories * handle exceptions when creating summary * tune for logging	2021-12-08 06:55:13 +00:00
guoshzhao	44f0270ec4	Benchmarks: Add Feature - Add return_code metric into result (#256 ) Description Add return_code metric into result and revise unit tests.	2021-12-07 07:32:37 +00:00
Yuting Jiang	655f238dbb	Docs - Add doc for data diagnosis (#249 ) Description Add doc for data diagnosis, including input, output and baseline file schema.	2021-12-06 02:49:38 +00:00
Yifan Xiong	bd8f105d2e	Benchmarks - Add config file for NDm A100 v4 (#255 ) Add config file for Azure NDm A100 v4 SKU.	2021-12-04 01:17:23 +08:00
guoshzhao	8042fa34cf	Benchmarks: Configuration - Add gpt-small into config files. (#253 ) Description Add gpt-small into config files.	2021-12-02 11:12:55 +00:00
guoshzhao	371fd61cea	Benchmarks: Add Feature - Add 'ignore_invalid' option when register benchmarks. (#247 ) Description If `ignore_invalid` is True, and 'required' arguments are not set when register the benchmark, the arguments should be provided by user in config and skip the arguments checking.	2021-12-02 10:26:56 +00:00
Yifan Xiong	b4ea97bfa4	Benchmark: Replace `-c` argument with `-N` for `numactl` in Configuration (#250 ) Description Replace `-c` argument with `-N` for `numactl` since the old `-c`/`--cpubind` argument is deprecated.	2021-12-02 09:27:03 +00:00
Ziyue Yang	b0e759f599	Benchmarks: Build Pipeline - Upgrade FIO benchmark tool (#251 ) Description Upgrade FIO benchmark tool from 3.27 to 3.28.	2021-12-01 20:33:09 +08:00
Yuting Jiang	978e88efdd	Docs: Update ib validation microbenchmark metrics (#246 ) Description Update ib validtion mirobenchmark metrics.	2021-11-30 12:58:34 +00:00

1 2 3 4 5 ...

305 Коммитов Все ветки Поиск

305 Коммитов

Все ветки