superbenchmark

Граф коммитов

Автор	SHA1	Сообщение	Дата
dependabot[bot]	02941e6e09	Bump terser from 4.8.0 to 4.8.1 in /website (#376 ) Bumps [terser](https://github.com/terser/terser) from 4.8.0 to 4.8.1. - [Release notes](https://github.com/terser/terser/releases) - [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md) - [Commits](https://github.com/terser/terser/commits) --- updated-dependencies: - dependency-name: terser dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-07-22 11:52:43 +08:00
Yifan Xiong	352ae0c95f	Fix port conflict in ib loopback (#375 ) Fix potential port conflict due to race condition between time-to-check to time-to-use, by binding the port all through. Modify the function to resolve flake8 C901 while keeping the logic same.	2022-07-20 11:30:00 +08:00
Yifan Xiong	16b6385dee	Add dependencies (#374 ) Add dependencies * include ndv4-topo.xml in cuda docker images * require requests version to avoid RequestsDependencyWarning	2022-07-13 08:42:53 +00:00
Yifan Xiong	b2875179bf	Fix issues in ib validation benchmark (#370 ) Fix several issues in ib validation benchmark: * continue running when timeout in the middle, instead of aborting whole mpi process * make timeout parameter configurable, set default to 120 seconds * avoid mixture of stdio and iostream when print to stdout * set default message size to 8M which will saturate ib in most cases * fix hostfile path issue so that it can be auto found in different cases	2022-07-09 19:57:11 +08:00
Yifan Xiong	e00a8180f6	Support node_num=1 in mpi mode (#372 ) Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in both 1 node and all nodes in one config by changing `node_num`. Update docs and add test case accordingly.	2022-07-08 09:24:17 +08:00
Yifan Xiong	9f03d5687a	Update dependencies and Dockerfile (#371 ) Update dependencies and Dockerfile: * upgrade nccl-tests and rccl-tests to current latest version to match NCCL/RCCL versions * unify image tag names on DockerHub * remove verbose output in Dockerfile and minor fix some flags	2022-07-06 10:31:41 +00:00
Yifan Xiong	a94ead34b0	CLI - Support SKU auto detect if running on Azure VM (#365 ) Support SKU auto detect and using corresponding benchmark config if running on Azure VM.	2022-07-05 10:52:39 +08:00
Yifan Xiong	620192a242	Fix issues in ib loopback benchmark (#369 ) Fix several issues in ib loopback benchmark: * use `--report_gbits` and divide by 8 to get GB/s, previous results are MiB/s / 1000 * use the ib_write_bw binary built in third_party instead of system path * update the metrics name so that different hca indices have same metric	2022-06-29 17:53:02 +00:00
Yifan Xiong	8ef7163a18	Deployment - Refine error message when GPU is not detected (#368 ) Refine error message when GPU is not detected. Possible solutions if hardware exists and drivers are already installed: * nvidia gpus: ```sh /sbin/modprobe nvidia-uvm D=`grep nvidia-uvm /proc/devices \| awk '{print $1}'` mknod -m 666 /dev/nvidia-uvm c $D 0 ``` * amd gpus ```sh modprobe amdgpu ```	2022-06-30 01:12:25 +08:00
Yifan Xiong	325a7338bf	Fix incorrect ulimit config in Dockerfile (#364 ) Fix incorrect ulimit nofile config in Dockerfile. Instead of bash, sh is used by default where `echo` does not accept any parameters and `-e` is written into /etc/security/limits.conf.	2022-06-24 14:14:00 +00:00
Yifan Xiong	bfaa1c837b	Support multiple IB/GPU in ib validation (#363 ) Description Support multiple IB/GPU devices run simultaneously in ib validation benchmark. Major Revisions - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.	2022-06-24 08:35:20 +00:00
Yifan Xiong	0f7b057a2d	Runner - Fix sudo issue when running without Docker (#362 ) Fix sudo issue when running without Docker, user account could be arbitrary in such case.	2022-06-19 11:56:36 +00:00
Yifan Xiong	483bf782e1	Update ROCm Dockerfile (#361 ) Description Update ROCm Dockerfile. Major Revisions - Add dockerfile for ROCm 5.1.3 - Merge 5.1.x and 5.0.x dockerfile - Remove 4.2 and 4.0 legacy - Update build pipeline accordingly	2022-06-19 17:26:39 +08:00
Yifan Xiong	60a3c74306	Fix cmake and build issues (#360 ) Description Fix cmake and build issues. Major Revision * Remove unnecessary boost build * Remove user-agent for mlc * Remove -j for third party to build each project in sequence * Fix ansible collections installation path	2022-06-15 13:07:57 +08:00
Yifan Xiong	a4937e95c6	Support `sb run` on host directly without Docker (#358 ) Description Support `sb run` on host directly without Docker Major Revisions - Add `--no-docker` argument for `sb run`. - Run on host directly if `--no-docker` if specified. - Update docs and tests correspondingly.	2022-06-14 10:57:01 +08:00
dependabot[bot]	528d69bd13	Bump eventsource from 1.1.0 to 1.1.1 in /website (#357 ) Bumps [eventsource](https://github.com/EventSource/eventsource) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/EventSource/eventsource/releases) - [Changelog](https://github.com/EventSource/eventsource/blob/master/HISTORY.md) - [Commits](https://github.com/EventSource/eventsource/compare/v1.1.0...v1.1.1) --- updated-dependencies: - dependency-name: eventsource dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-06 11:39:35 +08:00
dependabot[bot]	77f8048ad8	Bump cross-fetch from 3.1.4 to 3.1.5 in /website (#349 ) Bumps [cross-fetch](https://github.com/lquixada/cross-fetch) from 3.1.4 to 3.1.5. - [Release notes](https://github.com/lquixada/cross-fetch/releases) - [Commits](https://github.com/lquixada/cross-fetch/compare/v3.1.4...v3.1.5) --- updated-dependencies: - dependency-name: cross-fetch dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-02 19:06:54 +08:00
dependabot[bot]	cdd19e6f30	Bump async from 2.6.3 to 2.6.4 in /website (#351 ) Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](https://github.com/caolan/async/compare/v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-02 13:26:35 +08:00
Yuting Jiang	54da021b4d	Analyzer - Fix bugs in data diagnosis (#355 ) Description Fix bugs in data diagnosis. Major Revision - add support to get baseline of the metric which uses custom benchmark naming with ':' like 'nccl-bw:default/allreduce_8_bw:0' - save raw data of all metrics rather than metrics defined in diagnosis_rules.yaml when output_all is True - fix bug of using wrong column index when applying format(red color and percentile) in the excel	2022-06-01 17:12:38 +08:00
Yuting Jiang	3f135e4669	Dockerfile - Add support to run sb command inside docker image (#356 ) Description Add support to run sb command inside docker image - install missing dependency.	2022-06-01 01:11:28 +08:00
Yuting Jiang	e08b6d3a1c	Dockerfile: Update rccl version and fix issue in rocm5.1.1 dockerfile (#354 ) Description Update rccl version and fix issue in rocm5.1.1 dockerfile.	2022-05-27 10:46:40 +08:00
Yuting Jiang	81a4146bc1	Dockerfile - Add dockerfile for rocm5.1.1 (#353 ) Description Add dockerfile for rocm5.1.1.	2022-05-25 20:28:11 +08:00
Yifan Xiong	6681c72043	Release - SuperBench v0.5.0 (#350 ) Description Cherry-pick bug fixes from v0.5.0 to main. Major Revisions * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>	2022-04-29 16:22:55 +08:00
Yuting Jiang	712eafc373	Docs - Update links using relative file paths with extensions (#346 ) Description Update links of referencing other docs using relative file paths with extensions.	2022-04-21 07:28:19 +08:00
Jared Bowden	cb26691173	Docs - Update link to cli.md (#341 ) Description Fixes relative link in documentation: point to `../cli.md`.	2022-04-15 22:11:14 +08:00
guoshzhao	80dcc8aaec	Benchmarks: Add Benchmark - Add FAMBench based on docker benchmark (#338 ) Description Integrate FAMBench into superbench based on docker implementation: https://github.com/facebookresearch/FAMBench The script to run all benchmarks is: https://github.com/facebookresearch/FAMBench/blob/main/benchmarks/run_all.sh	2022-04-11 15:31:07 +08:00
Yuting Jiang	8dc19ca4af	CLI - Integrate output all nodes diagnosis results (#339 ) Description Integrate output all nodes diagnosis results.	2022-04-11 13:42:04 +08:00
Yuting Jiang	55b0f9d239	Analyzer: Add Feature - Output results of all nodes in data diagnosis (#336 ) Description Output results of all nodes in data diagnosis.	2022-04-10 18:57:15 +08:00
Yuting Jiang	56c9a711a8	Docs - Add usage for result summary (#337 ) Description Add usage for result summary.	2022-04-08 20:44:25 +00:00
Yuting Jiang	f15da60b2b	CLI - Integrage result summary and update output format of data diagnosis (#335 ) Description Integrage result summary and update output format of data diagnosis. Major Revision - integrage result summary - add md and html format for data diagnosis	2022-04-08 18:48:43 +08:00
guoshzhao	6d895da83c	Benchmarks: Add Feature - Provide option to save raw data into file. (#333 ) Description Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.	2022-04-01 16:26:09 +08:00
dependabot[bot]	d368d90e21	Bump minimist from 1.2.5 to 1.2.6 in /website (#334 ) Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-31 12:41:48 +08:00
Yuting Jiang	84fed1ce18	Analyzer: Add feature - Add result summary in excel,md,html format (#320 ) Description Add result summary in excel,md,html format. Major Revision - Add ResultSummary class to support result summary in excel,md,html format. - Abstract RuleBase class for common-used functions in DataDiagnosis and ResultSummary.	2022-03-24 15:32:01 +08:00
Yuting Jiang	c5aa4f4e38	Bug: Benchmarks - remove fp16 samples type converting time (#332 ) Description Remove fp16 samples type converting time for training cnn and lstm inference.	2022-03-22 12:51:52 +08:00
Yifan Xiong	a9634ef5a8	Config - Add inference config for NC A100 and NV A10 series (#329 ) Add inference config for preview SKUs, including: * [NC96ads_A100_v4](https://docs.microsoft.com/en-us/azure/virtual-machines/nc-a100-v4-series) * [NV18ads_A10_v5](https://docs.microsoft.com/en-us/azure/virtual-machines/nva10v5-series)	2022-03-21 14:24:37 +08:00
Yuting Jiang	6e74918044	Bug: Benchmarks - remove fp16 samples type converting time for cnn and lstm models (#330 ) Description Remove fp16 samples type converting time for cnn and lstm models.	2022-03-17 14:02:40 +08:00
rafsalas19	ff51a3cee9	Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324 ) Description Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn	2022-03-16 16:20:11 +08:00
Yuting Jiang	84359fd806	Bug: Executor - fix bug in result writing to files for mpi mode (#328 ) Description fix the bug in result writing to files for mpi mode.	2022-03-15 16:35:03 +00:00
Yuting Jiang	b3c95f1827	Analyzer - Add md and html output format for DataDiagnosis (#325 ) Description Add md and html output format for DataDiagnosis. Major Revision - add md and html support in file_handler - add interface in DataDiagnosis for md and HTML output Minor Revision - move excel and json output interface into DataDiagnosis	2022-03-15 18:04:11 +08:00
Yifan Xiong	f755c0b659	Bug - Fix env path to absolute path (#327 ) Fix env file path to absolute path in `docker exec`, in case there're mixed ssh and local connections or different users are used.	2022-03-09 17:16:43 +08:00
Yuting Jiang	1ec055e1c2	Analyzer: Revise - Abstract RuleBase from DataDiagnosis (#321 ) Description Abstract RuleBase from DataDiagnosis.	2022-03-07 17:25:07 +08:00
dependabot[bot]	9759527111	Bump url-parse from 1.5.8 to 1.5.10 in /website (#323 ) Bumps [url-parse](https://github.com/unshiftio/url-parse) from 1.5.8 to 1.5.10. - [Release notes](https://github.com/unshiftio/url-parse/releases) - [Commits](https://github.com/unshiftio/url-parse/compare/1.5.8...1.5.10) --- updated-dependencies: - dependency-name: url-parse dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-07 03:24:22 +00:00
Jeff Daily	a9ef0f99ab	Benchmarks - Keep BatchNorm as fp32 for pytorch cnn models cast to fp16 (#322 ) Description The BatchNorm operator is not numerically stable in fp16. PyTorch documentation recommends to keep the BN op in fp32 for fp16 AMP models. Refer to https://pytorch.org/docs/stable/amp.html#ops-that-can-autocast-to-float32. Preserving BN in fp32 for superbench more accurately reflects real workloads.	2022-03-06 13:22:43 +00:00
Yuting Jiang	425b9ff865	Dockerfile - Add dockerfile for rocm5.0.1 (#319 ) Description Add dockerfile for rocm5.0.1.	2022-02-28 19:30:43 +08:00
dependabot[bot]	74a3b1231a	Bump prismjs from 1.23.0 to 1.27.0 in /website (#318 ) Bumps [prismjs](https://github.com/PrismJS/prism) from 1.23.0 to 1.27.0. - [Release notes](https://github.com/PrismJS/prism/releases) - [Changelog](https://github.com/PrismJS/prism/blob/master/CHANGELOG.md) - [Commits](https://github.com/PrismJS/prism/compare/v1.23.0...v1.27.0) --- updated-dependencies: - dependency-name: prismjs dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-28 14:55:12 +08:00
Yuting Jiang	a4950a707e	Dockerfile - Add rocm5.0 dockerfile (#307 ) Description Add rocm5.0 dockerfile.	2022-02-26 07:12:45 +08:00
Ziyue Yang	01304706ed	Bug Fix - Fix P2P detection in gpu_copy (#317 ) Description Fix invalid reference of P2P detection result in gpu_copy.	2022-02-25 05:48:38 +08:00
Yuting Jiang	4f5027dbda	Benchmarks: Build Pipeline - Make gpcnet only for cuda (#316 ) Description Make gpcnet only for cuda.	2022-02-24 18:18:49 +08:00
Yuting Jiang	e0c491425d	Bug - Fix empty HIP_ARCHITECTURES issue in cmake>=3.21.0 (#315 ) Description Fix HIP_ARCHITECTURES is empty issue with cmake>=3.21.0. Refer to https://github.com/ROCm-Developer-Tools/HIP/pull/2364	2022-02-22 12:38:58 +00:00
dependabot[bot]	0740780bcc	Bump url-parse from 1.5.1 to 1.5.8 in /website (#313 ) Bumps [url-parse](https://github.com/unshiftio/url-parse) from 1.5.1 to 1.5.8. - [Release notes](https://github.com/unshiftio/url-parse/releases) - [Commits](https://github.com/unshiftio/url-parse/compare/1.5.1...1.5.8) --- updated-dependencies: - dependency-name: url-parse dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-21 13:03:27 +08:00

1 2 3 4 5 ...

305 Коммитов Все ветки Поиск

305 Коммитов

Все ветки