Граф коммитов

69 Коммитов

Автор SHA1 Сообщение Дата
Yifan Xiong c0c43b8f81
Bug bash - Fix bugs in multi GPU benchmarks (#98)
* Add `sb deploy` command content.
* Fix inline if-expression syntax in playbook.
* Fix quote escape issue in bash command.
* Add custom env in config.
* Update default config for multi GPU benchmarks.
* Update MANIFEST.in to include jinja2 template.
* Require jinja2 minimum version.
* Fix occasional duplicate output in Ansible runner.
* Fix mixed color from Ansible and Python colorlog.
* Update according to comments.
* Change superbench.env from list to dict in config file.
2021-06-23 18:16:43 +08:00
guoshzhao 216c5b5c71
Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments for micro benchmark. (#100) 2021-06-21 23:34:05 +08:00
Yuting Jiang 3d72c07807
Bug bash - Rename bin name and metric name of cublas and cudnn microbenchmark (#99)
rename bin name and result metric of cublas and cudnn microbenchmark
2021-06-20 15:42:37 +08:00
Yifan Xiong ddbc51a135
Bug bash - Fix bugs and refine log in single GPU benchmarks (#97)
Fix bugs and refine log in single GPU benchmarks:

* Fix none framework issue
* Fix empty parameter bug
* Remove missed mobilenet_v3 models
* Change benchmark registration log to debug level
* Add pid in logging
* Add missing benchmarks in default config
* Fix deprecated logging warn
2021-06-16 13:51:22 +08:00
guoshzhao 03b41be145
Benchmarks: Fix Bug - Fix OOM issue when run pytorch models sequentially. (#93)
* Clean up the cache.
2021-06-07 10:19:05 +08:00
guoshzhao 2d9be807a9
Benchmarks: Fix Bug - Fix return code overwrite issue (#94)
* fix return code reset issue
2021-06-04 18:02:12 +08:00
Yifan Xiong 6b0ca1cb05
Runner - Support local mode in runner (#88)
* Support local mode in runner.
2021-06-02 23:58:44 +08:00
guoshzhao 44c5103b5c
Benchmarks: Code Revision - Change default shape of sharding-matmul. (#92)
* Change default shape of sharding-matmul.
2021-06-02 10:50:09 +08:00
guoshzhao 6c6f526937
Benchmarks: Add Benchmark - Add FLOPs performance benchmark for cuda. (#87)
* add cuda flops performance benchmark.
2021-06-02 09:15:58 +08:00
guoshzhao 331c740a15
Benchmarks: Add Feature - Add nvml package to provide python interfaces of nvidia. (#91) 2021-06-01 23:31:07 +08:00
Yuting Jiang 83235433b2
Benchmarks: Add benchmark - add micro benchmark for cudnn test (#89)
* add python related cudnn microbenchmark
2021-06-01 22:24:35 +08:00
Yuting Jiang 0831748167
Benchmarks: Code Revision - add error return code for cublas microbenchmark (#90)
* add error return code for cublas micro benchmark
2021-06-01 16:12:15 +08:00
Yuting Jiang 61c258fe5b
Benchmarks: Add benchmark - add source code of cudnn function micro benchmark (#78)
* Benchmarks: Add benchmark - add source code of cudnn function micro benchmark
2021-06-01 10:33:23 +08:00
Yifan Xiong 5e9f948df2
Executor - Save benchmark results to file (#86)
* Save benchmark results to json file.
2021-05-31 13:05:12 +08:00
Yuting Jiang 18398fbaa2
Benchmarks: Add benchmark - add micro benchmark for cublas test (#80)
* add benchmark for cublas test

* format

* revise error handling and test

* add interface to read json file, revise json file path and include .json in packaging

* add random_seed in arguments

* revise preprocess of cublas benchmark

* fix lint error and note error in source code

* update according comments

* revise input arguments from json file to custom str and convert json file to built-in dict list

* restore package config

* fit lint issue

* update platform and comments

* rename files to match source code dir and fix comments error

Co-authored-by: root <root@sb-validation-000001.51z1chmys5fuzfqyo4niepozre.bx.internal.cloudapp.net>
2021-05-31 10:31:53 +08:00
Yifan Xiong 8b4f613a76
Runner - Support torch.distributed mode in runner (#81)
* Support `torch.distributed` mode in runner.
* Support given `proc_num` and `node_num` in `torch.distributed` mode.
2021-05-28 12:29:39 +08:00
Yuting Jiang 87f6b371e8
Benchmarks: Add benchmark - add source code of cublas function micro benchmark (#77)
* Superbenchmark: Add benchmarks - add cublas function micro benchmark

* format

* add python benchmark for cublas functions, example and test file

* detele python related and rename some files

* revise cmd_helper and move json package to cmake

* resolve conflict

* revise error handing to try-catch and update some code style

* revise cmd_helper.h, cublas_helper.h, cublas_helper.cpp

* revise structure of the cublas function

* add some comments and move cuda_init and cuda_free

* add comments for class member

* add ramdom seed, revise input from file to json string, simplify cmake

* delete json file in source code of cublas

* update according comments

* limit batchcount=1 in initialization of cublas function which do not use batch count

* revise and fix some errors of annotations

* update according comments and revise construction of CublasFunction

Co-authored-by: root <root@sb-validation-000001.51z1chmys5fuzfqyo4niepozre.bx.internal.cloudapp.net>
2021-05-27 16:44:18 +08:00
Yifan Xiong e7f6d8ba78
CI/CD - Add integration tests for Ansible playbooks (#82)
* Add integration tests for Ansible playbooks
* Add `gpu_vendor` var to bypass gpu mount
2021-05-26 20:04:49 +08:00
Yuting Jiang e996516259
Benchmarks: Build Pipeline - Revise path of installing cmake projects (#83)
* Unify SB_MICRO_PATH and SB_MICRO_LIB

* fix bug of lib path
2021-05-26 19:14:14 +08:00
Yifan Xiong c05e173b3d
Runner - Implement ansible client and runner (#69)
Implement ansible client and runner:
* add ansible client
* add deploy and check_env playbooks
2021-05-23 23:53:37 +08:00
guoshzhao e977bbc17f
Benchmarks: Add Benchmark - Add kernel launch overhead benchmark. (#74)
* add kernel launch overhead benchmark.
2021-05-19 17:06:55 +08:00
Yuting Jiang b7d0ee329f
expose interface of pin memory and modify cnn configuration (#75) 2021-05-19 10:52:45 +08:00
guoshzhao 7cfe7c16cf
Benchmarks: Add Benchmark - Add the source code of cuda kernel launch overhead benchmark. (#71)
* add cuda kernel launch overhead benchmark - source part.
* can customize the nvcc_archs_support.
* set SB_MICRO_PATH for azure pipeline tests.
2021-05-18 14:07:27 +08:00
Yifan Xiong 977b1a7355
CLI - Refine CLI handlers (#68)
* use absolute path of input file
* parse registry uri from image
* merge common parts for arguments processing
2021-05-18 11:34:15 +08:00
guoshzhao 2bc7ada149
Benchmarks: Add Feature - Add script to build all cmake benchmark projects. (#72)
* add script to build all native benchmarks with cmake.
2021-05-17 18:16:54 +08:00
guoshzhao 729e04ab94
Benchmarks: Code Revision - Revise MicroBenchmark class to be more flexible. (#66)
* Revise MicroBenchmark class to be more flexible.
* use command index not the command as the parameter.
* changes according to discussion.
2021-05-13 18:58:47 +08:00
Yifan Xiong 57ce473a02
Utils - Support lazy import (#67)
__Major Revision__

* Support lazy import.
* Not importing benchmarks when running `help`, `version`, `deploy` commands, etc.
2021-05-11 10:49:22 +08:00
guoshzhao a7184da3f3
Benchmarks: Fix Bug - Increase default sample count for benchmarking. (#64) 2021-04-26 20:47:12 +08:00
guoshzhao 0324117fd3
Benchmarks: Fix Bug - Fix dataset precision for CNN and LSTM benchmarks. 2021-04-26 20:37:06 +08:00
guoshzhao 2a7ab691f1
Benchmarks: Add Benchmark - Add LSTM model benchmarks. (#60)
* Benchmarks: Add Benchmark - Add LSTM model benchmarks.
2021-04-20 10:53:44 +08:00
guoshzhao 902ea211d1
Benchmarks: Add Benchmark - Add CNN model benchmarks. (#59)
* Benchmarks: Add Benchmark - Add CNN model benchmarks.
2021-04-20 10:43:02 +08:00
guoshzhao ce3ed24ab7
Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58)
Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58)
2021-04-16 13:17:42 +08:00
guoshzhao af567cf650
Benchmarks: Add Benchmark - Add GPT2 model benchmark. (#57)
* Benchmarks: Add Benchmark - Add GPT2 model benchmark.
2021-04-16 11:39:57 +08:00
guoshzhao fb850af760
Benchmarks: Add Feature - Add interface to get all predefine parameters of all benchmarks. (#56)
* Benchmarks: Add Feature - Add interface to get all predefine parameters of all benchmarks.
2021-04-14 22:38:26 +08:00
Yuting Jiang 435b2d5eeb
Benchmarks: Add Benchmark - Add computation and communication overlap micro benchmark (#39)
* Benchmarks: Add Benchmark - add computation and communication overlap micro benchmark

* Benchmarks: Add benchmark - fix some format issues and typo

* Benchmarks: Add Benchmark - update according comments and add test

* revise tests

* skip multi gpu test due to no multi gpu

Co-authored-by: v-yujiang <v-yujiang@microsoft.com>
2021-04-14 18:07:06 +08:00
Yifan Xiong 8c5273083a
Executor - Fix issues when executing benchmarks (#51)
* fix missing package in dockerfile
* update benchmark list and parameters
* catch runtime errors
* refine logging info
2021-04-13 14:38:11 +08:00
Yifan Xiong 5711429403
CLI - Integration with Executor and Runner (#26)
* CLI integration with Executor and Runner
2021-04-12 17:38:17 +08:00
guoshzhao 7f6deabb4c
add _post_process() implementation in sharding_matmul.py to clean up distributed resource. (#46)
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-12 14:32:12 +08:00
Yifan Xiong f73d1adec5
Runner: Init - Add superbench runner class (#38)
* init runner class with not implemented
2021-04-12 12:02:31 +08:00
guoshzhao 7bd416491c
rename metric name of sharding-matmul (#48)
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-12 11:02:49 +08:00
guoshzhao 020cefbd9a
remove unused code. (#47)
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-12 10:46:29 +08:00
guoshzhao 82e267b9cc
change the condition when execute self.__matmul_nosharding() (#49)
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-12 10:34:10 +08:00
guoshzhao 1f7260912f
add _post_process() implementation in pytorch_base.py to clean up distributed resource. (#45)
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-12 10:12:04 +08:00
guoshzhao 0172968f25
add _post_process() interface. (#40)
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-12 09:58:53 +08:00
Yifan Xiong c74f4879be
Executor: Init - Add superbench executor class (#34)
Add superbench executor class

* add executor class
* update default config to exec benchmarks
* add micro benchmarks and model benchmarks
2021-04-09 20:00:22 +08:00
guoshzhao 02eef9ca34
Benchmarks: Fix Bug - Fix bug when validate the raw data format. (#35)
* fix raw data validation bug.
* address comments.


Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-09 17:09:48 +08:00
guoshzhao f0f65a719b
Benchmarks: Add Benchmark - Add op-sharding microbenchmark, including matmul and sharding_matmul. (#36)
* add microbenchmark - sharding matmul.
* address comments.

Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-09 15:02:12 +08:00
guoshzhao 923ce2773f
Benchmarks: Code Revision - Revise BenchmarkRegistry interfaces for integration with executor. (#33)
* revise BenchmarkRegistry interfaces.
* address comments

Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-08 23:17:03 +08:00
guoshzhao 2871a68b62
Benchmarks: Code Revision - Revise result process interface and add result checking (#32)
* revise result process interface

* add more comments

Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
2021-04-08 11:54:37 +08:00
Yifan Xiong 0e2b2b0829
Update logger (#28)
Update logger class.
* add file handler along with stream handler
* add colored formatter
2021-03-29 14:06:55 +08:00