Граф коммитов

171 Коммитов

Автор SHA1 Сообщение Дата
Lev Kurilenko fd1449c766
Port Reza's INT8-quantization fix to container architecture (#2725)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-02-16 10:12:18 -08:00
Molly Smith 46784cb58e
Fix auto TP for duplicate modules with different gems (#2784)
* Fix auto TP for duplicate modules with different gems

* precommit and comments

* Comment

* Combine gem list of same named modules

* remove duplicates from gem_list before updating policy

* Add module attribute with name variation for ProphetNet

---------

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-02-15 12:50:32 -08:00
Lev Kurilenko 10f3c301a0
Add container load checkpoint error reporting + refactor (#2792)
This PR refactors the organization of meta tensor checkpoint loading as follows:

- Move get_param_names() abstract method definition from TransformerPolicy into MetaTensorContainer
- Model-specific get_param_names() definitions moved from policy into model-specific container
- selected_policy_g, megatron_v2_g, and transformer_config_g globals replaced with a single container_g global, since the container will contain all of the information those globals previously captured
- ckpt_load_enabled flag added to containers that's set to False by default in the base.py container and gets set to True when the MetaTensorContainer feature is inherited
- Assertion added to replace_transformer_layer before performing checkpoint loading to check if ckpt_load_enabled ==True, otherwise an error message will be printed saying that the container does not support meta tensor checkpoint loading.

The aim of these changes is to more closely couple meta tensor checkpoint loading code to the MetaTensorContainer and to allow for better error reporting of load checkpoint use on model types that don't support this feature.
2023-02-07 23:18:30 +00:00
Lev Kurilenko 0a73e6e613
Container param cleanup + remove qkv_merging (#2780)
This PR cleans up some container items and removes an unused qkv_merging parameter:

- Remove qkv_merging=True from BERT containers
- Change containers config object to ds_model_config
- Remove qkv_merging param
2023-02-03 21:49:33 +00:00
Reza Yazdani 9f41ffe4a6
Reset KV-cache at the beginning of text-generation (#2669)
Co-authored-by: Martin Cai <martincai@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-02-03 12:07:44 -08:00
Reza Yazdani 2c6e819450
Fix Checkpoint-loading with Meta-tensor (#2781)
* Reset KV-cache at the beginning of text-generation

* Pass the ckpt-loading arguments to work with meta-tensor

* remove unrelated changes
2023-02-03 07:12:53 +00:00
Michael Wyatt ef6a958e70
Fix for diffusers v0.12.0 (#2753)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-31 15:54:46 -08:00
Ma, Guokai 98cc35b6a8
Abstract accelerator (step 3) (#2677)
* Integrate accelerator abstraction interface into deepspeed/

* Fix error message in fp16/fused_optimizer

* fix error message in fp16/unfused_optimizer.py

* assign get_accelerator().pin_memory() result to input Tensor name

* no need to check cuda and whether nvtx supported

* move try-except into inner most block

* call Event() and Stream() in get_accelerator() for data type

* Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed

* Apply op_builder backend api change from #2705 from @jeffra

* fix tests where Builder NAME is used

* keep original ...Builder.NAME interface instead of ...Builder().NAME interface

* fix builder closure for installation

* fix randomltd builder

* add comments to clarify create_op_builder and get_op_builder

* fix compatibility with pip install -e

Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2023-01-26 06:03:12 -08:00
Molly Smith d59b572911
Automatic tensor parallelism v2 (#2670)
* loop through pipe.model

* tp_parser first draft

* client_module must be type object

* Simplify layernorm tracking. Add unittest.

* cleanup

* Add more models to unittest

* cleanup inference pytest for merging

* Add unittest

* cleanup

* pre-commit

* unittest id and pytest marker

* try marian for unittest

* precommit

* Move tp code to seperate file

* Add new auto tp file

* pre-commit and type

* Update deepspeed/module_inject/auto_tp.py

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Update deepspeed/module_inject/auto_tp.py

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* Update tests/unit/inference/test_inference.py

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

* remove unused fillmask function

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2023-01-24 15:05:48 -08:00
Ammar Ahmad Awan 867da307d0
Inference Refactor (replace_with_policy, model_implementations) (#2554)
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-19 14:10:03 -08:00
Reza Yazdani 95d9a1b6c3
Fix Opt injection (#2541)
* fix Opt injection & add injection verification check at inference test

* fix several issues

* remove fixture

* remove check_injection when no kerenl is injected

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2023-01-06 13:21:49 -08:00
Jeff Rasley d9b788d773
tweaks to ds-attn, distilbert policy, and mup (#2649) 2022-12-28 10:16:02 -08:00
Jeff Rasley e0aa84c5b5
Fix issue w. bloom when changing tp size (#2645) 2022-12-23 03:27:33 +00:00
Lev Kurilenko 503706ac44
Remove GatheredParameters context from replace_with_policy (#2591)
This PR removes the zero-infernece GatheredParameters context from replace_with_policy due to no longer needing zero-inference after the introduction of meta tensor support for BLOOM.
2022-12-16 13:43:28 -08:00
Jeff Rasley 35eabb0a33
Fix issues w. python 3.6 + add py-version checks to CI (#2589) 2022-12-09 21:53:58 +00:00
Michael Wyatt ccb8eb81fb
Add checkpoint sharding unit tests (#2561)
* added checkpopint sharding tests
2022-12-08 14:35:43 -08:00
Lev Kurilenko 731965db33
Fix MegatronLayerPolicy to have megatron_v2=True (#2579)
This PR updates the MegatronLayerPolicy to set megatron_v2=True, which is required in order to properly transpose in the replace_with_policy() function.

After the change in this PR, in conjunction with PR #99 in the Megatron-DeepSpeed fork, the Megatron text-generation example works with DS inference.
2022-12-07 09:26:09 -08:00
Reza Yazdani 35b350b28c
Fix quantized-inference & Add generic support of checkpoint loading (#2547)
* fix checkpoint loading when it is a dictionary

* fix some issues with saving ckpt & int8 inference

* fix quantized-inference & add generic support of checkpoint loading

* remove int8 hard-coded flag

* fix mlp return tensors

* fix several issue to load checkpoints of GPT-J, GPT-NEOX, and OPT with different TP-size

* add more comments & description for checkpoint-loading module

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-12-06 13:49:29 -08:00
Ammar Ahmad Awan 90ae688442
Pass down the new DS inference config to replace_transformer_layer. (#2539)
* pass down the new DS inference config to replace_transformer_layer.

* remove quantize_settings and rename the ep_mp_group.

* Fix model_config passing. Fixes gptj issue with wrong output.

* fix small bug in gpt-neo.

Co-authored-by: Reza Yazdani and Michael Wyatt
2022-11-23 19:50:11 +00:00
Ammar Ahmad Awan b5d18a6ab3
DeepSpeed inference config. (#2459) (#2472)
Changes to inference API to use accept a config dict and cleaning up Inference Engine to utilize the newly added inference config.

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-11-15 00:45:43 +00:00
lokoppakmsft f2710bbe1d
Make data contiguous before the inplace reshape-copy_ function (#2489)
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-11-11 14:04:31 -08:00
Connor Holmes e7e7595502
Stable Diffusion Enhancements (#2491)
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
2022-11-09 17:40:59 -08:00
Kevin Ko 6f77da1bae
Add `scale_attn_by_inverse_layer_idx` feature (#2486)
* Add scale_attn_by_inverse_layer_idx feature

* Fix layer_id bug

* Fix scaling value

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-11-09 15:29:10 -08:00
Reza Yazdani 9cfcf7431a
Add correct memory-allocation at DeepSpeed-Attention (#2474)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
2022-11-07 16:23:25 -08:00
Ammar Ahmad Awan 35458da0e0
Create a new folder structure to isolate model-specific code in DS (#2464) 2022-11-03 17:00:44 -07:00
Connor Holmes 10e9d04c23
Cache Allocation and Softmax Fixes (#2433)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-11-02 10:48:18 -07:00
Jeff Rasley ec13da6ba7
add SD injection policy (#2381)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-10-13 16:47:12 -07:00
Andrey Chernykh cd3a70953a
Fix GPT Neo-X multi-gpu inference (#2401)
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-10-13 10:18:03 -07:00
lekurile 46a886c068
Change type to tuple in replace_wo_policy isinstance check (#2387)
Update the isinstance check inside the `replace_wo_policy` function to `tuple` and `str` instead of `dict`, since the layers are provided as a `tuple` type.

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Molly Smith <mosm@microsoft.com>
Co-authored-by: Lok Chand Koppaka <lokoppak@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2022-10-07 15:32:10 -07:00
Ammar Ahmad Awan 993264388d
Inference profiling updates/fixes (#2348) (#2349)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-09-23 14:38:09 -07:00
Stas Bekman b146aa3523
[ds-inference] fix progress bar (#2286)
when loading the non-sharded checkpoint update the progress bar (fix by @RezaYazdaniAminabadi) - I've just tested it to work.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-09-04 18:12:36 -04:00
Reza Yazdani afdc72879f
Ds-inference Int8 support through ZeroQuant technology (#2217)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-08-30 16:39:34 -07:00
Molly Smith a7ee688a6f
Update replace_module.py, test-gptj.py related fix (#2269)
Fix RuntimeError: Boolean value of Tensor with more than one value is ambiguous when running test-gptj.py
2022-08-26 23:25:27 -07:00
Reza Yazdani c35bfe89f6
fix ds-inference without policy (#2247)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-08-23 14:44:09 -07:00
Arash Bakhtiari fae896ef60
Make OPT policy backward compatible with pre-OPT transformers versions (#2254) 2022-08-23 14:38:48 -07:00
Jeff Rasley dce3acaac7
allow saving ckpt w/o ckpt json + bloom copy fix (#2237) 2022-08-19 15:01:15 -07:00
Arash Bakhtiari 8b2a63717a
Add support of OPT models (#2205)
* add opt replace policy

* simplify inf. api

* fix opt replace policy

* fix use-cash & add relu

* Add support of custom MLP act. function

* Revert "simplify inf. api"

This reverts commit 9e910fcbd5471dec9b3c92008426f5ba590bf0b6.

* fix the inference API (temp. solution)

* fix code formatting

* add unit tests for OPT models.

* refactor pre-attention layer norm configuration

* add support of opt-350m model

* refactor the HF model config initialization

* fix hf model config issue

Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-08-15 07:31:51 -07:00
Reza Yazdani 8920308c66
Fix the tensor-slicing copy for qkv parameters (#2198)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-08-10 09:34:57 -07:00
Reza Yazdani e7d9959540
fixing model partitioning without injection (#2179) 2022-08-03 20:49:11 -07:00
Reza Yazdani 556f005152
Fix random token-generation issue + MP-checkpoint loading/saving (#2132)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-07-28 17:24:07 -07:00
Alex Hedges 316c4a43e0
Add flake8 to pre-commit checks (#2051) 2022-07-25 16:48:08 -07:00
Michael Wyatt ee7ea3b805
use HF NeoX (#2087)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-07-19 12:50:58 -07:00
Stas Bekman 16699d839f
[ds-inference] checkpoint loading => tqdm (#2107)
* [ds-inference] checkpoint loading => tqdm

solve 2 issues:
- less noise using tqdm progress bar
- more informative - tell users how much to wait and how many shards to load

New way:

```
Loading 72 checkpoints:  12%|█▎        | 9/72 [01:12<08:39,  8.25s/it]
```

* write only from one process

* style
2022-07-19 09:21:19 -07:00
Reza Yazdani aa88137b8d
Add Inference support for running the BigScience-BLOOM Architecture (#2083)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2022-07-18 16:27:12 -07:00
Alex Hedges 76ea0534c1
Fix missing import in replace_module.py (#2050)
* Fix missing import in replace_module.py

* Change import from torch.distributed to deepspeed.comm
2022-06-29 17:16:26 +00:00
Jeff Rasley b666d5cd73
[inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) (#1992)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-06-15 14:21:19 -07:00
Ammar Ahmad Awan 36ad3119d5
DeepSpeed comm backend v1 (#1985)
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-06-10 16:47:33 -07:00
Reza Yazdani 8164ea9e6d
Fixing several bugs in the inference-api and the kernels (#1951)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-05-24 13:27:50 -07:00
Jeff Rasley b4fcd98ff0
Inference PP changes for neox (#1899)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2022-04-26 11:50:38 -07:00
Samyam Rajbhandari c13457b756
Supporting multiple modules injection with a single policy when they have identical architectures (#1869)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-03-30 17:47:19 +00:00
Ammar Ahmad Awan c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
2022-02-28 11:46:40 -08:00
Reza Yazdani 841f99d162
Load MoE checkpint at deepspeed inference-engine (#1759)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-02-11 10:56:13 -08:00
Reza Yazdani fa0735760a
Fix the tensor-slicing with multi-GPU inference and kernel-injection (#1724)
* use the right tensor-copy function when adding tensor-slicing

* small fix in inference tutorial
2022-02-03 09:25:11 -08:00
Reza Yazdani 94de0229fb
Fix inference api & add more description on inference engine tutorial (#1711) 2022-01-19 15:27:51 -08:00
Jeff Rasley e46d808a1b
MoE inference + PR-MoE model support (#1705)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Zhewei Yao <zheweiy@berkeley.edu>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
2022-01-18 16:25:01 -08:00
Reza Yazdani 289c3f9ba4
GPT-J inference support (#1670)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-01-08 02:40:31 +00:00
Alex Hedges fc2f378ece
Improve pre-commit hooks (#1602)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-12-01 03:12:29 +00:00
Jeff Rasley a10e4811fe
force set lf instead of crlf (https://github.com/pre-commit/pre-commit-hooks#mixed-line-ending) (#1598) 2021-11-29 15:41:18 -08:00
Reza Yazdani 9ce00a2171
Tensor-Parallelism general support (#1512)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-11-11 22:22:57 -08:00
Reza Yazdani ee6a92c066
Fixing the transformer APIs to return tuple as the output (if needed) (#1491) 2021-10-29 23:19:48 +00:00
Kamal Raj c6d1418d85
Make the replace module more configurable (#1366)
* DeepSpeedInferenceConfig
get epsilon value from config

* epsilon -> layer_norm_eps
to keep var name same as in DeepSpeedTransformerConfig

* DeepSpeedTransformerConfig
get epsilon value from config

* configurabale stochastic_mode
eg:
1. For LM pre-training True
2. For LM fine-tuning on task False

* Updated replace_module.py
checking layer_norm_eps is attribute of config
default 1e-12

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2021-10-05 13:29:50 -07:00
Alex Hedges be789b1665
Fix many typos (#1423)
* Fix typos in docs/

* Fix typos in code comments and output strings

* Fix typos in the code itself

* Fix typos in tests/

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-10-01 19:56:32 -07:00
Reza Yazdani 49b6a63251
Reducing the memory-overhead of creating model for multi-GPU run (#1244)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-08-26 11:05:43 -07:00
Reza Yazdani 6ba9628970
Fixing inference api for FP32 and non-masking GPT-based models (#1204)
* fixing inference api for FP32 and non-masking GPT-based models

* use a dummy tensor if input_mask is none

* fix input_mask

* minor fix

* send input_mask to compute_attn func for checking
2021-07-20 13:54:16 -07:00
Hyunwoong Ko 429cbc89af
Fix bugs about non-contiguous tensor broadcasting (#1168)
* Fix bugs about non-contiguous tensor broadcasting

* Fix typo

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
2021-06-23 20:52:54 -07:00
Reza Yazdani aca7fc549a
Add local attention for GPT-Neo model architecture (#1114)
* fix links for inference tutorial

* Fix automatic injection. Add the local-attention for GPT-Neo

* fix the inference for generation of large sequences (>1K & <32K)

* fix format

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2021-06-08 11:44:59 -07:00
Jeff Rasley 96eb5b12e3
delay imports for replace policies and fix missing req (#1100)
* delay imports for replace policies and fix missing req

* fix issue with _orig_layer_class always being None
2021-05-24 16:43:36 -07:00
Reza Yazdani ed3de0c21b
Quantization + inference release (#1091)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Arash Ashari <arashari@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: niumanar <60243342+niumanar@users.noreply.github.com>
2021-05-24 01:10:39 -07:00
Reza Yazdani e2dfcadf3b
Fix the bias-add and add the layer-norm-eps parameter (#791)
* fix the bias-add precision and indexing and also adding the layer-norm-eps as a configurable parameter for transformer

* add ACC_HALF config

* use defined to check if ACC_Half is defined
2021-02-24 13:51:48 -08:00
Reza Yazdani 48065c06d7
Fixing the module-inject Api (#786) 2021-02-24 10:54:09 -08:00
Jeff Rasley 44bd538b11
Module replacement support (#586)
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2021-01-06 11:03:35 -08:00