Граф коммитов

6667 Коммитов

Автор SHA1 Сообщение Дата
Sylvain Gugger 461e8cacf9
Fix evaluation with label smoothing in Trainer (#10338) 2021-02-22 16:39:02 -05:00
Stas Bekman 622a8c5995
[trainer] add Trainer methods for metrics logging and saving (#10266)
* make logging and saving trainer built-in

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-22 13:02:53 -08:00
Tanmay Garg 94d8767ba3
Loading from last checkpoint functionality in Trainer.train (#10334)
Enhance resume_from_checkpoint argument of Trainer.train to accept
bool type. If True given, last saved checkpoint in self.args.output_dir
will be loaded. (#10280)
2021-02-22 15:33:00 -05:00
Stas Bekman eab0afc19c
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)
* implement gradient_accumulation_steps support in DeepSpeed integration

* typo

* cleanup

* cleanup
2021-02-22 11:15:59 -08:00
Stas Bekman f991daed18
defensive programming + expand/correct README (#10295) 2021-02-22 10:58:50 -08:00
Sylvain Gugger 9e147d31f6
Deprecate prepare_seq2seq_batch (#10287)
* Deprecate prepare_seq2seq_batch

* Fix last tests

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* More review comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-02-22 12:36:16 -05:00
Lysandre Debut e73a3e1891
Add note to resize token embeddings matrix when adding new tokens to voc (#10331) 2021-02-22 09:48:20 -05:00
Julien Plu 19e737b93e
Making TF Longformer-like models compliant with AMP (#10233)
* AMP

* Add LED

* Apply style

* Fix longformer
2021-02-22 15:41:56 +01:00
Lysandre Debut cd8c4c3fc2
DeBERTa-v2 fixes (#10328)
Co-authored-by: Pengcheng He <penhe@microsoft.com>

Co-authored-by: Pengcheng He <penhe@microsoft.com>
2021-02-22 07:45:18 -05:00
tagucci 88605f37a6
fix typo in conversion script (#10316)
* fix typo in conversion script

* style

Co-authored-by: Stas Bekman <stas@stason.org>
2021-02-21 07:54:27 -08:00
Stas Bekman cdd31b4de4
don't fail when there are no zombies (#10308) 2021-02-20 13:28:43 -08:00
Sylvain Gugger a2e379743c Fix style 2021-02-20 15:46:54 -05:00
cronoik a0dfc2d30f
fixes #10303 (#10304) 2021-02-20 15:21:33 -05:00
Pengcheng He 9a7e63729f
Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018)
* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;

* DeBERTa-v2

* Fix v2 model loading issue (#10129)

* Doc members

* Update src/transformers/models/deberta/modeling_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address Sylvain's comments

* Address Patrick's comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-19 18:34:44 -05:00
Sylvain Gugger f6e53e3c2b
Fix example links in the task summary (#10291) 2021-02-19 18:04:15 -05:00
Julien Plu 536aee99bb
Move the TF NER example (#10276) 2021-02-19 16:06:13 -05:00
Joe Davison cbadb5243c
Zero shot distillation script cuda patch (#10284) 2021-02-19 14:06:57 -05:00
Stas Bekman f1299f5038
Kill any run-away pytest processes (#10281) 2021-02-19 13:36:37 -05:00
Tanmay Garg 709c86b5a9
Introduce logging_strategy training argument (#10267) (#10267)
Introduce logging_strategy training argument
in TrainingArguments and TFTrainingArguments. (#9838)
2021-02-19 11:49:22 -05:00
Julien Plu 34df26ec3a
Making TF OpenAI GPT model compliant with AMP and XLA (#10261)
* Fix AMP and XLA

* Remove useless var
2021-02-19 09:33:25 -05:00
Julien Plu 3e116ed331
Making TF TransfoXL model compliant with AMP (#10264)
* Fix AMP

* Apply style

* Remove unused import
2021-02-19 06:58:07 -05:00
Julien Plu 86caeb7636
Fix XLA and AMP (#10262) 2021-02-19 06:57:16 -05:00
Julien Plu 3d72d47f09
Making TF MPNet model compliant with XLA (#10260)
* Fix XLA

* Rework cast

* Apply style
2021-02-19 06:56:41 -05:00
Julien Plu fb56bf2584
Making TF MobileBert model compliant with AMP (#10259)
* Fix AMP

* Trigger CI

* Rework cast
2021-02-19 06:55:25 -05:00
Julien Plu 2fc6284f04
Making TF Lxmert model compliant with AMP (#10257)
* Fix AMP

* Rework cast

* Apply style
2021-02-19 06:54:14 -05:00
Stas Bekman d27b28d958
[ISSUES.md] propose using google colab to reproduce problems (#10270)
* propose using google colab to reproduce problems

* Update ISSUES.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 17:15:51 -08:00
Stas Bekman 4eddc459a9
[trainer] implement support for full fp16 in evaluation/predict (#10268)
* implement --fp16_full_eval

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* add test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 17:02:35 -08:00
Stas Bekman d9a81fc0c5
fix func signature (#10271) 2021-02-18 16:44:42 -08:00
Joe Davison c6fe17557e
Script for distilling zero-shot classifier to more efficient student (#10244)
* add zero-shot distillation script

* readme wordsmithing

* clean up code

* add multi-gpu teacher inference
plus tidying up more code

* add use_fast_tokenizer arg

* update results in readme

* more readme wordsmithing

* style

* Add handle to readme

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix code block

* add error+docs about distributed & tpu

* add @sgugger format requests

* xla -> tpu

* support fp16 for teacher preds

* no checkpoint by default

* add demo colab link

* add model sharing prompt + model link

* correct resulting acc of example

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-18 17:08:45 -05:00
Stas Bekman 97e688bc22
[Trainer] memory tracker metrics (#10225)
* memory tracker metrics

* go back to eval for somewhat consistency

* handle no-gpu case

* deal with stackable eval calls

* restore callback order

* style

* simplify the API

* add test

* docs

* consistently use eval_ prefix

* improve docs

* Update src/transformers/trainer_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename method

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-18 09:27:32 -08:00
Tanmay Garg d7f38c5d1d
Introduce warmup_ratio training argument (#10229)
Introduce warmup_ratio training argument in both
TrainingArguments and TFTrainingArguments classes (#6673)
2021-02-18 12:23:33 -05:00
Julien Plu 2acae50a0c
Reduce the time spent for the TF slow tests (#10152)
* rework savedmodel slow test

* Improve savedmodel tests

* Remove useless content
2021-02-18 15:52:57 +01:00
Julien Plu 14ed3b978e
Fix AMP (#10216) 2021-02-18 06:29:43 -05:00
Julien Plu bdf1669e3f
Making TF GPT2 compliant with XLA and AMP (#10230)
* Fix XLA and AMP

* Fix AMP and XLA

* Apply style

* Apply Patrick's comment
2021-02-18 09:36:01 +01:00
Stas Bekman 5da7c78ed8
update to new script; notebook notes (#10241) 2021-02-17 15:58:08 -08:00
Stas Bekman dee876ceff
[trainer] refactor place_model_on_device logic, add deepspeed (#10243)
* refactor place_model_on_device logic, add deepspeed

* doc

* style
2021-02-17 15:52:36 -08:00
Stas Bekman d1eb88f42d
[CI] 2 fixes (#10248)
* fix invalid port

* missing requirements
2021-02-17 14:12:39 -08:00
Julien Plu 7246785a67
Make TF CTRL compliant with XLA and AMP (#10209)
* Fix XLA and AMP

* Apply style

* Remove useless cast
2021-02-17 18:54:15 +01:00
Julien Plu fdb2351ebb
Making TF XLM-like models XLA and AMP compliant (#10211)
* Fix Flaubert and XLM

* Remove useless cast

* Tiny fix

* Tiny fix
2021-02-17 18:02:48 +01:00
Julien Plu 83d803ba02
Making TF BART-like models XLA and AMP compliant (#10191)
* Update BART

* Update Blenderbot

* Update BlenderbotSmall

* Update Marian

* Update MBart

* Update MBart

* Update Pegasus

* Update template

* Fix Marian and Pegasus

* Apply style

* Default initializer

* Default initializer

* Default initializer

* Remove int32 casts

* Fix template

* Remove more cast
2021-02-17 17:48:56 +01:00
Daniel Stancl 8d79e5ca49
Fix head masking for TFT5 (#9877)
* Fix head_mask and decoder_head_mask in TFT5 models

* Enable test_headmasking both fot TFT5 tester
and TFT5EncoderOnly tester

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2021-02-17 19:00:09 +03:00
Lysandre Debut 4b91965731
Factor out methods (#10215) 2021-02-17 09:53:43 -05:00
Stas Bekman e94d63f6cb
[trainer] fix ignored columns logger (#10219)
* [trainer] fix ignored columns logger

This PR fixes a confusing log entry that says:
```
The following columns in the evaluation set don't have a corresponding argument in `T5ForConditionalGeneration.forward` and have been ignored: .
```
when everything is in order.

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-16 13:35:39 -08:00
Joe Davison 4210cd96fc
fix add_token_positions fn (#10217) 2021-02-16 14:00:05 -05:00
Sylvain Gugger 7169d1ea7b
Store FLOS as floats to avoid overflow. (#10213) 2021-02-16 11:15:15 -05:00
Zhang Cheng df1b0fb54d
set tgt_lang of MBart Tokenizer for summarization (#10205) 2021-02-16 09:39:37 -05:00
Julien Plu 5c2d66a2f5
Unlock XLA test for convbert (#10207) 2021-02-16 07:59:41 -05:00
Suraj Patil 1c8c2d9ab3
[WIP][examples/seq2seq] move old s2s scripts to legacy (#10136)
* move old s2s scripts to legacy

* add the tests back

* proper rename

* restore

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-02-15 10:48:02 -08:00
Stas Bekman 96897a3535
make the sub-group of tests run always (#10196) 2021-02-15 13:01:35 -05:00
Lysandre Debut 8cbd0bd137
Specify dataset dtype (#10195)
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2021-02-15 12:57:17 -05:00