Π“Ρ€Π°Ρ„ ΠΊΠΎΠΌΠΌΠΈΡ‚ΠΎΠ²

2776 ΠšΠΎΠΌΠΌΠΈΡ‚ΠΎΠ²

Автор SHA1 Π‘ΠΎΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅ Π”Π°Ρ‚Π°
Aymeric Augustin bc1715c1e0 Add black-compatible isort configuration.
lines_after_imports = 2 is a matter of taste; I like it.
2019-12-21 17:53:18 +01:00
Aymeric Augustin 36883c1192 Add "make style" to format code with black. 2019-12-21 17:53:18 +01:00
Aymeric Augustin 6e5291a915 Enforce black in CI. 2019-12-21 17:53:18 +01:00
Aymeric Augustin fa84ae26d6 Reformat source code with black.
This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.
2019-12-21 17:52:29 +01:00
Aymeric Augustin 63e3827c6b Remove empty file.
Likely it was added by accident.
2019-12-21 15:38:08 +01:00
Thomas Wolf 645713e2cb
Merge pull request #2254 from huggingface/fix-tfroberta
adding positional embeds masking to TFRoBERTa
2019-12-21 15:33:22 +01:00
Thomas Wolf 73f6e9817c
Merge pull request #2115 from suvrat96/add_mmbt_model
[WIP] Add MMBT Model to Transformers Repo
2019-12-21 15:26:08 +01:00
thomwolf 77676c27d2 adding positional embeds masking to TFRoBERTa 2019-12-21 15:24:48 +01:00
thomwolf 344126fe58 move example to mm-imdb folder 2019-12-21 15:06:52 +01:00
Thomas Wolf 5b7fb6a4a1
Merge pull request #2134 from bkkaggle/saving-and-resuming
closes #1960 Add saving and resuming functionality for remaining examples
2019-12-21 15:03:53 +01:00
Thomas Wolf 6f68d559ab
Merge pull request #2130 from huggingface/ignored-index-coherence
[BREAKING CHANGE] Setting all ignored index to the PyTorch standard
2019-12-21 14:55:40 +01:00
thomwolf 1ab25c49d3 Merge branch 'master' into pr/2115 2019-12-21 14:54:30 +01:00
thomwolf b03872aae0 fix merge 2019-12-21 14:49:54 +01:00
Thomas Wolf 518ba748e0
Merge branch 'master' into saving-and-resuming 2019-12-21 14:41:39 +01:00
Thomas Wolf 18601c3b6e
Merge pull request #2173 from erenup/master
run_squad with roberta
2019-12-21 14:33:16 +01:00
Thomas Wolf 6e7102cfb3
Merge pull request #2203 from gthb/patch-1
fix: wrong architecture count in README
2019-12-21 14:31:44 +01:00
Thomas Wolf deceb00161
Merge pull request #2177 from mandubian/issue-2106
:zip: #2106 tokenizer.tokenize speed improvement (3-8x) by caching added_tokens in a Set
2019-12-21 14:31:20 +01:00
Thomas Wolf eeb70cdd77
Merge branch 'master' into saving-and-resuming 2019-12-21 14:29:59 +01:00
Thomas Wolf ed9b84816e
Merge pull request #1840 from huggingface/generation_sampler
[WIP] Sampling sequence generator for transformers
2019-12-21 14:27:35 +01:00
thomwolf f86ed23189 update doc 2019-12-21 14:13:06 +01:00
thomwolf cfa0380515 Merge branch 'master' into generation_sampler 2019-12-21 14:12:52 +01:00
thomwolf 300ec3003c fixing run_generation example - using torch.no_grad 2019-12-21 14:02:19 +01:00
thomwolf 1c37746892 fixing run_generation 2019-12-21 13:52:49 +01:00
Thomas Wolf 7e17f09fb5
Merge pull request #1803 from importpandas/fix-xlnet-squad2.0
fix run_squad.py during fine-tuning xlnet on squad2.0
2019-12-21 13:38:48 +01:00
thomwolf 8a2be93b4e fix merge 2019-12-21 13:31:28 +01:00
Thomas Wolf 562f864038
Merge branch 'master' into fix-xlnet-squad2.0 2019-12-21 12:48:10 +01:00
Thomas Wolf 8618bf15d6
Merge pull request #1736 from huggingface/fix-tf-xlnet
Fix TFXLNet
2019-12-21 12:42:05 +01:00
Thomas Wolf 2fa8737c44
Merge pull request #1586 from enzoampil/include_special_tokens_in_bert_examples
Add special tokens to documentation for bert examples to resolve issue: #1561
2019-12-21 12:36:11 +01:00
Thomas Wolf f15f087143
Merge pull request #1764 from DomHudson/bug-fix-1761
Bug-fix: Roberta Embeddings Not Masked
2019-12-21 12:13:27 +01:00
Thomas Wolf fae4d1c266
Merge pull request #2217 from aaugustin/test-parallelization
Support running tests in parallel
2019-12-21 11:54:23 +01:00
Aymeric Augustin b8e924e10d Restore test.
This looks like debug code accidentally committed in b18509c2.

Refs #2250.
2019-12-21 08:50:15 +01:00
Aymeric Augustin 767bc3ca68 Fix typo in model name.
This looks like a copy/paste mistake. Probably this test was never run.

Refs #2250.
2019-12-21 08:46:26 +01:00
Aymeric Augustin 343c094f21 Run examples separately from tests.
This optimizes the total run time of the Circle CI test suite.
2019-12-21 08:43:19 +01:00
Aymeric Augustin 80caf79d07 Prevent excessive parallelism in PyTorch.
We're already using as many processes in parallel as we have CPU cores.
Furthermore, the number of core may be incorrectly calculated as 36
(we've seen this in pytest-xdist) which make compound the problem.

PyTorch performance craters without this.
2019-12-21 08:43:19 +01:00
Aymeric Augustin bb3bfa2d29 Distribute tests from the same file to the same worker.
This should prevent two issues:

- hitting API rate limits for tests that hit the HF API
- multiplying the cost of expensive test setups
2019-12-21 08:43:19 +01:00
Aymeric Augustin 29cbab98f0 Parallelize tests on Circle CI.
Set the number of CPUs manually based on the Circle CI resource class,
or else we're getting 36 CPUs, which is far too much (perhaps that's
the underlying hardware and not what Circle CI allocates to us).

Don't parallelize the custom tokenizers tests because they take less
than one second to run and parallelization actually makes them slower.
2019-12-21 08:43:19 +01:00
Aymeric Augustin a4c9338b83 Prevent parallel downloads of the same file with a lock.
Since the file is written to the filesystem, a filesystem lock is the
way to go here. Add a dependency on the third-party filelock library to
get cross-platform functionality.
2019-12-21 08:43:19 +01:00
Aymeric Augustin b670c26684 Take advantage of the cache when running tests.
Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.

Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.

Fix #2222.
2019-12-21 08:43:19 +01:00
Aymeric Augustin b67fa1a8d2 Download models directly to cache_dir.
This allows moving the file instead of copying it, which is more
reliable. Also it avoids writing large amounts of data to /tmp,
which may not be large enough to accomodate it.

Refs #2222.
2019-12-21 08:43:19 +01:00
Aymeric Augustin 286d5bb6b7 Use a random temp dir for writing pruned models in tests. 2019-12-21 08:43:19 +01:00
Aymeric Augustin 478e456e83 Use a random temp dir for writing file in tests. 2019-12-21 08:43:19 +01:00
Aymeric Augustin 12726f8556 Remove redundant torch.jit.trace in tests.
This looks like it could be expensive, so don't run it twice.
2019-12-21 08:43:19 +01:00
Julien Chaumond ac1b449cc9 [doc] move distilroberta to more appropriate place
cc @lysandrejik
2019-12-21 00:09:01 -05:00
Julien Chaumond 3e52915fa7 [RoBERTa] Embeddings: fix dimensionality bug 2019-12-20 19:01:27 -05:00
Dom Hudson 228f52867c Bug fix: 1764 2019-12-20 18:27:35 -05:00
Francesco a80778f40e small refactoring (only esthetic, not functional) 2019-12-20 17:21:24 -05:00
Francesco 3df1d2d144 - Create the output directory (whose name is passed by the user in the "save_directory" parameter) where it will be saved encoder and decoder, if not exists.
- Empty the output directory, if it contains any files or subdirectories.
- Create the "encoder" directory inside "save_directory", if not exists.
- Create the "decoder" directory inside "save_directory", if not exists.
- Save the encoder and the decoder in the previous two directories, respectively.
2019-12-20 17:21:24 -05:00
Lysandre a436574bfd Release: v2.3.0 2019-12-20 16:22:20 -05:00
Thomas Wolf d0f8b9a978
Merge pull request #2244 from huggingface/fix-tok-pipe
Fix Camembert and XLM-R `decode` method- Fix NER pipeline alignement
2019-12-20 22:10:39 +01:00
Thomas Wolf a557836a70
Merge pull request #2191 from huggingface/fix_sp_np
Numpy compatibility for sentence piece
2019-12-20 22:08:08 +01:00