Aymeric Augustin
bc1715c1e0
Add black-compatible isort configuration.
...
lines_after_imports = 2 is a matter of taste; I like it.
2019-12-21 17:53:18 +01:00
Aymeric Augustin
36883c1192
Add "make style" to format code with black.
2019-12-21 17:53:18 +01:00
Aymeric Augustin
6e5291a915
Enforce black in CI.
2019-12-21 17:53:18 +01:00
Aymeric Augustin
fa84ae26d6
Reformat source code with black.
...
This is the result of:
$ black --line-length 119 examples templates transformers utils hubconf.py setup.py
There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.
This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.
2019-12-21 17:52:29 +01:00
Aymeric Augustin
63e3827c6b
Remove empty file.
...
Likely it was added by accident.
2019-12-21 15:38:08 +01:00
Thomas Wolf
645713e2cb
Merge pull request #2254 from huggingface/fix-tfroberta
...
adding positional embeds masking to TFRoBERTa
2019-12-21 15:33:22 +01:00
Thomas Wolf
73f6e9817c
Merge pull request #2115 from suvrat96/add_mmbt_model
...
[WIP] Add MMBT Model to Transformers Repo
2019-12-21 15:26:08 +01:00
thomwolf
77676c27d2
adding positional embeds masking to TFRoBERTa
2019-12-21 15:24:48 +01:00
thomwolf
344126fe58
move example to mm-imdb folder
2019-12-21 15:06:52 +01:00
Thomas Wolf
5b7fb6a4a1
Merge pull request #2134 from bkkaggle/saving-and-resuming
...
closes #1960 Add saving and resuming functionality for remaining examples
2019-12-21 15:03:53 +01:00
Thomas Wolf
6f68d559ab
Merge pull request #2130 from huggingface/ignored-index-coherence
...
[BREAKING CHANGE] Setting all ignored index to the PyTorch standard
2019-12-21 14:55:40 +01:00
thomwolf
1ab25c49d3
Merge branch 'master' into pr/2115
2019-12-21 14:54:30 +01:00
thomwolf
b03872aae0
fix merge
2019-12-21 14:49:54 +01:00
Thomas Wolf
518ba748e0
Merge branch 'master' into saving-and-resuming
2019-12-21 14:41:39 +01:00
Thomas Wolf
18601c3b6e
Merge pull request #2173 from erenup/master
...
run_squad with roberta
2019-12-21 14:33:16 +01:00
Thomas Wolf
6e7102cfb3
Merge pull request #2203 from gthb/patch-1
...
fix: wrong architecture count in README
2019-12-21 14:31:44 +01:00
Thomas Wolf
deceb00161
Merge pull request #2177 from mandubian/issue-2106
...
:zip: #2106 tokenizer.tokenize speed improvement (3-8x) by caching added_tokens in a Set
2019-12-21 14:31:20 +01:00
Thomas Wolf
eeb70cdd77
Merge branch 'master' into saving-and-resuming
2019-12-21 14:29:59 +01:00
Thomas Wolf
ed9b84816e
Merge pull request #1840 from huggingface/generation_sampler
...
[WIP] Sampling sequence generator for transformers
2019-12-21 14:27:35 +01:00
thomwolf
f86ed23189
update doc
2019-12-21 14:13:06 +01:00
thomwolf
cfa0380515
Merge branch 'master' into generation_sampler
2019-12-21 14:12:52 +01:00
thomwolf
300ec3003c
fixing run_generation example - using torch.no_grad
2019-12-21 14:02:19 +01:00
thomwolf
1c37746892
fixing run_generation
2019-12-21 13:52:49 +01:00
Thomas Wolf
7e17f09fb5
Merge pull request #1803 from importpandas/fix-xlnet-squad2.0
...
fix run_squad.py during fine-tuning xlnet on squad2.0
2019-12-21 13:38:48 +01:00
thomwolf
8a2be93b4e
fix merge
2019-12-21 13:31:28 +01:00
Thomas Wolf
562f864038
Merge branch 'master' into fix-xlnet-squad2.0
2019-12-21 12:48:10 +01:00
Thomas Wolf
8618bf15d6
Merge pull request #1736 from huggingface/fix-tf-xlnet
...
Fix TFXLNet
2019-12-21 12:42:05 +01:00
Thomas Wolf
2fa8737c44
Merge pull request #1586 from enzoampil/include_special_tokens_in_bert_examples
...
Add special tokens to documentation for bert examples to resolve issue: #1561
2019-12-21 12:36:11 +01:00
Thomas Wolf
f15f087143
Merge pull request #1764 from DomHudson/bug-fix-1761
...
Bug-fix: Roberta Embeddings Not Masked
2019-12-21 12:13:27 +01:00
Thomas Wolf
fae4d1c266
Merge pull request #2217 from aaugustin/test-parallelization
...
Support running tests in parallel
2019-12-21 11:54:23 +01:00
Aymeric Augustin
b8e924e10d
Restore test.
...
This looks like debug code accidentally committed in b18509c2
.
Refs #2250 .
2019-12-21 08:50:15 +01:00
Aymeric Augustin
767bc3ca68
Fix typo in model name.
...
This looks like a copy/paste mistake. Probably this test was never run.
Refs #2250 .
2019-12-21 08:46:26 +01:00
Aymeric Augustin
343c094f21
Run examples separately from tests.
...
This optimizes the total run time of the Circle CI test suite.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
80caf79d07
Prevent excessive parallelism in PyTorch.
...
We're already using as many processes in parallel as we have CPU cores.
Furthermore, the number of core may be incorrectly calculated as 36
(we've seen this in pytest-xdist) which make compound the problem.
PyTorch performance craters without this.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
bb3bfa2d29
Distribute tests from the same file to the same worker.
...
This should prevent two issues:
- hitting API rate limits for tests that hit the HF API
- multiplying the cost of expensive test setups
2019-12-21 08:43:19 +01:00
Aymeric Augustin
29cbab98f0
Parallelize tests on Circle CI.
...
Set the number of CPUs manually based on the Circle CI resource class,
or else we're getting 36 CPUs, which is far too much (perhaps that's
the underlying hardware and not what Circle CI allocates to us).
Don't parallelize the custom tokenizers tests because they take less
than one second to run and parallelization actually makes them slower.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
a4c9338b83
Prevent parallel downloads of the same file with a lock.
...
Since the file is written to the filesystem, a filesystem lock is the
way to go here. Add a dependency on the third-party filelock library to
get cross-platform functionality.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
b670c26684
Take advantage of the cache when running tests.
...
Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.
Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.
Fix #2222 .
2019-12-21 08:43:19 +01:00
Aymeric Augustin
b67fa1a8d2
Download models directly to cache_dir.
...
This allows moving the file instead of copying it, which is more
reliable. Also it avoids writing large amounts of data to /tmp,
which may not be large enough to accomodate it.
Refs #2222 .
2019-12-21 08:43:19 +01:00
Aymeric Augustin
286d5bb6b7
Use a random temp dir for writing pruned models in tests.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
478e456e83
Use a random temp dir for writing file in tests.
2019-12-21 08:43:19 +01:00
Aymeric Augustin
12726f8556
Remove redundant torch.jit.trace in tests.
...
This looks like it could be expensive, so don't run it twice.
2019-12-21 08:43:19 +01:00
Julien Chaumond
ac1b449cc9
[doc] move distilroberta to more appropriate place
...
cc @lysandrejik
2019-12-21 00:09:01 -05:00
Julien Chaumond
3e52915fa7
[RoBERTa] Embeddings: fix dimensionality bug
2019-12-20 19:01:27 -05:00
Dom Hudson
228f52867c
Bug fix: 1764
2019-12-20 18:27:35 -05:00
Francesco
a80778f40e
small refactoring (only esthetic, not functional)
2019-12-20 17:21:24 -05:00
Francesco
3df1d2d144
- Create the output directory (whose name is passed by the user in the "save_directory" parameter) where it will be saved encoder and decoder, if not exists.
...
- Empty the output directory, if it contains any files or subdirectories.
- Create the "encoder" directory inside "save_directory", if not exists.
- Create the "decoder" directory inside "save_directory", if not exists.
- Save the encoder and the decoder in the previous two directories, respectively.
2019-12-20 17:21:24 -05:00
Lysandre
a436574bfd
Release: v2.3.0
2019-12-20 16:22:20 -05:00
Thomas Wolf
d0f8b9a978
Merge pull request #2244 from huggingface/fix-tok-pipe
...
Fix Camembert and XLM-R `decode` method- Fix NER pipeline alignement
2019-12-20 22:10:39 +01:00
Thomas Wolf
a557836a70
Merge pull request #2191 from huggingface/fix_sp_np
...
Numpy compatibility for sentence piece
2019-12-20 22:08:08 +01:00