Rico Sennrich
75a69fc153
add some more umlauts to tests to check behavior in different locales
2020-02-21 17:39:42 +01:00
Rico Sennrich
5c7b56ea97
apply BPE dropout on list, not set of symbol pairs (in line with what Provilkov et al. did)
...
simplify and optimize apply_bpe code
2019-11-14 15:14:39 +01:00
Kweonwoo Jung
f7c03abf79
apply bpe-dropout in subword-nmt cli mode
2019-11-07 13:30:05 +09:00
Rico Sennrich
a40db4510c
documentation
2019-10-30 09:07:54 +01:00
Rico Sennrich
c4aa49a086
BPE dropout (Provilkov et al., 2019)
2019-10-30 08:59:25 +01:00
Rico Sennrich
18a5c87046
Merge pull request #70 from alvations/patch-4
...
Use a single regex match with optional operator
2019-01-14 16:13:57 +00:00
alvations
6728e93e3f
Cast filter generator to list for Python3
2019-01-14 23:12:35 +08:00
alvations
f4f430acaf
re.split can catch groups and save the delimiter
2019-01-14 23:05:08 +08:00
alvations
8a94d6e6bf
added missing parameter
2019-01-14 22:53:07 +08:00
alvations
ee99a507f3
Use a single regex match with optional operator
2019-01-14 15:42:59 +08:00
Rico Sennrich
955abfe7e5
enable encoding fix in subword-bpe
...
relevant code was not run because subword_bpe.py is never executed as a script.
2018-11-12 17:56:02 +00:00
Rico Sennrich
d21ced8f86
fix subword-bpe learn-bpe in Python 2
...
fixes regression from commit 06352. Error was:
AttributeError: 'Namespace' object has no attribute 'separator'
2018-09-17 11:57:06 +01:00
Joost Bastings
bdcf459c27
pass `total_symbols` to learn_bpe
...
pass `total_symbols` to learn_bpe when using the `subword-nmt learn-bpe` command
2018-08-22 22:09:08 +02:00
Rico Sennrich
73a6e55d5b
suppert argument --total-symbols in learn_joint_bpe_and_vocab
2018-08-20 12:07:45 +01:00
Jean A. Senellart
8450bd3231
condition parameter conversion to python 2
2018-07-18 07:36:11 +10:00
Jean Senellart
d92491ff12
Merge branch 'master' into fix_unicode_separator
2018-07-18 07:25:48 +10:00
Rico Sennrich
06352533dd
enable unicode separators in Python2
...
thanks @jsenellart
2018-07-17 16:40:51 +10:00
Jean A. Senellart
a36b489094
same for glossaries
2018-07-13 04:23:54 +09:00
Jean A. Senellart
9df8997c78
enable unicode separator
2018-07-12 11:52:30 +09:00
Proyag
ba1db43457
add unittest (and fix python3 integer division in unittest)
2018-07-09 11:12:25 +02:00
Proyag
c06e87d396
handle regex as glossaries
2018-07-09 11:12:17 +02:00
Rico Sennrich
48ba99e657
fix typo in previous commit
2018-06-28 11:48:40 +01:00
Rico Sennrich
61ad855cf0
new option --total-symbols in learn-bpe
...
redefines "--symbols" to be the number of merge operations,
minus the character vocabulary size, so that "--symbols" becomes
an estimate of the final symbol vocabulary size.
thx @phikoehn
2018-06-28 11:43:56 +01:00
Lenz
7e336e0e1f
new method segment_tokens that takes and returns a list
2018-06-05 23:13:51 +03:00
Lenz
d643c5ff9a
fix: spurious .format() operation
2018-06-05 23:06:43 +03:00
Rico Sennrich
8012fd6607
fix pip package with Python3
2018-05-21 10:53:59 +01:00
Rico Sennrich
f61c957926
more consistent command line names for get-vocab
2018-05-16 16:44:15 +01:00
Rico Sennrich
748377374e
recommend subword_nmt.py as alternative to pip install in README
2018-05-16 16:32:55 +01:00
Rico Sennrich
bbf885decb
help text for subword-nmt command (and remove little-used segment_char_ngrams from command)
2018-05-16 16:10:19 +01:00
Rico Sennrich
f678226440
bugfixes to packaging
2018-05-16 14:47:59 +01:00
Rico Sennrich
65db9c5407
create symlink in old script location (with deprecation warning)
2018-05-16 14:47:23 +01:00
Rico Sennrich
4a1d3a777b
modify files for packaging; thanks to universome
2018-05-16 14:35:23 +01:00
Rico Sennrich
2a4a44b5c0
move files to package structure; add setup.py
2018-05-16 11:44:24 +01:00