Граф коммитов

33 Коммитов

Автор SHA1 Сообщение Дата
Rico Sennrich 75a69fc153 add some more umlauts to tests to check behavior in different locales 2020-02-21 17:39:42 +01:00
Rico Sennrich 5c7b56ea97 apply BPE dropout on list, not set of symbol pairs (in line with what Provilkov et al. did)
simplify and optimize apply_bpe code
2019-11-14 15:14:39 +01:00
Kweonwoo Jung f7c03abf79 apply bpe-dropout in subword-nmt cli mode 2019-11-07 13:30:05 +09:00
Rico Sennrich a40db4510c documentation 2019-10-30 09:07:54 +01:00
Rico Sennrich c4aa49a086 BPE dropout (Provilkov et al., 2019) 2019-10-30 08:59:25 +01:00
Rico Sennrich 18a5c87046
Merge pull request #70 from alvations/patch-4
Use a single regex match with optional operator
2019-01-14 16:13:57 +00:00
alvations 6728e93e3f
Cast filter generator to list for Python3 2019-01-14 23:12:35 +08:00
alvations f4f430acaf re.split can catch groups and save the delimiter 2019-01-14 23:05:08 +08:00
alvations 8a94d6e6bf
added missing parameter 2019-01-14 22:53:07 +08:00
alvations ee99a507f3
Use a single regex match with optional operator 2019-01-14 15:42:59 +08:00
Rico Sennrich 955abfe7e5 enable encoding fix in subword-bpe
relevant code was not run because subword_bpe.py is never executed as a script.
2018-11-12 17:56:02 +00:00
Rico Sennrich d21ced8f86 fix subword-bpe learn-bpe in Python 2
fixes regression from commit 06352. Error was:
AttributeError: 'Namespace' object has no attribute 'separator'
2018-09-17 11:57:06 +01:00
Joost Bastings bdcf459c27
pass `total_symbols` to learn_bpe
pass `total_symbols` to learn_bpe when using the `subword-nmt learn-bpe` command
2018-08-22 22:09:08 +02:00
Rico Sennrich 73a6e55d5b suppert argument --total-symbols in learn_joint_bpe_and_vocab 2018-08-20 12:07:45 +01:00
Jean A. Senellart 8450bd3231 condition parameter conversion to python 2 2018-07-18 07:36:11 +10:00
Jean Senellart d92491ff12
Merge branch 'master' into fix_unicode_separator 2018-07-18 07:25:48 +10:00
Rico Sennrich 06352533dd enable unicode separators in Python2
thanks @jsenellart
2018-07-17 16:40:51 +10:00
Jean A. Senellart a36b489094 same for glossaries 2018-07-13 04:23:54 +09:00
Jean A. Senellart 9df8997c78 enable unicode separator 2018-07-12 11:52:30 +09:00
Proyag ba1db43457 add unittest (and fix python3 integer division in unittest) 2018-07-09 11:12:25 +02:00
Proyag c06e87d396 handle regex as glossaries 2018-07-09 11:12:17 +02:00
Rico Sennrich 48ba99e657 fix typo in previous commit 2018-06-28 11:48:40 +01:00
Rico Sennrich 61ad855cf0 new option --total-symbols in learn-bpe
redefines "--symbols" to be the number of merge operations,
minus the character vocabulary size, so that "--symbols" becomes
an estimate of the final symbol vocabulary size.

thx @phikoehn
2018-06-28 11:43:56 +01:00
Lenz 7e336e0e1f new method segment_tokens that takes and returns a list 2018-06-05 23:13:51 +03:00
Lenz d643c5ff9a fix: spurious .format() operation 2018-06-05 23:06:43 +03:00
Rico Sennrich 8012fd6607 fix pip package with Python3 2018-05-21 10:53:59 +01:00
Rico Sennrich f61c957926 more consistent command line names for get-vocab 2018-05-16 16:44:15 +01:00
Rico Sennrich 748377374e recommend subword_nmt.py as alternative to pip install in README 2018-05-16 16:32:55 +01:00
Rico Sennrich bbf885decb help text for subword-nmt command (and remove little-used segment_char_ngrams from command) 2018-05-16 16:10:19 +01:00
Rico Sennrich f678226440 bugfixes to packaging 2018-05-16 14:47:59 +01:00
Rico Sennrich 65db9c5407 create symlink in old script location (with deprecation warning) 2018-05-16 14:47:23 +01:00
Rico Sennrich 4a1d3a777b modify files for packaging; thanks to universome 2018-05-16 14:35:23 +01:00
Rico Sennrich 2a4a44b5c0 move files to package structure; add setup.py 2018-05-16 11:44:24 +01:00