This commit is contained in:
Darío Hereñú 2018-06-10 22:08:22 -03:00 коммит произвёл GitHub
Родитель 20e68fa7c4
Коммит b915766b48
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 1 добавлений и 1 удалений

Просмотреть файл

@ -29,7 +29,7 @@ You can use custom normalization rule by preparing a TSV file formatted as follo
```
In this sample, UCS4 sequence [41 302 300] (hex) is converted into [1EA6] (hex). When there are ambiguities in the conversions, the longest rule is used.
Note that the tab is used as a delimiter for source and target sequence and space is used as a delimiter for UCS4 characters.
See data/nfkc.tsv as an example. Once a TSV file is prepared, you can specify it with `--normalization_rule_tsv` flag.
See [data/nfkc.tsv](data/nfkc.tsv) as an example. Once a TSV file is prepared, you can specify it with `--normalization_rule_tsv` flag.
```
% spm_train --normalization_rule_tsv=<rule tsv file> --input=<input> --model_prefix=<model file> --vocab_size=8000
```