Update api.md
This commit is contained in:
Родитель
2784cafd8b
Коммит
c18b5949cc
12
doc/api.md
12
doc/api.md
|
@ -54,6 +54,18 @@ processor.Decode(ids, &text);
|
|||
std::cout << text << std::endl;
|
||||
```
|
||||
|
||||
## Sampling (subword regularization)
|
||||
Calls `SentencePieceProcessor::SampleEncode` method to sample one segmentation.
|
||||
|
||||
```C++
|
||||
std::vector<std::string> pieces;
|
||||
processor.SampleEncode("This is a test.", &pieces, -1, 0.2);
|
||||
|
||||
std::vector<int> ids;
|
||||
processor.SampleEncode("This is a test.", &ids, -1, 0.2);
|
||||
```
|
||||
SampleEncode has two sampling parameters, `nbest_size` and `alpha`, which correspond to `l` and `alpha` in the [original paper](https://arxiv.org/abs/1804.10959). When `nbest_size` is -1, one segmentation is sampled from all hypothesis with forward-filtering and backward sampling algorithm.
|
||||
|
||||
## SentencePieceText proto
|
||||
You will want to use `SentencePieceText` class to obtain the pieces and ids at the same time. This proto also encodes a utf8-byte offset of each piece over user input or detokenized text.
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче