Added C++ tag for all code blocks

This commit is contained in:
resec 2017-07-20 10:54:12 +08:00 коммит произвёл GitHub
Родитель fe95e7c092
Коммит 266f274da0
1 изменённых файлов: 7 добавлений и 7 удалений

Просмотреть файл

@ -4,7 +4,7 @@
To start working with the SentencePiece model, you will want to include the `sentencepiece_processor.h` header file.
Then instantiate sentencepiece::SentencePieceProcessor class and calls `Load`or `LoadOrDie` method to load the model with file path or std::istream.
```
```C++
#include <sentencepiece_processor.h>
sentencepiece::SentencePieceProcessor processor;
@ -17,7 +17,7 @@ processor.LoadOrDie("//path/to/model.model");
## Tokenize text (preprocessing)
Calls `SentencePieceProcessor::Encode` method to tokenize text.
```
```C++
std::vector<std::string> pieces;
processor.Encode("This is a test.", &pieces);
for (const std::string &token : pieces) {
@ -27,7 +27,7 @@ for (const std::string &token : pieces) {
You will obtain the sequence of vocab ids as follows:
```
```C++
std::vector<int> ids;
processor.Encode("This is a test.", &ids);
for (const int id : ids) {
@ -38,7 +38,7 @@ for (const int id : ids) {
## Detokenize text (postprocessing)
Calls `SentencePieceProcessor::Decode` method to detokenize a sequence of pieces or ids into a text. Basically it is guaranteed that the detoknization is an inverse operation of Encode, i.e., `Decode(Encode(Normalize(input))) == Normalize(input)`.
```
```C++
std::vector<std::string> pieces = { "▁This", "▁is", "▁a", "▁", "te", "st", "." }; // sequence of pieces
std::string text
processor.Decode(pieces, &text);
@ -52,7 +52,7 @@ std::cout << text << std::endl;
## SentencePieceText proto
You will want to use `SentencePieceText` class to obtain the pieces and ids at the same time. This proto also encodes a utf8-byte offset of each piece over user input or detokenized text.
```
```C++
#include <sentencepiece.pb.h>
sentencepiece::SentencePieceText spt;
@ -80,7 +80,7 @@ for (const auto &piece : spt.pieces()) {
## Vocabulary management
You will want to use the following methods to obtain ids from/to pieces.
```
```C++
processor.GetPieceSize(); // returns the size of vocabs.
processor.PieceToId("foo"); // returns the vocab id of "foo"
processor.IdToPiece(10); // returns the string representation of id 10.
@ -91,7 +91,7 @@ processor.IsControl(10); // returns true if the given id is a control token.
## Extra Options
Use `SetEncodeExtraOptions` and `SetDecodeExtraOptions` methods to set extra options for encoding and decoding respectively. These methods need to be called just after `Load/LoadOrDie` methods.
```
```C++
processor.SetEncodeExtraOptions("bos:eos"); // add <s> and </s>.
processor.SetEncodeExtraOptions("reverse:bos:eos"); // reverse the input and then add <s> and </s>.