From f31bdaf120ad3c4b0b313daf4947e1fb724ca8b6 Mon Sep 17 00:00:00 2001 From: Frank Xu Date: Sun, 28 May 2017 04:44:22 +0800 Subject: [PATCH 1/3] fixed a typo in the spm_train help text --- src/spm_train_main.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/spm_train_main.cc b/src/spm_train_main.cc index 0e2696e..42471b4 100644 --- a/src/spm_train_main.cc +++ b/src/spm_train_main.cc @@ -66,7 +66,7 @@ DEFINE_string(normalization_rule_name, "nfkc", "Choose from nfkc or identity"); DEFINE_string(normalization_rule_tsv, "", "Normalization rule TSV file. "); DEFINE_bool(add_dummy_prefix, kDefaultNormalizerSpec.add_dummy_prefix(), - "Add dummy whitespace at the begging of text"); + "Add dummy whitespace at the beginning of text"); DEFINE_bool(remove_extra_whitespaces, kDefaultNormalizerSpec.remove_extra_whitespaces(), "Removes leading, trailing, and " From 186cb0e3a6974caf3627fe591d9dbaeaa7715cb3 Mon Sep 17 00:00:00 2001 From: Lauri Apple Date: Sat, 3 Jun 2017 19:15:57 +0200 Subject: [PATCH 2/3] Update and rename CONTRIBUTING to CONTRIBUTING.md Made some light edits and added the .md so the Markdown would work. --- CONTRIBUTING => CONTRIBUTING.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) rename CONTRIBUTING => CONTRIBUTING.md (67%) diff --git a/CONTRIBUTING b/CONTRIBUTING.md similarity index 67% rename from CONTRIBUTING rename to CONTRIBUTING.md index 2827b7d..d0b993c 100644 --- a/CONTRIBUTING +++ b/CONTRIBUTING.md @@ -2,18 +2,17 @@ Want to contribute? Great! First, read this page (including the small print at t ### Before you contribute Before we can use your code, you must sign the -[Google Individual Contributor License Agreement] -(https://cla.developers.google.com/about/google-individual) +[Google Individual Contributor License Agreement](https://cla.developers.google.com/about/google-individual) (CLA), which you can do online. The CLA is necessary mainly because you own the -copyright to your changes, even after your contribution becomes part of our +copyright to your changes even after your contribution becomes part of our codebase, so we need your permission to use and distribute your code. We also -need to be sure of various other things—for instance that you'll tell us if you +need to be sure of various other things—for instance, that you'll tell us if you know that your code infringes on other people's patents. You don't have to sign the CLA until after you've submitted your code for review and a member has approved it, but you must do it before we can put your code into our codebase. Before you start working on a larger contribution, you should get in touch with us first through the issue tracker with your idea so that we can help out and -possibly guide you. Coordinating up front makes it much easier to avoid +possibly guide you. Coordinating up-front makes it much easier to avoid frustration later on. ### Code reviews @@ -22,6 +21,4 @@ use Github pull requests for this purpose. ### The small print Contributions made by corporations are covered by a different agreement than -the one above, the -[Software Grant and Corporate Contributor License Agreement] -(https://cla.developers.google.com/about/google-corporate). +the one above, the [Software Grant and Corporate Contributor License Agreement](https://cla.developers.google.com/about/google-corporate). From 07c5c44ae6e2cbdf088ddff3e0680f6f3ec506a4 Mon Sep 17 00:00:00 2001 From: TSUCHIYA Masatoshi Date: Wed, 14 Jun 2017 17:18:46 +0900 Subject: [PATCH 3/3] Do not ignore empty lines. --- src/spm_encode_main.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/spm_encode_main.cc b/src/spm_encode_main.cc index 3d66070..d140e41 100644 --- a/src/spm_encode_main.cc +++ b/src/spm_encode_main.cc @@ -69,6 +69,7 @@ int main(int argc, char *argv[]) { sentencepiece::io::InputBuffer input(filename); while (input.ReadLine(&line)) { if (line.empty()) { + output.WriteLine(""); continue; } process(line);