From 06770a3a651c25ed94ebc86d4b552e16c6d64ff9 Mon Sep 17 00:00:00 2001
From: Taku Kudo <taku910@users.noreply.github.com>
Date: Tue, 1 May 2018 10:40:55 +0900
Subject: [PATCH] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index d7414d8..050324e 100644
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ SentencePiece is an unsupervised text tokenizer and detokenizer mainly for
 Neural Network-based text generation systems where the vocabulary size
 is predetermined prior to the neural model training. SentencePiece implements
 **subword units** (e.g., **byte-pair-encoding (BPE)** [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)]) and 
-**unigram language model** [[Kudo.](http://acl2018.org/conference/accepted-papers/)])
+**unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)])
 with the extension of direct training from raw sentences. 
 Subword segmentation with unigram language model supports probabilistic subword sampling for **subword regularization** [[Kudo.](http://acl2018.org/conference/accepted-papers/)], a simple technique to improve the robustness of NMT model. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postp\
 rocessing.
@@ -14,7 +14,7 @@ rocessing.
 **This is not an official Google product.**
 
 ## Technical highlights
-- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](http://acl2018.org/conference/accepted-papers/)] are supported.
+- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
 - **Subword regularization**: SentencePiece implements subwrod sampling for subword regularization which helps to improve the robustness and accuracy of NMT model (Available only on unigram language model.)
 - **Purely data driven**: SentencePiece trains tokenization and detokenization
   models from only raw sentences. No pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is required.
@@ -29,7 +29,7 @@ rocessing.
 |:---|:---:|:---:|:---:|
 |Supported algorithm|BPE, unigram, char, word|BPE|BPE*|
 |OSS?|Yes|Yes|Google internal|
-|[Subword regularization](http://acl2018.org/conference/accepted-papers/)|Yes (unigram only)|No|No|
+|[Subword regularization](https://arxiv.org/abs/1804.10959bb)|Yes (unigram only)|No|No|
 |Python Library (pip)|Yes|No|N/A|
 |C++ Library|Yes|No|N/A|
 |Pre-segmentation required?|No|Yes|Yes|