This commit is contained in:
Chris Basoglu 2017-02-02 18:23:41 -08:00
Родитель 6a603b0c79
Коммит ae372d7b09
4 изменённых файлов: 711 добавлений и 88 удалений

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -1,5 +1,16 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from IPython.display import Image"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -11,20 +22,78 @@
"\n",
"This hands-on tutorial will take you through both the basics of sequence-to-sequence networks, and how to implement them in the Microsoft Cognitive Toolkit. In particular, we will implement a sequence-to-sequence model to perform grapheme to phoneme translation. We will start with some basic theory and then explain the data in more detail, and how you can download it.\n",
"\n",
"Andrej Karpathy has a [nice visualization](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) of the five paradigms of neural network architectures:\n",
"\n",
"<img src=http://cntk.ai/jup/paradigms.jpg width=750px>\n",
"\n",
"Andrej Karpathy has a [nice visualization](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) of the five paradigms of neural network architectures:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"http://cntk.ai/jup/paradigms.jpg\" width=\"750\"/>"
],
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Figure 1\n",
"Image(url=\"http://cntk.ai/jup/paradigms.jpg\", width=750)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial, we are going to be talking about the fourth paradigm: many-to-many, also known as sequence-to-sequence networks. The input is a sequence with a dynamic length, and the output is also a sequence with some dynamic length. It is the logical extension of the many-to-one paradigm in that previously we were predicting some category (which could easily be one of `V` words where `V` is an entire vocabulary) and now we want to predict a whole sequence of those categories.\n",
"\n",
"The applications of sequence-to-sequence networks are nearly limitless. It is a natural fit for machine translation (e.g. English input sequences, French output sequences); automatic text summarization (e.g. full document input sequence, summary output sequence); word to pronunciation models (e.g. character [grapheme] input sequence, pronunciation [phoneme] output sequence); and even parse tree generation (e.g. regular text input, flat parse tree output).\n",
"\n",
"## Basic theory\n",
"\n",
"A sequence-to-sequence model consists of two main pieces: (1) an encoder; and (2) a decoder. Both the encoder and the decoder are recurrent neural network (RNN) layers that can be implemented using a vanilla RNN, an LSTM, or GRU cells (here we will use LSTM). In the basic sequence-to-sequence model, the encoder processes the input sequence into a fixed representation that is fed into the decoder as a context. The decoder then uses some mechanism (discussed below) to decode the processed information into an output sequence. The decoder is a language model that is augmented with some \"strong context\" by the encoder, and so each symbol that it generates is fed back into the decoder for additional context (like a traditional LM). For an English to German translation task, the most basic setup might look something like this:\n",
"\n",
"<img src=http://cntk.ai/jup/s2s.png width=700px>\n",
"\n",
"A sequence-to-sequence model consists of two main pieces: (1) an encoder; and (2) a decoder. Both the encoder and the decoder are recurrent neural network (RNN) layers that can be implemented using a vanilla RNN, an LSTM, or GRU cells (here we will use LSTM). In the basic sequence-to-sequence model, the encoder processes the input sequence into a fixed representation that is fed into the decoder as a context. The decoder then uses some mechanism (discussed below) to decode the processed information into an output sequence. The decoder is a language model that is augmented with some \"strong context\" by the encoder, and so each symbol that it generates is fed back into the decoder for additional context (like a traditional LM). For an English to German translation task, the most basic setup might look something like this:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"http://cntk.ai/jup/s2s.png\" width=\"700\"/>"
],
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Figure 2\n",
"Image(url=\"http://cntk.ai/jup/s2s.png\", width=700)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The basic sequence-to-sequence network passes the information from the encoder to the decoder by initializing the decoder RNN with the final hidden state of the encoder as its initial hidden state. The input is then a \"sequence start\" tag (`<s>` in the diagram above) which primes the decoder to start generating an output sequence. Then, whatever word (or note or image, etc.) it generates at that step is fed in as the input for the next step. The decoder keeps generating outputs until it hits the special \"end sequence\" tag (`</s>` above).\n",
"\n",
"A more complex and powerful version of the basic sequence-to-sequence network uses an attention model. While the above setup works well, it can start to break down when the input sequences get long. At each step, the hidden state `h` is getting updated with the most recent information, and therefore `h` might be getting \"diluted\" in information as it processes each token. Further, even with a relatively short sequence, the last token will always get the last say and therefore the thought vector will be somewhat biased/weighted towards that last word. To deal with this problem, we use an \"attention\" mechanism that allows the decoder to look not only at all of the hidden states from the input, but it also learns which hidden states, for each step in decoding, to put the most weight on. We will discuss an attention implementation in a later version of this tutorial."
@ -87,7 +156,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {
"collapsed": false
},
@ -137,11 +206,21 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reusing locally cached: ..\\Examples\\SequenceToSequence\\CMUDict\\Data\\tiny.ctf\n",
"Reusing locally cached: ..\\Examples\\SequenceToSequence\\CMUDict\\Data\\cmudict-0.7b.train-dev-20-21.ctf\n",
"Reusing locally cached: ..\\Examples\\SequenceToSequence\\CMUDict\\Data\\cmudict-0.7b.mapping\n"
]
}
],
"source": [
"import requests\n",
"\n",
@ -187,7 +266,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {
"collapsed": true
},
@ -207,7 +286,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {
"collapsed": true
},
@ -230,11 +309,24 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Vocabulary size is 69\n",
"First 15 letters are:\n",
"[\"'\", '</s>', '<s/>', '<s>', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K']\n",
"\n",
"Print dictionary with the vocabulary mapping:\n",
"{0: \"'\", 1: '</s>', 2: '<s/>', 3: '<s>', 4: 'A', 5: 'B', 6: 'C', 7: 'D', 8: 'E', 9: 'F', 10: 'G', 11: 'H', 12: 'I', 13: 'J', 14: 'K', 15: 'L', 16: 'M', 17: 'N', 18: 'O', 19: 'P', 20: 'Q', 21: 'R', 22: 'S', 23: 'T', 24: 'U', 25: 'V', 26: 'W', 27: 'X', 28: 'Y', 29: 'Z', 30: '~AA', 31: '~AE', 32: '~AH', 33: '~AO', 34: '~AW', 35: '~AY', 36: '~B', 37: '~CH', 38: '~D', 39: '~DH', 40: '~EH', 41: '~ER', 42: '~EY', 43: '~F', 44: '~G', 45: '~HH', 46: '~IH', 47: '~IY', 48: '~JH', 49: '~K', 50: '~L', 51: '~M', 52: '~N', 53: '~NG', 54: '~OW', 55: '~OY', 56: '~P', 57: '~R', 58: '~S', 59: '~SH', 60: '~T', 61: '~TH', 62: '~UH', 63: '~UW', 64: '~V', 65: '~W', 66: '~Y', 67: '~Z', 68: '~ZH'}\n"
]
}
],
"source": [
"# Print vocab and the correspoding mapping to the phonemes\n",
"print(\"Vocabulary size is\", len(vocab))\n",
@ -254,7 +346,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"metadata": {
"collapsed": false
},
@ -289,7 +381,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"metadata": {
"collapsed": false
},
@ -325,7 +417,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"metadata": {
"collapsed": false
},
@ -363,7 +455,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"metadata": {
"collapsed": false
},
@ -404,7 +496,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"metadata": {
"collapsed": true
},
@ -454,7 +546,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"metadata": {
"collapsed": true
},
@ -500,7 +592,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"metadata": {
"collapsed": false
},
@ -522,7 +614,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 19,
"metadata": {
"collapsed": false
},
@ -549,7 +641,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 20,
"metadata": {
"collapsed": true
},
@ -599,7 +691,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 21,
"metadata": {
"collapsed": true
},
@ -624,7 +716,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 22,
"metadata": {
"collapsed": true
},
@ -713,11 +805,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['raw_labels', 'raw_input']\n"
]
}
],
"source": [
"model = create_model()\n",
"label_sequence = model.find_by_name('label_sequence')\n",
@ -739,7 +839,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 24,
"metadata": {
"collapsed": false
},
@ -767,7 +867,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 25,
"metadata": {
"collapsed": false
},
@ -794,11 +894,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Minibatch: 0, Train Loss: 4.234, Train Evaluation Criterion: 1.000\n"
]
}
],
"source": [
"training_progress_output_freq = 100\n",
"max_num_minibatch = 100 if isFast else 1000\n",
@ -827,7 +935,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 27,
"metadata": {
"collapsed": true
},
@ -846,7 +954,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 28,
"metadata": {
"collapsed": true
},
@ -881,7 +989,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 29,
"metadata": {
"collapsed": false
},
@ -984,11 +1092,30 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Minibatch: 0, Train Loss: 4.234, Train Evaluation Criterion: 1.000\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"['</s>', '</s>', '</s>', '</s>', 'C', '</s>']\n",
"--- EPOCH 0 DONE: loss = 3.817505, errs = 87.036621 ---\n"
]
}
],
"source": [
"# Given a vocab and tensor, print the output\n",
"def print_sequences(sequences, i2w):\n",
@ -1009,11 +1136,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"87.03662098404853\n"
]
}
],
"source": [
"# Print the training error\n",
"print(error)"
@ -1051,7 +1186,7 @@
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
@ -1065,7 +1200,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.5"
"version": "3.5.2"
}
},
"nbformat": 4,

Просмотреть файл

@ -493,7 +493,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
"version": "3.5.2"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны